Search-and-Scoring

Search-and-scoring approaches to learning BBN structures search over a space of BBN structures and score the candidates. The highest scoring BBN structure is typically the output.

Load data

Let’s read our data into a Spark DataFrame SDF.

[1]:
from pyspark_bbn.discrete.data import DiscreteData

sdf = spark.read.csv("hdfs://localhost/data-1479668986461.csv", header=True)
data = DiscreteData(sdf)

Genetic algorithm

We use genetic algorithm GA as a search-and-scoring approach to learning BBN structures. In general, the GA algorithm has the following major steps.

  • Initialization: a population of BBN structures

  • Fitness: the population is scored according to a fitness function and filtered

  • Crossover: two parents from the population undergo a crossover operation to produce two new offspring

  • Mutation: each offspring undergo a mutation operation

The fitness, crossover and mutation steps are repeated until a threshold of iterations is reached or there is convergence (a higher scoring BBN structure cannot be discovered).

[2]:
from pyspark_bbn.discrete.ssslearn import Ga

ga = Ga(data, sc, max_iters=3)
g = ga.get_network()

[3]:
import matplotlib.pyplot as plt
import networkx as nx

fig, ax = plt.subplots(figsize=(5, 5))

nx.draw(
    g,
    with_labels=True,
    node_size=500,
    alpha=0.8,
    font_weight="bold",
    font_family="monospace",
    node_color="r",
    arrowsize=15,
    ax=ax,
)
_images/structure-ss_5_0.png