Bayesian Networks: D-Separation and Probabilistic Inference

Master d-separation algorithms for determining conditional independence in Bayesian networks and understand the foundation of probabilistic inference.

Bayesian Networks: D-Separation and Probabilistic Inference

Bayesian Networks provide a powerful framework for representing and reasoning about uncertainty using directed acyclic graphs (DAGs). One of the most fundamental questions we can ask about a Bayesian network is: “Are two variables conditionally independent given some evidence?”

This lecture explores d-separation (directional separation), the algorithm that answers this question by analyzing paths through the network. Understanding d-separation is essential for efficient probabilistic inference and forms the theoretical foundation for variable elimination techniques.

Review: The Three Types of Triples

Before diving into d-separation, let’s review the three fundamental patterns (triples) that determine whether information flows along a path in a Bayesian network.

1. Causal Chain: $A \to B \to C$

Structure: A causes B, which causes C.

Conditional Independence:

  • $A \perp C \mid B$ (A and C are independent given B)

Intuition: If we know B, then learning about A provides no additional information about C. The information from A is “blocked” by observing B.

Example: $\text{Rain} \to \text{Sprinkler} \to \text{Grass Wet}$

  • If we observe the sprinkler state, knowing whether it rained provides no additional information about whether the grass is wet.

Activity Rules:

  • Path is active if B is unobserved

  • Path is inactive if B is observed (shaded)

2. Common Cause: $A \leftarrow B \to C$

Structure: B causes both A and C.

Conditional Independence:

  • $A \perp C \mid B$ (A and C are independent given B)

Intuition: A and C are correlated because they share a common cause (B). However, once we observe B, they become independent—each is determined solely by B.

Example: $\text{Season} \to \text{Allergies}, \text{Season} \to \text{Ice Cream Sales}$

  • Allergies and ice cream sales are correlated (both increase in summer)

  • Given the season, they’re independent

Activity Rules:

  • Path is active if B is unobserved

  • Path is inactive if B is observed (shaded)

3. Common Effect (V-Structure): $A \to B \leftarrow C$

Structure: Both A and C cause B.

Conditional Independence:

  • $A \perp C$ (A and C are marginally independent)

  • $A \not\perp C \mid B$ (A and C are dependent given B)

Intuition: This is the counterintuitive case. Observing the effect B creates a dependency between the causes A and C through explaining away.

Example: $\text{Earthquake} \to \text{Alarm} \leftarrow \text{Burglary}$

  • Earthquake and burglary are independent events

  • If the alarm goes off and we learn there was an earthquake, this explains away the alarm, making burglary less likely

Activity Rules:

  • Path is inactive if B is unobserved

  • Path is active if B or any of its descendants is observed

Summary Table:

Triple Type Structure Active When

|————-|———–|————-|

Causal Chain $A \to B \to C$ B unobserved
Common Cause $A \leftarrow B \to C$ B unobserved
Common Effect $A \to B \leftarrow C$ B or descendant observed

The D-Separation Algorithm

D-separation (directional separation) determines whether two variables X and Y are conditionally independent given a set of evidence variables Z.

Definition: X and Y are d-separated by Z if all paths from X to Y are blocked (inactive).

Consequence: If X and Y are d-separated by Z, then $X \perp Y \mid Z$ (conditional independence).

Algorithm Steps

Input: Bayesian network G, query variables X and Y, evidence set Z

Output: True if $X \perp Y \mid Z$, False otherwise

Procedure:

  1. Shade all observed nodes in Z on the graph

  2. Enumerate all undirected paths from X to Y

  3. For each path:

    • Decompose the path into consecutive triples (3-node segments)

    • Check each triple:

      • If any triple is inactive, the entire path is blocked (inactive)

      • If all triples are active, the path is active

  4. Determine independence:

    • If all paths are blocked, then $X \perp Y \mid Z$ ✓ (independence guaranteed)

    • If at least one path is active, then $X \not\perp Y \mid Z$ ✗ (cannot guarantee independence)

Key Insight: It only takes one inactive triple to block an entire path. Conversely, all triples must be active for a path to be active.

Worked Examples

Example 1: Simple Chain

Network:



U → T → V


Questions:

  1. Is $U \perp V$? (No evidence)

Answer: No. Path $U \to T \to V$ is active (T unobserved, causal chain). Information flows from U to V.

  1. Is $U \perp V \mid T$? (Given T)

Answer: Yes. Shade T. The path $U \to T \to V$ becomes inactive (T observed in causal chain blocks information flow).

Example 2: V-Structure

Network:



T → Y ← W


Questions:

  1. Is $T \perp W$? (No evidence)

Answer: Yes. The triple $T \to Y \leftarrow W$ is a v-structure. Y is unobserved, so the path is inactive. T and W are marginally independent.

  1. Is $T \perp W \mid Y$? (Given Y)

Answer: No. Shade Y. The v-structure $T \to Y \leftarrow W$ becomes active (Y observed). T and W are now dependent through explaining away.

Example 3: Complex Network

Network:



    U


    ↓


V ← T → X


    ↓


    Y ← W


    ↓


    Z


Questions:

  1. Is $V \perp W$? (No evidence)

Path Analysis:

  • Path 1: $V \leftarrow T \to Y \leftarrow W$

    • Triple 1: $V \leftarrow T \to Y$ (common cause, T unobserved → active)

    • Triple 2: $T \to Y \leftarrow W$ (v-structure, Y unobserved → inactive)

    • Path 1 is blocked

Answer: Yes, $V \perp W$ (path blocked by inactive v-structure).

  1. Is $V \perp W \mid T$? (Given T)

Path Analysis:

  • Path 1: $V \leftarrow T \to Y \leftarrow W$

    • Triple 1: $V \leftarrow T \to Y$ (common cause, T observedinactive)

    • Path 1 is blocked

Answer: Yes, $V \perp W \mid T$ (path blocked by observing common cause).

  1. Is $Y \perp Z \mid W$? (Given W)

Path Analysis:

  • Path 1: $Y \to Z$ (direct edge)

    • Single edge (no intermediate node) is always active

    • Path is active

Answer: No, $Y \not\perp Z \mid W$ (direct connection cannot be blocked).

  1. Is $Y \perp Z \mid T$? (Given T)

Path Analysis:

  • Path 1: $Y \to Z$ (direct edge, always active)

    • Path is active

Answer: No. Observing T doesn’t block the direct connection $Y \to Z$.

Structure Implications and Independence Lists

Given a Bayesian network structure, we can systematically apply d-separation to build a complete list of conditional independencies that must hold for any probability distribution represented by the network.

Procedure:

  1. For each pair of variables $(X, Y)$

  2. For each subset $Z \subseteq V \setminus {X, Y}$

  3. Apply d-separation algorithm

  4. Record all cases where $X \perp Y \mid Z$

Importance: This list defines the set of probability distributions that can be faithfully represented by the network structure.

Example: For the network $X \to Y \to Z$:

  • $X \perp Z \mid Y$ (must hold)

  • $X \perp Z$ does not necessarily hold (X and Y are marginally dependent)

Computing All Independences

Network Structures and Their Independencies:

  1. Chain: $X \to Y \to Z$

    • $X \perp Z \mid Y$
  2. Fork (Common Cause): $X \leftarrow Y \to Z$

    • $X \perp Z \mid Y$
  3. V-Structure: $X \to Y \leftarrow Z$

    • $X \perp Z$ (marginally independent)

    • $X \not\perp Z \mid Y$ (dependent given Y)

  4. Fully Connected: $X \to Y, X \to Z, Y \to Z$

    • No conditional independencies (every pair is dependent given any evidence)

Probabilistic Inference: From Networks to Queries

D-separation tells us when variables are independent, but it doesn’t compute actual probabilities. For that, we need probabilistic inference.

Inference Tasks

Three main types of inference queries:

  1. Posterior Probability (most common):

    P(QE=e)P(Q \mid E = e)
    • Query variables: Q

    • Evidence variables: E

    • Example: $P(\text{Burglary} \mid \text{Alarm} = \text{true}, \text{Earthquake} = \text{false})$

  2. Most Likely Explanation (MAP query):

    argmaxqP(Q=qE=e)\arg\max_q P(Q = q \mid E = e)
    • Find the most probable assignment to query variables

    • Example: Most likely diagnosis given symptoms

  3. Marginal Probability:

    P(Q)P(Q)
    • No evidence, just compute distribution over Q

Factor Zoo: Understanding Probability Tables

When working with Bayesian networks, we encounter various types of probability tables (factors):

1. Joint Distribution: $P(X, Y)$

Definition: Full probability distribution over all combinations of X and Y.

Properties:

  • Entries: $P(x, y)$ for all $x \in X, y \in Y$

  • Sums to 1: $\sum_{x, y} P(x, y) = 1$

Example: Temperature and Weather

T W P(T, W)

|—|—|———|

hot sun 0.4
hot rain 0.1
cold sun 0.2
cold rain 0.3

2. Selected Joint: $P(x, Y)$

Definition: A slice of the joint distribution with X fixed to value x.

Properties:

  • Entries: $P(x, y)$ for fixed x, all $y \in Y$

  • Sums to $P(x)$: $\sum_y P(x, y) = P(x)$

Example: $P(\text{cold}, W)$

T W P

|—|—|—|

cold sun 0.2
cold rain 0.3

Sum: $0.2 + 0.3 = 0.5 = P(\text{cold})$

3. Conditional Distribution: $P(Y \mid X)$

Definition: Probability of Y given each value of X.

Properties:

  • Entries: $P(y \mid x)$ for all $x \in X, y \in Y$

  • Each row sums to 1: $\sum_y P(y \mid x) = 1$ for each x

Example: $P(W \mid T)$

T W P(W \mid T)

|—|—|————-|

hot sun 0.8
hot rain 0.2
cold sun 0.4
cold rain 0.6

4. Specified Family: $P(y \mid X)$

Definition: Probability of a fixed value y given all values of X.

Properties:

  • Entries: $P(y \mid x)$ for fixed y, all $x \in X$

  • Does not sum to any particular value

Example: $P(\text{rain} \mid T)$

T W P

|—|—|—|

hot rain 0.2
cold rain 0.6

General Factor Notation

When we write $P(Y_1, \ldots, Y_N \mid X_1, \ldots, X_M)$:

  • It represents a multi-dimensional array (factor)

  • Dimensions correspond to unassigned (uppercase) variables

  • Assigned (lowercase) variables reduce dimensionality

  • Values are $P(y_1, \ldots, y_N \mid x_1, \ldots, x_M)$

Practice Problems

Problem 1: D-Separation

Given the network:



A → B → C


↓       ↓


D → E → F


Determine:

  1. Is $A \perp C$?

  2. Is $A \perp C \mid B$?

  3. Is $A \perp F \mid E$?

  4. Is $D \perp C \mid {B, E}$?

Problem 2: Identifying Triples

For each of the following, identify the triple type and determine if the path is active:

  1. $X \to Y \to Z$, Y unobserved

  2. $X \leftarrow Y \to Z$, Y observed

  3. $X \to Y \leftarrow Z$, Y unobserved

  4. $X \to Y \leftarrow Z$, Y’s child observed

Problem 3: Building Independence Lists

For the network $A \to B \leftarrow C \to D$:

List all conditional independencies of the form $X \perp Y \mid Z$ where Z can be empty or a singleton.

Hint: Consider pairs (A, D), (A, C), (B, D), etc., and test with different evidence sets.

Conclusion

D-separation is the cornerstone of reasoning about conditional independence in Bayesian networks:

  1. Three Triple Types: Causal chains, common causes, and v-structures (common effects) have different blocking behaviors

  2. Blocking Rules: Chains and forks block when the middle is observed; v-structures block when unobserved

  3. Path Activity: A path is active only if all triples are active; one inactive triple blocks the entire path

  4. Independence Guarantee: D-separation provides a graphical criterion for conditional independence

Key Takeaway: Understanding d-separation allows us to:

  • Determine which variables influence each other

  • Identify opportunities for computational efficiency in inference

  • Validate whether a network structure correctly represents domain knowledge

In the next lecture, we’ll see how d-separation insights enable variable elimination, an exact inference algorithm that dramatically improves upon naive enumeration.

Further Reading


This lecture covered d-separation for determining conditional independence in Bayesian networks. Next, we’ll explore variable elimination for efficient probabilistic inference.