Bayesian Networks: D-Separation and Probabilistic Inference
Master d-separation algorithms for determining conditional independence in Bayesian networks and understand the foundation of probabilistic inference.
Bayesian Networks: D-Separation and Probabilistic Inference
Bayesian Networks provide a powerful framework for representing and reasoning about uncertainty using directed acyclic graphs (DAGs). One of the most fundamental questions we can ask about a Bayesian network is: “Are two variables conditionally independent given some evidence?”
This lecture explores d-separation (directional separation), the algorithm that answers this question by analyzing paths through the network. Understanding d-separation is essential for efficient probabilistic inference and forms the theoretical foundation for variable elimination techniques.
Review: The Three Types of Triples
Before diving into d-separation, let’s review the three fundamental patterns (triples) that determine whether information flows along a path in a Bayesian network.
1. Causal Chain: $A \to B \to C$
Structure: A causes B, which causes C.
Conditional Independence:
- $A \perp C \mid B$ (A and C are independent given B)
Intuition: If we know B, then learning about A provides no additional information about C. The information from A is “blocked” by observing B.
Example: $\text{Rain} \to \text{Sprinkler} \to \text{Grass Wet}$
- If we observe the sprinkler state, knowing whether it rained provides no additional information about whether the grass is wet.
Activity Rules:
-
Path is active if B is unobserved
-
Path is inactive if B is observed (shaded)
2. Common Cause: $A \leftarrow B \to C$
Structure: B causes both A and C.
Conditional Independence:
- $A \perp C \mid B$ (A and C are independent given B)
Intuition: A and C are correlated because they share a common cause (B). However, once we observe B, they become independent—each is determined solely by B.
Example: $\text{Season} \to \text{Allergies}, \text{Season} \to \text{Ice Cream Sales}$
-
Allergies and ice cream sales are correlated (both increase in summer)
-
Given the season, they’re independent
Activity Rules:
-
Path is active if B is unobserved
-
Path is inactive if B is observed (shaded)
3. Common Effect (V-Structure): $A \to B \leftarrow C$
Structure: Both A and C cause B.
Conditional Independence:
-
$A \perp C$ (A and C are marginally independent)
-
$A \not\perp C \mid B$ (A and C are dependent given B)
Intuition: This is the counterintuitive case. Observing the effect B creates a dependency between the causes A and C through explaining away.
Example: $\text{Earthquake} \to \text{Alarm} \leftarrow \text{Burglary}$
-
Earthquake and burglary are independent events
-
If the alarm goes off and we learn there was an earthquake, this explains away the alarm, making burglary less likely
Activity Rules:
-
Path is inactive if B is unobserved
-
Path is active if B or any of its descendants is observed
Summary Table:
| Triple Type | Structure | Active When |
|————-|———–|————-|
| Causal Chain | $A \to B \to C$ | B unobserved |
| Common Cause | $A \leftarrow B \to C$ | B unobserved |
| Common Effect | $A \to B \leftarrow C$ | B or descendant observed |
The D-Separation Algorithm
D-separation (directional separation) determines whether two variables X and Y are conditionally independent given a set of evidence variables Z.
Definition: X and Y are d-separated by Z if all paths from X to Y are blocked (inactive).
Consequence: If X and Y are d-separated by Z, then $X \perp Y \mid Z$ (conditional independence).
Algorithm Steps
Input: Bayesian network G, query variables X and Y, evidence set Z
Output: True if $X \perp Y \mid Z$, False otherwise
Procedure:
-
Shade all observed nodes in Z on the graph
-
Enumerate all undirected paths from X to Y
-
For each path:
-
Decompose the path into consecutive triples (3-node segments)
-
Check each triple:
-
If any triple is inactive, the entire path is blocked (inactive)
-
If all triples are active, the path is active
-
-
-
Determine independence:
-
If all paths are blocked, then $X \perp Y \mid Z$ ✓ (independence guaranteed)
-
If at least one path is active, then $X \not\perp Y \mid Z$ ✗ (cannot guarantee independence)
-
Key Insight: It only takes one inactive triple to block an entire path. Conversely, all triples must be active for a path to be active.
Worked Examples
Example 1: Simple Chain
Network:
U → T → V
Questions:
- Is $U \perp V$? (No evidence)
Answer: No. Path $U \to T \to V$ is active (T unobserved, causal chain). Information flows from U to V.
- Is $U \perp V \mid T$? (Given T)
Answer: Yes. Shade T. The path $U \to T \to V$ becomes inactive (T observed in causal chain blocks information flow).
Example 2: V-Structure
Network:
T → Y ← W
Questions:
- Is $T \perp W$? (No evidence)
Answer: Yes. The triple $T \to Y \leftarrow W$ is a v-structure. Y is unobserved, so the path is inactive. T and W are marginally independent.
- Is $T \perp W \mid Y$? (Given Y)
Answer: No. Shade Y. The v-structure $T \to Y \leftarrow W$ becomes active (Y observed). T and W are now dependent through explaining away.
Example 3: Complex Network
Network:
U
↓
V ← T → X
↓
Y ← W
↓
Z
Questions:
- Is $V \perp W$? (No evidence)
Path Analysis:
-
Path 1: $V \leftarrow T \to Y \leftarrow W$
-
Triple 1: $V \leftarrow T \to Y$ (common cause, T unobserved → active)
-
Triple 2: $T \to Y \leftarrow W$ (v-structure, Y unobserved → inactive)
-
Path 1 is blocked ✗
-
Answer: Yes, $V \perp W$ (path blocked by inactive v-structure).
- Is $V \perp W \mid T$? (Given T)
Path Analysis:
-
Path 1: $V \leftarrow T \to Y \leftarrow W$
-
Triple 1: $V \leftarrow T \to Y$ (common cause, T observed → inactive)
-
Path 1 is blocked ✗
-
Answer: Yes, $V \perp W \mid T$ (path blocked by observing common cause).
- Is $Y \perp Z \mid W$? (Given W)
Path Analysis:
-
Path 1: $Y \to Z$ (direct edge)
-
Single edge (no intermediate node) is always active
-
Path is active ✓
-
Answer: No, $Y \not\perp Z \mid W$ (direct connection cannot be blocked).
- Is $Y \perp Z \mid T$? (Given T)
Path Analysis:
-
Path 1: $Y \to Z$ (direct edge, always active)
- Path is active ✓
Answer: No. Observing T doesn’t block the direct connection $Y \to Z$.
Structure Implications and Independence Lists
Given a Bayesian network structure, we can systematically apply d-separation to build a complete list of conditional independencies that must hold for any probability distribution represented by the network.
Procedure:
-
For each pair of variables $(X, Y)$
-
For each subset $Z \subseteq V \setminus {X, Y}$
-
Apply d-separation algorithm
-
Record all cases where $X \perp Y \mid Z$
Importance: This list defines the set of probability distributions that can be faithfully represented by the network structure.
Example: For the network $X \to Y \to Z$:
-
$X \perp Z \mid Y$ (must hold)
-
$X \perp Z$ does not necessarily hold (X and Y are marginally dependent)
Computing All Independences
Network Structures and Their Independencies:
-
Chain: $X \to Y \to Z$
- $X \perp Z \mid Y$
-
Fork (Common Cause): $X \leftarrow Y \to Z$
- $X \perp Z \mid Y$
-
V-Structure: $X \to Y \leftarrow Z$
-
$X \perp Z$ (marginally independent)
-
$X \not\perp Z \mid Y$ (dependent given Y)
-
-
Fully Connected: $X \to Y, X \to Z, Y \to Z$
- No conditional independencies (every pair is dependent given any evidence)
Probabilistic Inference: From Networks to Queries
D-separation tells us when variables are independent, but it doesn’t compute actual probabilities. For that, we need probabilistic inference.
Inference Tasks
Three main types of inference queries:
-
Posterior Probability (most common):
-
Query variables: Q
-
Evidence variables: E
-
Example: $P(\text{Burglary} \mid \text{Alarm} = \text{true}, \text{Earthquake} = \text{false})$
-
-
Most Likely Explanation (MAP query):
-
Find the most probable assignment to query variables
-
Example: Most likely diagnosis given symptoms
-
-
Marginal Probability:
- No evidence, just compute distribution over Q
Factor Zoo: Understanding Probability Tables
When working with Bayesian networks, we encounter various types of probability tables (factors):
1. Joint Distribution: $P(X, Y)$
Definition: Full probability distribution over all combinations of X and Y.
Properties:
-
Entries: $P(x, y)$ for all $x \in X, y \in Y$
-
Sums to 1: $\sum_{x, y} P(x, y) = 1$
Example: Temperature and Weather
| T | W | P(T, W) |
|—|—|———|
| hot | sun | 0.4 |
| hot | rain | 0.1 |
| cold | sun | 0.2 |
| cold | rain | 0.3 |
2. Selected Joint: $P(x, Y)$
Definition: A slice of the joint distribution with X fixed to value x.
Properties:
-
Entries: $P(x, y)$ for fixed x, all $y \in Y$
-
Sums to $P(x)$: $\sum_y P(x, y) = P(x)$
Example: $P(\text{cold}, W)$
| T | W | P |
|—|—|—|
| cold | sun | 0.2 |
| cold | rain | 0.3 |
Sum: $0.2 + 0.3 = 0.5 = P(\text{cold})$
3. Conditional Distribution: $P(Y \mid X)$
Definition: Probability of Y given each value of X.
Properties:
-
Entries: $P(y \mid x)$ for all $x \in X, y \in Y$
-
Each row sums to 1: $\sum_y P(y \mid x) = 1$ for each x
Example: $P(W \mid T)$
| T | W | P(W \mid T) |
|—|—|————-|
| hot | sun | 0.8 |
| hot | rain | 0.2 |
| cold | sun | 0.4 |
| cold | rain | 0.6 |
4. Specified Family: $P(y \mid X)$
Definition: Probability of a fixed value y given all values of X.
Properties:
-
Entries: $P(y \mid x)$ for fixed y, all $x \in X$
-
Does not sum to any particular value
Example: $P(\text{rain} \mid T)$
| T | W | P |
|—|—|—|
| hot | rain | 0.2 |
| cold | rain | 0.6 |
General Factor Notation
When we write $P(Y_1, \ldots, Y_N \mid X_1, \ldots, X_M)$:
-
It represents a multi-dimensional array (factor)
-
Dimensions correspond to unassigned (uppercase) variables
-
Assigned (lowercase) variables reduce dimensionality
-
Values are $P(y_1, \ldots, y_N \mid x_1, \ldots, x_M)$
Practice Problems
Problem 1: D-Separation
Given the network:
A → B → C
↓ ↓
D → E → F
Determine:
-
Is $A \perp C$?
-
Is $A \perp C \mid B$?
-
Is $A \perp F \mid E$?
-
Is $D \perp C \mid {B, E}$?
Problem 2: Identifying Triples
For each of the following, identify the triple type and determine if the path is active:
-
$X \to Y \to Z$, Y unobserved
-
$X \leftarrow Y \to Z$, Y observed
-
$X \to Y \leftarrow Z$, Y unobserved
-
$X \to Y \leftarrow Z$, Y’s child observed
Problem 3: Building Independence Lists
For the network $A \to B \leftarrow C \to D$:
List all conditional independencies of the form $X \perp Y \mid Z$ where Z can be empty or a singleton.
Hint: Consider pairs (A, D), (A, C), (B, D), etc., and test with different evidence sets.
Conclusion
D-separation is the cornerstone of reasoning about conditional independence in Bayesian networks:
-
Three Triple Types: Causal chains, common causes, and v-structures (common effects) have different blocking behaviors
-
Blocking Rules: Chains and forks block when the middle is observed; v-structures block when unobserved
-
Path Activity: A path is active only if all triples are active; one inactive triple blocks the entire path
-
Independence Guarantee: D-separation provides a graphical criterion for conditional independence
Key Takeaway: Understanding d-separation allows us to:
-
Determine which variables influence each other
-
Identify opportunities for computational efficiency in inference
-
Validate whether a network structure correctly represents domain knowledge
In the next lecture, we’ll see how d-separation insights enable variable elimination, an exact inference algorithm that dramatically improves upon naive enumeration.
Further Reading
-
Probabilistic Graphical Models by Koller and Friedman - Comprehensive treatment of Bayesian networks
-
Artificial Intelligence: A Modern Approach by Russell and Norvig (Chapter 13) - Accessible introduction to probabilistic reasoning
-
D-Separation Without Tears by Judea Pearl - Original paper explaining d-separation
-
CS 188: Introduction to AI (UC Berkeley) - Course materials on Bayesian networks
-
pgmpy: Python Library for Probabilistic Graphical Models - Practical implementation
This lecture covered d-separation for determining conditional independence in Bayesian networks. Next, we’ll explore variable elimination for efficient probabilistic inference.