Adversarial Search¶

Policy: the agent should choose an action leading to the state with the largest value.

State under agent's control: \(V(s) = \max_{s' \in \text{successors}(s)} V(s')\).
State under opponent's control: \(V(s') = \min_{s' \in \text{successors}(s)} V(s)\).
Terminal state: \(V(s) = \text{known}\).

A state-space search tree.
Compute each node's minimax value: the best achievable utility against a rational (optimal) adversary.

Problem: In realistic games, cannot search to leaves!

Solution: Depth-limited search.

\[ \text{Eval}(s) = w_1 f_1(s) + w_2 f_2(s) + \cdots + w_n f_n(s) \]

General configuration (MIN version)

We're computing the MIN-VALUE at some node \(n\)
We're looping over \(n\)'s children, so \(n\)'s estimate is decreasing
Let a be the best value that MAX can get at any choice point along the current path from the root
If n becomes worse than \(a\), then we can stop considering. (\(n\)'s other children)
Reason: if \(n\) is eventually chosen, then the nodes along the path shall all have the value of \(n\), but n is worse than \(a\) and hence the path shall not be chosen at the MAX.

MAX version is symmetric

Compute the average score under optimal play.
- Max nodes as in minimax search.
- Chance nodes are like min nodes but the outcome is uncertain.
- Calculate their expected utilities, i.e. taking weighted average (expectation) of children.

评论