Adversarial Search¶
Zero-Sum Games¶
- Agents have opposite utilities (values on outcomes).
 - Let us think of a single value that one maximizes and the other minimizes.
 - Adversarial, pure competition.
 
Minimax¶
Minimax Values¶
Policy: the agent should choose an action leading to the state with the largest value.
- State under agent's control: \(V(s) = \max_{s' \in \text{successors}(s)} V(s')\).
 - State under opponent's control: \(V(s') = \min_{s' \in \text{successors}(s)} V(s)\).
 - Terminal state: \(V(s) = \text{known}\).
 
Adversarial Search¶
- A state-space search tree.
 - Compute each node's minimax value: the best achievable utility against a rational (optimal) adversary.
 

Problem: In realistic games, cannot search to leaves!
Solution: Depth-limited search.
Evaluation Functions¶
- Evaluation functions score non-terminals in depth-limited search.
 - Ideal function: returns the actual minimax value of the position.
 - A simple solution in practice: weighted linear sum of features:
 
\[
\text{Eval}(s) =  w_1 f_1(s) + w_2 f_2(s) + \cdots + w_n f_n(s)
\]
- Monte Carlo Tree Search.
 - Convolutional Neural Network.
 
Alpha-Beta Pruning¶
General configuration (MIN version)
- We're computing the MIN-VALUE at some node \(n\)
 - We're looping over \(n\)'s children, so \(n\)'s estimate is decreasing
 - Let a be the best value that MAX can get at any choice point along the current path from the root
 - If n becomes worse than \(a\), then we can stop considering. (\(n\)'s other children)
 - Reason: if \(n\) is eventually chosen, then the nodes along the path shall all have the value of \(n\), but n is worse than \(a\) and hence the path shall not be chosen at the MAX.
 
MAX version is symmetric

Expectimax Search¶
- 
Compute the average score under optimal play.
- Max nodes as in minimax search.
 - Chance nodes are like min nodes but the outcome is uncertain.
 - Calculate their expected utilities, i.e. taking weighted average (expectation) of children.
 
 
