跳转至

Adversarial Search

Zero-Sum Games

  • Agents have opposite utilities (values on outcomes).
  • Let us think of a single value that one maximizes and the other minimizes.
  • Adversarial, pure competition.

Minimax

Minimax Values

Policy: the agent should choose an action leading to the state with the largest value.

  • State under agent's control: \(V(s) = \max_{s' \in \text{successors}(s)} V(s')\).
  • State under opponent's control: \(V(s') = \min_{s' \in \text{successors}(s)} V(s)\).
  • Terminal state: \(V(s) = \text{known}\).

Adversarial Search

  • A state-space search tree.
  • Compute each node's minimax value: the best achievable utility against a rational (optimal) adversary.

Problem: In realistic games, cannot search to leaves!

Solution: Depth-limited search.

Evaluation Functions

  • Evaluation functions score non-terminals in depth-limited search.
  • Ideal function: returns the actual minimax value of the position.
  • A simple solution in practice: weighted linear sum of features:
\[ \text{Eval}(s) = w_1 f_1(s) + w_2 f_2(s) + \cdots + w_n f_n(s) \]
  • Monte Carlo Tree Search.
  • Convolutional Neural Network.

Alpha-Beta Pruning

General configuration (MIN version)

  • We're computing the MIN-VALUE at some node \(n\)
  • We're looping over \(n\)'s children, so \(n\)'s estimate is decreasing
  • Let a be the best value that MAX can get at any choice point along the current path from the root
  • If n becomes worse than \(a\), then we can stop considering. (\(n\)'s other children)
  • Reason: if \(n\) is eventually chosen, then the nodes along the path shall all have the value of \(n\), but n is worse than \(a\) and hence the path shall not be chosen at the MAX.

MAX version is symmetric

  • Compute the average score under optimal play.

    • Max nodes as in minimax search.
    • Chance nodes are like min nodes but the outcome is uncertain.
    • Calculate their expected utilities, i.e. taking weighted average (expectation) of children.

评论