Randomized algorithms

A randomized algorithm is an algorithm that is allowed access to a source of independent, unbiased, random bits, which could influence its computation. Random bits are a resource whose use involves a non-trivial cost (consider picking a random real number). Also, there is sometimes a non-trivial computational overhead associated with sampling from a seemingly well-behaved distribution (consider sampling uniformly from a set whose cardinality is not a power of 2). An easy way to sample a random element from a set \(S\) is to choose \(O(\log |S|)\) random bits and then use these bits to index an element in the set.

Advantages to randomized algorithms: 1. Performance: for many problems, randomized algorithms run faster than the best known deterministic algorithms. 2. Many randomized algorithms are simpler to describe and implement than deterministic ones.

An example of a randomized algorithm is Randomized Quicksort (RandQS). To analyse a randomized algorithm, we compute the expected running time which holds for every input. This expectation depends only on the random choices made by the algorithm, and not on any assumptions about the distribution of the input. The behavior of a randomized algorithm can vary even on a single input, from one execution to another. The running time becomes a random variable which distribution we should analyse.

Probabilistic analysis of an algorithm is based on the distribution of the inputs. The algorithm itself may be deterministic.

A Las Vegas algorithm is a randomized algorithm that always gives the correct solution, while its running time is a random variable. A Monte Carlo algorithm is a randomized algorithm does not always produce the correct solution, but we are able to bound the probability of such an incorrecct solution. Here the quality of the solution is a random variable.

For decision problems (which produce the answer yes or no) there are two kinds of Monte Carlo algorithms:

  1. algorithms with one-sided error if the probability that it errs is zero for at least one of the possible outputs yes or no
  2. algorithms with two-sided error, which have a non-zero probability that it errs when it outputs either yes or no

A Las Vegas algorithm is efficient if its expected running time on any input is bounded by a polynomial function of the input size. A Monte Carlo algorithm is efficient if its worst-case running time is bounded by a polynomial function of the input size.

Game-theoretic techniques

Game-theoretic viewpoint enables us to think of a randomized algorithm as a probability distribution on deterministic algorithms. Yao's Minimax Principle determines a lower bound on the performance of a randomized algorithm.

Game tree evaluation

A game tree is a rooted tree in which internal nodes at even distance from the root perform the MIN operation, and internal nodes at odd distance perform the MAX operation. Each leaf has a value, which is represented by a real number. Given a tree with values at the leaves, the evaluation problem is to determine the value returned by the root.

We will limit the discussion to a special case where the values at the leaves are bits, 0 and 1. Therefore, for this special case each MIN node performs a Boolean AND operation, and each MAX node performs a Boolean OR operation.

Let \(T_{d,k}\) denote a uniform tree in which the root and every internal node has \(d\) children, and every leaf is at distance \(2k\) from the root. Therefore, any root-to-leaf path passes through \(k\) AND nodes (including the root itself) and \(k\) OR nodes, and there are \(d^{2k}\) leaves. We are interested in calculating the number of steps taken by a given algorithm to evaluate any instance of \(T_{d,k}\).

The algorithm beigns by choosing a leaf whose value should be read at the first time. Deterministic algorithm chooses the next leaf by a deterministic function of the values seen so far. For a randomized algorithm, this choice may be randomized. It can be shown that for any deterministic algorithm, there is an instances \(T_{d,k}\) that forces the algorithm to read the values on all \(d^{2k}\) leaves.

We simplify analysis for the case \(d=2\). Any deterministic algorithm inspects leaves in a fixed order, and can be made to evaluate all \(2^{2k}=4^k\) leaves on some instance \(T_{2,k}\). An adversary can hide the crucial value to evaluate the node at the second of the two leaves.

However, reading the leaves in a random order with probability \(1/2\), the algorithm chooses the hidden value on the first step, so its expected number of steps is \(3/2\), which is better than the worst case for any deterministic algorithm.

The randomized algorithm proceeds as following. To evaluate an AND node \(v\), the algorithm chooses one of its children (a sub-tree rooted at an OR node) at random and evaluates it by recursively invoking the algorithm. If 1 is returned by the sub-tree, the algorithm proceeds to evaluate the other child. If 0 is returned, the algorithm returns 0 for \(v\). To evaluate an OR node, the procedure is the same with the roles of 0 and 1 interchanged.

Claim: the expected cost of evaluating any instance of \(T_{2,k}\) is at most \(3^k\).

Proof by induction. For the basic case \(k=1\) the algorithm chooses the leaf which decides the evaluation of the OR node at random with probability \(1/2\), so the expected number of steps is \(3/2\) for evaluating every two leaves. Therefore, the cost is \(3/2 * 2 = 3^1\).

Assume now the expected cost of evaluating the instance of \(T_{2,k-1}\) is at most \(3^{k-1}\). We establish an inductive step. Consider first a tree \(T\) whose root is an OR node, each of whose children is the root of a copy of \(T_{2,k-1}\). If the root of \(T\) were to evaluate to 1, at least one of its children returns 1. With probability \(1/2\) this child is chosen first, incurring (by the inductive hypothesis) an expected cost of at most \(3^{k-1}\) in evaluating \(T\). With probability \(1/2\) both sub-trees are evaluated, incurring a net cost of at most \(2 \times 3^{k-1}\). Therefore, the expected cost of evaluating the tree \(T\) is at most

$$\frac{1}{2} \times 3^{k-1} + \frac{1}{2} \times 2 \times 3^{k-1} = \frac{3}{2} \times 3^{k-1}$$

If, on the other hand, the OR were to evaluate to 0, both children must be evaluated, incurring a cost of at most

$$2 \times 3^{k-1}$$

Now consider the root of the tree \(T_{2,k}\), and AND node. If it evaluates to 1, then both its sub-trees rooted at OR nodes return 1. The expected cost of evaluating \(T_{2,k}\) to 1 is at most

$$2 \times \frac{3}{2} \times 3^{k-1} = 3^{k}$$

On the other hand, if the instance of \(T_{2,k}\) evaluates to 0, at least one of its sub-trees rooted at OR nodes returns 0. With probability \(1/2\) it is chosen first, thus the expected cost of evaluating \(T_{2,k}\) is at most

$$2 \times 3^{k-1} + \frac{1}{2} \times \frac{3}{2} \times 3^{k-1} \le 3^{k}$$

Here the first term bound the cost of evaluating both sub-trees of the OR node that returns 0; the second term accounts for the fact that with probability \(1/2\), an additional cost of \((3/2)3^{k-1}\) may be incurred in evaluating its sibling that returns 1.

Theorem. Given any isntance of \(T_{2,k}\), the expected number of steps for the above randomized algorithm is at most \(3^k\).

Since \(n=4^k \implies k=\log_4 n\), the expected running time for the randomized algorithm is

$$3^k = 3^{\log_4 n} = 3^{\frac{\log_3 n}{\log_3 4}} = n^{\frac{1}{\log_3 4}} = n^{\log_4 3} \le n^{0.793}$$

Therefore, the expected number of steps is smaller than the worst case for any deterministic algorithm. Note that the randomized algorithm above a Las Vegas algorithm which always produces the correct answer.

The minimax principle

We have established an upper bound on the running time of the randomized algorithm. Now we are interested in determining its lower bound. For this perpose, we need the minimax principle, which is the only known general technique for proving lower bounds on the running times of the randomized algorithms. Note: this technique only applies to algorithms that terminate in finite time on all inputs and sequences of random choices.

Consider a standard rock paper scissors game, which is a two-person zero-sum game with a payoff matrix \(M\) as following:

Scissors Paper Stone
Scissors 0 1 -1
Paper -1 0 1
Stone 1 -1 0

The set of all possible strategies of the row player R is correspondence with the rows of \(M\), and likewise for the strategies of the column player \(C\). The entry \(M_{ij}\) is the amount paid by C to R when R chooses strategy \(i\) and C chooses stragety \(j\).

The goal of the row (column) player is to maximize (minimize) the payoff. Assume that this is a zero-information game, in which neither player has any information about the opponent's strategy. If R chooses strategy \(i\), then she is guaranteed a payoff of \(\min_j M_{ij}\) regardless of C's strategy. An optimal strategy for R is an \(i\) that maximizes \(\min_j M_{ij}\). Let \(V_R = \max_i \min_j M_{ij}\) denote the lower bound on the value of the payoff to R when she uses an optimal strategy. An optimal strategy for C is a \(j\) that gives the best possible upper bound on the payoff from C to R, denoted by \(V_C = \min_j \max_i M_{ij}\).

$$\begin{align} \text{lower bound} \quad V_R = \max_i \min_j M_{ij}\\[2ex] \text{upper bound} \quad V_C = \min_j \max_i M_{ij} \end{align}$$

The game is said to have a solution (saddle point) when \(V_R = V_C\) with the value of the game \(V = V_R = V_C\). An example of a game with a solution is a modified scissors-paper-stone game:

Scissors Paper Stone
Scissors 0 1 2
Paper -1 0 1
Stone -2 -1 0

When the game has no solution, we can introduce a randomization in the choice of strategies. A mixed strategy is a probability distribution on the set of possible strategies. The row player picks a vector \(p=(p_1, \ldots, p_n)\), where \(p_i\) is the probability that R will choose strategy \(i\). In a similar way, the column player has a vector \(q=(q_1, \ldots, q_m)\). The payoff is now a random variable, and its expectation is given by

$$E[\text{payoff}] = p^T Mq = \sum_{i=1}^n \sum_{j=1}^m p_i M_{ij} q_j$$

As before, we denote \(V_R\) the best possible lower bound on the expected payoff to R by using strategy \(q\), and \(V_C\) the best possible upper bound on the expected payoff by C by using strategy \(q\):

$$\begin{align} \text{lower bound} \quad V_R = \max_p \min_q p^T Mq\\[2ex] \text{upper bound} \quad V_C = \max_p \min_q p^T Mq \end{align}$$

Theorem (von Neumann's Minimax Theorem). For any two-person zero-sum game specified by a matrix \(M\),

$$\max_p \min_q p^T Mq = \max_p \min_q p^T Mq$$

Yao's technique

We use the previous results to prove lower bounds on the performance of randomized algorithms. The idea is to view the algorithm designer as the column player C and the adversary choosing the input as the row player R. The columns correspond to the set of all possible deterministic algorithms that produce a correct solution; the rows correspond to the set of all possible inputs (of fixed size). The payoff from C to R is some real-valued measure of the performance of an algorithm. We assume that the payoff refers to the running time (positive integers). The algorithm designer would like to choose an algorithm that minimizes the payoff, while the adversary would like to maximize the payoff.

Consider a problem where the number of distinct inputs of a fixed size is finite, as is the number of distinct (deterministic, terminating, and always correct) algorithms for solving that problem. A pure strategy for C corresponds to the choice of a deterministic algorithm, while a pure strategy for R corresponds to a specific input. An optimal strategy for C corresponds to an optimal deterministic algorithm, and \(V_C\) is the worst-case running time of any deterministic algorithm for the problem, called the deterministic complexity of the problem. \(V_R\) is related to non-deterministic complexity of the problem.

A mixed strategy for C is a probability distribution over the space of always correct deterministic algorithms (Las Vegas randomized algorithm). A mixed strategy for R is a distribution over the space of all inputs.

We define the distributional complexity of the problem as the expected running time of the best deterministic algorithm for the worst distribution on the inputs. Since we know the input distribution, the distributional complexity is smaller than the deterministic complexity.

The Loomis' theorem implies that the distributional complexity equals the least possible expected running time achievable by any randomized algorithm.

Now let \(\Pi\) be a problem with a finite set \(\mathscr{I}\) of input instances (of a fixed size), and a finite set of deterministic algorithms \(\mathscr{A}\). For input \(I \in \mathscr{I}\) and algorithm \(A \in \mathscr{A}\), let \(C(I,A)\) denote the running time of algorithm \(A\) on input \(I\). For probability distibutions \(p\) over \(\mathscr{I}\) and \(q\) over \(\mathscr{A}\), let \(I_p\) denote a random input chosen according to \(p\) and \(A_q\) denote a random algorithm chosen according to \(q\). Then,

$$\max_p \min_q E[C(I_p, A_q)] = \min_q \max_p E[C(I_p, A_q)]$$

and

$$\max_p \min_{A \in \mathscr{A}} E[C(I_p, A)] = \min_q \max_{I \in \mathscr{I}} E[C(I, A_q)]$$

Yao's Minimax Principle. For all distributions \(p\) over \(\mathscr{I}\) and \(q\) over \(\mathscr{A}\),

$$\min_{A \in \mathscr{A}} E[C(I_p, A)] \le \max_{I \in \mathscr{I}} E[C(I, A_q)]$$

In other words, the expected tunning time of the optimal deterministic algorithm for an arbitrarily chosen input distribution \(p\) is a lower bound on the expected running time of the optimal (Las Vegas) randomized algorithm for \(\Pi\). Thus, to prove a lower bound on the randomized complexity, it suffices to choose any distribution \(p\) on the input and prove a lower bound on the expected running time of deterministic algorithms for that distribution. Note that the deterministic algorithm knows the chosen distribution \(p\).