Imagine a robot walking around on a hill. The hill is a metaphorical posterior density, and the steps the robot takes are, metaphorically, the Markov chain. A Markov chain is a series of states in which the next state depends only on the current state. This property of being memoryless is known as the Markov property. The other part of the name (Monte Carlo) comes from the random nature of the process (after the city in Monaco known for its gambling casinos). The term Monte Carlo generally denotes any kind of computer simulation that involves drawing random (or at least pseudorandom) numbers.

The figure to the right illustrates the simple rules that the robot follows as it explores the landscape. Each state of the Markov chain corresponds to a position along the x-axis where the robot is standing. The robot proposes a new step by choosing a random direction (in this case left or right) and a random distance (e.g. a gamma random deviate). The spot where the robot is proposing to go is indicated in the figure by a dotted line. The robot computes a ratio R by dividing the height of the proposed spot by the height of its current position. This ratio, if less than or equal to 1.0, can be interpreted directly as the probability that the robot will move to the proposed spot (i.e. take the step). The robot draws a uniform(0,1) random deviate, and if the deviate is less than R, it takes the step. If R > 1, the robot always takes the step because any random number between 0 and 1 is guaranteed to be less than R in this case. (The probability of accepting the proposal is thus actually min(R, 1).) If the proposal is not accepted, the robot stays where it is and tries again. Any proposal (accepted or not accepted) adds 1 to the total step count.

What is the effect of following these rules? Amazingly, these simple rules are sufficient to guarantee that the proportion of steps that fall in a specified interval along the x-axis approximately equals the posterior probability corresponding to that interval. This is an amazing thing! If the “hill” was the posterior density from the coin flipping example above, then one could approximate the probability that the coin was “fair” (i.e. p between 0.45 and 0.55) by counting the number of steps that fall in the interval (0.45, 0.55) and dividing by the total number of steps taken by the robot. Yes, it is that easy!