The principle of maximum entropy

The word "entropy" denotes missing information, and it could be substituted by the word "uncertainty". The entropy is a single number measuring the total amount of uncertainty represented by a probability distribution.

If we have $N$ propositions $n=1,2,\cdots,N$, exactly one of which is true, and we have assigned to them a set of probabilities $p_n$, which encode all our knowledge relevant to choosing one rather than another. The amount of uncertainty, according to Shannon, is represented by the formula

$$S = -k \sum_{n=1}^N p_n \ln p_n \, ,$$

where $k$ is an arbitrary positive constant. With a suitable choice of $k$, the result can be interpreted as an estimate of the number of questions (having yes/no answers) which would be needed to isolate the true proposition. In case when all $p_n = 1/N$, i.e. for the uniform distribution of probabilities, most questions would be needed. According to Jaynes, the formula reflects in a quantitative way the total amount of uncertainty remaining after all relevant information about some situation had been taken into account.