The principle of maximum entropy

The word "entropy" denotes missing information, and it could be substituted by the word "uncertainty". The entropy is a single number measuring the total amount of uncertainty represented by a probability distribution.

If we have \(N\) propositions \(n=1,2,\cdots,N\), exactly one of which is true, and we have assigned to them a set of probabilities \(p_n\), which encode all our knowledge relevant to choosing one rather than another. The amount of uncertainty, according to Shannon, is represented by the formula

$$S = -k \sum_{n=1}^N p_n \ln p_n \, ,$$

where \(k\) is an arbitrary positive constant. With a suitable choice of \(k\), the result can be interpreted as an estimate of the number of questions (having yes/no answers) which would be needed to isolate the true proposition. In case when all \(p_n = 1/N\), i.e. for the uniform distribution of probabilities, most questions would be needed. According to Jaynes, the formula reflects in a quantitative way the total amount of uncertainty remaining after all relevant information about some situation had been taken into account.