Entropy

Intuitive

Entropy is a measure of chaos or randomness in a system. An important property of entropy is that it increases over time. In practice, every system gets more chaotic over time, unless we use energy to bring it into order.

An explanation for this steady increase of entropy is that there are far more possible states of high entropy than there are states of low entropy.

Therefore, it is much more likely that a system will end up in a state of higher entropy.

A familiar example is sand. It is far more likely to find sand lying randomly around than in an ordered form like a sand castle.


Concrete


The, in the beginning purely empirical, observation that the entropy of a system always increases, can be deduced from general logical arguments as was demonstrated by Jaynes.

Entropy is a macroscopic notion like temperature and is used when we do not have absolute knowledge about the exact micro configurations of the system.

Boltzmann interpretation of entropy is that is a measure of the "number of ways" in which the macrostate can be realized in terms of microstates. If there are many microstates that yield the same macrostate, this macrostate has a high probability.

In this sense entropy gets maximized because this is the most probable macroscopic configuration of the system.

Jaynes goes one step further and argues that entropy is a tool that explicitly takes into account that what we can predict depends on our state of knowledge. This means, while what happens in the real world depends on the physical laws, what is really important is what we can actually predict. Predictions necessarily rely on what we know about the given system.

In his own words:

Instead of asking, "What do the laws of physics require the system to do?", which cannot be answered without knowledge of the exact microstate, Gibbs asked a more modest question, which can be answered: "What is the best guess we can make, from the partial information that we have?"

https://pdfs.semanticscholar.org/d7ff/97069799d3a912803ddd2266cdf573c2461d.pdf

In combination with Boltzmann's interpretation a nice logical picture emerges:

The fact that we always observe that entropy increases, is not something that nature does, but instead a result of how we make predictions for physical systems with limited knowledge. For macroscopic system the microscopic details are usually much too complicated to know exactly. Nevertheless, we want to make predictions. Our best guess in such situations is the most probable outcome, i.e. the macroscopic configurations that can be realized by the largest number of micro configurations, i.e. the macroscopic configuration with maximal entropy.

In this sense, the maximization of entropy is a general principle for reasoning in situations where we do not know all the details.

For a nice simple example of this kind of reasoning, see http://www-mtl.mit.edu/Courses/6.050/2003/notes/chapter9.pdf

If this point of view is correct, an obvious question pops up: Why then is the principle of maximum entropy so uniformly successful?

The reason is that the multiplicity of the macroscopic configurations, i.e. the number of ways in which they can be realized in terms of microscopic configurations, has an extremely sharp maximum. This can be calculated explicitly, as shown, for example at page 7 here. There it is found for a simple system that "not only is $E'$ the value of $E_1$ that can happen in the greatest number of ways for given total energy $E$; the vast majority of all possible microstates with total energy $E$ have $E_1$ very close to $E'$. Less than 1 in $10^8$ of all possible states have $E_1$ outside the interval ($E' \pm 6 \sigma$), far too narrow to measure experimentally".

If it would be otherwise, for example, when the maximum would be broad or if there would be many local maxima the principle of maximum entropy wouldn't be so powerful.

Thus even

if we had more information we would seldom do better in prediction of reproducible phenomena, because those are the same for virtually all microstates in an enormously large class C; and therefore also in virtually any subset of C. […] Knowledge of the "data" E alone would not enable us to choose among the different values of $E_1$ allowed by [energy conservation]; the additional information contained in the entropy functions, nevertheless leads us to make one de finite choice as far more likely than any other, on the information supposed.

https://pdfs.semanticscholar.org/d7ff/97069799d3a912803ddd2266cdf573c2461d.pdf

Shannon Entropy

The Shannon entropy is often taken as codifying the amount of information given in a probability distribution. The idea is that the most informative distribution gives probability 1 to some value (and 0 to all the others); and the least informative gives equal probability to all values (in this case 1 m for all m values).

https://adamcaulton.files.wordpress.com/2013/05/thermo51.pdf2

Boltzmann Entropy

$$S = k log W $$ This is such a strikingly simple relation that one can hardly avoid jumping to the conclusion that it must be true in general; i.e., the entropy of any macroscopic thermodynamic state A is a measure of the phase volume $W_A$ occupied by all microstates compatible with A. It is convenient verbally to say that S measures the "number of ways" in which the macrostate A can be realized. This is justified in quantum theory, where we learn that a classical phase volume W does correspond to a number of global quantum states $n = W/h^{3N}$ . So if we agree, as a convention, that we shall measure classical phase volume in units of $h^{3N}$ , then this manner of speaking will be appropriate in either classical or quantum theory.

We feel quickly that the conjectured generalization of (17) must be correct, because of the light that this throws on our problem. Suddenly, the mysteries evaporate; the meaning of Carnot's principle, the reason for the second law, and the justi cation for Gibbs' variational principle, all become obvious. Let us survey quickly the many things that we can learn from this remarkable discovery. Given a "choice" between going into two macrostates A and B, if $S_A < S_B$, a system will appear to show an overwhelmingly strong preference for B, not because it prefers any particular microstate in B, but only because there are so many more of them.

As noted in Appendix C, an entropy difference ($S_B- S_A$) corresponding to one microcalorie at room temperature indicates a ratio $W_B =W_A > exp(10^{15})$. Thus violations are so improbable that Carnot's principle, or the equivalent Clausius statement (14), appear in the laboratory as absolutely rigid "stone wall" constraints suggesting a law of physics rather than a matter of probability

https://pdfs.semanticscholar.org/d7ff/97069799d3a912803ddd2266cdf573c2461d.pdf

Gibbs Entropy
In the beginning there was just Clausius' weak statement that the entropy of a system tends to increase:

$$ S_{initial} \leq S_{final} .$$

This old statement, has been replaced by the modern view of Gibbs and Jaynes:

Instead of Clausius' weak statement that the total entropy of all bodies involved "tends" to increase, Gibbs made the strong prediction that it will increase, up to the maximum value permitted by whatever constraints (conservation of energy, volume, mole numbers, etc.) are imposed by the experimental arrangement and the known laws of physics. Furthermore, the systems for which this is predicted can be more complicated than those envisaged by Clausius; they may consist of many different chemical components, free to distribute themselves over many phases. Gibbs' variational principle resolved the ambiguity: Given the initial macroscopic data defining a nonequilibrium state, there are millions of conceivable final equilibrium macrostates to which our system might go, all permitted by the conservation laws. Which shall we choose as the most likely to be realized?"

Although he gave a de nite answer to this question, Gibbs noted that his answer was not found by deductive reasoning. Indeed, the problem had no deductive solution because it was ill-posed. There are initial microstates, allowed by the data and the laws of physics, for which the system will not go to the macrostate of maximum entropy. There may be additional constraints, unknown to us, which make it impossible for the system to get to that state; for example new \constants of the motion". So on what grounds could he justify making that choice in preference to all others?

At this point thermodynamics takes on a fundamentally new character. We have to recognize the distinction between two different kinds of reasoning; deduction from the laws of physics, and human inference from whatever information you or I happen to have. Instead of asking, "What do the laws of physics require the system to do?", which cannot be answered without knowledge of the exact microstate, Gibbs asked a more modest question, which can be answered: "What is the best guess we can make, from the partial information that we have?"

[…]

Gibbs said almost nothing about what entropy really means. He showed, far more than anyone else, how much we can accomplish by maximizing entropy. Yet we cannot learn from Gibbs: \What are we actual ly doing when we maximize entropy ?" For this we must turn to Boltzmann.

https://pdfs.semanticscholar.org/d7ff/97069799d3a912803ddd2266cdf573c2461d.pdf

Abstract

A great discussion of common misconceptions by E. T. Jaynes can be found here https://pdfs.semanticscholar.org/d7ff/97069799d3a912803ddd2266cdf573c2461d.pdf

See also "Where do we stand on maximum entropy" by Jaynes.

Why is it interesting?