Breaking News
Loading...
Sunday 16 December 2012

Info Post
When they're rather young, children learn about hot and cold things. They learn to say "heiss" when their tongue gets burned a little bit and "brrr" when they're freezing. Much like our ancestors, the Monkey Americans, little kids learn that they may heat objects up by fire; and they may cool them down in the refrigerator if they have one.



In ancient Greece about 2,000 years ago, Philo of Byzantium and Hero of Alexandria constructed the first thermometers when they observed that materials generally expand when the temperature is higher. People would eventually learn that the feeling "how hot or cold" an object is may be expressed by a quantity, the temperature, and objects in mutual contact love to converge to the state in which their temperatures are equal.

The industrial revolution has ignited the expansion of heat engines, steam engines, and many other kinds of engines. During the 19th century, a discipline of physics called thermodynamics began to thrive. It is a different type of physics than the physics in which one tries to find the most elementary laws that determine, as uniquely as possible, the evolution of Nature. Instead, thermodynamics is about some properties of temperature, heat, energy, entropy, and related quantities that are inevitably "emergent" but that emerge rather simply in any sufficiently complicated or macroscopic physical system.




Albert Einstein has counted thermodynamics in the same group as relativity, in the group of "principle-based theories" in which one looks for general consequences of some universal laws and axioms and principles; the other class consisted of "constructive theories". It may sound surprising that relativity and thermodynamics are clumped together but note that both of them are meant to describe just some properties of the system and constraints restricting its evolution rather than the exact evolution of the system under any circumstances.

Thermodynamics is based on several basic laws, especially the first and the second one. There's also the third law of thermodynamics – saying that the entropy of a system at the absolute zero temperature has to go to zero (in classical physics, the entropy was determined up to an overall additive shift only, so instead of saying \(S=0\), the kosher claim was that the entropy was minimized). I won't discuss the third law again (except for one sentence that you will find many paragraphs below); let me just say that its derivation from statistical mechanics is kind of clear and boils down to the fact satisfied by many materials that the ground state of a physical system (with the minimal possible energy) is essentially unique. This ground state is picked at \(T=0\) and the uniqueness implies \(S=0\).

Thermodynamics isn't just about the three laws; it also manages to determine universal constraints and identities that are satisfied at special points (temperatures, pressure etc.) when various functions hit zero, e.g. phase transitions, even without knowing how to derive the relevant functions (e.g. heat capacity as a function of the temperature) from some underlying microscopic laws. It's important to emphasize that such conclusions may be reached without any knowledge of the microscopic theory, just from the "vague effective" description provided by thermodynamics. The qualitative behavior of many quantities, especially their sign (and therefore inequalities in general) are omnipresent in thermodynamics.

The first two laws are more important than the third one. The first law of thermodynamics is the conservation of energy. It's the same law that holds in the microscopic, constructive theories – and whose validity is linked, via Emmy Noether's theorem, to the time-translational symmetry of the laws of physics (they don't change if you test them later). But in thermodynamics, the energy conservation law is expressed in a slightly different language, with slightly different goals.

The most important new twist in the energy conservation law in thermodynamics is the incorporation of heat. In the 1840s, James Joule figured out that heat could have been created by mechanical work and vice versa, heat could have been used to do some work, too. There was an equivalence. It was a remarkable unification – although one that appears in the emergent, principle-based theories, not the microscopic constructive ones – because heat and work had previously been considered independent. In particular, many people would measure heat in calories. The international unit of both heat and work is called one joule today, to celebrate Joule's contributions.

In 1850, Rudolf Clausius used Joule's insights to formulate the first law of thermodynamics. In the form of an equation, it said\[

dU = \delta Q +\delta W, \quad \delta W \equiv - p\,dV.

\] So the energy of some object may change, so \(dU\) may be nonzero, but the change must be exactly equivalent to the energy that the object got from the environment. This includes the infinitesimal amount of work \(\delta W\) that the rest of the world did on the system (e.g. by decreasing its volume if it is a vessel with gas) as well as the infinitesimal amount of heat \(\delta Q\) that flew to the object from the environment. You see that the forms of energy that are appreciated in this form of the energy conservation law include the normal internal energy \(U\) that is already stored in the object, including its thermal motion, the heat that may be traced by watching the temperature changes and multiplying them by the heat capacities, and the work \(p\,dV\) that determines some mechanical work, the change of a pressure-related potential energy of other objects. Note that \(p\,dV\) is replaced by \(\vec F\cdot d\vec s\) if the work is done on solid bodies rather than gases.

Why did he decide that the law had to hold? Well, it was because Clausius and others were already convinced that it was impossible to construct a "perpetual motion machine of the first kind", something that would eternally do mechanical work without consuming anything. Why did they think it was impossible? Wasn't it just due to some sort of pessimism? If God exists and if He is great, He should allow such free workers, shouldn't He? Well, they believed such a machine was impossible because all of them failed to construct it, despite many attempts; more importantly, they knew from Newton's laws of mechanics that the total energy was conserved. The first law of thermodynamics only differed by appreciating heat as a form of energy. They knew that when heat flows from A to B, something changes about the "intrinsic character and properties" of both A and B – and in particular, their internal energies have to change as well. So even when we don't know yet how this thermal energy is stored, we may see that the energy conservation law holds even when thermal phenomena and heat transfer are allowed.

While the first law of thermodynamics is just a reformulation of a law we know from the microscopic theories, the second law of thermodynamics that was appreciated a few years later is completely new. It's a genuine flagship of thermodynamics. It says that the heat may only move from a warmer body to a cooler one but will never spontaneously go in the opposite direction. In other words, temperatures of objects tend to get closer as time goes by. Equivalently, the total entropy never goes down. Well, the equivalence is a bit nontrivial but we will discuss it momentarily.

Why did they decide this law was valid? Well, they decided that it must be impossible to construct another perpetual motion machine, the perpetual motion machine of the second kind. It would do work indefinitely; the only price would be that it would be cooling an object or a reservoir all the time. In some sense, it would be even better than the perpetual motion machine of the first kind because no one would have to work again and global warming would get slowed down or reverted. ;-)

The fact that the heat always goes from a warmer body to a cooler one – so the perpetual motion machine above will simply stop working when the reservoir is cooled down to the same temperature as the engine's temperature – was empirically observed. At the same moment, people decided that this alternative "simple solution to all problems with work" was also impossible because they were modest and skeptical. However, their guess was right. No perpetual motion machine of the second kind may be constructed. You may also say that if it were possible, some animals would have already found the trick.

Entropy

Except for one sentence that you may overlook for a while, the text above said nothing about the entropy. This concept was quite an ingenious invention of Rudolf Clausius from the 1850s and 1860s when he was building on some intuition about the heat transfer coming from Lazare Carnot and his son Sadi Carnot in the early 19th century. They observed that most of the macroscopic processes in Nature such as heat transfer were irreversible. And the most natural mathematical incarnation of irreversibility is the condition that a quantity describing the "state of the system" must be an increasing function of time. The processes such as heat transfer can't be reverted simply because such a reversal would mean to return to the previous state but the state had a lower value of the quantity – yes, entropy – but the decrease of the entropy is impossible.

This main story about entropy – the axiom that it can't decrease – is pretty much inseparable from its existence. The second law of thermodynamics in the "entropy form" says that the entropy never decreases and this second law is pretty much a defining property of the concept of "entropy". In a particular system, the entropy must be a property of the physical system that is defined in whatever way is necessary to guarantee that it won't be allowed to decrease.

This condition is a bit ambiguous because at least, you could redefine the entropy to an arbitrarily increasing function of it,\[

S\to S'=f_{\rm increasing}(S),

\] but most of this freedom may be eliminated if you also demand that the entropy is extensive or additive so for two independent objects A and B, \[

S(A+B) = S(A)+S(B).

\] With this extra condition, there's only "one right entropy" and the only allowed redefining functions \(f(S)\) are linear increasing functions. The scaling may be fixed by \(dS=\delta Q/T\), a relation I will discuss soon, and the additive shift may be determined by requiring that crystals etc. have \(S=0\) for the minimum possible temperature \(T=0\). Alternatively, the additive constant may be obtained from quantum statistical physics where the entropy is effectively \(k_B\) times the logarithm of an integer – the integer may also be interpreted as the number of cells in the phase space. In quantum mechanics, the natural volume of a phase cell is \((2\pi\hbar)^N\) which picks a preferred additive shift to the entropy, one that agrees with the third law of thermodynamics.

Fine, I have mentioned that for the heat transfer, the entropy of an object at some temperature that is absorbing some heat is changing via\[

dS = \frac{\delta Q}{T}.

\] It's a very simply looking insight that pretty much defines the entropy in thermodynamics – before we start to explain thermodynamics by statistical physics. Clausius didn't just guess it; much like string theorists who are learning important things about the Universe by doing thought experiments and computing quite detailed properties of various objects in string theory, Clausius found the expression by studying a particular technical situation – namely the Carnot cycle going back to Sadi Carnot in the 1820s and Benoit Clapeyron in the following two decades. In this cycle, we isothermally (=at unchanging temperature) take the heat from a hot reservoir, reduce the temperature adiabatically (=so that no heat flows from anyone to anyone else), deposit the heat isothermally to a cold reservoir, and return to the higher temperature adiabatically.

Out of the temperature difference of the two reservoirs, we may extract some useful energy that may do mechanical work. This work may be shown to be at most\[

W = \zav{1 - \frac{T_{2}}{T_{1}} } Q_1

\] where the indices 1 and 2 refer to the higher- and lower-temperature reservoir, respectively. The parenthesis is the "Carnot efficiency". At the same moment, the energy conservation implies that the extracted work is \(W=Q_1-Q_2\). Eliminating \(W\) from both formulae, we see that\[

\frac{Q_1}{T_1} = \frac{Q_2}{T_2}.

\] That must be the formula equivalent to the reversibility of the Carnot cycle. Note that all the 4 stages of the cycle are reversible because the heat is either not transferred at all; or it's being transferred among two bodies whose temperature is [almost] the same which is also enough to believe that the entropy increase is [approximately] zero.

Because the formula above is what shows the reversibility of the cycle, it must be that the desired change of the entropy during the isothermal transfers is \(dS = \delta Q/T\). If we assign \(\delta Q\) with the right sign according to some convention (whether the heat goes in or out), the two isothermal transfers in the Carnot cycle give opposite and mutually cancelling contributions to the total \(\Delta S\). The entropy doesn't change.

Note that the entropy does increase when the (negative, by convention) heat \(\delta Q_{\lt 0}\) is transferred from a strictly warmer body to a cooler one because \[

\frac{\delta Q_{\lt 0}}{T_{\rm higher}} - \frac{\delta Q_{\lt 0}}{T_{\rm lower}}\gt 0.

\] That's true because the second term has a greater absolute value than the first one and it's positive. This proof that the entropy increases if the temperatures become more uniform would work even if \(\delta Q\) were multiplied by an arbitrary decreasing function of \(T\), for example \(1/T^2\), but \(1/T\) is the only right choice to guarantee that \(\Delta S = 0\) for reversible processes such as those in the Carnot cycle.

Statistical physics

Everything above was pure thermodynamics, a science about macroscopic objects or materials. We didn't have to study what the thermal energy, heat, and entropy were "made of". We didn't really need to know because we were describing all these things phenomenologically, by an effective theory valid for big objects. But at the end, we want to know where those things come from. A detailed "constructive, microscopic theory of everything" should predict the particular evolution of physical systems including their thermal properties and the observations of thermodynamics should be among the insights – the universal ones, which may be proved for pretty much any system – that one may derive from the "constructive, microscopic theories of everything".

Needless to say, the discipline of physics that gives the microscopic explanation of the thermodynamic laws and concepts is known as statistical physics or statistical mechanics. It finds out that the thermal energy is mostly the kinetic energy of moving and vibrating atoms – we hadn't have to use any properties, and maybe not even the existence, of atoms in the discussion about the perpetual motion machines and Carnot cycles above but we will use the compositeness of matter now.

And a key new question of statistical physics is how the entropy is encoded in the behavior of the atoms. The answer is that it's some kind of (lost) information that the atoms might have carried but it became impossible to get this information.

First, once you appreciate that the matter is made out of atoms, it's clear that they may vibrate and contribute e.g. their kinetic energy \(mv^2/2\) to the total energy \(U\). For various models such as ideal gas or simple models of crystals, you may figure out how the total energy is divided. What we really want to know is how much energy an atom carries at a given temperature \(T\). If we start with the ideal gas, we may see both experimentally and theoretically that the ideal gas obeys \(pV=nRT\) for some constant \(R\) assuming that \(n\) is the amount of the matter in moles. The constant \(R\) may be measured experimentally, assuming that we have agreed what one mole was.

If we avoid these obsolete units and count the amount of matter in atoms and molecules, the equation for the ideal gas is \(pV = NkT\) where \(N\) is simply the number of atoms or molecules and \(k\), Boltzmann's constant, may be measured. Well, if you can prepare a countable number \(N\) of atoms or molecules of a gas at temperature \(T\) and measure the pressure \(p\), you may experimentally measure the constant \(k\). Theoretically, you may interpret \(pV/Nk\) as a definition of the temperature \(T\). You may derive that the kinetic energy of one gas molecule is therefore proportional to \(T\).

There are lots of "stories" one could say about what constants may be derived from what assumptions and measurements assuming some choices of units or no choices of units, and so on, but I don't want to get drowned in various orderings of the story. It's more important to note that from the ideal gas considerations, it may be seen that the energy per degree of freedom is \(kT/2\) and this result holds more generally.

The statistical interpretation of the entropy is more interesting. It's essentially the logarithm of the number of microscopic arrangement in which the physical system may be found so that it still looks the same at the macroscopic level:\[

S = k \ln N_{\rm arrangements}

\] This formula is written on Boltzmann's tomb. You may say that the precise identification of the microstate – one of the \(N\) microstates – represents some information and we have just written down how large the information is (in the natural units known as "nats": one bit is \(\ln 2\) nats, one byte is \(\ln 256\) nats, and so on). When we only say that the system is "in one of the microstates", it really means that we are ignorant about the information whose magnitude is \(S\) – in other words, this much information has been lost in the chaotic evolution of the atoms we can't trace. Again, the defining property of the entropy is that it must be a function of the state that never decreases; in other words, an increasing part of the information about the state is being lost in the arrangement and motion of the atoms we can't trace. We will see that the number of arrangements never decreases (macroscopically). The logarithm is there because\[

\ln(N_1 N_2) = \ln(N_1)+\ln(N_2)

\] and the logarithm (up to a normalization) is the unique function with this property that guarantees the additivity of the entropy. And the constant \(k\) is there to agree with the previous normalizations of the entropy that we have known from thermodynamics – a constant we need to assign the same entropy to an object regardless of whether we use the old methods of thermodynamics or the new methods of statistical physics.

In classical physics, the number of arrangements may be represented by the volume of a region in the (highly) multi-dimensional phase space – or its surface which doesn't really change much. In quantum mechanics, it's the number of macroscopically indistinguishable (pure) microstates.

So if we have a density matrix whose only nonzero eigenvalues are all equal to \(1/N\), and there are \(N\) such nonzero eigenvalues (note that the trace equals one), then the entropy is\[

S = k\ln N.

\] There's nothing fundamentally special about density matrices whose nonzero eigenvalues are equal to each other. We surely want to generalize the expression to an arbitrary density matrix with eigenvalues \(p_i\):\[

S = - \sum_i p_i\cdot\ln(p_i) = -{\rm Tr}(\rho\cdot \ln \rho).

\] The first form of the entropy is the Shannon entropy and may also be used for a classical system with possible classical states distinguished by the index \(i\); the latter, trace-based form is known as the von Neumann entropy and is characteristically quantum mechanical.

There are many – morally equivalent – ways to derive the von Neumann entropy. But we just want something that only counts "information", regardless of where it's stored, so it must depend on the spectrum of \(\rho\) only. Moreover, the entropy has to be additive and the von Neumann entropy is additive because if \[

\rho = \rho_1\otimes \rho_2,

\] we may see that\[

{\rm Tr}(\rho \ln \rho) = {\rm Tr}[(\rho_1\otimes \rho_2)(\ln\rho_1\otimes 1 + 1\otimes \ln\rho_2)]

\] which is indeed equal to the sum of the (minus) entropies from the system 1 and from the system 2.

(There also exist generalized entropies that are not additive or extensive. However, they're much less important in our world because our world ultimately obeys locality or the mutual independence of faraway objects. This locality requires the entropy to be additive for all the laws to simply split to the laws constraining each system separately. The non-additive generalized entropies are useful for systems whose states are constrained by extra conditions whose number depends on the size.)

Now, the entropy wants to get maximized. A physical system is ultimately converging to thermal equilibrium. You may ask what the density matrix must be so that the entropy is maximized among all states with the [approximately] same value of energy \(H\). This may be answered by the variational calculus. Require \(\delta S = \beta \delta H\) and the answer is\[

\rho = C\cdot \exp(-\beta H)

\] where the constants \(C,\beta\) are undetermined. Well, \(C\) is determined by \({\rm Tr}(\rho)=1\), the usual normalization condition, and \(\beta\) is undetermined but we may see that it is linked to the temperature via \(\beta=1/kT\). The term \(\beta H\) appeared in the exponent due to the Lagrange multiplier imposing the fixed value of \(H\). If there are other conserved quantities, they will produce additional terms such as \(\mu N\) (chemical potential times the conserved number of particles) in the exponent.

It's left as an exercise for the reader to calculate the maximization problem.

There is another way to derive the \(\exp(-\beta H)\) form of the thermal density matrix. It must be the "final" density matrix of a system at equilibrium – when it is no longer changing. But the Schrödinger-like equation for the density matrix says\[

i\hbar \ddfrac{\rho(t)}{t} = [H,\rho(t)].

\] Because the equilibrium condition says that the left hand side is zero, the right hand has to vanish, too. It means that \(\rho\) has to be a function of \(H\) (and/or other conserved observables i.e. operators that commute with \(H\)) and the exponential dependence on \(H\) may again be determined from the condition that the thermal density matrix \(\rho\) for a composite system is the tensor product \(\rho_1\otimes\rho_2\). Indeed, the exponential obeys this condition because\[

\exp(-\beta(H_1+H_2)) = \exp(-\beta H_1)\exp(-\beta H_2).

\] Also, one may prove the thermodynamic relationship \(dS=\delta Q/T\) from these formulae. There are lots of variations of all these calculations and assumptions and they may be performed both in quantum physics as well as classical physics; the essence is pretty much the same. I don't want to get lost in this ocean of possible comments.

Instead, let me finally say and prove that the entropy defined statistically is never decreasing. Let's consider the evolution from the ensemble A to ensemble B. By ensembles, I mean some states roughly given by their macroscopic properties which may be represented by one of many "practically indistinguishable" microstates.

The probability of the evolution from A to B is\[

P(A\to B) = \sum_{a\in A}\sum_{b\in B} P(a)\cdot P(a\to b)

\] where \(a,b\) go over microstates included in the ensembles \(A,B\), respectively. This notation is valid both for classical physics and quantum mechanics (where the microstates refer to basis vectors in an orthonormal basis of the ensemble subspace); in quantum mechanics, we could write the expression in a more basis-independent way using traces, too. If we compute the probabilities for ensembles, we must sum the probabilities over all possible final microstates \(b\) because we're calculating the probability of "one possible \(b\)" or "another possible \(b\)" and the word "OR" corresponds to the addition of probabilities.

On the other hand, we have included \(P(a)\), the probability that the initial microstate is a particular \(a\), because the probability of evolution \(P_{a\to b}\) is only relevant when the initial state actually is the particular \(a\). Now, we will simplify our life and assume that all microstates \(a\) are equally likely so \(P(a)=1/N(A)\), the inverse number of microstates in the ensemble \(A\).

It's also possible to calculate the probability of the time-reversed process \(B^*\to A^*\). The stars mean that the signs of the velocities get inverted as well; in quantum field theory, the star would represent the complete CPT conjugate of the original ensemble because CPT is the only reliably universal symmetry involving the time reversal. Clearly, the probability is\[

P(B^*\to A^*) = \sum_{a^*\in A^*} \sum_{b^*\in B^*} P(b^*) P(a\to b)

\] where I used that \(P(a\to b)=P(b^*\to a^*)\) from the time-reversal symmetry (or, more safely, CPT symmetry) of the evolution followed by the microstates. Note that the two probabilities are almost given by the same formula but in the latter one, we see the factor \(P(b^*)=1/N(B^*)=1/N(B)\) in our simplified "equal odds" setup. At any rate, the ratio is\[

\frac{P(A\to B)}{P(B^*\to A^*)} = \frac{P_a}{P_b} = \frac{1/N(A)}{1/N(B)} = \exp\left(\frac{S_B-S_A}{k}\right)

\] where I used \(N(A)=\exp(S_A/k)\) – the inverted formula from Boltzmann's tomb – and similarly for \(B\). But now realize that \(k\) is extremely tiny in the macroscopic SI-like units while \(S_B-S_A\), the entropy change, is finite. So the ratio \((S_B-S_A)/k\) is equal either to plus infinity or minus infinity. It follows that one of the probabilities, \(P(A\to B)\) or \(P(B^*\to A^*)\), is infinitely (even exponential of infinity) times greater (i.e. more likely) than the other one which means that only one of the processes may occur with a nonzero probability (both are less than or equal to one). If you look at the sign, the only possible (nonzero probability) transition between ensembles is one for which the final entropy is greater than the initial one. If \(S_B-S_A\gt 0\), the numerator \(P(A\to B)\) is infinitely greater than the denominator \(P(B^*\to A^*)\) which means that only the numerator is allowed to be nonzero. In other words, \(A\to B\) may occur while \(B^*\to A^*\) is prohibited.

(When you consider finite entropy changes, comparable to \(k\), both processes may occur but the decreasing-entropy process is less likely. The ratio of the likelihoods becomes infinite once the entropy increase/decrease becomes macroscopic.)

There exist numerous "mechanistic" ways to argue that the entropy defined by the tools of statistical physics never decreases. One may try to follow a "ball" in the phase space that becomes a long and chaotic string of spaghetti which covers a much larger phase space when it's "blurred" a little bit (without the blurring, the volume of the spaghetti remains constant, by Liouville's theorem). And many other classical or quantum calculations are doing pretty much the same thing.

But I find my method based on the time reversal to be the conceptually clearest way to prove that the entropy can never decrease. The time reversal is used because the entropy is about irreversibility. So we attempt to perform the time reversal on the situation and we carefully watch what goes wrong. What goes wrong is that the probabilities must be summed over the final states but averaged – because of the \(1/N(A)\) or \(1/N(B)\) factors – over the initial state. The asymmetric roles played by the final state and the initial state is what decides about the asymmetric outcome: the entropy must never decrease although you could argue that there is a "symmetry" between both kinds of evolution.

However, the symmetry wasn't there at all – and Lochschmidt's reversibility "paradox" is complete rubbish – because of the logical arrow of time. Even before one considers any particular microscopical dynamical laws of physics, the past and the future play a different role in mathematical logic and in the probability calculus. The past has to be assumed and when the initial state is described by an ensemble, we don't know which microstate we started with. So we must average over their probabilities. On the other hand, the future is being calculated or expected and when the final state is described by an ensemble, we don't care which of them will appear which is why the probabilities are simply being added.

Addition differs from the averaging. But addition is linked with the final microstates and the averaging is linked with the initial state. That's why the initial and final state play different roles in the calculation of the probabilities: the formula for the probability is asymmetric in the initial and final states as soon as we start to talk about ensembles given by "one microstate OR another". As soon as statistical claims of this sort are being discussed, there's simply no symmetry between the past and the future. At most, you could switch the terminology and use the word "past" for the "future" and vice versa. But it would be stupid. You know that the future has an invariant meaning – it's whatever is evolving (or predicted) from the past. If you try to determine the past from the future, it's a totally different problem – a reverse engineering of a sort, and you need retrodictions which are a form of Bayesian inference whose results always depend on subjective priors. (Retrodictions are generally hard due to the irreversibility of the macroscopic processes, too: you may predict when the temperature of your soup will drop to 23 °C but when it's already matching the temperature of the desk, it's hard to retrodict when the soup's temperature was 40 °C: it was sometime in the past.) Only the evolution in one direction may follow an objective prescription for the probabilities (either the prescribed probabilities that the laws of Nature and quantum mechanics dictate; or the predicted probabilities we calculate when we try to emulate Nature's job) – and by definition, we say that it's the evolution from the past to the future. And because of this convention, we're able to prove that the entropy of the final state is never lower than the entropy of the initial state.

0 comments:

Post a Comment