Prediction isn't the right method to learn about the past

Happy New Year 2013 = 33 * 61!

The last day of the year is a natural moment for a blog entry about time. At various moments, I wanted to write about the things that the year 2012 brought us.

The most important event in science was the discovery of the $126\GeV$ Higgs boson (something that made me $500 richer but that's of course the least important consequence of the discovery) but those of us who were following the events and thinking about them rationally have known about the $126\GeV$ Higgs boson since December 2011.

Lots of other generic popular science sources recall the landing of Curiosity and other things. But let's discuss something else. Something related to time.

Cara Santa Maria of The Huffington Post (I thought that Santa Maria was a ship, not a car) posted an article about the arrow of time and embedded the following video interview with Sean Carroll.

Clearly, he hasn't learned or understood anything at all over those years. Maybe it is difficult to get a man to understand something when his job depends on not understanding it. ;-) Once again, we hear that the hottest thing in cosmology is the fact that the early Universe had a low entropy (in reality, it really follows from a defining property of the entropy which has been known from the first moment when entropy was introduced in the 19th century).

The picture with the most concentrated wrongness appears around 2:24 in the video above:

Starting from the dot at the "present", Carroll proposes to predict the future and to "predict the past" [sic]. In both cases, the entropy increases relatively to the entropy of the present state.

A very similar picture appears in Brian Greene's book The Fabric of the Cosmos. Brian's picture is even worse because he suggests that the graph of the entropy is smooth, like $S=(t-t_0)^2$, so its derivative vanishes at $t=t_0$. It surely has no reason to vanish. Moreover, Brian omits the helpful part of the graph "actual past".

Now, look at the picture again. You see that Carroll "predicts the past" but his "prediction" for the entropy completely and severely disagrees with the "actual past" (whatever is the way how he determined that the entropy was "actually" lower in the past, he wasn't able to derive this elementary fact because his derivation led to the wrong result "predicted past"; he must have some above-the-science method to find the right answers without science even when his scientific methods produce wrong predictions).

Prague clearly resembled a military front again last night.

In science, when your prediction disagrees with the facts, you must abandon your theory. Instead, Sean Carroll just doesn't care. He isn't thinking as a scientist at all. The disagreement between his predictive framework and the empirical fact means nothing for him; he just continues to use and promote his wrong predictive framework, nevertheless.

It's easy to see why his "prediction" of the past is wrong. The reason is that he is using the same method – prediction – that we use to predict the future. He thinks about the past in the same way as if it were the future. However, the very term

"prediction of the past"

is a logical oxymoron. It is exactly as inconsistent a sequence of words as

"sweeten your tea by adding lemon".

You just can't make your tea any sweeter by adding lemon! Instead, you need sugar, stupid. In the same way, it is wrong to use the particular method of "prediction" when you want to say/guess/reconstruct/determine something about the past. The method of "prediction" is, by definition, only good for learning something about the moment $t_2$ out of the data about the physical system at time $t_1$ when $t_2\gt t_1$: you may only predict a later moment (a moment in the future, if we talk about predictions that are being made now) out of an earlier one, not vice versa!

All successfully verified predictions in science – where we use the usual methodology of predictions – satisfy this property that the predicted moment occurs later than the moment(s) at which some facts are known and inserted as input to the problem. If you use the methodology in the opposite way, it just doesn't work! This method of determining the past is as wrong as an attempt to sweeten your tea by lemon. The wrong graph of the entropy in the past on the picture above is the easiest – and a rather universal – way to see that the methodology doesn't work for "predictions of the past".

Instead, if you want to say something valid about the past, you need to use a different methodology: retrodiction. But retrodictions obey completely different rules than predictions. Predictions produce objective values of probabilities of future events out of known facts about the past; in this sense, predictions "emulate" what Nature Herself is doing when She actually decides what to do with the world at a later moment out of the state at an earlier moment, when She is evolving the world. On the other hand, retrodictions can never produce any objective probabilities at all. The reason is that retrodictions are a form of Bayesian inference

Bayesian inference is a method to update our opinions about the probability of a hypothesis once we see some new evidence. Now, the state (or a statement about some properties) of the physical system in the past is an example of a "hypothesis" and the data collected now (at a later moment) are an example of the "evidence".

What's important is that the Bayesian inference is a "reverse process" or a solution to an "inverse problem". The straightforward calculation starts from a hypothesis (an initial state is a part of a hypothesis about evolution) and this hypothesis predicts objective probabilities for the later moment, for the future, if you wish. These probabilities are objectively calculable because the future literally evolves out of the earlier moment (the past).

But it is not guaranteed that you may revert this evolution – or this reasoning. And indeed, in general, you can't. In fact, in statistical physics, you can't. And in quantum physics, you can't do it, either. The reason is that whenever you discuss the fate of any facts or measurements that may only be predicted statistically – and it is true both in quantum mechanics as well as in statistical physics (even in classical statistical physics) – things are simply irreversible.

If you start with a hot tea on the table, you may predict when the tea-desk temperature difference drops below 1 Celsius degree. However, if you start with a tea that is as cold as the desk, you can't say when it was 60 °C hot. This problem simply has no unique solution because the evolution isn't one-to-one, it isn't reversible. Whatever is the moment when the tea is boiling and poured to the cup, it will ultimately end up as a cold tea.

People such as Sean Carroll or Brian Greene correctly notice that the microscopic laws of Nature are time-reversal-invariant (more precisely, CPT-invariant if we want to include subtle asymmetries of the weak nuclear force) but they're overinterpreting or misinterpreting this fact. This symmetry doesn't mean that every statement about the future and past may be simply reverted upside down. It only means that the microscopic evolution of particular microstates – pure states – to particular other microstates – pure states – may be reverted.

But no probabilistic statements may actually be reverted in this naive way. They can't be reverted for the same reason why $A\Rightarrow B$ is inequivalent to the logical proposition $B\Rightarrow A$. The laws of Nature imply facts of the type ${\rm Past}\Rightarrow{\rm Future}$ but these facts can't be translated to ${\rm Future}\Rightarrow{\rm Past}$ because you would have to check all other conceivable initial states in the past and prove that all of them imply something about the future (i.e. evolve to states in the future that still obey a certain special condition) – which is virtually never the case. The past and the future play asymmetric roles in mathematical logic because of the $A$-$B$ asymmetry of the logical proposition $A\Rightarrow B$, the implication.

To deal with the microstates only – for which the time-reversal symmetry holds – means to deal with equivalences $A\Leftrightarrow B$ only. But this template doesn't allow us to make any realistic statements about physics because the pure states "equivalent" to some states in the past (the future states that evolve from them) are complicated probabilistic superpositions or mixtures that can't be measured. Whenever we make some measurement, we need to talk about microstates that aren't inequivalent to some natural states/information at an earlier moment which is why we need the statements of the type $A\Rightarrow B$ almost all the time and these implications simply violate the $A$-$B$ symmetry.

In particular, if you fail to specify the precise coordinates and velocities of all atoms in your tea, or if you're talking about a large/nonzero entropy of your tea at all, then you are clearly not talking about a particular microstate. You are only talking about some ensembles of operationally indistinguishable microstates (which is why the entropy is nonzero) or, equivalently, about partial, probably macroscopic properties of your tea. And statements of this sort – for example all statements about the entropy of the tea or the tea-desk temperature difference – simply refuse to be time-reversal-invariant! Lots of friction forces, viscosity, diffusion, and other first-time-derivative terms breaking the time reversal symmetry inevitably emerge in the effective laws controlling these quantities and propositions. All the laws that govern the macroscopic quantities average and/or sum over the microstates and the right way to do so inevitably breaks the past-future symmetry "maximally". For example (and it is the most important example), the entropy-decreasing processes are exponentially less likely than their time-reversed partners that increase the entropy.

As I have emphasized many times, the asymmetry arises because the calculated probabilities must be averaged over the initial microstates but summed over the final microstates. Averaging and summing isn't quite the same thing and this difference is what favors the higher-entropy final states.

There is one more consequence I have emphasized less often. The averaging (over initial state) requires "weights". If you have a finite number $N$ of microstates, you may assign the weights $p_i=1/N$ to each of them. However, it's not necessarily the choice you want to make or believe. There may exist evidence that the actual probabilities of initial microstates $p_i$ – the prior probabilities – are not equal to each other. The only thing that will hold is\[

\sum_i p_i = 1.

\] The possible initial microstates differ, at least in principle. You may accumulate evidence $E$ – it means a logical proposition you know to be true because you just observed something that proves it – which will force you to change your beliefs about the probabilities of possible initial states according to Bayes' theorem:\[

P(H_i|E) = \frac{P(H_i)\cdot P(E|H_i)}{P(E)}

\] The vertical line means "given". So the probability of the $i$-th hypothesis (the hypothesis that the initial state was the $i$-th state) given the evidence (which means "after the evidence was taken into account") is equal to the prior probability $P(H_i)$ of the initial state (the probability believed before the evidence was taken into account) multiplied by the probability that the just observed evidence $E$ occurs according to the hypothesis $H_i$ and divided by the normalization factor $P(E)$, the "marginal likelihood", which must be chosen so that the total probability of all mutually excluding hypotheses remains equal to one:\[

\sum_i P(H_i|E) = \sum_i \frac{P(H_i)\cdot P(E|H_i)}{P(E)} = 1.

\] Note that $P(H_i|E)$ and $P(E|H_i)$ aren't the same thing (another potential critical mistake that the people believing in a naive "time reversal symmetry" are probably making all the time as well) but they're proportional to each other. The hypothesis (initial microstate) for which the observed evidence is more likely becomes more likely by itself; the initial states that imply that the evidence (known to be true) cannot occur at all are excluded.

A particular observer has collected certain kinds of evidence $E_j$ and he has some subjective knowledge which determines $P(H_i|E_{\rm all})$. It's important that these probabilities of the hypotheses are subjective, they depend on the evidence that a particular observer has accumulated and labeled trustworthy and legitimate. They become prior probabilities when a new piece of evidence emerges. And indeed, one of the most notorious properties of the prior probabilities is that they are totally subjective and there's no way for everyone to agree about the "right priors". There aren't any objective "right priors".

Except for the Czechoslovak communist malls, Priors, which had to be believed to be objectively right. However, Prior is an acronym for "Přijdeš rychle i odejdeš rychle" (You quickly arrive as well as quickly depart) which quantified the product selection.

That's why the retrodicted probabilities of initial states $p_i=P(H_i)$ always depend on some subjective choices. What we think about the past inevitably depends on other things we have learned about the past. This is a totally new property of retrodictions that doesn't exist for predictions. Predictions may be probabilistic (and in quantum mechanics and statistical physics, they are inevitably "just" probabilistic) but the predicted probabilities are objectively calculable for certain input data. The formulae that objectively determine these probabilities are known as the laws of physics. But the retrodicted probabilities of the past are not only probabilistic; their values inevitably depend on the subjective knowledge, too!

Of course, when the past is determined by the correct method – the method of retrodictions which is a form of Bayesian inference – we will find out that the lower-entropy states are exponentially favored. We won't be able to become certain about any property of the Universe in the past but some most universal facts such as the increasing entropy will of course follow from this Bayesian inference. In particular, the correctly "retrodicted past entropy" will more or less coincide with the "actual past" curve.

I think that even the laymen implicitly know how to reconstruct the past. They know that it's a "reverse problem" of a sort and they secretly use the Bayes theorem even if they don't know the Bayes formula and other pieces of mathematics. They are aware of the fact that the tea-desk temperature difference was higher in the past exactly because this difference is decreasing with time. More generally, they know that the entropy was lower in the past exactly because the entropy is increasing, was increasing, and will be increasing with time. They know that determining the past by the same logic by which we predict or expect the future is wrong, stupid, and it contradicts common sense.

Too bad that Sean Carroll hasn't been able to get this basic piece of common sense yet, after a decade of futile attempts to understand the basics of statistical physics.

And that's the memo.

Prediction isn't the right method to learn about the past

0 comments:

Post a Comment

Popular Posts

Recent Comments

Arsip Blog