Higgs: Does science require to hide data?

Collider-related, Wednesday: Lyn Evans was chosen the director of the CLIC+ILC linear collider unification governance body. I think it's a very good choice.

Unofficial reports say that the Higgs signals after more than 5/fb of the 8 TeV 2012 data boast the same same strength that you would expect based on the 2011 5/fb 7 TeV data.

Up to this point, almost 12.5/fb of data were delivered to each detector in the 2010-2012 runs and about 92%, or 11.5/fb, have been recorded.

In particular, the diphoton ($\gamma\gamma$) channel shows the same strength as it did in 2011, reaching 4 sigma per detector just from the 2012 data. This fact makes it more likely that at least one of the major detectors – ATLAS or CMS – will be able to announce a 5-sigma discovery at the ICHEP 2012 conference in Melbourne that starts on the Independence Day and lasts for a week.

Yes, over 50% of TRF readers are Americans.

The probability of an "individual detector's Summer 2012 discovery" has increased substantially (discovery using one detector, a combination of channel – or one detector, the diphoton channel only). Although we don't know too many details, I would say that even the suspicion that the diphoton channel is stronger than expected from the Standard Model will increase, too.

Of course, the Higgs discovery – more precisely, a discovery of something that surely differs from a Higgsless part of the Standard Model and that is close to the Standard Model Higgs but may be something else (and may gradually acquire properties that are significantly and demonstrably different) – is just a formality. For people who follow the published data and sensible evaluate them, the existence of a 124-126 GeV Higgs boson has been a sure thing at least since December 14th, 2011.

If you wanted to say that your humble correspondent seems to be always right and always ahead of others, then you're very correct, too! ;-)

JollyJoker has pointed out the following article on these developments by Prof Dr Matt Strassler CSc:

New Higgs Rumors Have Arrived

JollyJoker has reacted to some of Matt's words, especially those about withholding scientific information and asymmetric ways to use the data. I still think that Matt is a highly technically potent scientist – but make no doubts about it, JollyJoker is right.

Various independent physicists and science fans – especially climate realists – know very well that I am not one of the most passionate advocates of the "publish all scientific data for everyone" movement although I usually agree with it – and this Higgs saga is no exception.

Off-topic, related to HEP physics: Media are full of the BaBar measured deviation from the Standard Model today. I wrote about it already a month ago...

In fact, I think it would be great if the fresh data from all the LHC collisions could be made completely public and perhaps even user-friendly. We could easily find that there are many people outside the LHC physics teams who are able to write much cleverer, faster, and more accurate algorithms to deal with the information than the hired experimental physicists. And that would be a good thing – for science.

To write something balanced, I am ready to acknowledge the following justifications to withhold data in some situations:

practicality – raw data may often be complicated and unreadable enough so that only an inner circle of scientists around some research understand them; it may be too much work (and expensive procedure) to make it readable for everyone and scientists may naturally be unwilling to publish mess
credit – individual scientists or scientific teams often take credit for their discoveries that have some value; for obvious reasons, they don't want to publish data (and they have some moral right not to publish the data) from their work in progress that would help others to scoop them
protection of scientists against laymen's attacks – scientists require some degree of peace and calm atmosphere to do their research impartially; data from work in progress may be enough for others to scream but may be still incomplete and the screaming may make the completion of the work harder; I could also write a separate point about the protection against invalid interpretations – it's still better if the data are mostly interpreted by experts because they have (or they should have) a higher chance not to draw wrong interpretations – but I will include this observation into this point, too.

All other arguments support the idea that freely accessible data and information are good for science.

The reasons are obvious. Science depends on reproducibility and verification. The more accessible the data are (incidentally, I prefer to insist that the data are a Latin noun in plural, while the singular is a datum), the more people are able to reproduce the research, verify it, and perhaps go beyond it. Freely available data are also good to make biased research less likely. If a group of scientists has a bias or an agenda, others may discover and correct this fact assuming that they have access to the data.

Prof Matt Strassler also wrote:

This is especially true since we learned last year that some well-known non-particle-physicist bloggers have information pipelines directly into the experiments. It is perhaps inevitable that there are scientists who see it in their best interest to subvert the scientific process.

I only have direct onymous pipelines to theory and phenomenology – and occasionally, anonymous pipelines to experiments. My contacts in the ATLAS and CMS teams remain 100% silent when it comes to the communication about the LHC findings (people like Dorigo may need an extra discussion). The adjectives such as "non-particle-physicist" above suggest that the writer primarily talks about Not Even Wrong even though it's likely that the experimenters revealing the data to that website are anonymous, too (again, except for some CMS bloggers who probably reveal the information onymously yet privately).

It surely sounds crazy for me to defend Peter Woit but much like JollyJoker, I simply don't believe that the publication of information "subverts the scientific process" as long as we are talking about the "scientific process" that is highly compatible with good scientific manners. Quite on the contrary, science depends on knowledge – as much knowledge as we can get – so not having enough data always undermines one's ability to participate in the scientific process. The inaccessibility of information cools down science as much as liquid nitrogen.

I just wanted to embed this old LHC video so how should I have introduced it?

There have been several debates on whether or not the ATLAS and CMS physicists "possess" the data from their detectors in the copyright sense. I find such a suggestion incredible. The LHC has cost something like $10 billion and the data from the detector are the only "products" from the collider. So it makes sense to conjecture that the total value of the data over the career of the LHC has the value of $10 billion or so, too. Even if you divide it among the 6,000 ATLAS and CMS physicists, you get $1.5 million per person. Was each of the LHC physicists given a gift that is this expensive? I don't believe so. They are just serving the public – and especially the scientific public – and were hired to manipulate with the data that belong to the taxpayers, the main sponsors of the LHC.

It's true that they got the right to hide their data but the justification can't be that they possess the data in the copyright sense. The arguments favoring secrecy must belong to the list of three "secrecy may be good" arguments. To claim that secrecy is essential for the scientific process is preposterous. A scientific collaboration may have some secretive internal rules and the members may cherish them – it's OK as long as the rules are legal – but it's pretentious and dishonest to sell these secretive rules as "principles of science" which they're surely not.

Asymmetric usage of rumors, data

Also, Matt Strassler is trying to find a way to escape from his previous, long-held positions that the Higgs boson was remaining uncertain even after December 2011 and so on – because he actually realizes that the newest data probably make the "No Higgs near 125 GeV" attitude truly indefensible. It seems clear to me that the rumors are actually being used in his text, perhaps including the numbers on the statistical significance. Still, he tries to hide those from others and he doesn't even tell his readers what the statistical significance seems to be. I just don't think it's fair.

Agent Higgs is a cool $1 roadblock puzzle-style game for your iPhone or iDevice. Buy via iTunes. I've actually bought it, nice – and I am at level 10 now. More words here. Meanwhile, acknowledging the formidable European competition, the Fermilab has changed its primary specialization and welcomed five new bison calves. Via Katie Yurkewicz (Twitter).

Another point in which I unambiguously agree with JollyJoker is that one must use the data symmetrically when it comes to the refutation or confirmation of theories – and one should use all the relevant data. Cutting a subset of the data away from the picture means to deny some evidence – and this denial may be used to make the "Yes" answer or the "No" answer more convincing. It's wrong in both cases.

One may also mention that it's not too natural to use a subset of the data to take care of some possible problems – e.g. use the 2011 data to eliminate the look-elsewhere effect – but it may be done and similar strategies are common. Still, this attitude may heavily underestimate the strength of the signal because the 2011 signal near 125 GeV is much stronger than what is needed "just to erase" the look-elsewhere ambiguity.

There's one more point I agree with JollyJoker: Matt hugely exaggerates how difficult it is to combine various datasets to get an accurate enough idea about the strength of a signal. He even talks about the difference between 7 GeV and 8 GeV to be hugely complicated etc. But the difference is small even if you neglected it and interpreted all the collisions as 7.5 TeV collisions. Moreover, we know the (near) power laws by which various cross sections depend on the energy in the models we care about so we may easily be more accurate. Also, Phil Gibbs has shown that one may get visually indistinguishable combinations by the most straightforward formulae you may think of. They only start to break down when the confidence level is really small but in that case, there's not much to talk about, anyway. Matt is simply creating mysterious dragons where there are almost none.

There are many things to discuss here but JollyJoker is right in his major points. It's not good for science if a selected ad hoc group of scientists is given enough room to hide some data (e.g. new data) and publish other data and if it is given a monopoly to interpret them (new data as well as old data) in their preferred way. This is not how good science should look like because such a secrecy and a monopoly reduces the efficiency and balance in the evaluation of the data and in the determination of their consequences. And if this secretive "scientific process" may be undermined, it's a good thing to undermine it, indeed.

And that's the memo.

Remotely related: This touching story about a Second World War widow searching for her husband (6-minute video) – which culminates in a bombshell – may also teach us a lesson about the results of withholding the information and some people's "monopoly" (or "self-believed monopoly") over this information. Thanks to Gene for the link.

Well, if you haven't noticed, Echo comments may still be displayed but no new Echo comments are being accepted. Use DISQUS only.

I needed to stop the changes to the Echo comments files because they add lots of mutations and random errors to the migration process. Lots of hours of my CPU time – plus lots of my manhours – are needed to fill the missing URLs to the 73,842 Echo comments – the most important non-automatic part of the migration process.

You may pray that it will essentially work tonight. If that is the case, I will improve the XML file to import the Echo comments to DISQUS just a little bit, and that will probably be the last import of Echo data to DISQUS. So the only question I leave to a vote is whether I should re-enable new Echo comments after the import – with the condition that all the new Echo comments would be lost without a trace on October 1st. My vote is No.

At this moment, it seems that with a few exceptions, the Echo comments should appear under the right blog entries as DISQUS comments, with the correct author names (including Guests whose author names had to be inserted manually into 2,000 comments where any name was absent), but they will probably not have the right avatars and they will not be "possessed" by the corresponding DISQUS user.

The pictures attached to Echo comments should be stored, too – with a link to a Dropbox copy of these 171 files added to the relevant Echo-turned-DISQUS comments. Let's see whether it works.

Update:

I've completed a test import into a different account. Some random threads - less than 10% or so e.g. Celebrating Grassmann Numbers - seem not to show up. I don't know why. Otherwise everything is OK except that the reply structure (nesting) isn't preserved. This method of import doesn't have the potential to do so. Also, all the Echo comments will show up as comments from named but "anonymous" users with the universal default avatar. I am giving some time to figure out what to do with these imperfections but at some moment, I may give up and import the XML with these problems.

Higgs: Does science require to hide data?

0 comments:

Post a Comment

Popular Posts

Recent Comments

Arsip Blog