Friday, July 25, 2008

A modest proposal

This is brilliant:

If you could choose anyone, from any walk of life, to be Prime Minister, who would you choose? > Bob Geldof. I would then make it my life's mission to hold a yearly concert in a field and denounce him as a 'typical politician' to a bunch of wristband wearing teenagers who have a vague idea that they are doing their bit for world peace/climate change/Make Poverty History by turning up to listen to Natasha Bedingfield and waving their lighters in the air. My version of 'Tell Me Why / I Don't Like Mondays' isn't bad either.

That's the normblog profile of Sadie Smith. It also contains some interesting Berlin-esque thoughts on negative liberty.

Her extremely funny blog is here.

Thursday, July 17, 2008

Let me tell you a story
It has a Hero, an Adventure, and a surprising amount of Nuclear Waste

From CommentIsFree:

A couple of years ago the US Congress established an expert commission to develop a language or symbolism capable of warning against the threats posed by American nuclear waste dumps 10,000 years from now. The problem to be solved was: how must concepts and symbols be designed in order to convey a message to future generations, millennia from now? The commission included physicists, anthropologists, linguists, neuroscientists, psychologists, molecular biologists, classical scholars, artists, and so on.

The experts looked for models among the oldest symbols of humanity. They studied the construction of Stonehenge and the pyramids and examined the historical reception of Homer and the Bible. But these reached back at most a couple of thousand years, not 10,000. The anthropologists recommended the symbol of the skull and crossbones.

The date normally given for Stonehenge is 3100 BC (5000 years ago), the pyramid of Khufu was finished in 2560BC (over 4500 years ago)[1], and even if we accept the controversial belief that there were "many Homers" Iliad and Odyssey were composed in the 9th century BC [2]. To be fair, the age she gives for surviving written documents of Biblical texts is reasonably accurate (some portions of the Qumran scrolls date to 200BC, although clearly the words, like those of Homer, are much older. I would be surprised if the oldest Homer manuscript isn't AD).

This shouldn't annoy me. But it does.

In any case, here's a picture of some older writing:
This is tablet from 7th century BC. [Epic of Gilgamesh, tablet IX] recording the flood narritive.

Which brings me to my point. There are basically two ways of passing on information over vast streches of time:

(a) Cultivating a tradition in which information is passed on, with strict injunctions that it should remain unchanged. There are plenty of examples of this in religious and secular literature.

(b) Create an oral story, which, although it may receive embelishments, will be passed down in peopular memory. Even though the language itself changes, the story remains the same.

Or, to put it another way, the nuclear industry needs to hire either Talmudic Scholars, or Steven Moffat.

[1] Herodotus discusses his visit to the Giza pyramid complex in his Histories (Book II,124). He was writing in around 440BC - so the construction of Great pyramid was almost as distant from Herodotus as he is from us.

[2] Herodotus' Histories (Book II, 54) places Homer 400 years before Herodotus, although modern scholarship disagrees.

Wednesday, July 09, 2008

No Scientific Method in his Madness
In which we discover that Feyerabend has much to answer for

I am a Very Silly Man. Here I am, trying to finish up my PhD, all set to spend my career toiling in the vineyards of condensed matter physics, and what do those clever so-and-so's at the Googleplex go and do? Only go and render "the Scientific Method obsolete", that's what. This is quite a striking claim, and one that merits examining in some detail: after all, if true, it will have serious consequences, not least of which is that I shall abandon my work and seek alternative employment, possibly with a travelling circus.

The structure of Chris Anderson's argument in "wired" is as follows:
He quotes the statistician George Box: "All Models are wrong, but some are useful", and proclaims the truth of this sentiment. "Only models, from cosmological equations to theories of human behavior, seemed to be able to consistently, if imperfectly, explain the world around us. Until now." Because now, we "don't have to settle for models at all." There follows a panegyric on computers in general and the "Pentabyte age" in particular. "Google conquered the advertising world with nothing more than applied mathematics. It didn't pretend to know anything about the culture and conventions of advertising — it just assumed that better data, with better analytical tools, would win the day. And Google was right." And not just about advertising: "Google can translate languages without actually "knowing" them". What fields will Google be overthrowing next?: "linguistics to sociology. Forget taxonomy, ontology, and psychology." He offers two examples - one from physics, one from biology - and argues that "Correlation supersedes causation, and science can advance even without coherent models, unified theories, or really any mechanistic explanation at all."

Oh dear.

We shall briefly note that the statement "All models are wrong, but some are useful" is itself a model (of an epistemological system, with many competing models) and thus is paradoxical, being true only if it isn't. Moreover, although it asserts directly that some models are useful and indirectly that others are not, the statement tells us nothing as to which is which, so it is not, itself, useful.

Scientists are trained to recognize that correlation is not causation, that no conclusions should be drawn simply on the basis of correlation between X and Y (it could just be a coincidence). Instead, you must understand the underlying mechanisms that connect the two. Once you have a model, you can connect the data sets with confidence. Data without a model is just noise.

But faced with massive data, this approach to science — hypothesize, model, test — is becoming obsolete.

Whilst if everyone knew one things about stats, it would be "correlation is not causation", it would be nice if everyone knew two things about stats: specifically WHY "correlation is not causation": are there confounding factors? Could the result have arisen by chance? If you have twenty variables even if there is no causal relationship, one will be significant at the 5% level, and if you have "penta"scale quantities of variables, chance correlations are far more likely. [It would, of course, be nicer still if everyone knew three things about stats:that chanting "correlation is not causation", or putting it in a comment box, is not a devastating critique of a report of a correlation.]

But faced with massive data, this approach to science — hypothesize, model, test — is becoming obsolete. Consider physics: Newtonian models were crude approximations of the truth (wrong at the atomic level, but still useful). A hundred years ago, statistically based quantum mechanics offered a better picture — but quantum mechanics is yet another model, and as such it, too, is flawed, no doubt a caricature of a more complex underlying reality. The reason physics has drifted into theoretical speculation about n-dimensional grand unified models over the past few decades (the "beautiful story" phase of a discipline starved of data) is that we don't know how to run the experiments that would falsify the hypotheses — the energies are too high, the accelerators too expensive, and so on.

Quite apart from the phenomenally irritating assumption that all or most physicists are engaged in "theoretical speculation" or in high energy physics - a popular assumption pandered to by the press, which seldom reports on the majority of physics research which is not "fundamental" - describing "quantum mechanics" as a "caricature of a more complex underlying reality" is a bit peculiar, and in any case does not tell us how "Google searches","throw[ing] the numbers into the biggest computing clusters the world has ever seen" and "statistical algorithms" will replace quantum mechanics.

The second example is likewise flawed:

The best practical example of this is the shotgun gene sequencing by J. Craig Venter. Enabled by high-speed sequencers and supercomputers that statistically analyze the data they produce, Venter went from sequencing individual organisms to sequencing entire ecosystems. In 2003, he started sequencing much of the ocean, retracing the voyage of Captain Cook. And in 2005 he started sequencing the air. In the process, he discovered thousands of previously unknown species of bacteria and other life-forms.

If the words "discover a new species" call to mind Darwin and drawings of finches, you may be stuck in the old way of doing science. Venter can tell you almost nothing about the species he found. He doesn't know what they look like, how they live, or much of anything else about their morphology. He doesn't even have their entire genome. All he has is a statistical blip — a unique sequence that, being unlike any other sequence in the database, must represent a new species.

This sequence may correlate with other sequences that resemble those of species we do know more about. In that case, Venter can make some guesses about the animals — that they convert sunlight into energy in a particular way, or that they descended from a common ancestor. But besides that, he has no better model of this species than Google has of your MySpace page. It's just data. By analyzing it with Google-quality computing resources, though, Venter has advanced biology more than anyone else of his generation.

No, this exactly like "Darwin and the drawing of finches". What Venter has done is gather together vast amounts of data - just as natural historians, biologists and so forth used to make collections and draw connections between them. The theory came later - but still it came. Also, from theories we derive testable hypotheses, and then test them. Here lies the difference between the science and advertising algorithms, however successful they may be. Mr. Anderson concludes:

"There's no reason to cling to our old ways. It's time to ask: What can science learn from Google?"

No, it's time to ask what this scientist can learn from Google. And what he learns is:

Chris Anderson, Wired's waggle-eared rock-star editor, has been dropping hints left and right about the relaunch of HotWired, a faded Web property Conde Nast picked up along with Webmonkey last month. The rumor we've heard: That Wired is relaunching the site as a news-focused social network like Digg.

Hmmm. Mr. Anderson is "dropping hints ... about the relaunch of HotWired", and Mr. Anderson writes an article almost designed to get up the noses of scientists, science bloggers, and science fetishists alike. "Correlation is enough.", Mr. Anderson?

[via The Register ]

[Edited fur spolling]

Sunday, July 06, 2008

I love Sheffield!

I have just returned from a conference there: it is a fantastic city.

That is all