Wednesday, July 09, 2008

No Scientific Method in his Madness
In which we discover that Feyerabend has much to answer for

I am a Very Silly Man. Here I am, trying to finish up my PhD, all set to spend my career toiling in the vineyards of condensed matter physics, and what do those clever so-and-so's at the Googleplex go and do? Only go and render "the Scientific Method obsolete", that's what. This is quite a striking claim, and one that merits examining in some detail: after all, if true, it will have serious consequences, not least of which is that I shall abandon my work and seek alternative employment, possibly with a travelling circus.

The structure of Chris Anderson's argument in "wired" is as follows:
He quotes the statistician George Box: "All Models are wrong, but some are useful", and proclaims the truth of this sentiment. "Only models, from cosmological equations to theories of human behavior, seemed to be able to consistently, if imperfectly, explain the world around us. Until now." Because now, we "don't have to settle for models at all." There follows a panegyric on computers in general and the "Pentabyte age" in particular. "Google conquered the advertising world with nothing more than applied mathematics. It didn't pretend to know anything about the culture and conventions of advertising — it just assumed that better data, with better analytical tools, would win the day. And Google was right." And not just about advertising: "Google can translate languages without actually "knowing" them". What fields will Google be overthrowing next?: "linguistics to sociology. Forget taxonomy, ontology, and psychology." He offers two examples - one from physics, one from biology - and argues that "Correlation supersedes causation, and science can advance even without coherent models, unified theories, or really any mechanistic explanation at all."

Oh dear.

We shall briefly note that the statement "All models are wrong, but some are useful" is itself a model (of an epistemological system, with many competing models) and thus is paradoxical, being true only if it isn't. Moreover, although it asserts directly that some models are useful and indirectly that others are not, the statement tells us nothing as to which is which, so it is not, itself, useful.

Scientists are trained to recognize that correlation is not causation, that no conclusions should be drawn simply on the basis of correlation between X and Y (it could just be a coincidence). Instead, you must understand the underlying mechanisms that connect the two. Once you have a model, you can connect the data sets with confidence. Data without a model is just noise.

But faced with massive data, this approach to science — hypothesize, model, test — is becoming obsolete.

Whilst if everyone knew one things about stats, it would be "correlation is not causation", it would be nice if everyone knew two things about stats: specifically WHY "correlation is not causation": are there confounding factors? Could the result have arisen by chance? If you have twenty variables even if there is no causal relationship, one will be significant at the 5% level, and if you have "penta"scale quantities of variables, chance correlations are far more likely. [It would, of course, be nicer still if everyone knew three things about stats:that chanting "correlation is not causation", or putting it in a comment box, is not a devastating critique of a report of a correlation.]

But faced with massive data, this approach to science — hypothesize, model, test — is becoming obsolete. Consider physics: Newtonian models were crude approximations of the truth (wrong at the atomic level, but still useful). A hundred years ago, statistically based quantum mechanics offered a better picture — but quantum mechanics is yet another model, and as such it, too, is flawed, no doubt a caricature of a more complex underlying reality. The reason physics has drifted into theoretical speculation about n-dimensional grand unified models over the past few decades (the "beautiful story" phase of a discipline starved of data) is that we don't know how to run the experiments that would falsify the hypotheses — the energies are too high, the accelerators too expensive, and so on.

Quite apart from the phenomenally irritating assumption that all or most physicists are engaged in "theoretical speculation" or in high energy physics - a popular assumption pandered to by the press, which seldom reports on the majority of physics research which is not "fundamental" - describing "quantum mechanics" as a "caricature of a more complex underlying reality" is a bit peculiar, and in any case does not tell us how "Google searches","throw[ing] the numbers into the biggest computing clusters the world has ever seen" and "statistical algorithms" will replace quantum mechanics.

The second example is likewise flawed:

The best practical example of this is the shotgun gene sequencing by J. Craig Venter. Enabled by high-speed sequencers and supercomputers that statistically analyze the data they produce, Venter went from sequencing individual organisms to sequencing entire ecosystems. In 2003, he started sequencing much of the ocean, retracing the voyage of Captain Cook. And in 2005 he started sequencing the air. In the process, he discovered thousands of previously unknown species of bacteria and other life-forms.

If the words "discover a new species" call to mind Darwin and drawings of finches, you may be stuck in the old way of doing science. Venter can tell you almost nothing about the species he found. He doesn't know what they look like, how they live, or much of anything else about their morphology. He doesn't even have their entire genome. All he has is a statistical blip — a unique sequence that, being unlike any other sequence in the database, must represent a new species.

This sequence may correlate with other sequences that resemble those of species we do know more about. In that case, Venter can make some guesses about the animals — that they convert sunlight into energy in a particular way, or that they descended from a common ancestor. But besides that, he has no better model of this species than Google has of your MySpace page. It's just data. By analyzing it with Google-quality computing resources, though, Venter has advanced biology more than anyone else of his generation.

No, this exactly like "Darwin and the drawing of finches". What Venter has done is gather together vast amounts of data - just as natural historians, biologists and so forth used to make collections and draw connections between them. The theory came later - but still it came. Also, from theories we derive testable hypotheses, and then test them. Here lies the difference between the science and advertising algorithms, however successful they may be. Mr. Anderson concludes:

"There's no reason to cling to our old ways. It's time to ask: What can science learn from Google?"

No, it's time to ask what this scientist can learn from Google. And what he learns is:

Chris Anderson, Wired's waggle-eared rock-star editor, has been dropping hints left and right about the relaunch of HotWired, a faded Web property Conde Nast picked up along with Webmonkey last month. The rumor we've heard: That Wired is relaunching the site as a news-focused social network like Digg.

Hmmm. Mr. Anderson is "dropping hints ... about the relaunch of HotWired", and Mr. Anderson writes an article almost designed to get up the noses of scientists, science bloggers, and science fetishists alike. "Correlation is enough.", Mr. Anderson?

[via The Register ]

[Edited fur spolling]

12 comments:

jdc said...

Nice post Political Scientist. It was only a matter of time before someone claimed correlation superseded causation. I just expected it to be someone from JABS.

LemmusLemmus said...

Great post, I'll link to it.

As for Feyerabend, that was one of the most disappointing books I have ever read. When I was in first and second semester he was presented as this very important philosopher of science. Then you actually read the book and he just drones on and on about how in the past scientists didn't alway follow Popper's prescriptions.

You don't say.

Political Scientist said...

Hello Guys,

jdc: Glad you liked it. I note that the JABS "enthusiasts" are currently claiming that epidemiology is useless
- except when it isn't. I think it's pretty difficult to reason with people who will accept or reject evidence on the basis of weather it agrees with their conclusions...
BTW, I've scribbled some thoughts about your post on the Bible. When they're in a more coherent form, I'll drop you an email.

LemmusLemmus: Thanks for linking.
I, too, count time spent reading Feyerabend as time lost. You might enjoy "Four modern irrationalists: Popper and After" by David Stove. Stove was an Australian philosopher
but firmly in the tradition of the British skeptical empiricists. His
"Helps for Young Authors, after the manner of the best authorities" is screamingly funny. He did some excellent stuff on Hume and induction back in the 70s. Sadly, he went a bit bonkers at the end, but he was always readable.

Political Scientist said...

Stove on grammar in philosophy of science:

http://web.maths.unsw.edu.au/~jim/stovehelp.html

LemmusLemmus said...

"Google conquered the advertising world with nothing more than applied mathematics. It didn't pretend to know anything about the culture and conventions of advertising — it just assumed that better data, with better analytical tools, would win the day. And Google was right."

That is a particularly disingenious example. The right-wing blogger Steve Sailer once had an hilarious post about how awfully badly google ads are matched to content on blogger sites. I think he mentioned the example that he had had ads for joining the communist party just because he had recently written a post which mentioned The Clash.

More generally, what's "applied mathematics" in this context? A mathematical model, maybe? One based on assumptions? Could we say there's a theory behind the formula? Maybe?

LemmusLemmus said...

Oh, I forgot to thank you for the cites. So far I've read the piece you linked to. Those aren't real quotes, right? Right?

Political Scientist said...

"Those aren't real quotes, right? Right?"
*grins* No, but if you can bear to read some Feyerabend, the quote comes close... [Also, I think I've read the "financial criticism" quote attributed to Feyerabend, but I can't find it on Google Books]

"Could we say there's a theory behind the formula? Maybe?"
Yeah, the whole Wired article is just completely absurd. I think Mr. Anderson was either (a) caught up in the hype, or (b) wrote it to attract controversy to Wired.

PS:
I found Sailer via your blogroll - he's good fun, and very good on processing large amounts of data and telling complex stories very simply and clearly. Don't always agree with him, but he's a fine writer.

LemmusLemmus said...

1. Phew!

The way I remember it (but it's been a few years), the famous Feyerabend book was not actually badly written* and had bits that were, say, historically interesting in their own right. The problem I had with the book was that I was waiting for the actual argument to start for at least about 200 pages. I waited and waited - and then the book was over.

2. (c) his usual stupid self? (d) drunk?

3. I disagree with Sailer on many political issues (but that may be a plus given that I like having my views challenged), I wonder why he's so obsessed with Barak Obama (I skip all of those posts) and I often think that he puts too much certainty in his common sense and everyday observations, but he certainly has interesting ideas and knows how to present them.

*As someone with a training in sociology, I'm pretty hard-boiled, though. In particular, I've read all of Niklas Luhmann's Social Systems. The first chapter starts out with a classic definition along the lines of "We'll speak of a social system when we find attributes in the absence of which we would not speak of a social system" and then things get much, much worse. Naturally, he is recognized as a great thinker in Germany.

Political Scientist said...

1] '"We'll speak of a social system when we find attributes in the absence of which we would not speak of a social system" and then things get much, much worse. Naturally, he is recognized as a great thinker in Germany.'
Heh, maybe I'm being unfair to Feyerebend: it could have been worse.
Cos I come from a physical sciences background, I like sentences to be short and declarative.*
However, I'm mindful that the social sciences are dealing with phenomena far more complicated, and subject to far more influences, than my research, so I'm not surprised that books about social sciences are more complicated.

2] Maybe (c) and (d)!

3] I suspect that he's trying to counter the rest of the medias take on Obama (honestly, I half expect to hear Obama can walk on water!). Also, his stuff on politics-as-psychodrama leaves me cold: the relationship between George HW and George W is not a plausible explanation of the last 7 years. But most of the rest is interesting. One of the things I like best about reading blogs is that you can read things written from a very different perspective: I read a fair few leftist, libertarian and atheist blogs, even though these are pretty antithetical to my own position. Like you, I like having my ideas challenged: it sharpens up my thinking, and occasionally changes my mind.


*Would that my scribblings at this blog lived up to that ideal...

LemmusLemmus said...

That humans, contrary to atoms, have a mind of their own, and the social sciences thus can't be as exact as the natural ones, is a standard argument in introductory social science textbooks. I think the point is a good one.

Given that, I think that people should be even more concerned about expressing themselves clearly. Many do.

Seriously, if your local uni library stocks Social Systems, I suggest you have a look at it; it's an interesting experience. The introduction and first chapter are sort of o.k., after that it gets interesting.

As for Obama, it's one of those cases in which I've long had a post in the back of my mind, but can't find a proper punchline.

Political Scientist said...

"Seriously, if your local uni library stocks Social Systems, I suggest you have a look at it; it's an interesting experience. The introduction and first chapter are sort of o.k., after that it gets interesting."

Yes, that may very well be a treat for post-thesis. I know almost nothing
about sociology, and this Will Not Do, so I shall correct it. The other thing I'm keen to learn more about is stats: because I've focused almost entirely on pure maths and mechanics, my stats education has been sadly neglected. Can you recommend any good books in English?

LemmusLemmus said...

Just to clarify: You're going to learn nothing of interest about sociology from Social Systems. I just thought getting an impression of its impenatrability might be an amusing experience if one knows one does not have to finish it and will not be tested on the subject. (Yes, it still hurts.)

As for good sociology textbooks, Anthony Giddens' imaginatively entitled Sociology is a good overview of topics and findings - long, but very readable. If you want to go down a more theoretical route, Raymond Boudon's The Logic of Social Action is recommended. Somewhat more advanced: James S. Coleman, Foundations of Social Theory. No need to read the mathematical (fifth?) part, though.

As for statistics, I've never read an English-language textbook, but Econometric Analysis by William H. Greene is cited over and over again, so it can't be all crap. ("Econometric analysis" or "econometrics" is simply statistics when used by economists. Of course, these are largely the same techniques used by other disciplines.)