XVII. How I corrupted the youth, and why I intend to continue
When I still worked in the Department of Parasitology, I often had to defend my research methods and approach to molecular biologists and biochemists, who gradually grew to predominate there. Finally, I gave up my efforts and moved with my entire team to the Department of Philosophy and History of Natural Sciences, which has a branch of theoretical and evolutionary biology, where my approach didn’t stick out. My colleagues in the Department of Parasitology often reproached my results for being inexact and ambiguous. They used to say that there was nothing or almost nothing to conclude from them – the performance of Toxo positives and Toxo negatives differs by only a couple percent, and only after running the data through a complicated statistical test, can we discover a relationship. When we conduct the exact same study on a different subject group, the relationship we originally discovered appears completely different – sometimes even opposite. When we use another, more sophisticated statistical test, we suddenly find a relationship we didn’t see before. In short, they say that our data allows for too much casual interpretation, and sometimes complete subjectivity. In contrast, the sort of results that biochemists and molecular biologists get are clearly unambiguous and objective, which must mean that they are perfectly exact.
From personal experience researching in the field of molecular biology and immunology, I well know that such results only seem unambiguous. It’s because biochemists and molecular biologists use very simplified systems, which are standardized to the extent that they will always give the same results. But in reality, just by using reagents from a different company, they will also get different results for the same experiment. And this is not even the biggest problem. Besides, in today’s globalized world, the number of independent producers of research reagents is rapidly dwindling, so researchers are often forced to buy chemicals from the same sources anyway, so their results are more comparable. A much larger problem is the biological material. If we use a different strain of lab mice, then the results of the same experiment (especially in my favorite field: immunology) will often be completely opposite to those obtained with the other strain. To get rid of the nuisance of ambiguous results, immunologists always try to use a system standardized to the smallest detail, that corresponds with the system used by colleagues in other work places.
But it’s a question whether this effort for maximum standardization doesn’t come at too a high price, and sacrifice some of the quality of the research It’s great that molecular biologists, unlike scientists such as experimental psychologists, can repeat the same experiment a hundred times over and get the same results, if they picked any one of the 200 strains of laboratory mice. But it’s less impressive that if they conducted their experiment only twice, but each time on a different strain of mice, then they might find that their “highly reproducible” results looked quite different. The reproducibility of results in current molecular biology and biochemistry is often achieved at the cost of their validity (Box 75 What is test validity, and what does the increase of interleukin 12 in Toxo positive mice indicate?).
Results obtained under highly standardized, and often also highly artificial conditions are clear and reproducible. However, they may tell us very little about the real biological significance of the observed phenomenon. Let me explain. Probably the most interesting molecules in organisms are enzymes – biological catalysts (Box 76 Why are enzymes such lousy catalysts?). When biochemists try to describe a newly isolated enzyme, they must first purify it from the rest of the cellular homogenate, and then find its physical and biochemical characteristics, including various enzymatic constants and the reaction kinetics. Few people realize that the standard conditions under which this purified enzyme is studied, are drastically different from the conditions which exist in a live cell.
Box 75 What is test validity, and what does the increase of interleukin 12 in Toxo positive mice indicate?
The most important measure of the quality of any study is the validity of its data.Validity is how closely something represents reality. A test which gives us reproducible results, and is highly sensitive and specific, but actually measures a different variable than we’d like it to, is quite useless. If we want to know whether latent toxoplasmosis causes immunosuppression or immunostimulation in mice, it does us no good that we can measure precisely and reproducibly how much the production of interleukin 12 (a protein which certain white blood cells use to communicate with other parts of the immune system) has increased two months after infection in the BALB/c strain. It’s very likely, that in one strain of mice the levels of interleukin 12 will increase to ten times their original amount, but not change at all in another strain. And that goes without mentioning other complications. Although a significant increase in interleukin 12 levels should indicate that toxoplasmosis stimulates the mouse’s immune system, further experiments may reveal (as ours really did) that in this case the increased interleukin is the result of immunosupression rather than immunostimulation. In infected mice, the production of interleukin 12 is most likely increased because important populations of WBCs, which should react to the interleukin 12 and the presence of foreign antigens by proliferating and producing defense substances, are actually significantly suppressed and hence don’t react to these stimuli. So the “frustrated” regulatory WBCs produce more and more interleukin 12, all in vain (86).
So while biochemists are able to measure the characteristics of an enzyme, these characteristics may be completely different in the cell. Under cellular conditions, an enzyme which, according to biochemical tests, catalyzes a reaction in one direction, may actually catalyze it in the opposite direction, or also be the catalyst for a different reaction.
Box 76 Why are enzymes such lousy catalysts?
Because the cell doesn’t need better ones. But let’s start from the beginning. From school, most of us get the impression that a catalyst speeds up chemical reactions. Like much of what we take from school science, this impression is false. In reality, a catalyst doesn’t speed up a reaction (usually, it even slows the reaction), but only influences which of many possible reactions will actually happen. Without the catalyst, this reaction may have occurred so rarely, that we wouldn’t take notice of its product. Enzymes are good catalysts in the sense that can very specifically support certain reactions while suppressing many others. Yet they are also lousy catalysts in the sense that the reactions which they supposedly “speed up,” are usually unbelievably slow. Of course there are exceptions, such as the enzyme catalase which can enable the decay of a 10 molecule in one second, but a large number of enzymes enact one reaction per second. For example, the usual rate of an enzyme participating in energy metabolism is 100-1000 reactions per second, which is a snail’s pace for a chemical reaction (87). How come evolution didn’t develop more efficient enzymes? Probably because they wouldn’t help the organism. The speed of an enzymatic reaction is not limited by how quickly an enzyme “turns,” but by how quickly it receives molecular substrates. The enzyme generally receives these substrates by diffusion (or enzymes of the same metabolic pathway pass them to each other when coming into physical contact – but in that case the enzymes themselves must diffuse, or at least each enzymes must change position to transfer the product of its reaction to the next enzyme). Diffusion, in comparison to a chemical reaction, is unbearably slow, so it’s no wonder that evolution didn’t bother developing enzymes that acted more quickly. It’s more likely that evolution worked on speeding up or getting around the process of diffusion. In many cases, for transporting reactants, it probably replaced diffusion with electrophoresis or even isoelectric focusing (see Box 37 Does a cell conduct isoelectric focusing?).
In other words, the exactness of current molecular biology and biochemistry is only skin-deep. Even these fields are prey to the arbitrary interpretation of the researcher. The difference is that the evolutionary psychologist can’t buy a genetically identical group of human subjects from a specialized firm, a group of subjects who were exposed to the same conditions their entire life. If we’re studying humans, we must accept that we’re working with a genetically diverse species, whose individuals undergo unique and disparate experiences. That is why the same experimental or observational study conducted on the students of the Department of Natural Sciences in 1995, 2005 and 2010 can give us different results. The population changed over the years, and the different cohorts of students can react differently to the same factor – in this case, infection by Toxoplasma gondii.
The necessity to work with very diverse subject groups may complicate our research, but in my eyes, it also gives us a great advantage. Our results aren’t as easily reproducible, but they’re obtained in a natural system; so they can be more safely extrapolated to the general population than results obtained in more “exact” fields. If we used a company-produced, genetically identical (but obviously non-existent) group of humans, we could never be sure that the results obtained from these experiments were applicable to the general human population. And scientists are definitely more interested in how toxoplasmosis, or any other factor, influences the real, genetically and phenotypically diverse, human population, than in how it influences an individual with a specific (and possibly rare) genotype (combination of genes) and a unique life experience. With a group of genetically identical humans, we could easily prove (or disprove) the effect of toxoplasmosis much more easily than with a representative group of people, and our results would be simple to reproduce. But a large part of these results would apply only to that strain of humans, and we couldn’t extrapolate them to the general population (Box 77 Pseudoreplications, and why statisticians fear them like the plague).
Another peculiarity of working with people, which causes many methodological complications, is that our test subjects participate in our studies of their own will. If people don’t wish
Box 77 Pseudoreplications, and why statisticians fear them like the plague
If you measured the height of one Toxo positive and one Toxo negative man, and found that the Toxo positive were 3 inches taller, then you wouldn’t conclude that Toxoplasma has a positive effect on body height (at least I hope you wouldn’t; if I’m wrong, then I’m afraid that my previous chapters probably fell short). But if you measured a hundred Toxo negative and a hundred Toxo positive men (Where’d you find them! Could you give them some questionnaires? And do you happen to know their Rh factor? Sorry, I got a bit carried away...), then even if the average height of the two groups were different by only a third of an inch, then you could almost be certain that toxoplasmosis and height are related. It should be obvious that measuring 50 men cannot be replaced by measuring one man 50 times. But it’s less obvious that a group of identical octuplets in our subject group would cause a similar error. (True, it’s not a common problem among humans, but if we were studying armadillos, then it wouldn’t be unusual.) Eight identical siblings are genetically the same, so they do not represent eight independent observations – eight unrelated individuals randomly selected from the population. Including them in the test group would be almost like measuring the same person eight times. When planning any study, one must consider the risk of pseudoreplication.Let’s say that we wish to the compare the surface area of leaves from oaks found on the southern and northern sides of a hill. We cannot measure 100 leaves from an oak on the southern side, and 100 leaves from an oak on the northern sides; then using Student’s t test, compare the average area of a leafs for each tree. We wouldn’t actually be comparing 200 objects, as the computer would assume, but only 2 – we’d be comparing just two trees, so any differences between them could be attributed to chance. To correctly compare the tree leaves on each side of the hill, we would walk between 100 trees on the southern side, and 100 trees on the northern side – and measure one leaf at each tree. Using the Student’s t test, we’d compare the average area of a hundred leaves taken from each side of the hill – each leaf taken from an individual tree. We would take a similar approach to determine whether Toxo positives or negatives contributed more to the common pool during the experimental game Public goods(see xxx). We wouldn’t use all the sums which each Toxo positive student contributed to pool over 6 rounds of the game and compare it to those of Toxo negative students. First we’d calculate the average contribution of each student to the bank, and then compare the average contribution of all the Toxo positives to that of all the Toxo negatives using Student’s t test (there’s actually a better and more sensitive method, called GLMM, but this would do). If we didn’t worry about pseudoreplication and took the first, wrong approach, we might unwittingly choose an unusually generous or penny-pinching student for one of the groups, and his six aberrant contributions would skew the comparison with other group.
to participate, there’s no way to force them. This means we are never certain about how representative our subject group is – and to what extent their results are applicable to the population they came from.
The third peculiarity (and difficulty) of working with people, is that we often can’t study a problem using experiments. Instead, we must turn to observational studies – even when the problem better fits an experimental study. Let’s say we want to determine whether infection by Toxoplasma would cause lower tendency novelty-seeking in mice. To this end, we take two identical groups of mice, infect one of them with Toxoplasma, then wait and observe if the groups begin to exhibit differences in novelty seeking. If so, it’ll be obvious that the infection caused the change; as opposed to the possibility that differences in novelty seeking influenced which mouse was infected. But as I’ve mentioned several times (with thinly veiled regret) humans cannot be experimentally infected with Toxoplasma. We must use appropriate (and not always completely reliable) diagnostic techniques to separate individuals into those who were already naturally infected, and those not yet infected. Then we test their tendency towards novelty seeking and hope that individuals with an extremely high or extremely low tendency aren’t trying to hide their unusual characteristic, consciously or subconsciously – or at least, that they won’t succeed. And even if we find a difference between the Toxo positives and negatives, we can’t necessarily conclude that infection by Toxoplasma causes lower novelty-seeking. Lower novelty-seeking might increase the risk of catching Toxoplasma, or a third factor, such as the size of a person’s hometown, might influence both these factors.
Working with such a complicated population of test subjects, it’s understandable that we can’t rely on simple methods to evaluate our data. In molecular biology, researchers usually make do without statistics. They look at the results of an electrophoresis and determine, for example, that after the addition of soluble iron salt, the band which marks the location of a certain protein has been enhanced, grown fainter or disappeared entirely. A visible change in the levels of a certain protein on the electrophoretic gel is an unambiguous sign that needs no statistical evaluation. In molecular biology, we often need just natural intelligence to draw conclusions from our experiments. Fields like evolutionary psychology usually call for an additional, complicated step of statistical evaluation. We may be working with humans or animals caught in the wild, individuals who differ in age, gender and a number of other traits. Because this creates a very heterogeneous subject group, we must control for the effect of factors that may influence the trait we are studying (for example, reaction time) more than the factor that we study (e.g. toxoplasmosis), but are not the topic of our research. These confounding factors (e.g. age, health) must be filtered out, so that we can see the relationship between the studied factor (toxoplasmosis) and the dependent factor (subject reaction time). To minimize the effect of confounding variables, we must often employ complicated statistical techniques (see Box 78 How to deal with confounding variables). Such techniques are readily available today. Often, we even have a relatively easy-to-use computer programs that take on the complicated and time-consuming work. Of course, without knowledge of the statistical techniques, the operations we carry out with our original data may seem like New Age shamanism.
In reality, our complex world leaves us with little choice. Sophisticated methods are often the only way to understand complex phenomena and systems. This means that data analysis takes up a significant chunk of time. In many cases, collecting the data is much less difficult and time-consuming than it is to evaluate them.
Box 78 How to deal with confounding variables
Confounding variables, among other things, increase the variability of our results, and so lowers our chances of discovering an existing effect in our data. To stop this from happening, we cannot ignore them, but rather must deal with them appropriately. The most effective technique is elimination.We include only individuals who have the same values for all confounding variables (for example, 24 year-old men, from Prague, non-smokers). That gets rid of much of the variability that would have existed in our data, and increases our chances of discovering the effect of the studied factor. But we also risk that this specific group of people will not be affected by the studied factor. The factor might affect only older people, or smokers, or young women. And even if we find that the studied factor has an effect on the specific group, we won’t know that the effect applies to the general population – perhaps the effect is only true for our 24 year-old, male, Prague non-smokers. Another tactic is to block variables. If we need to block a confounding variable, we estimate the effect of the studied factor on the studied (dependent) variable using a paired test – for each pair, the individuals have the same value for the confounding variable (see Box 67 What is the difference between the paired and unpaired t tests, and why is the paired one better?).
If we have several confounding variables, the situation is much more difficult. In this case, we should at least try to ensure that the groups which we’re comparing don’t differ in the representation of values for each confounding variable. Let’s say we’re interested in the effect of toxoplasmosis on reaction time. We try to include the same percentage of: male smokers and non-smokers aged 21-25, 26-30...years; female smokers and non-smokers aged 21-25, 26-30...years etc., in the Toxo positive and the Toxo negative group. Blocking variables is an efficient approach in experimental studies, but less so in observational studies. Moreover, it’s clear that blocking a large number of confounding variables is not only difficult, but often impossible. Facing such a scenario, we should randomize the data in terms of the confounding variables. This means that individuals with any combination of confounding variables should be equally likely to be in the group exposed to the studied factor as in the control group. For example, it’s obvious that we shouldn’t expose women to the studied factor and use the men as the control group. The same principle means that we can’t give the people who came to the morning appointment the active substance, and give the placebo to those who came in the afternoon (see also Box 15 Popular mistakes when making a control group).The health of the individual might influence the time of day he arrives. Furthermore, we must ensure that both the test subjects and those administering the experiment do not influence what group each test subject is placed in. For example, it’s useful to flip a coin for each test subject to determine whether he goes in the experimental or control group. In observational studies, we are generally forced to the effect of the confounding variables. For every individual, we carefully take down the value of confounding variables, and then include them as other independent variables in our analysis. It does not harm to do this even if we’ve already blocked or randomize these variables.
And that brings me to another objection voiced against our research by colleagues who study so-called white biology (see pp. xxx) – the molecular biologists and biochemists. Often they complain that our laboratory does too little manual work. According to them, a student of biology should start pipetting solutions before (or rather instead of) breakfast, then switch to running from one complicated instrument to the other, harvesting cells and collecting them with a centrifuge, homogenizing tissue, sequencing or loading samples on chromatography columns – so in the evening, completely worn out, his other pair (their third this month) of scuffed-out lab sandals on his feet, he falls asleep on the living room couch, for his sore feet won’t carry him all the way to bed. In our lab, we spend most of the time sitting at the computer, entering in data from paper forms, checking that we entered it correctly; at best we might be conducting an analysis, but of course we’re never sure whether we understand it in detail, and whether it’s an analysis which really fits our data (fortunately, a random passerby can’t tell). Furthermore, the methods we use to collect data are not similar to the usual methods carried out in biological laboratories. We test many hypotheses using a questionnaire. A lot more of our studies are observational than experimental. I usually heard remonstrances against my work in connection to the complaint that my lab students did too little hands-on work and learned few methods. Apparently, it wouldn’t have been that bad if only I worked in this manner – but students in the college of natural sciences should learn modern methods, which they will use in future research. So the main objection against my research style was that I am in fact corrupting the youth. In the end, I preferred to go to another department. My less fortunate colleague Socrates, may he rest in peace, had it much worse.
Even in this case, I think that the complaint isn’t justified (which probably doesn’t surprise you).Today, science develops at such a rate that the instruments students used in their undergraduate or graduate work, will undoubtedly be different from the ones they’ll be working with when they graduate and begin their own research. And this applies not only to instruments, but to all experimental techniques. I remember, back in the day, when we’d walk reverently by the door of the laboratory, where they knew how to sequence DNA. Today we can fly our samples to be sequenced in Korea and get it cheaper than if we did it ourselves with the lab’s expensive machine. And the kind and clever Koreans (I hope it’s clear that I’m talking about South Korea here) will thank the person who sends them the most samples to be sequenced in a year, by mailing them a digital camera for Christmas. Today we pass reverently by the laboratories which can sequence an entire genome or proteome. I’m sure that in another couple of years none will bother with something so laughably routine, and will send genomes or protein mixtures they want sequenced straight to a specialized company in Korea or maybe China. And the amiable and skilled Chinese will give whoever sent them the most genomes, a digital ping-pong table with a chocolate fountain for Christmas.
In my opinion, our undergraduate and graduate students should leave our laboratories with the ability to set up a scientific study (be it observational or experimental) in order to answer a question; as the ability to analyze data, and particularly to interpret their results. This last thing is perhaps the most important – a student must learn to look at his results realistically, whether they agree with his hypothesis or not. In short, during his undergraduate or at least during his graduate work, he should learn to think scientifically and deal honestly with the data he obtains. I believe that the work carried out in workplaces like mine is much more apposite for teaching this, than the kind of work carried out in many labs focused on experimental science. In laboratories that are using the most sophisticated techniques of the day, students usually master one or two complicated techniques, learn to manipulate a certain machine or several ones. Often the measured data needs little evaluation; the student only has to believe his mentor and the producer of the machine, that measurements given really mean what the manual says they mean. In our laboratory, most students soon realize that data are very treacherous, and that their analysis must be handled with kid gloves. They discover (sometimes through bitter experience) all the problems that must be tackled when collecting data – all the things that can skew or completely invalidate results. From the perspective of training future scientists, the experience that students gain, for example, in our laboratory of evolutionary psychology is more valuable than that gained in a typical modernly equipped molecular biology lab. But frankly, the most valuable thing students can learn from a good laboratory (and such labs fortunately began popping up in the past 10 years in the college of natural sciences) – whether he spends his undergraduate years working with a single dilapidated machine, handling ten cutting-edge instruments, or studying in our lab, where the most complicated machine he’ll meet is a computer or a scanner to transfer questionnaires from paper onto the computer – the basic principle he should learn, is that anything he does in science, should be done honestly and thoroughly.