V. Casual contemplations about scientific luck, mistakes, and an experiment I am truly embarrassed of

So the start of our study on the effect of toxoplasmosis on human behavior looked something like this. In some things I started off with bad luck, in some things with good, but I must objectively conclude that I had more good luck than bad. I did a couple of clever things and a couple of foolish ones, but I guess that’s how it goes in anything, not just in science.

My great fortune was that when I returned to the faculty and began looking around for a fitting parasitological subject, I came across toxoplasmosis, and that this happened precisely in the period when I was musings over my own psychical characteristics – by that I mean those strange elements of behavior and feelings, which I was unable to explain to myself. When these two circumstances were put together, there resulted the idea that I was later able to develop.

But the greatest fortune came to me right at the beginning, when I was given room to grow. It wasn’t just me – when the communist regime collapsed unexpectedly, crumbling like a house of cards, many young people just stepping into science got the opportunity to begin researching completely independently. This usually doesn’t happen in an established system, where continuity of research teams exists. There a young person generally starts his career working for a long time on someone else’s projects – planning and picking his own projects usually begins only when he’s past the prime of his creative (and sometimes also physical) strength. People of my generation and often even younger were very lucky, that they lived in the Czech Republic (at that time still Czechoslovakia) and were starting their scientific career in the period when science was undergoing a resurrection, when new functional scientific teams were being built, often nearly from scratch. Most scientific teams under the communist regime were not functional, because they were led by entirely incompetent people whose primary and often only premise for carrying out a leadership function was being a member of the Communist Party. The competent people were usually “weeded out” of prestigious scientific institutions and definitely couldn’t lead their own team. Of course there were exceptions – in some cases a communist could even be a skilled scientist – but that happened only rarely. The leader of a research team was almost never chosen based on scientific ability, but generally according to his servility, his ability to brownnose and say out loud that black is white and two plus two makes five. All that, and a sufficiently thick skin, formed the main requisites for the person in question to carry out a career in any field, and hence also in science. By career I mean obtaining a leadership position, money for a study, a laboratory and so on; scientists who held informal prestige were often entirely other people. I do not claim that all this happened only then, and only in communist countries, but likely in hardly any other time and place did possessing professional qualities lower one’s chance of a successful scientific career. It was so in communist Czechoslovakia – the head of the team basically had no interest in producing quality research, and so when he was free to, and when he could distinguish it, he chose of two job applicants the one who was less capable, who posed less a threat of growing beyond him. Interestingly, for the very same reason he’d rather pick a nonmember than a member of the communist party for his team, unless he had a very high position in the party hierarchy himself, other comrades usually commanded him to take on a party member.

All this ended in the year 1990, or soon after. Communists were stripped of their long-standing leadership functions, a few individuals were even forced to leave the faculty, and new research teams began to form in workplaces. Only rarely was some continuity maintained. I don’t know how it was in other institutions, but in the biological section of the Faculty of Natural Sciences in the year 1989 there were only two or three functional research teams that were operating truly on a top level. I was very lucky that I grew up professionally in one of these teams. In the Laboratory of Physiology and Biochemistry of Parasitic Protozoa, informally led by Jaroslav Kulda and Jiří Čerkasov, I worked on my undergraduate thesis and later even my doctoral dissertation.

After the revolution, Jaroslav Kulda became the head of the department of parasitology. He took several young people, who he expected were professionally capable, into his department and gave them the opportunity to build their own teams and, above all, to devote themselves to topics, which they themselves chose. I was also given this freedom, and for this reason I was able to begin dedicating myself to the emerging field of manipulation hypothesis. That certainly would not have been possible if I had already been a part of a functional work team, which annually produces several high quality publications. There it is expected that each member of the team devote himself to projects already underway, which regularly bring publishable results – they won’t be allowed onto an uncharted field, where there exists a serious risk that nothing interesting will come of the study. And if something does come of it, then the first results suitable for publication will be obtained only after many years.

My third great fortune has to do with the risk I was just talking about. The probability that Toxoplasma influences human behavior, and that it has a strong enough effect that we could detect it using such simple methods as a psychological questionnaire, was in reality quite small. When I was starting up my project, I expected rather to prove no effect of toxoplasmosis on human behavior. The chance of “winning” was very low, but, on the other hand, the enormity of the prize seemed very great. The idea, that I’d be the first on earth to prove that the manipulation hypothesis doesn’t just apply to animals, but that a parasite can also influence the behavior of a human, was tempting enough to me that I took that risk and dedicated my time to this uncertain project. An important role in my decision was also that work on the project, in comparison to the usual work going on in the molecular biology laboratory, was much more exciting. And why not admit it: I always try to do in the laboratory first and foremost that, which I enjoy. Of course, I had it easy in that I enjoy whatever promises to bring interesting results (see Box 26 What is to be done?).

Box 26 What is to be done?

I don’t know what is contained in Chernyshevsky’s original What is to be done?, nor in Lenin’s treatise of the same name. Not that I’d wish to brag about it, or to even build my career on it (I’m a veteran opponent to communism – I fearlessly paid no attention in classes of Russian Literature and Marxism-Leninism!). In any case, after a meeting last year of the College Academic Senate, at which a certain formerly high-ranking comrade, now a properly elected member of the Academic Senate lectured me about morals – I even got the impression, that such bragging would bring me harm. The truth is that aside from the contents of the Marxist-Leninist treatises, I unfortunately also forgot most of what was poured into my head in high school and later in college. In any case, I am almost certain that the contents of my box are entirely different from the writings of my more famous predecessors. Here I’d like to deliberate the criteria, according to which it is appropriate to choose a topic for scientific research. Be warned, that this is my purely subjective opinion, which I definitely don’t wish to impose on anyone (although, in years to come, I will be diligently checking the knowledge of said opinion in exams of my course Practical methodology of science!). The main criterion, which I applied always and before anything else, was whether the research question interests me. Our life is too short and (intending no offense to Buddhists) only one, so it is the height of folly to waste it searching for answers to questions that don’t really interest us. Sure, it might be nice to tell yourself and others that we’re able to find the answer to the given question before our competitors, but competition can also be carried out solving interesting questions (and one also attracts more spectators). Secondly, I applied the criterion of the probability that I could solve the given problem. That was related to the difficulty of the project, as well as (first and foremost) to whether I had the necessary materials and prior knowledge to solve the given problem. I’d really be interesting in researching the validity of string theory, but when I see an integral (I’m ashamed to admit it, but even a derivate is enough), I cannot help but shudder. Nor do we momentarily have an appropriate particle accelerator in the department. The third criterion is practical. I don’t think that I exactly need to save or at least nourish a “suffering mankind,” but it would definitively bring me joy, if something useful came of my discovery. The fourth and, admittedly, at once the most important criterion, is that I must enjoy the research of the given topic. I definitely enjoy it more, when I’m inventing ways to find, using an ethological experiment, whether Toxoplasma-infected people are more suspicious, than when I was inventing or rather used trial and error to find how to best separate out virions from homogenized trichomonads cells.

The cleverest thing that I did lies in that I realized that testing my hypothesis from the aspect of methodology is easy, that it could be carried on material I had at my disposal – that is to say on patients we were testing in the department for toxoplasmosis, or on people, who we could ask to let themselves be examined for the purposes of the experiment. In addition, this was a low cost study. At the start of the 90‘s we had very little money and this was one of the cheapest studies, perhaps with the exception of the modeling, possible to carry out in the circumstances of the College of Natural Sciences (see Box 27 Why to model in science, and what can and can’t be modeled). In retrospect it became clear that the inexpensiveness of the study was something of a two-edged sword. On one hand it made easier our research, on the other hand it made it harder to publish the results of this research and discouraged established laboratories from verifying our study, and perhaps picking it up. I’ll explain right away.

The reason it’s difficult to publish the results of low cost studies is all in all obvious – expensive studies can’t be done by just anyone, so expensiveness is a certain guarantee of the seriousness of the study. Why today it’s hard to continue inexpensive studies is more difficult to explain. When one leads

Box 27 Why to model in science, and what can and can’t be modeled

In engineering we model primarily to replace a difficult or expensive study of the behavior of certain systems with a much cheap and easier study of the behavior of their models. Usually we try to make it so that the behavior of our model corresponds as closely as possible with the behavior of the modeled system, and we’re glad when we succeed in this. In science, however, we usually model for a completely different reason. With the help of a model we generally try to show what the mechanism of a certain process isn’t like, and so we’re happy when the behavior of the given model is substantially different from the behavior of the real system. The model is actually the realization of our hypothesis as to what the mechanism of a certain process is like – what elements the given system contains, and what relationships are applied between these elements. When our model behaves differently than we imaged, it means that our hypothesis likely wasn’t true, which means that we have to discard our hypothesis or more often modify it. We’ll embody the new hypothesis in another model, which we’ll again subject to our test. When we discover a model whose behavior corresponds very well with the behavior of the real system, meaning that we observe in it processes analogous to those in the real world, we should actually feel disappointed (usually we aren’t, because men are vain creatures and dislike their proposals being rejected). When the model’s behavior corresponds with that of the system, it signifies that we couldn‘t disprove the given hypothesis, and therefore that the hypothesis could be true. But we won’t be certain, because the same behavior can produce many very different models.

Modeling is a tremendously powerful tool. It enables us to quickly and cheaply refute (more accurately, to make improbable) a large number of hypotheses, and thus to concentrate on testing those more likely to succeed. Additionally, creating models allows us to specify and elaborate our often somewhat vague hypotheses. An intuitive visualization, that something should work like this and this, achieves a defined outline only after conversion into a mathematical or numerical model; and often we still have to alter it during the very creation of the model (frequently we must decide on very significant details, which we didn’t tackle during the verbal construction of our hypothesis). For example, in my book Frozen Evolution: Or, that’s not the way it is, Mr. Darwin, I expressed the hypothesis that sexually reproducing species with an inability to fit the model of natural selection may have a paradoxical advantage in quickly changing environments. Evolutionarily plastic asexually-reproducing species are in danger of opportunistically adapting to a short-term variation in environmental conditions, and after a return of the conditions to normal, they won’t manage to adapt to the original conditions quickly enough, thus dying out. Precisely this could be the reason why most known species reproduce sexually. Only during the modeling, which was conducted by Petr Ponížil, did it become apparent that I must decide whether plastic and non-plastic species compete directly for a common resource, or whether they just coexist independently. The results consequently showed that apparently only in the second case could the evolutionarily non-plastic species win over the plastic species. They also showed that it is more advantageous for the success of the non-plastic species when the changes in environment are cyclical (this I did not expect); and that jump changes are more favorable than continuous changes for the success of the non-plastic species (this I expected, but it pleased me anyway).

We can model any process (phenomenon, property). One must realize, however, that we never model an object or system, but always only the certain specific behavior of the given object or system. The command “Model the watch!” is nonsensical – it isn’t clear, whether we’re supposed to model how the watch will behave after a crash from a speed of 200 km/h into a brick wall, or after being placed into 70% sulfuric acid, or whether we just want to know why its hands tick regularly. All of the mentioned processes would require completely different models. Technically, we can study the behavior of mathematical models, either analytical or numerical or perhaps even mechanical models (for example in aerodynamic or hydrodynamic testing). In science the mathematical models are most applicable, and for a biologist, whose qualification prerequisites almost inherently include mathematical illiteracy, numerical modeling it is useful above all. Modern computers and modern programming resources are immensely powerful and allow even modeling amateurs to model. Nevertheless, to this end I’d recommend to taking up cooperation with a professional. Modeling on your own harbors one serious hazard – it’s addictive.

a research team, he needs money not only for the material necessary for the experiment and the general running of the laboratory, but also to pay the salaries (rather stipends, in our case) of his coworkers. Today this money is raised by asking for scientific grants. The researcher will write out a project: “I want to study such and such a topic, which could make such and such a finding, and I need this amount of money to carry it out.” And in the instance that we’re talking about an inexpensive study, he can’t write that he needs a couple of cents for postage, writing paper and envelopes, in addition to half of million for the salary and health insurance of his coworkers. To put it more accurately, he can write it, but the evaluators will see the budget as dubiously imbalanced and reject the project. So it is more favorable to enter the grant arena with a more expensive project, which requires a costly equipment and chemicals, because then I need half a million for the actual material and equipment and a similar amount for the pay of coworkers, lab technicians and students. And that is enough to ensure the functioning of a smaller-sized laboratory. Since our project didn’t require anything aside from “pencils and paper,” which today actually means a computer and some statistical software, for many years no one else followed up on our experiments. Colleagues from abroad assured me that our results are interesting and that they understand the necessity of someone else repeating our study. But for a long time no one else tested it, and most likely for the reasons I just mentioned.

To a certain extent, that was an advantage. Thanks to it we had enough time at our disposal, that we could discover most of the interesting things on our own. If several other research teams were simultaneously working on the same project, then we probably wouldn’t have discovered as much; in a best case scenario we’d often come out in second place. Not because many of the world’s top laboratories have better equipment and more money for research – the problem, as I already mentioned, wasn’t in that. The problem lies in something else. Established laboratories in top world universities, which Charles University, unfortunately, is long not a part of, can publish their results more easily. So even in the case that we discovered something before our competitors, it is very likely that the researchers of another laboratory would overtake us in publishing the results of their study, which might not be as thorough, but bore the glorious name of a famous institution.

If a certain university or even laboratory has produced a number of quality studies, then the editors and reviewers look upon its manuscripts much more favorably, than when similar results are sent in by a researcher from a practically unheard of laboratory and, what’s more, from a university which, after the forty years that the Communists cut us off from the world, certainly isn’t among the world’s top (see Box 28 How to measure the quality of science).

Box 28 How to measure the quality of science

It’s hard. The value of a scientific discovery is usually seen only in retrospect, and just for an individual discovery it’s often difficult to determine who contributed to what extent. The quality of science produced by certain researchers, by a certain research team or institution, can be weighed based on the examination of individual scientific papers, the response to these papers and the scientific reputation of the researchers (according to awards received, invitations to be a plenary speaker in conferences, and so on). But such individual evaluation is pretty demanding and in the conditions of a medium-sized country would require the formation of an international team of evaluators. On the other hand, this approach would be hardly objective. For example, in our small Czech pond we all know each other too well, and personal ties would likely have a greater effect on the evaluation than did the objective quality of the study. In most cases we must therefore rely on auxiliary, indirect criteria, which we take to at least approximately reflect the quality and quantity of the research. A simple criterion is the number of publications, which the evaluated scientist (team) produced in a certain period. Another indirect measure reflecting the quality of research is the number of references of the article of the given author in the articles of other authors. This value reflects, above all, the author’s age (this can be avoided by counting only the references of articles published in a particular time period); as well as the extent of his social network (this cannot be helped, but it is questionable, whether it would even be desirable); and finally the significance of the published papers for other authors – nobody would reference unimportant works (unfortunately, they would reference erroneous ones, albeit in a negative context, but we wouldn’t differentiate this during the evaluation). When evaluating the number of publications it is possible and fitting to take into account also the quality of the journals in which the papers were published. A simple criterion of the quality of a journal is its JIF, or Journal Impact Factor, the impact which is the average number of references to one article within its first two years of publication. Of course, in individual fields there are different procedures for referring resources; so basically one can’t compare the quality of journals nor certainly of researchers from different fields; but that isn’t necessary very often. Today the most popular indirect measure of the quality of a researcher is likely the Hirsch index (h-index). We obtain it by ordering the papers a scientist published in a given time period, from most referenced to least referenced. We determine the h-index as the number of papers h cited at least h-times. An h-index of 20 was achieved by a researcher who, in a given time period, published 20 papers, each of which was referenced at least twenty times. An advantage of the Hirsh index is that it is little affected when an author references his own publications in his other papers (auto-reference) or when, among many low quality papers, he somehow manages to publish one high quality, or rather highly referenced, paper. Nevertheless, a highly referenced and scientifically valuable paper doesn’t have to be one and the same. For example, my most referenced article is a study describing a computer program for molecular phylogenetics. The program is definitely useful – for this reason others use it and reference the article – but the scientific value of the article is relatively low. Indirect measures for evaluating science, offered to us by today’s scientometrics (a field which pursues the measuring of scientific quality), are imperfect, but we have no better ones, so thank goodness we have at least these.

And the mistakes that I made? For example, that twenty years ago, because I wasn’t very skillful in statistics, I overlooked an interesting result regarding my ten-question toxo-questionnaire. As a consequence of this stupid mistake, for fifteen years I didn’t know that the clearest proof of the effect of toxoplasmosis on the human psyche was brought by just the first study – the one which came out of my own introspection. (I probably shouldn’t mention this too much. When a former PhD student of mine recently found out about my school-boyish mistake, I think he was considerably wounded.) Maybe it’s just as well, that I don’t know how many similar, hitherto undetected mistakes I’ve made during my scientific career.

What I’d really ashamed of is my first ethological study on animals, which I conducted sometime in the first half of the 90’s. I attempted to verify whether infected animals have a greater tendency to give up a yet undecided fight. I asked this directly of people in my toxo-questionnaire, and, as it turns out, toxo-positive men truly reported this tendency. I had to ask the question of the animals using an experiment. I conducted my experiment on bank vole (Myodes glareolus, a larger-sized vole often found in the forest), captured on the Ruda research field station. From previous studies we knew that about half of the bank voles on this field station were infected by the protozoan Frenkelia, a parasite which subsequently must get to its final host, a bird of prey. Frenkelia is related to Toxoplasma, but, unlike Toxoplasma, its cysts can be recognized on brain tissue slides prepared using the squash technique. To find out whether the vole is infected, we need only two slides, between which we gradually squash the brain of the freshly killed animal and look under the microscope for round cysts. Therefore we don’t need to conduct any laboratory tests, which in the case of some wild animal species may have a lower specificity and sensitivity (see Box 7 The specificity and sensitivity of a diagnostic test).

I brought about 15 voles from the Ruda field station. Around my laboratory I laid out fifteen aquariums filled with water, placed a vole in each one, and watched how long they would swim before giving up fighting for their life, and sink their head into the water. At that moment I planned to take them out of the water, kill them and examine the presence of cysts in the brain. The reason I am so ashamed of this experiment today isn’t even so much that I harmed the voles – that I let them swim to utter exhaustion, though that bothered me greatly about the experiment. Yet if exhaustion was the object of my study, then I didn’t have much of a choice. I am most ashamed that I prepared my experiment badly, and so tormented the animals needlessly (see Box 29 Planning and preparing experiments and Box 30 Animal testing).

Box 29 Planning and preparing experiments

The most important part of a research project is its preparation. One must first resolve what he actually wants to determine during his study, or what question he wants to answer. We’re dealing with perhaps the most important phase of project preparation, during which the researcher must decide whether the given question is even worth being pursued, whether there isn’t already a known answer and whether he is the right person to pursue it. Always in this phase it’s necessary to conduct a thorough literature review, to find what’s already been written about the given topic. In the next preparatory phase one must choose an appropriate method for answering the given question and plan out the project, making clear when and where the study will be carried out, as well as who will work on it and how. The project must address not only how the data will collected, but also how they will subsequently be pre-processed (what will be done, for example, about missing or outlying values) and how the pre-processed data will be assessed. It is an unforgivable error, which can definitely come back to bite you, if during the project preparation you don’t decide ahead of time how specifically you’ll evaluate the data. For it can easily happen that we neglect to record some important information and finally won’t be able to correctly evaluate the data. Next is the phase of the project’s technical preparation, which involves procuring all the necessary material, learning to work with the equipment and preparing the protocols. The protocols contain tables used to record data during the experiment, and are prepared ahead of time by the researcher; afterwards he records the actual progress of the research in his laboratory notebook, whether it’s a paper or an electronic lab notebook, e.g. an Excel or Open Office spreadsheet. The latter is easier to navigate, namely in the case that one is working on multiple projects – using an automatic filter he can easily control all the recorded data related to a particular project. For a more complicated project it is often appropriate to first carry out a pilot study and test whether it’s possible to carry out the project as planned originally, or whether it’s necessary to modify or entirely scrap our plans – as would happen, for example, if we found using a study power analysis that the studied effect is so weak that to prove it we’d require too many experimental animals. What follows is carrying out the actual study, pre-evaluating and controlling the obtained data and finally evaluating the data. For more complicated studies the pre-evaluation and evaluation of the data often involves as much or even more work than taken up by the experiment itself. Usually it’s also the most challenging phase for the expertise of the respective scientists.

Above all, I could have realized beforehand that fifteen voles is not enough to detect a difference between the infected and uninfected individuals. Considering that it was a group of animals of various ages and both genders, there was definitely a large variability in the studied value, i.e. the amount of time they tried to swim. Therefore the probability that I would successfully prove using fifteen animals a statistically significant difference between the infected and uninfected individuals was very low; even in the optimal case that this difference was very significant, and that half of my sample of animals was infected and the other half uninfected. Such an optimal case could not be expected; it was much more likely that in a small sample of fifteen animals there were, for example, five infected and ten uninfected individuals, or vice versa.

Another problem was that I underestimated the preparation of the experiment. That became clear, for example, in that the animals in aquariums closer to the door gave up their struggle before the animals nearer the center of the room. Apparently there was some sort of gradient in that room, most likely a temperature gradient, and as a result the voles who gave up first were those in aquariums with the worst conditions for survival – where, for example, the water was coldest.

The last problem was that for subjective and objective reasons I was unable to remain focused during the entirety of the experiment; so some of the voles drowned, and in some cases I didn’t even record the exact time that a certain individual gave up his struggle. It would have been better for two observes to switch off every half hour during the experiment.

I originally intended the experiment as a pilot study to verify whether such means could be used to find if infected individuals give up more easily. In this aspect the study was actually successful, for it gave me a fairly unambiguous answer. Technically, after certain modifications, the given topic could be addressed using this method. However, I would have to use a more genetically and physiologically homogenous group of infected and control animals, and carry out the experiment under better controlled conditions – for example, in thermostat-equipped water baths. Moreover, the course of this experiment convinced me that I don’t have the “proper” disposition for this kind of study, and that it would be better that similar experiments never again be performed in my laboratory. That promise I was able to keep, so perhaps those fifteen voles didn’t die entirely in vain (Box 30 Animal Testing).

Box 30 Animal Testing

Unfortunately, biomedical research can never entirely do without animal experiments. Moreover it often cannot even do without testing on humans. If the effectiveness of a new drug or medical procedure is to be verified, it is clear that eventually there must come the phase of testing on human volunteers. Some animal testing can be replaced by testing on tissue cultures (if we want to know how a certain substance or physical factor influences the viability of individual cells). In most cases we have to conduct the experiment on animals in the end anyway, because there exist many factors which have an entirely different effect on the functioning of cells and the functioning of multicellular organisms. In the same way, it is not possible to replace an animal experiment with a mathematical model. The model can only serve a didactical purpose – using the model we can “painlessly,” inexpensively and quickly demonstrate to students the course of certain processes. The model can also sometimes reveal that conducting a certain experiment is useless, that the pursued effect is so weak that we basically cannot prove it in the laboratory. In some cases it is possible to replace an experiment with observation in the wild. There it’s not necessary to infect reindeer with Echinococcus tapeworms and observe whether the infected individuals become the prey of wolves; it’s enough, using a suitable serological technique, to find what percent of the reindeer in a normal population and what percent of reindeer that become wolf prey are infected. During animal testing, even with the best intent, we cannot avoid more or less harming them. That we’re doing it in the interest of science or mankind, and that it was approved by the respective Institutional for Animal Care and Use Committee, may be an extenuating circumstance, but it doesn‘t diminish the guilt. As my colleague Jan Zrzavý says, science cannot do without animal testing, but those who do that testing, should perhaps know that they’ll finish in the fiery pits of hell. Before any experiment, one must deeply consider whether its possible outcome is important enough to justify our carrying it out; whether it really enables us to obtain the expected results; whether there exists another way to obtain them; what is the smallest number of animals we must include; and how to minimalize the animals’ suffering.

Frozen Evolution. Or, that’s not the way it is, Mr. Darwin. A Farewell to Selfish Gene.