II. How I accidentally discovered that Toxoplasma changes human personality

I got very lucky in selecting my research method. And in truth, I basically selected the first method at hand. At the time, we had practically no equipment for studying animal behavior. But we did have a constant inflow of human test subjects – female patients got the toxoplasmosis test administered one day, and returned two days later for us to check the test results. And I told myself that I could simply hand them a questionnaire, in which I would ask about those strange behavior patterns I’d observed in myself. Then I’d compare the answers of the women who turned out to be Toxo positive (infected), with those of the Toxo negative (uninfected) women, and see if there was a significant difference. I prepared a set of maybe ten questions that I think I would’ve answered differently before and after my infection. Besides questions about an unwillingness to defend oneself against a swindler, I included questions about behavior I thought could prove advantageous to a parasite. For example, I asked how quickly and in what way the answer reacted to immediate danger; whether they startle easily and jump away quickly, or rather remain calm, reacting slowly or not at all. I had noticed that now I was rarely startled. If I heard a rustling above my head, instead of jumping to the side, I was more likely to look up for the source of the sound. This sort of behavior could be advantageous for a parasite trying to make its way into a felid’s stomach. Felids usually ambush their prey, taking advantage of the moment of surprise. If, in this moment, the prey acts effectively and quickly enough, then the felid will miss its meal. But if the prey hesitates for any sort of deliberation, instead of reacting reflexively, then it will find itself in the felid’s jaws.

By the way, there are some situations of immediate danger in which an unnaturally cold-blooded reaction can actually be an advantage. For example, I was surprised by my own emotionless and entirely rational reaction to a machine-gun attack by Kurdish units on a small town near Diyarbakir in eastern Turkey. With a group of four students, we were staying the night in a truck bed among bags of cement (that evening, our Kurdish hosts had emphasized that this night we really could not sleep on the flat roof of the house, as we were accustomed to in Turkey), and for about 20 minutes, machine gun bullets blazed a few meters above our heads. I checked that everyone was huddled between the spaces between the bags, safe from ricocheting bullets, and then fell to contemplating what the angry Turkish soldiers would do after the attack. Fortunately, they were content with simply blowing out the window panes of all the Kurdish shops in the town and left our cement truck in peace.

Another question asked the person how startled, frightened, repulsed, or uncomfortable he is when he sees a spider or a snake. At the time, I created the questions according to intuition – I’m not exactly sure why I included this one. It might have been related to situation I heard about from one of my colleagues from University, Jan Buchar, when I was recruiting him as a guinea pig for this testing, and told him about my hypothesis. He told me that, in the wild, he himself had seen a frog hop straight into the jaws of a grass snake. And all the while, the frog was wailing in fear; nevertheless, it hopped all the way to the snake, who summarily snapped it up. So the ability of a snake of to hypnotize its prey could be more than a myth. Perhaps the snake is aided in his hunt by the parasite waiting in the brain of its prey – the parasite that needs to get into the snake.

Forther question asked the questionee whether he would fight to the end if physically attacked. Twice when I found myself in such a situation (thankfully, I haven’t been physically attacked more times), I prematurely gave up the fight. And this was despite the fact that I had done karate a number of years after college, and so it wasn’t a technical problem for me to incapacitate my attacker. But for reasons unknown, it was a problem psychologically.

Some of my questions sounded pretty strange, and if they’d been posed individually, then no doubt the questionee would wonder what on earth I was actually trying to figure out. For this reason, it judged it better to disguise my ten questions among a larger number of inquires, preferably taken from a standard psychological questionnaire. Entirely by chance, I selected Cattell’s 16-factor personality questionnaire, because the mother of one of our students was working with it at the time. I copied the questionnaire onto the computer, mixed my ten questions into its 187, and then handed it out to patients at the screening. But soon it because apparent that this method wasn’t very effective. Although we got three to five patients every week, only some of them were willing to dedicate an hour to answering nearly 200 questions. Many of the patients were either in a rush, or had no sympathy for my science, so the data accumulated slowly. I decided to focus my efforts on my colleagues. During each week of testing, I dashed from lab to lab, asking all the scientists and students I find whether they could volunteer. I explained to them the concept behind the testing, which stirred their interest, and about 80 to 90% took part in the testing. I also recruited friends and acquaintances, and the friends and acquaintances of my friends and acquaintances. So every Tuesday I made my rounds, convincing employees and students to get themselves tested for toxoplasmosis, and then complete that obnoxious, 200-question questionnaire. Over the course of six months, I gathered data from about 200 people. In the end, I didn’t include the data from those couple dozen patients that I tested at the very beginning, and in my statistical evaluation of the testing, used only the data from the University students and employees, in order to keep the study population as homogeneous as possible (see Box 10 Statistical evaluation of data).

Of course, I was most interested in the answers to my ten questions; but I incorporated those from Cattell’s questionnaire into the analysis as well. This questionnaire primarily explores 16 psychological factors, such as Sociability, Warmth, Emotional

Box 10 Statistical evaluation of data

Statistics is a set of methods, which allows us to detect relationships in imperfect data that have been “sullied” by the effects of chance. In the real world, random factors affect almost all the data a scientist works with. Exploratory statistical methods, such as factor, cluster, and discrimination analysis, allow us to reveal relations partially hidden by the effects of various (in the present context) random factors – i.e., factors other than the one(s) we are currently studying. Confirmatory statistical methods allow us to estimate the probability that the null hypothesis is true – in other words, the probability that the observed phenomenon, such as an average greater body mass measured for 30 Toxo positive people than for 30 Toxo negatives, is only due to chance. If we create two entirely random groups – for example, by tossing a coin for each person – and then measure the mass body of the individuals, then the average body mass of the two groups will undoubtedly be different. If the two groups are very large, and their members are truly selected by chance, then the difference between the group averages should be small. But if either of the groups is small, or was affected by an outside factor (for example, if we weighed the people in one group first, and in the meantime, the other group of people lost weight – for example, some of them simply went to the bathroom), then the difference in average body mass can be quite large. Based on whether or not the observed difference is large for the given group sizes and the given variability in the measured variable– that is, whether or not an outside factor influenced played a role – we select the appropriate statistical method. In this case, it would be a t-test (Student’s t-test). The result of statistical data analysis is the P value, which reflects the probability that the observed phenomenon – in the example above, this is the difference in the average body mass of the two groups – is just due to chance. If this P value is lower than 0.05, or 5%, then the difference is statistically significant. This means that the probability of the null hypothesis is so low that we have the right to lean towards the opposite conclusion: that the observed difference is caused by not just by chance, but also by an outside factor. (In the framework of classical statistics, the P value is not the probability that the null hypothesis is true. Rather, it is the probability of obtaining the results you did (or more extreme) even if the null hypothesis applies – in other words, getting “false positive data.” But don’t worry about this – even among researchers who regularly use statistics in their work, only a few understand this difference (11).)

stability, Dominance, and Intelligence. I expected that none of these factors would be related to toxoplasmosis, though some of the 200 questions in Cattell’s questionnaire could be. They might ask something similar to what my question did; or they might ask about something that hadn’t occurred to me, but could be related to Toxoplasma infection. But the situation was a bit risky, because two hundred questions meant two hundred statistical tests, and when analyzing so many tests, one must use a correction for multiple testing (see Box 11 Bonferroni correction for multiple testing). Without this correction, we risk false positive results; but with it, we risk false negative ones.

Box 11 Bonferroni correction for multiple testing

When we pose 200 different questions to students divided randomly into two groups, then the average answer of the two groups will have a statistically significant difference for every twentieth question. This is because of the way statistical tests work. A statistical test primarily gives us the P value, which reflects the probability that an observed phenomenon, such as the difference in the average answer of infected versus uninfected people to a certain question, is only due to chance (see also Box 10 Statistical evaluation of data). But if we are asking whether the two groups differ in their average answer to not one specific question, but to at least one of any 200 questions, then the probability that the difference is only due to chance is much greater than what we are given by the statistical test. Whenever I’m explaining this concept to students, I use this analogy: When I toss a piece of chalk into a trash can across the room, the chances that I’ll make it in are about 1%. Whereas if I throw 200 pieces of chalk, then the chances that at least of these will make it in are 200 times greater – meaning that I most probably get two piece of chalk in. So if we’re conducting several unrelated statistical tests, we must use the Bonferroni correction, which involves multiplied the obtained P value by the number of tests. A more precise (and not as strict) corrective method for multivariable tests substitutes the P value into the following formula: P´ = 1 – (1 – P)ⁿ. But after applying the Bonferroni correction, many originally statistically significant results are no longer statistically significant; many of them rightfully, but other unrightfully so. This means that we risk overlooking an interesting result. But of course, the whole matter is a bit more complicated (see also Box 85 When and when not to use a Bonferroni correction). Today, the statistician usually tries to replace a set of several individual tests with a single test that tries out all the hypotheses simultaneously. So if we’re comparing the average temperature in five localities, we don’t have to use ten separate tests to compare the temperature of all ten pairs of localities, and then fix the resulting P values using the Bonferroni correction (by multiplying them by ten). Instead we use the ANOVA (analysis of variance), and only if the ANOVA gives us a statistically significant result will we check each pair of localities for a difference. This is a better approach, because it involves a much lower risk of false negative results. Therefore, today multiple tests followed by a Bonferroni correction should only be conducted if we don’t have the appropriate test for several independent variables (e.g. multiple regression or ANOVA) available – for example, if we have data whose analysis requires a nonparametric test (see Box 25 Parametric and nonparametric tests).

To my pleasant surprise, the questionnaire study was successful, although a bit differently than I expected. For some of my ten questions, there was a difference in the answers of infected and uninfected people, but these differences weren’t very big, and weren’t statistically significant after a Bonferroni correction. But much more interesting were the results regarding Cattell’s psychological questionnaire, which I had only included to keep the test subjects from suspecting the study’s true purpose. When looking at the individual questions, none of the differences were statistically significant – and definitely not after the Bonferroni correction – so there was nothing interesting in that aspect. But out of those 16 psychological factors determined by Cattell’s questionnaire, several were different in infected versus uninfected men, and one of these was statistically significant even after the Bonferroni correction. In women, differences in the same factors tended to be flipped (for Toxo positives versus Toxo negatives), and weren’t statistically significant. However, differences in two other factors were almost statistically significant. In reality, the differences may have missed statistical significance because there were a lot fewer women than men in our test group. Back then, substantially more men attended the department of natural sciences than women; over time, the ratio reversed, and today we have almost three times as many female as male students.

Toxo positive and negative men differed most in the psychological factor L (Protension), or suspiciousness (Toxo positive men were more suspicious); and then in factor G (Superego strength), which tells us how willing people are to follow social norms. Toxo positive men had significantly lower Superego strength than did Toxo negative (uninfected) men. So one can say that my Toxo positive colleagues and students were more suspicious and less willing to respect social norms (Fig. 8). A weaker, and not statistically significant effect of toxoplasmosis manifested in a lowered factor A (Affectothymia) (sociability, warmth, and openness), as well as in factor Q₃- Self sentiment integration (they have less self-control). In contrast, infected women had a slightly greater willingness to respect social norms (factor G), higher affectothymia (factor A), and slightly greater intelligence (factor B).

I happily presented my results at a conference of Czech and Slovak protozoologists, and, as could be expected, it brought some much-needed excitement. By that I mean that most of those present welcomed the diversion in an otherwise fairly boring program; I sincerely doubt that they believed our results. I myself wasn’t really sure what I thought about the data. If there had been unambiguous differences between the infected and uninfected subjects in their average answers to my ten questions, I would’ve said: yes, I clearly confirmed my hypothesis; Toxoplasma probably manipulates the behavior of

Fig. 8 A bar graph demonstrating the differences between the Toxo positive versus negative male students and teachers of our college, regarding Cattell’s factor G (Rule-conscientiousness), which measures Super Ego strength e.g. the tendency to follow social norms, and factor L (Suspiciousness). Since older people are more likely to be Toxo positive, the difference between Toxo negatives (white columns) and positives (gray columns) could be result of their age. Therefore, we had to statistically filter out the effect of age by including age in the analysis as a covariant (as an “unimportant” variable that influences the studied variable but is outside our interest), or by analyzing individual age groups separately (see x-axis). The graph shows that the effect of toxoplasmosis is statistically significant in all analyzed age groups.

its host. But I had no clue how Suspiciousness or Superego strength could be related to toxoplasmosis. I think that my presentation, called “Show me your parasites and I’ll tell you who you are,” concluded that the observed differences between infected and uninfected men might be a side-effect of Toxoplasma’s manipulatory activity. The parasite tries to do the same thing in man that works in mice, and it manifests in this bizarre manner. But I myself wasn’t too sure about this conclusion. I knew only too well that when one obtains and unexpected positive result in one study, then, regardless of statistical significance, he must be very careful in his conclusions – and certainly verify the result with new, unrelated data (see Box 12 Why to be wary of unexpected results).

Box 12 Why to be wary of unexpected results

We could think of, and we have no way of knowing how many we tested in our minds when looking at the data. Yes, you should carefully examine the data, because it might contain unexpected results which could be much more interesting than the original reason for the study. (In my opinion, the ability to notice such details it what sets apart a good research

Unexpected results are treacherous, especially when one can’t come up with a convincing explanation after obtaining them. If, before starting the study, we decide to look whether infected versus uninfected people one average differ in any of the 16 psychological factors, then we can use the Bonferroni correction to determine the probability that the observed difference in that factor is due to chance as opposed to toxoplasmosis (see Box 10 Bonferroni correction for multiple testing and Box 85 When and when not to use a Bonferroni correction). But if we first collect the data and discover a relationship during the following analysis – for example, that all the people whose last name- starts with a vowel differ in a certain factor from those whose last name starts with a consonant – then we certainly need to confirm the result on a different subject group before publishing it. And that is regardless of how statistically significant the correlation is. There are infinitely many nonsense hypotheses that we could test in our data; therefore it’s not surprising that one of these would be verified by the statistical test. As I already mentioned in relation to the Bonferroni correction, the obtained P value, which indirectly reflects the probability that a certain phenomenon is only the result of chance is only correct when you are checking a single test. If we’re examining data after the experiment, then we are unwittingly or even wittingly testing an enormous number (of mostly nonsensical) hypotheses. If something in the data catches our interest, to the extent that we decide to formally test the existence of the observed phenomenon (for example, the relationship between the last letter of one’s first name, last name, name of residence and the psyche, body height, blood group, or the sum of the digits in one’s birth-date, etc., and the test gives us a highly statistically significant result, then it doesn’t really mean anything (Fig. 9). And no Bonferroni correction can help us – there are many nonsensical hypotheses we could think of, and we have no way of knowing how many we tested in our minds when looking at the data. Yes, you should carefully examine the data, because it might contain unexpected results which could be much more interesting than the original reason for the study. (In my opinion, the ability to notice such details it what sets apart a good researcher.) But unexpected results should be approached very carefully, and always verified in an independent study.

Fig. 9 An example of an unexpected, and therefore probably nonsense correlation revealed subsequently in the data. The X-Y scatter plot graphs the number of errors made on a written exam on evolutionary biology according to the alphabetical order of student’s last name. The relationship between this order and test result might not be as nonsensical as it seems at first glance. Many elementary and middle school teachers call out students according their alphabetical order in the gradebook, so students at the beginning of the alphabet are often questioned more frequently. Annie Adams is therefore (maybe) better trained to prepare for examination than Zachary Zuko. Unfortunately, I was not able to reproduce this result in the following years, so it might really be due to chance.

So why did our results seem suspicious to me, and why, when starting the study, did I not expect that any of the 16 Cattell’s factors could be influenced by toxoplasmosis? The thing is, most of Cattell’s psychological factors are primarily influenced by a someone’s personal value system, rather than his natural, spontaneous tendencies. This, at least in my opinion, is what distinguishes Cattell’s questionnaire from others like Cloninger’s 7-factor Temperament and Character Inventory (TCI) questionnaire, which we began using in later years. Cloninger’s factors, such as novelty seeking, reward dependence, and harm avoidance, can be (and apparently are) related to the concentration of a certain neurotransmitter (see Box 13 Neurotransmitters and Box 56 What does Cloninger’s TCI measure?). So by changing the concentration of the neurotransmitter, the parasite can easily affect these factors. In contrast, Cattell’s factors, such as Sociability, Superego strength, and Suspiciousness, are related to the set-up of one’s inner values. For example, if someone thinks that it’s good not to obey social norms, then he likely won’t obey them, and the Cattell’s questionnaire will show his low Superego Strength. If someone has had his trust broken several times in his life, then he’s probably suspicious; but if it hasn’t happened to him yet (in which case he must have shipwrecked in time on a deserted island), then he’s more trusting and Cattell’s questionnaire will show his low Protension. Since one’s system of values is established early on in life, and isn’t easy to change, it seemed to me quite unlikely that Cattell’s personality factors could be significantly affected by toxoplasmosis.

Box 13 Neurotransmitters

The cells of the nervous system communicate with each other and other cells through electrical and chemical signals. The chemical substances that transfer information between nervous cells are called neurotransmitters. There exist a number of neurotransmitters, which differ not only in molecular structure, but also in the location of their synthesis, their effect, and their biological function. Some neurotransmitters act only in area in which they are secreting, binding to the membrane receptors of the surrounding cells. The leftover neurotransmitters are usually reabsorbed into the original cell, to be reused, or are digested by specialized enzymes. But many neurotransmitters act at a greater distance. In this case, they remain in the nervous tissue for a longer period of time at a fairly high concentration. As a result of the higher concentration of neurotransmitters, the number of corresponding receptors on surrounding cells gradually changes (usually by decreasing). The levels of individual neurotransmitters, as well as the number and type of the corresponding receptors, affect how an individual reacts to a certain type of stimulus, or how enthusiastically he seeks it out (see also Box 56 What does Cloninger’s TCI measure?). Temporary differences in neurotransmitter and receptor concentration are reflected in mood swings; long-term differences can explain the variety of human temperaments. Aside from the stimuli he encounters in his lifetime (i.e., experiences), a person’s temperament also depends on genetic predispositions. For example, it’s known that individuals have different variants of receptors for the neurotransmitter dopamine. People with a certain variant exhibit a much greater probability of becoming drug addicts and risk-takers. From the perspective of a population and the entire species, it’s clearly advantageous for individuals to have different temperaments, and therefore give precedence to different activities, because it facilitates division of both labor and resources. Studies conducted on birds also reveal genetically-determined differences in temperament.

After nature exposed my error, and showed that a number of Cattell’s factors – including some that I definitely wouldn’t have expected – were affected by Toxo infection, I began casting around for a possible mechanism. Finally, I reached the conclusion that there is an explanation, but it requires a fundamentally reevaluation of the generally accepted relationship between one’s systems of values and one’s behavior. Psychologists usually believe that the reason people act in agreement with their system of values is because they modify their behavior to fit their values. Based on our results from Toxoplasma studies, today I picture the relationship as being the other way around. Each individual systematically – even if subconsciously – observes himself, noting how he reacts in various situations. And so that his behavior doesn’t conflict with his values system, he gradually adjusts not his behavior (for this would achieve only short-term harmony), but rather his system of values. So today I picture the relationship between Cattell’s personality factors and toxoplasmosis as follows: a person has some system of values, a large part of which was constructed during his childhood; then he becomes infected with Toxoplasma, and in certain situations, behaves differently than he would’ve before. After some time, he notices this, just as I did myself, and formulates a rational explanation. Then he reorganizes his system of values to align with his new behavior, which in reality is caused by the parasite.

I am but a self-made amateur in psychology (though my wife would say that this is more of a flattering euphemism). I don’t know whether the hypothesis I’ve cooked up is correct; nor do I know whether it’s new, or whether there are psychologists who look at the relationship between behavior and the values system in this opposite manner. When I asked my colleagues, or the psychologists I worked with, they weren’t too sure about it. Perhaps one of my readers will tell me which author I should cite in the future in relation to the above model of building one’s value system (see Box 14 Citations in scientific literature).

Box 14 Citations in scientific literature

In scientific literature (but not, for example, scientific textbooks), any nontrivial statement must be supported with the appropriate citation – that is, a reference to original author and the source in which he published his discovery or hypothesis. There are two reasons for this practice. First off, it enables the reader to look up the source and see the statement in the context of the data and methods. The second reason is social – citations are part of scientific etiquette, and really etiquette in general, because they acknowledge that the cited author was the first to make that discovery. This is why you must cite the first author of a hypothesis, even if you yourself formulated the hypothesis independently. And strictly speaking, it’s sometimes quite difficult to know whether you created a hypothesis independently, or whether it somehow (perhaps indirectly) reached you from its original source. Back in the 80s, while reading an immunology textbook (Fundamental immunology), I was overjoyed when I uncovered the role of MHC-proteins in antigen presentation. I thought of a model that elegantly explained almost all the peculiarities then known about MHC-proteins, including the reason they play such an important role in most immunological processes. Several years later, I realized that, maybe a year before making my “discovery,” I had read about the hypothesis in a 1986 article published by Jacques Ninio in the journal Immunology Today (12); only I hadn’t quite understood or appreciated it. Meanwhile, the article apparently fell into obscurity; over those several years, it was cited only once (and that was by the author himself). And actually, a year before Ninio’s hypothesis a nearly identical (but clearer) model had been published in Nature by Antonio Lanzavecchia, and this time with strong supporting data (13). Lanzavecchia’s model was referenced in an article in Immunology Today, and by 2010, had been cited 1027 times by various authors. Nevertheless, even now, 25 years after what is probably the greatest discovery in modern immunology, Lanzavecchia is still waiting for his Nobel Prize. And that is despite having a Hirsch index of 95 (see Box 28 How to measure the quality of science).

Cattell’s and Cloninger’s weren’t the only psychological questionnaires that we used to torture our test subjects. Starting in 2007, we began using the currently popular questionnaire known as the Big Five; before that, we used a Czech questionnaire known as N-70. Let’s start with the second questionnaire. The N-70 was created by Czech psychologist Karel Vacíř as a shorter alternative to the better known SCL-90 questionnaire. Unfortunately, the N-70 is not used world-wide, so results obtained using the questionnaire are very difficult to publish in international journals. For this reason, we tried to replace it with the SCL-90, but it turned out that results obtained by the two questionnaires are not comparable, at least regarding the effect of latent toxoplasmosis on the human psyche. Phenomena we repeatedly observed with the N-70 could not be verified with the internationally accepted SCL-90 questionnaire. In a normal, healthy population, the N-70 measures tendencies towards certain psychopathies, such as hysteria, neurasthenia, vegetative lability, and phobia. When testing several large groups of soldier from the mandatory military service, we always saw differences between Toxo positive and negative people in several factors determined by the questionnaire. We weren’t surprised to find differences, but we were surprised at the nature of these differences. Apparently, Toxo positives were psychologically healthier and more resilient. We repeated the studies on several groups of professional soldiers, but the results were not as clear as for those of the mandatory military service.

Fig. 10 Differences in two of Big Five personality traits, extroversion (a) and conscientiousness (b), in Toxo positive and negative male and female students. The groups consisted of 181 uninfected and 30 infected female students and 95 uninfected and 21 infected male students. Toxo positive persons have greater extroversion and lower conscientiousness. The graphs show the mean for each group with a 95% confidence interval.

The Big Five, or rather its most popular implementation the NEO-PI-R, is a widely used psychological questionnaire. It primarily differs from Cattell’s 16PF questionnaire in the number of factors it determines. As the name suggests, it distinguishes only five main factors (extroversion, neuroticism, agreeableness, conscientiousness, and openness to experience), whereas Cattell’s 16 factor questionnaire, shockingly enough, determines sixteen. Nevertheless, this particular difference is not too substantial. Each of the main Big Five factors has several sub-factors (which is also true for six of the seven Cloninger’s factors). Furthermore, the sixteen Cattell’s factors can be used to calculate five factors quite similar to those of the Big Five (extroversion, emotional stability, self-control, self-reliance, and tension). The main advantage of the Big Five becomes apparent in practice; in comparison with most earlier psychological questionnaires, its results are not as affected by population type (male, female, nationality). And this was confirmed in our results. When we tested our students with this questionnaire, infected men and infected women had similar results (in contrast to the time when we used Cattell’s questionnaire, and infected men and women were affected in opposite ways). Infected persons had greater extroversion and lower conscientiousness (Fig. 10). Today, we use the Big Five questionnaire more and more in our new studies. It’s not because we consider it better questionnaire than Cattell’s, but rather because most psychologists know the Big Five and can more easily understand our results. Consequently, our papers have a better chance of being published in good psychological journals.

Frozen Evolution. Or, that’s not the way it is, Mr. Darwin. A Farewell to Selfish Gene.