Today I attended a Basic Epidemiology class meant for the undergraduate students as I thought it would be good to brush up on my basic knowledge. The topics for the day were Hypothesis Testing and An Introduction to Randomized Controlled Trials, both pretty important ones, no matter which level you are studying at. What struck me was the amount of details the students were taught. To be entirely honest, in my undergrad days we did not get such detailed statistical teaching, and I was borderline jealous of the lucky 7th semester students. However, this post is not about the class or its importance (or the lack thereof), but what stood out for me – a moment from the history of medicine. This post is a result of some online meandering following up on that momentary whim.
We have all used the Student’s t-test at one point of time or another during our lives but I wonder how many of us ever wondered who this “Student” fellow was. Well, to be entirely honest, till today, I had dismissed him as a brooding Statistician with a long white beard, heavy monocles and an intent look… you know, the ones you find on Wikipedia. While I was quite correct in stereotyping the look, digging beneath the surface revealed a much fancier and colorful story behind the apparently benign (and somewhat boring)name of Student.
Born to Agnes Sealy Vidal and Col. Frederic Gosset (June 13. 1876) at Oxfordshire, he went to New College, Oxford, where he studied Chemistry and Mathematics. He was awarded First Class degrees in both subjects, obtaining his Mathematics degree in 1897 and his Chemistry degree in 1899. The same year, he went to work with Arthur Guinness and Son as a Chemist. Yeah, you read it right, He went to work for THE Guinness company and was posted at Dublin.
Digressing for a while here, I must say that the history of the Guinness company is in itself an interesting study, especially since in its long 253 year history, it has withstood the ravages of two world wars and three major economic meltdowns – no mean feat that – but what probably makes the brand instantly recognizable for the aficionado of historical trivia is its association with the Battle of Waterloo. Apparently it was so famous in 1815 that the wounded soldiers at Waterloo were asking for Guinness by name, and were getting magically revived by partaking of the wonder-drink. The company cashed in on the legends by publishing a series of advertisements on this theme in the 1930s and 1940s when the print media was just taking off. They say it went viral…
Anyways, coming back to the story of Student. Guinness had a policy of employing the brightest minds coming out of Cambridge and Oxford in order to bolster the statistical and biochemical working of the company.(3) Being a brilliant student, Gosset naturally was picked up by the company. It was thus a stroke of great luck for future statisticians and researchers that he did not go on to become an Engineer like his father owing to his poor eyesight.(4) Now for those who are wondering why I am obsessing over the Guinness company (beside the obvious reason), well, just hold on to your hats, let me just say that this company also had a major role to play in Gossett getting his moniker of Student. While this is pure speculation, I must say if Gossett were alive today, and had seen the popularity that his discovery had found, he would rather had it called Gosset’s t-test than Student’s t-test! But then again, he was a very unassuming and humble person, so…
When Gosset joined Guinness, Dublin, his task was to perfect the process of brewing beer.(5) The principle was that one had to add an exact amount of yeast colonies to a certain amount of fermenting barley to turn it into beer. Too few colonies and the brew would be incompletely fermented and too much, it would become bitter. So the challenge was to count the colonies and add just the right number of them. Gosset innovated around this problem by using the newly developed Hemocytometer to count the yeast colonies. However, the challenge was to extrapolate the findings from a small sample of the yeast extract to entire jars of the sludge! This is akin to the problem medical or social scientists face when they draw a small sample from the huge universe to study some factors! It was in this setting that the mathematical and statistical training Gosset had acquired, came into the picture.
It was the use of the hemocytometer that resulted in Gosset’s first publication and the assumption of his pseudonym, Student. A researcher at Guinness had previously published his work, leading to loss of trade secrets of the Guinness brewery and hence the company had put a blanket ban on all publication efforts by their employees. While in today’s “publish or perish” world this would seem like a counterproductive policy that would drive away the best brains from the company, those were rosier and better times, where the weight of one’s achievements was not measured by the length of their publishography.
Gosset had to plead with the brewery that the paper which he proposed to publish was an absolutely philosophical and mathematical assertion and would have no dealings with the secret workings of the Guinness factories and hence, would be of no practical importance to the competition. The authorities gave in, but added the rather practical rider that he was better off publishing them under a pseudonym (he chose “Student”) in order to avoid conflicts with other staff member with publication ambitions.(6) At this juncture, Gosset’s friendship with Karl Pearson came in handy. Pearson agreed to hide Gosset’s personal information and allowed him to publish under a pseudonym in Biometrika, the statistics journal he had founded in October 1901. In this article (7) Gosset discussed “how the scatters of the yeast colony counts using the hemocytometer was similar to the exponential limits of the binomial distribution”.(5) Thus, with this publication, the transformation of Gosset into Student began!
Pearson was a giant in his field and he first met Gosset in 1905. He was one of the people who built up the fundamentals of modern statistics. Gosset worked under him for two terms in 1906-1907 and worked on Poisson’s distributions and helped Pearson with the statistical work for his papers. In 1908, when Gosset was working on the theory for the t-test, Pearson helped him, but apparently did not recognize the importance of his work.
Pearson also believed that the only method to assess population parameters was by using large samples. Gosset set about to try and formulate a formal method in which he could try using small samples in order to generate representative statistics. He conducted some empirical experiments, like the following:(5)
In 1 experiment, Gosset prepared 3000 pieces of cardboard, on each of which he wrote 2 sets of data on 3000 “criminals.” One set of values were heights, and the other values were the lengths of the left middle fingers. Gosset shuffled the cards, drew at random 750 samples of 4 cards each, and computed means and standard deviations of each. Then he obtained the difference between each sample mean and the population mean (n=3000) and divided the difference by the sample standard deviation to obtain 750 z scores. He plotted the scores as probability functions and discovered that even without any of 4 parameters of Pearson, one could estimate the population mean and the associated error with a degree of certainty.
The four parameters Pearson had suggested were:
2. Standard Deviation
Pearson contended that if one knew the four parameters for a set of variables then once could locate the position of one observation in the entire spectrum of observations. In order to describe the scatter of the observations, he introduced a set of skewed curves as well.(5)
These empirical experiments led to the publication of the second paper.(8) This was a long algebraic discourse but later readers have described it to be surprisingly lucid and jargon-free.(5)
Image is from the authors of (5). If you are so inclined, you can check out the cleaned out version of the real paper as a PDF on the University of York, Department of Mathematics page here. If you can cross the paywall, then take a look at the real deal on the Biometrika page here. Although this is a debate for another time, but I find it very irksome to see that an article published over a century ago is still under copyright wraps. This just points to one of the so many things that ail the scientific publication world. I’ll save the rant about open access and copyrights for another day…
As is often the thing with concepts that are ahead of their time, Gosset’s (who was now known as Student to the publishing intelligentsia) work did not find much appreciation from the statistical world. It was not until Ronald Fisher had found a formal proof and enlisted practical applications of the t-test that people started to sit up and take notice. Apparently Gosset had written to Fisher informing him about his paper, saying that: “I am sending you a copy of Student’s Tables as you are the only man that’s ever likely to use them!”(9) Fisher modified the t-tests (don’t ask me how or why, I am statistically too impaired to go into the workings of that answer) to suit his theory of the degrees of freedom. Fisher was also responsible for the introduction of Gosset’s t-distribution in regression.
Gosset worked with a lot of the major statisticians of the day. Besides maintaining an active friendship with both Pearson and Fisher, two of the leading lights in the world of statistics at that time, he also maintained fruitful liaisons with others like Neymar. Karl Pearson’s son, Egon, himself a master number-wizard, pieced together a lot of information about the life and work of Gosset from the vast epistolary evidence he left behind, being the over-zealous letter writer that he was. Wikipedia claims that maintaining friendships with both Pearson and Fisher simultaneously was no mean feat because both had huge egos and a massive loathing for each other. It would take a man with a special amount of resilience and equanimity to be friends with both the vitriolic rivals. And Gosset was just that kind of a person. McMullen, a personal friend of Gosset’s, wrote: (10)
… he was very kindly and tolerant and absolutely devoid of malice. He rarely spoke about personal matters but when he did his opinion was well worth listening to and not in the least superficial.
A humble man despite the heights of his achievements, what struck me was the way he would interject his admirers. He would cut them short saying: “Fisher would have discovered it all anyway.”(9)
In 1934 he met with an accident and was confined to a sedentary life for a while. This time saw an explosion in the production of the statistical work by Gosset. He was bed-ridden for three months and took almost a year to recover. However, the accident left him with a limp that he carried for the rest of his life. Although he was transferred to London in 1835 to take charge of the new Guinness brewery opening there, it did not hamper his statistical work and he kept producing papers under his assumed identity of Student. He also branched out into working on theories of resistant strains of barley that would grow in adverse situations. Thus, his contributions cut across borders of different disciplines – statistics, botany, business – he was truly a man of multiple interests.
He succumbed to a heart attack in 1937 at the age of 61 years. There were multiple obituaries in Biometrika, which had been the major publisher of his life’s works. Even the usually secretive Guinness company relented and allowed his friends to posthumously publish a selection of his works in 1942.(11)
The appearance of articles written by Student was surrounded by an aura of mystery and romanticism as very few people outside of the closely knit statistical group knew the actual identity of Student. And although the obituary in Times finally removed the shroud on the question of who Student really was, it was still quite some time before he was accredited directly for his work:
All but one of Gosset’s papers were published under his assumed pseudonym. The t-test has now become a routine tool in the repertoire of pretty much anyone who has dabbled with research, irrespective of the field of research. I think there could have been but fewer apt eulogies for a person of such caliber than the one that was proposed by Ronald Fisher: Gosset was the “Faraday of statistics”.(12)
1. Image of William Sealy Gosset from Wikipedia: Now in public domain in the EU and Australia and some other countries 70 years after the death of the individual.
4. Plackett RL. Student’: A Statistical Biography of William Sealy Gosset. Oxford, United Kingdom: Vlarendon Press; 1990
5. Raju TN (2005). William Sealy Gosset and William A. Silverman: two “students” of science Paediatrics, 116 (3), 732-735 DOI: 10.1542/peds.2005-1134
7. “On the error of counting with hæmacytometer”. Biometrika 5 (3): 351–360. February 1907.
8. Student. The probable error of a mean. Biometrika. 1908;6:1–25
9. Wikipedia: William Sealy Gossett Accessed on 22nd September, 2012
10. William Sealy Gosset, 1876-1937, in E S Pearson and M G Kendall, Studies in the History of Statistics and Probability(London, 1970), 355-404.
11. Gosset WS. “Student”’s Collected Papers. Pearson ES, Wishart J, eds. Cambridge, United Kingdom: Cambridge University Press; 1942
12. H. Kohler: Life of Gosset