Preface
“Beyond essential truths is but a following.”
This book provides a general introduction and overview of univariate and multivariate statistical modeling techniques typically used in the social and behavioral sciences. Students reading this book will come from a variety of fields, including psychology, sociology, education, political science, etc., and possibly biology and economics. Spanning several statistical methods, the focus of the book is naturally one of breadth than of depth into any one particular technique. These are topics usually encountered by upper-division undergraduate or beginning graduate students in the aforementioned fields.
A wide selection of applied statistics and methodology texts exist, from books that are relatively deep theoretically to texts that are essentially computer software manuals with a modest attempt to include at least some of the elements of statistical theory. All of these texts serve their intended purpose so long as the user has an appreciation of their strengths and limitations. Theoretical texts usually cover topics in sufficient depth, but often do not provide enough guidance on how to actually run these models using software. Software manuals, on the other hand, typically instruct one on how to obtain output, but too often assume the reader comes to these manuals already armed with a basic understanding of statistical theory and research methodology.
The author of this book did not intend to write a software manual, yet at the same time was not inclined to write something wholly abstract, theoretical, and of little pragmatic utility. The book you hold in your hands attempts a more or less “middle of the road” approach between these two extremes. Good data analysis only happens when one has at least some grounding in both the technical and philosophical aspects of quantitative science. For example, it is well known that the “machinery” of multivariate methodology is grounded primarily in relatively elementary linear and matrix algebra. However, the use of these procedures is not. The how to do something can always be dug up. The why to do something is where teaching and instruction are needed. Indeed, one can obtain a solution to an equation, but if one does not know how to use or interpret that solution, it is of little use in the applied sense.
Hence, a balance of sorts was attempted between theory and application. Whether the optimum balance has been achieved, will be, of course, left to the reader (or instructor) to ultimately decide. Undoubtedly, the theoretician will find the coverage somewhat trivial, while the application-focused researcher will yearn for more illustrations and data examples. It is hoped, however, that the student new to these methods will find the mix to his or her liking, and will find the book relevant as a relatively gentle introduction to these techniques.
As merely a survey and overview of statistical methodologies, the book is void of proofs or other technical justification as one would have in a more theoretical book. This does not imply, however, that the book is one of recipes. Attention was given to explaining how formulas work and what they mean, as I see this as the first step to facilitating an understanding of the more technical arguments required for proofs and the like. The emphasis is on communicating what the equations and formulas are actually telling you, instead of focusing on how they are rigorously and timelessly justified. Readers interested in a more advanced and theoretical treatment should consult any of the excellent books on mathematical and theoretical statistics such as that by Casella and Berger (2002).
In my view, the current textbook trend to provide “readable” data analysis texts to students outside of the mathematically dense sciences has reached its limit. Books now exist on statistical topics that attempt to use virtually no formulae or symbols whatsoever. I find this to be unfortunate, if not somewhat ridiculous, just as I equally find the abuse of mathematical complexity for its own sake rather distasteful. Indeed, being technical and complex for its own sake does little for the student attempting to grasp difficult concepts rather than simply memorizing equations. As Kline (1977) noted with regard to teaching calculus, rigor, while ultimately required, can too often obscure that thing we call understanding. Stewart (1995) said the same thing: “The psychological is more important than the logical…Intuition should take precedence; it can be backed up by formal proof later” (p. 5).
What has always intrigued me, however, is how little social science students, aside from those perhaps in economics, are exposed to even elementary mathematics in their coursework. Even courses in statistics for social scientists generally de-emphasize the use of mathematics. I believe there are two reasons for this trend. First, mathematical representation in these disciplines has a reputation for being either “mysterious” or otherwise “beyond the grasp” of students. Students shy away from equations, and, for good reason, they can be difficult to understand and are difficult to manipulate. Except for the gifted few, we all struggle. But to think of them as mysterious or beyond anyone's grasp is simply wrong. Second, the communication and writing of mathematics generally lacks clarity and that philosophical “touch” when the teaching of it is attempted. Nobody likes to see one equation followed by another without understanding what “happened” in between, and even more importantly, why it happened. The proverbial sigh of outward and seemingly innate and unforgiving disappointment displayed toward any student who should ask why, is, of course, no service to the student either.
I do not believe most students dislike mathematics. I do believe, however, that most students dislike mathematics that are unclear, poorly communicated, or otherwise purposely cryptic. In this book, I go to somewhat painstaking efforts to explain technical information in as clearly and in as expository fashion as possible. In this spirit, I was largely inspired by A.E. Labarre's text Elementary Mathematical Analysis published in 1961. It is as exceptionally clear elementary-to-moderate level mathematics text that you will ever find and is a perfect demonstration of how technical information can be communicated in a clear, yet still technically efficient manner. Once more, the reader will be the final judge on whether or not this clarity of exposition has been achieved.
Importance of History
I have always found that learning new statistical techniques without consulting the earliest of historical sources on those techniques a rather shallow and hollow experience. Yes, one could read a book on the how and why of factor analysis, for instance, but it is only through consulting the earliest of papers and derivations that one begins to experience a deeper understanding. Nothing compares to studying the starting points, the original manuscripts. It has always also intrigued me that one can claim to understand regression, for instance, yet not have ever heard of Francis Galton. How can one understand regression without even a cursory study of its historical roots? Of course, one can, but I believe a study of its history contributes more of an impression of the technique than is possible otherwise. A study of early plots that featured the technique (see Figure 1) and the context in which the tool came about, I believe can only promote a deeper understanding of concepts in students.
Figure 1 One of Galton's early graphical illustrations of regression. Circles in the plot are average heights for subgroups of children as a function of mid-parent height. The lines AB and CD are regression lines. (Galton, 1886).
A priority of the book is to introduce students to these methods by often providing a glimpse into their historical beginnings, or at minimum, providing some discussion of its origination. Historically relevant data are also used in places, whereas other parts of the book feature hypothetical and very “easy” data. And though demonstrating techniques by referring to substantive applications is always a good idea, it is equally useful to demonstrate methods using “generic” variables (e.g., x1,x2,…) to encourage an understanding of what the technique is actually doing as distinct from the substantive goals of the investigation. Researchers in applied fields can sometimes get overly “immersed” in their theories such that next to their significant other, their theory is their greatest love. Over-focus on applications can prevent the student from realizations of the kind that factor analysis, for instance, does not “discover” anything. It merely models correlation. It is often useful in this regard to retreat from substantive considerations and simply focus on the mechanics lest we conclude more from the software output than is warranted by the quantitative analysis. A course in statistical methods should be just as much about what statistics cannot do as it should be about what they can do. Many students significantly overestimate the power of the tool.
There is another reason for the focus somewhat on historical papers. Though the history of statistics by no...