Like so many people out there, I have been hacking and spitting my way through R. I’ve made some awesome stuff, made the stats work, made some graphs, and written R Markdown notebooks that take 30 minutes to render (no, not because of
for loops). I feel comfortable saying that I am capable in R, but I’m still in the “incantation” phase of language understanding: I don’t really know why I’m doing
[thing] but I know that
[thing] will work because Stack Overflow told me so.
I remember this phase in Python, but after attending a week long PyCamp, hanging out with extraordinary people of Py-CU I feel completely capable in Python. I don’t know everything, but I understand every piece of syntax that I use and I’m comfortable diving into new topics.
The challenge of R is that so many of the materials and documentation are written for statisticians. R is a statistical language, so this isn’t a bad thing, but is a piece of context that seems to be lost for many of the R experts. Please stop telling me “everything is a vector” because my soul dies a little more each time someone earnestly tells me that, as if it is helpful to the general public. No. It isn’t.
I don’t care that everything is a vector and no, I don’t want to explore the philosophical implications of that. I need to run some statistics and make a few charts. I understand data types, variable names, and data processing. I’ve got my data and I know my research question. I just need to smash that into a script and I need to know how to do it in R. In short, I needed an R book written for a developer from another language, or at least something good for the angry cynical crowd.
Cue a timely recommendation for R for Everyone (2014) by Jared P. Lander. At this point in my gum-and-spit-based R career I’m pretty desperate for help. The R Cookbook helped a little, but lacking much of the foundational R know-how means that even clear explanations of advanced concepts are still opaque. I loaded up an ebook version from my library, skimmed the chapter on the apply() family and ordered it from Amazon with my fingers crossed.
Striking a great balance between at the intersection of knowing the language incredibly well but not giving us the hard sell on why R is savior of our data souls, the examples are short, simple, and don’t try to clean up the messy output you’re used to in the interpreter.
Chapters 4 & 5 are the missing pages of my R life. These cover the absolute basics of working within R, including data types and containers. These chapters need to be standard reading for everyone who complains about R. The writing perspective highlights the variety of syntax oddities with acknowledgment of them rather than apology.
Some chapters are perhaps overly detailed and would suit someone newer to programming (chapters 8-10 cover functions, control statements, and looping), while others attempt to cover such broad topics that they are more of a look book (chapter 7 on ggplot2). I was particularly happy with the pace until I hit chapter 11, where the plyr section went a little nuts. Some syntax and packages are not explained, and a peak into some of the incorrect index page numbers makes me suspect that some editing and reorganizing happened without picking up the pieces. But that doesn’t take away the ultimate value of this book.
The book seems to have three basic sections: basic R programming, statistical tools, and advanced R programming topics. The covered range of topics is ridiculously broad, and I think does a decent job of balancing the pace and level of detail. Some chapters can be a bit on the side of just a vocabulary lesson rather than instructive, but this is a hallmark of a book where the chapters are meant to stand alone from the whole. Those chapters tend to be the topic areas where further instruction would put the book’s content into maths instruction rather than R instruction. So I understand.
This book is not for basic statistics instruction, for teaching core programming fundamentals, or to serve as a singleton resource on R.
This book is a valuable supplement for a statistics course in R, an intermediate R user wanting to sample some advanced techniques, or a self-taught R user to fill in some blank spots.
Overall I would classify this book as exceptional for reference and supplement, but not as a textbook or something with problems for students to work through.
- The narrative doesn’t clarify which packages are standard library versus external and often pulls in packages but doesn’t note which functions are coming in from that package. Much of this has to do with the profoundly annoying namespace issues that R has with namespaces, and often being overly explicit about where functions are coming in from is necessary for R instruction.
- The author names many people who work to teach and create R packages, which provides a nice peak into the development community, but sometimes they feel like unnecessary name dropping. Again, though, this is a nitpick.
- Coverage of the
subsetting method doesn’t seem to appear in the book. It is used, but never thoroughly spoken of. I would have traded some of the longer sections on basic programming concepts for more discussion about subsetting data.
- Additional tables with summary information would be extremely valuable. Particularly in the chapters where specific tasks are covered. Examples: cheatsheets on selecting columns for data frames, the apply family, and aggregation.