How I use github in 500 words

My feelings about github are somewhere at the nexus of aggravated and grateful.  Git is not easy because it is composed of a series of incantations based out of somewhat identifiable English.  Close enough that you think they have meaning but far enough that you want to bludgeon yourself in the hopes of forgetting semantics.  And yet, version control and backups are important to save your butt.

I shall confess (without apology), that I do not use command line git(hub). Can I? Sure. Do I want to try and remember those commands while I’m trying to keep my brain on my code? No. So the github app is where I find the sweet spot between extracting the power of version control without the anguish.

Here’s my use case for git: I write code, I want to track it, github works well for that, and I usually work alone.  All I need to do is create, commit, sync, and get back to my freaking job. I don’t do pull requests or contribute to shared projects. GET OFF MY LAWN.

Sorry, let’s get back to how to do commits and restore deleted files.

Step 1: Make a github account. Already have one? Go apply for the student developer pack if you haven’t already.

Step 2: Download the Github Desktop application:

Step 3: Go into the application, find preferences. Remember password. Sign in.

Step 4: Find the plus button thing. Click. Add will let you point the app at an existing folder and create a repo from that, Create makes a new one, and Clone is an incantation for the sociable. Click Create, give it a name, and click Create repository.

Screen Shot 2016-05-09 at 10.03.23 PM.png

Step 5: That repository name is now a folder. Go find it. Or make a new one somewhere you can remember.

Step 6: Add some crap to that folder.

Screen Shot 2016-05-09 at 10.10.43 PM.png

Step 7: Go back to the app and pour your eyeballs on your newly tracked file. The stuff on the right shows you the contents of the file. Green shows additions, red deletions. The little icon next to the repository name on the sidebar should be a computer monitor looking doodad, meaning that it is a local repo.

Step 8: Add a commit message (short & sweet) and description (maybe longer, with punctuation). Click Commit to Master. To send off to github, click Publish on the upper right. You can stack up a bunch of commits before publishing, if you want.  The first time through it’ll ask you some stuff. Just click the publish again.

Screen Shot 2016-05-09 at 10.14.22 PM

Step 10:  Go change some crap in your file and check back to the app. Make a less crappy commit message and click Commit to master. Then this time Sync when ready to send it to github.

Screen Shot 2016-05-09 at 10.18.37 PM

Step 11: Delete that file and check the app. GONE.

Screen Shot 2016-05-09 at 10.31.28 PM.png

Click the Repository menu item and select Discard Changes to Selected Files.

Screen Shot 2016-05-09 at 10.32.08 PM.png

Ya, everything goes away from the repo because THE FILE IS BACK.

Step 12: Go back to your research.

490 words.

Review of: R for Everyone

Like so many people out there, I have been hacking and spitting my way through R.  I’ve made some awesome stuff, made the stats work, made some graphs, and written R Markdown notebooks that take 30 minutes to render (no, not because of for loops).  I feel comfortable saying that I am capable in R, but I’m still in the “incantation” phase of language understanding: I don’t really know why I’m doing [thing] but I know that [thing] will work because Stack Overflow told me so.

Screen Shot 2016-05-04 at 5.57.25 PM

I remember this phase in Python, but after attending a week long PyCamp, hanging out with extraordinary people of Py-CU I feel completely capable in Python.  I don’t know everything, but I understand every piece of syntax that I use and I’m comfortable diving into new topics.

The challenge of R is that so many of the materials and documentation are written for statisticians.  R is a statistical language, so this isn’t a bad thing, but is a piece of context that seems to be lost for many of the R experts.  Please stop telling me “everything is a vector” because my soul dies a little more each time someone earnestly tells me that, as if it is helpful to the general public.  No.  It isn’t.

I don’t care that everything is a vector and no, I don’t want to explore the philosophical implications of that. I need to run some statistics and make a few charts.  I understand data types, variable names, and data processing.  I’ve got my data and I know my research question.  I just need to smash that into a script and I need to know how to do it in R.  In short, I needed an R book written for a developer from another language, or at least something good for the angry cynical crowd.

Cue a timely recommendation for R for Everyone (2014) by Jared P. Lander.  At this point in my gum-and-spit-based R career I’m pretty desperate for help.  The R Cookbook helped a little, but lacking much of the foundational R know-how means that even clear explanations of advanced concepts are still opaque.  I loaded up an ebook version from my library, skimmed the chapter on the apply() family and ordered it from Amazon with my fingers crossed.

Striking a great balance between at the intersection of knowing the language incredibly well but not giving us the hard sell on why R is savior of our data souls, the examples are short, simple, and don’t try to clean up the messy output you’re used to in the interpreter.

Chapters 4 & 5 are the missing pages of my R life.  These cover the absolute basics of working within R, including data types and containers. These chapters need to be standard reading for everyone who complains about R.  The writing perspective highlights the variety of syntax oddities with acknowledgment of them rather than apology.

Screen Shot 2016-05-04 at 6.26.58 PM

Why I can’t use library books on R

Some chapters are perhaps overly detailed and would suit someone newer to programming (chapters 8-10 cover functions, control statements, and looping), while others attempt to cover such broad topics that they are more of a look book (chapter 7 on ggplot2). I was particularly happy with the pace until I hit chapter 11, where the plyr section went a little nuts.  Some syntax and packages are not explained, and a peak into some of the incorrect index page numbers makes me suspect that some editing and reorganizing happened without picking up the pieces.  But that doesn’t take away the ultimate value of this book.

The book seems to have three basic sections:  basic R programming, statistical tools, and advanced R programming topics.  The covered range of topics is ridiculously broad, and I think does a decent job of balancing the pace and level of detail.  Some chapters can be a bit on the side of just a vocabulary lesson rather than instructive, but this is a hallmark of a book where the chapters are meant to stand alone from the whole.  Those chapters tend to be the topic areas where further instruction would put the book’s content into maths instruction rather than R instruction.  So I understand.

This book is not for basic statistics instruction, for teaching core programming fundamentals, or to serve as a singleton resource on R.

This book is a valuable supplement for a statistics course in R, an intermediate R user wanting to sample some advanced techniques, or a self-taught R user to fill in some blank spots.

Overall I would classify this book as exceptional for reference and supplement, but not as a textbook or something with problems for students to work through.


  • The narrative doesn’t clarify which packages are standard library versus external and often pulls in packages but doesn’t note which functions are coming in from that package.  Much of this has to do with the profoundly annoying namespace issues that R has with namespaces, and often being overly explicit about where functions are coming in from is necessary for R instruction.
  • The author names many people who work to teach and create R packages, which provides a nice peak into the development community, but sometimes they feel like unnecessary name dropping. Again, though, this is a nitpick.
  • Coverage of the [] subsetting method doesn’t seem to appear in the book. It is used, but never thoroughly spoken of.  I would have traded some of the longer sections on basic programming concepts for more discussion about subsetting data.
  • Additional tables with summary information would be extremely valuable.  Particularly in the chapters where specific tasks are covered. Examples:  cheatsheets on selecting columns for data frames, the apply family, and aggregation.

What I’m working on right now

By request of Julia Evans (tweet) to the world, I am writing about what I’m working on right now.

As the tweets were posted I was in the middle of facilitating Py-CU‘s weekly open hours. The discussion threads were:

  1. Attempting to help someone get Python 2.7 going within Anaconda3 so he could use computer vision packages in Jupyter Notebooks.  Windows. Pain. Unresolved.
  2. Chatting with a programming newbie about resources and common problems humanities students having when first learning how to code.
  3. Awesome nerd out over text and code editors.
  4. Listening to a BB-8 build group yell at a robot to test voice commands.
  5. Listening to a 3D printer choke to death on Darth Vader printer.
  6. Me ranting about the crazy train that chapter 11 of R for Everyone went on.

I was working on:

  1. Writing a tool to auto-generate a bunch of CSVs with fake data.
  2. In order to have test data to build an auto-documentation tool.
    1. And attempting to figure out how to slam this JSON file into a sqlite3 database.
    2. I just wrote data[‘files’][data[‘files’].keys()[0]].keys().
    3. Rethinking life choices.
  3. Musing over my book review notes for R for Everyone.
  4. Resisting the urge to get R for Everyone out of my bag because I need to finish this class project.

Things near the top of my stack:

  1. Finishing an XPath tutorial.
  2. Finishing R for Everyone.
  3. Planning how I would code up some data for a analytics project for work.
  4. Thinking about my summer learning stack once I GRADUATE in May and am FREE.

Why it barely matters where you start

There is no one true anything in life. Expanded out to the programming world, there is no one true IDE, book, language, package, etc. Anyone trying to sell you on that is a liar. A more refined statement might be: any hybrid tool can rarely ever be as good as a specific tool.


Many newcomers interested in data analysis ask the following completely reasonable questions:

  • Should I learn Python or R?
  • Which IDE should I use?
  • Is there a book or workshop that I need?

An appropriate answer is to scratch your head, hedge a bit, and then try to list off some stuff you think is recent, doesn’t involve too much of a headache to attain or install, and hopefully won’t terrify this person back to Excel. Imagine trying to explain childbirth to a young girl going through puberty and looking forward to adulthood. Stating the reality of “At least you probably won’t die” isn’t likely to make you feel great as a mentor and certainly won’t make her excited for future parenthood.

Many communities use the word stack to describe a pile of stuff that likely has some form of internal hierarchy or workflow. We can see this in software, networking, libraries, math, architecture, and many others. English has plenty of idioms implying that tool selection is a fluid process almost as important as the use of the tool itself. These include, right tool for the right job, bring out the big guns, and don’t bring a knife to a gunfight.

Each of these implies that there are multiple tools and the core joke is that the selection process should be determined by the job to be done. Let’s look at chess for a moment. The Queen piece may be more powerful than the Knight, but the Knight can move in ways the Queen can’t. This means there may be problems for which the most powerful piece is utterly useless. The language that our community uses is describing something that we understand implicitly once we have enough experience, yet we often let students have this perspective of reverence for a single specific tool.

Observing this truth and given how open source obsessed this research domain is as well, it isn’t shocking there is an overabundance of tools in certain areas. At the time of writing, PyPI is approaching nearly 80,000 packages. So how to choose? Which are worth investing your mental energy into getting good at? In the end, does it matter which package you use? These are very serious questions, and sadly, the more I study these things the more I come to the conclusion of “Who knows, but not a big deal one way or another.”


So my best advice for the newcomer is to just pick something. Start somewhere. Anywhere. Really, it barely matters. Because in the end you’ll likely need to know a little bit about everything on the list in front of you. Even if that tool or platform turns out to be a bust for the problem you were working on, that experience adds to your knowledge about what is available in the analysis stack and how to approach problems. You’ll run into it again if you stay in the analytics world.

Now, it doesn’t have to be a complete free-for-all. Some informed selection is always beneficial. Just keep in mind that you are selecting which thing to learn first and not only. Also, be open to accepting that you’ve gone down the wrong path.

You, the learner, have power over how you learn. I want to keep stressing that. You have the absolute power to accept or reject suggestions and strategies. Strict adherence to ‘expert’ recommendations doesn’t reflect your unique needs and puts too much value on those recommendations. I know some of the tools I use aren’t the best, but I use them because they have worked for me up until now. I’ve also run into problems with some of the gold standard tools out there and I end up dealing with pearl clutching coders when I mention that I don’t use them.

I will provide some recommendations below, but you should plan on trying a few things out. See what sticks. While I may recommend something, that is not to the exclusion of something else. Recommending Python does not imply that R will be useless.

When you know a little bit about what you’ll need to do

Search around online for similar projects. See what they’ve done. Pay attention to any chat about specific packages designed for these tasks. Go with whichever platform has the tools designed for your task. For example, I was asked about visualizations for Likert scale data from a survey. R happens to have a nifty 3rd party package for Likert charts. However, if the online survey tool you’re using has mangled your data, you may need to break out some Python to whack it into something compatible with math.

When you just need to start somewhere

Unless you think you may need to do a lot of stats, start with Python. It’ll be straightforward, and you can apply the basic concepts for other tasks, like making games, front-end web development, etc. You’re likely to move on from Python, and that’s fine. It’s a great starter language.

In conclusion

Don’t let decision fatigue prevent you from getting started on your path.  Nearly every programming area requires a stack of things to know, so all experience is good experience to have.  Investigate a little or ask colleagues, but at the end of the day just flip a coin and go with something.

“Programming as an information-centric activity” talk at the Python Education Summit

After developing/teaching several types of programming workshops and spending a lot of time listening to my peers at GSLIS talk about learning how to code, it is fair to say that I have a lot of opinions on the state of teaching programming for those outside of a STEM past and going into a non-STEM future. Additionally, being part of a graduate program in library and information science has biased me to see a lot of activities in our daily lives as information problems.

Much of my experience with programming books has revealed a concept-formula-drill presentation model, but this doesn’t encapsulate the real work of problem solving with code. Yes, students absolutely need to drill and practice the core concepts, but the activity of programming goes far beyond just that need. Many experienced coders criticize new students begging for help for not immediately searching for their problem on Google and solving it on their own. This is such a common response that the programming instruction community should listen and take note. Why are there so many Stack Overflow posts closed as duplicates? Sure, there are certainly searchers who are too lazy to actually read through other posts, but I believe that this group is in the minority. I’d pin the problem on searchers being unable to either a) correctly form a useful search query, or b) recognize an appropriate solution as useful for their problem.

Indeed, many instructors will encourage students to search online for their answers, but even a Google layperson understands that there is skill required to construct a useful search string. There are quirks and tricks to solving code problems via search engines. In unpacking this problem, we can see that students need to be able to identify the actual problem in their code, find the relevant section of code on a line, understand the words to describe the problem, and recognize how to apply a potential fix to their own code. There are a lot of essential skills here but do any introductory textbooks talk about this? (Seriously, let me know if you find one).

I collected many of my thoughts about this into a talk I presented at the Python Education Summit titled “Programming as an information-centric activity.” The core argument: the normal activity of programming involves a lot of information skills and these skills should be incorporated very explicitly into the classroom or other instruction environment. Instructors should not only use documentation and reference materials within lectures and demos but they should also take the time to talk about the common reference materials within programming communities. For example, answer the question: What is a programming cookbook and when should a student reference one? This is a reference document somewhat unique to the programming community but often discovered by accident by novices.

Slides are up on FigShare:
A pre-recorded screencast version is up on YouTube:

The great Python Mashup lesson plan

I’m often asked by new students where they should go to learn Python.  That isn’t always an easy answer, because I haven’t found the one perfect resource yet.  However, there are some really strong ones out there.  My goal was to construct a lesson plan that was a mashup of my favorite resources into a coherent plan of readings and homework.  Readings are important to have a foundation to build on and reference back to, but so is having a solid queue of content to whack on until you understand the how and why of things.

I’ve mashed up content from Python for Informatics,, Codecademy, and Python Batting Practice together into one course book.

I believe that these materials are some of the best out there, and I reject the notion that students need to learn from a single source.  Each has benefits, and I feel like these sources are very complimentary.  Recall learning how to spell or learning another (human) language.  We always had workbooks or some material that required us to act on the content we had just studied.  Learning how to write code is a skill based activity that requires a ton of practice to refine your understanding of the concepts and syntax.  Additionally, learning from multiple sources allows the student to experience how things are referred to by different people from more perspectives.

That being said, a streamlined and supportive course in programming designed to minimize frustration and difficulties does not mean that either of those will disappear.  An important part of the learning process is the fight to learn.  The harder we have to work for something the better we remember it, but that doesn’t mean that learning how to program needs to be the worst thing ever.  I have aimed to keep a good balance inside the “productively difficult” zone.

So, I am happy to announce a new page at the top: the Guided Self-Study Lesson Plan.  I will be leading another introductory workshop again soon with this structure as the basis, and I plan to document and publish that as well.

Interview with

I had the great pleasure of meeting Elliott Hauser and the awesome development team of at PyCon 2014.  Trinket taps into the web-based power of Python by creating a framework to host interactive Python sessions as part of lessons or embedded in a blog.

Trinket’s blog has been featuring a series of interviews with programming educators, showcasing a wide assortment of approaches and tools.  I was recently interviewed by them about the Python group that I co-organize and the various outreach we’ve taken up.

Trinket offers a platform to document your workshop or instructional notes in such a way that your students can take the link home for reference or sharing.  I’ll be posting about a workshop I recently documented on their platform shortly.