What I’m working on right now

By request of Julia Evans (tweet) to the world, I am writing about what I’m working on right now.

As the tweets were posted I was in the middle of facilitating Py-CU‘s weekly open hours. The discussion threads were:

  1. Attempting to help someone get Python 2.7 going within Anaconda3 so he could use computer vision packages in Jupyter Notebooks.  Windows. Pain. Unresolved.
  2. Chatting with a programming newbie about resources and common problems humanities students having when first learning how to code.
  3. Awesome nerd out over text and code editors.
  4. Listening to a BB-8 build group yell at a robot to test voice commands.
  5. Listening to a 3D printer choke to death on Darth Vader printer.
  6. Me ranting about the crazy train that chapter 11 of R for Everyone went on.

I was working on:

  1. Writing a tool to auto-generate a bunch of CSVs with fake data.
  2. In order to have test data to build an auto-documentation tool.
    1. And attempting to figure out how to slam this JSON file into a sqlite3 database.
    2. I just wrote data[‘files’][data[‘files’].keys()[0]].keys().
    3. Rethinking life choices.
  3. Musing over my book review notes for R for Everyone.
  4. Resisting the urge to get R for Everyone out of my bag because I need to finish this class project.

Things near the top of my stack:

  1. Finishing an XPath tutorial.
  2. Finishing R for Everyone.
  3. Planning how I would code up some data for a analytics project for work.
  4. Thinking about my summer learning stack once I GRADUATE in May and am FREE.

Why it barely matters where you start

There is no one true anything in life. Expanded out to the programming world, there is no one true IDE, book, language, package, etc. Anyone trying to sell you on that is a liar. A more refined statement might be: any hybrid tool can rarely ever be as good as a specific tool.

FullSizeRender

Many newcomers interested in data analysis ask the following completely reasonable questions:

  • Should I learn Python or R?
  • Which IDE should I use?
  • Is there a book or workshop that I need?

An appropriate answer is to scratch your head, hedge a bit, and then try to list off some stuff you think is recent, doesn’t involve too much of a headache to attain or install, and hopefully won’t terrify this person back to Excel. Imagine trying to explain childbirth to a young girl going through puberty and looking forward to adulthood. Stating the reality of “At least you probably won’t die” isn’t likely to make you feel great as a mentor and certainly won’t make her excited for future parenthood.

Many communities use the word stack to describe a pile of stuff that likely has some form of internal hierarchy or workflow. We can see this in software, networking, libraries, math, architecture, and many others. English has plenty of idioms implying that tool selection is a fluid process almost as important as the use of the tool itself. These include, right tool for the right job, bring out the big guns, and don’t bring a knife to a gunfight.

Each of these implies that there are multiple tools and the core joke is that the selection process should be determined by the job to be done. Let’s look at chess for a moment. The Queen piece may be more powerful than the Knight, but the Knight can move in ways the Queen can’t. This means there may be problems for which the most powerful piece is utterly useless. The language that our community uses is describing something that we understand implicitly once we have enough experience, yet we often let students have this perspective of reverence for a single specific tool.

Observing this truth and given how open source obsessed this research domain is as well, it isn’t shocking there is an overabundance of tools in certain areas. At the time of writing, PyPI is approaching nearly 80,000 packages. So how to choose? Which are worth investing your mental energy into getting good at? In the end, does it matter which package you use? These are very serious questions, and sadly, the more I study these things the more I come to the conclusion of “Who knows, but not a big deal one way or another.”

nostrongfeelings

So my best advice for the newcomer is to just pick something. Start somewhere. Anywhere. Really, it barely matters. Because in the end you’ll likely need to know a little bit about everything on the list in front of you. Even if that tool or platform turns out to be a bust for the problem you were working on, that experience adds to your knowledge about what is available in the analysis stack and how to approach problems. You’ll run into it again if you stay in the analytics world.

Now, it doesn’t have to be a complete free-for-all. Some informed selection is always beneficial. Just keep in mind that you are selecting which thing to learn first and not only. Also, be open to accepting that you’ve gone down the wrong path.

You, the learner, have power over how you learn. I want to keep stressing that. You have the absolute power to accept or reject suggestions and strategies. Strict adherence to ‘expert’ recommendations doesn’t reflect your unique needs and puts too much value on those recommendations. I know some of the tools I use aren’t the best, but I use them because they have worked for me up until now. I’ve also run into problems with some of the gold standard tools out there and I end up dealing with pearl clutching coders when I mention that I don’t use them.

I will provide some recommendations below, but you should plan on trying a few things out. See what sticks. While I may recommend something, that is not to the exclusion of something else. Recommending Python does not imply that R will be useless.

When you know a little bit about what you’ll need to do

Search around online for similar projects. See what they’ve done. Pay attention to any chat about specific packages designed for these tasks. Go with whichever platform has the tools designed for your task. For example, I was asked about visualizations for Likert scale data from a survey. R happens to have a nifty 3rd party package for Likert charts. However, if the online survey tool you’re using has mangled your data, you may need to break out some Python to whack it into something compatible with math.

When you just need to start somewhere

Unless you think you may need to do a lot of stats, start with Python. It’ll be straightforward, and you can apply the basic concepts for other tasks, like making games, front-end web development, etc. You’re likely to move on from Python, and that’s fine. It’s a great starter language.

In conclusion

Don’t let decision fatigue prevent you from getting started on your path.  Nearly every programming area requires a stack of things to know, so all experience is good experience to have.  Investigate a little or ask colleagues, but at the end of the day just flip a coin and go with something.