Projects/Portfolio

My personal research revolves around technical education for non-STEM student populations and professionals. More specifically, I am interested in programming education for humanists. To this end, I am working to develop technical skills in an effort to be a programmer on digital humanities projects, but also developing instructional skills so I can have a teaching element in my career.

This page will highlight what I am able to share about various instructional and technical projects. I cannot include all of my technical projects, as some are dealing with protected information.

Instruction

Technical

INSTRUCTION

Computer programming information resources for librarians

Important! This LibGuide is not live yet, and will be updated as it is completed over the course of Fall semester. A link to the under construction page is here.

This LibGuide will serve as a primer about and directory of computer programming resources for librarians. The guide will include information on what programming is, how to identify the usefulness of programming information resources, and a directory of exemplars of these materials.

ALA/LITA Midwinter workshop on practical programming

I will be teaching a full day workshop on practical programming as part of the LITA Midwinter Institutes. As always, I plan on having as much of that content available online as possible.

Abstract from the program:

This workshop will introduce foundational programming skills using the Python programming language. There will be three sections to this workshop: a brief historical review of computing and programming languages (with a focus on where Python fits in), hands on practice with installation and the basics of the language, followed by a review of information resources essential for computing education and reference. This workshop will prepare participants to write their own programs, jump into programming education materials, and provide essential experience and background for the evaluation of computing reference materials and library program development. Participants from all backgrounds with no programming experience are encouraged to attend.

Python mashup lesson plan

I published this lesson plan within on a page within this blog: Python Mashup Lesson Plan. This lesson plan was created after attempting to explain to interested students “Use pythonlearn.com and Codecademy as your workbook” and not having a lot of luck with that statement.  I was also interested in developing something students could work through that would combine several resources together.  I firmly believe that programming students need to be exposed to many types of instruction, and I do not feel like I have met a single learning resource that has a great balance between instruction and practice.

This lesson plan brings in four of my favorite Python learning resources together. Not only are they all complimentary, but they each provide an important educational voice for students to experience. I reviewed the content within each source and attempted to map all appropriate topics together. Except for the concepts of looping and lists, the content areas all fit in together neatly.

The exercise of reviewing how the content for each resource builds is a valuable stage of resource evaluation for instructors and authors. For example, many resources have a bad habit of needlessly introducing syntax or techniques before explaining them. For example, Codecademy lessons began using the def syntax before explaining what it was, and Python for Informatics began using lists within looping before explaining lists in any real detail. I rearranged much of the content into (what I would argue is) a better order of exposure.

Introduction to Programming with Python: Guided self-study version

This is an 8 session course is currently in progress at the time of this writing.  I am archiving the class presentations and homework on a page of this blog, here:  Classroom Lesson Plan.  The content is based on the Python Mashup Lesson Plan described below.

The basic design behind this course is to have students doing more of the studying at home, but attending a short lecture to work through questions and examples together.  The first run of this class is a pilot version to develop the lesson plan with the hope that the class can be run multiple times a year.

Introduction to Programming with Python

This was my first Python teaching experience. I worked with other members of Py-CU to develop a 6 session workshop designed to take students through the introductory concepts of programming using the Python language.

I chose Python for Informatics as the basic for our class structure, but we developed our own presentations and examples for the classroom portion. Students were encouraged to follow along with the book for additional exposure.

Much of the content has been archived on the class Tumblr: http://py-curious.tumblr.com/. I also documented other aspects of the class in a poster presented at PyCon 2014, archived on figshare: http://dx.doi.org/10.6084/m9.figshare.988625.

TECHNICAL PORTFOLIO

Project Runway data mining

This was a toy project designed to have something neat to play with as I explored machine learning topics. After running into several walls with my Jeopardy project, I decided to collect a simpler database to play with.  After watching nearly all the seasons, I thought I had noticed a trend.  It seemed like the number of challenge wins a contestant had didn’t represent their likelihood at winning the competition.

The question seemed to be, what are these two things actually testing?  Individual challenges give contestants a specific goal, while the final challenge has the open ended goal of creating a collection.  It is interesting that they would set it up this way, because they are two very different kinds of challenges.

I curated the data set to measure several things, but mainly to look at what quantitative factors could predict a winner?  I was able to fire the data into Weka without much cleanup or processing and got some interesting results.  Obviously those who were on the bottom many times did not make it far, but those who won the most also do not often win.  Of course, all these results are weakened by the fact that there have only been 12 seasons to test on.

J!Archive Scraping

This was one of my initial projects I started on after working through an initial set of Python resources.  I went on a Jeopardy! kick and found that I was quite curious about the backgrounds of the participants.  Luckily I found J! Archive as a data source, which has reasonably structured pages to scrape.

Initial research questions:

  • Are players with certain occupations likely to be more successful compared to the average player?
  • Are there certain “hot spots” of reported player home town?  Have these hot spots changed over time?
  • Are players who defeat long running champions more likely to be long running champions as well?

Points of interest:

I originally had code set up to work through the pages via regular expressions.  On the one hand, I learned a lot about using regex in Python.  On the other hand, it was unwieldy and absolutely bad practice.  I’m currently extracting the raw data from the HTML using XPath in Scrapy, and then cleaning it up and organizing it inside of Python.  Here’s a sample chart from some of the data that I’m getting in:

kenjennings

This was made using Raw. The data shows the winnings of Ken Jennings during his original 2004 run.  The size of the bubble indicates how much was won during the Final Jeopardy! round.

This project is currently active as part of my work within the socio-technical data analytics program at GSLIS. I have extracted all the occupational text from the players and am currently getting that aligned with the SOC codes from the US Department of Labor. Initial testing for just value equality yielded matches of about 30%. Subsequent refinements to matching and additional curation of the SOC codes increased this matching to 77%. The first step of this project will be to increase this matching percentage as much as possible. Once that data is prepared, I will bring in historical US Department of Labor data and compare the prevalence of occupations within the Jeopardy contestant sample to the US population over time.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s