stylized extract from a New York Times visualization of social mobility data drawn from The Opportunity Atlas

Analytics with python
[sqlite3, numpy, pandas, matplotlib]

bookTextbook sections

Dietel provides step-by-step examples with numpy and some pandas in chapter 7: Array-Oriented programming with NumPy

bookCore documentation

database: sqlite3

plotting: matplotlib

manipulation: pandas

Video tutorial resources

Misc resources

crunching base: numpy

bookSample code

bookData sets for exercises

listLesson Sequence

  1. Library building blocks: numpy and pandas
  2. Hello world
  3. Exploring health code violations with pandas

emoji_objectsCore objects: Index, Series, and DataFrame

The screen clippings of the API documentation below link to the object-specific overview subpages within.

pandas index pandas series pandas dataframe

cakeWork specification

program objective

Write a script that answers basic data-based questions concerning health violations in Allgheny county

questions to pursue in Allegheny county health code violations

  1. What types of health violations are most common in three municipal areas of your choosing?
  2. Which types of violations are mostly only considered "high" severity and not "medium" or "low" severity?
  3. Which classification of restaurant (i.e. use the column 'description') has the most "high" severity violations? Does this vary by the municipalities of your choosing?

choose your own dataset

Choose a dataset of your own or one listed on the Western PA Regional Data Center. It will be easiest to choose a dataset published as CSV file type for ingestion by pandas.

Write a script that uses pandas to describe the value counts of each key variable.

Then slice your dataset using a value in a column of your choosing and compare a pattern of your choosing between your sliced sub-data-sets.