Analytics with python
[sqlite3, numpy, pandas, matplotlib]

Textbook sections

Dietel provides step-by-step examples with numpy and some pandas in chapter 7: Array-Oriented programming with NumPy

Core documentation

Sample code

Sample code on GitHub

Data sets for exercises

Allegheny county restaurant inspection dataset from WPRDC

Lesson Sequence

Library building blocks: numpy and pandas
Hello world
Exploring health code violations with pandas

Core objects: Index, Series, and DataFrame

The screen clippings of the API documentation below link to the object-specific overview subpages within.

Work specification

Code to the specification below. Then upload your a python files and any related documents to your GitHub account oragnized in sensible ways with informative commit messages


program objective	Write a script that answers basic data-based questions concerning health violations in Allgheny county
questions to pursue in Allegheny county health code violations	What types of health violations are most common in three municipal areas of your choosing? Which types of violations are mostly only considered "high" severity and not "medium" or "low" severity? Which classification of restaurant (i.e. use the column 'description') has the most "high" severity violations? Does this vary by the municipalities of your choosing?
choose your own dataset	Choose a dataset of your own or one listed on the Western PA Regional Data Center. It will be easiest to choose a dataset published as CSV file type for ingestion by pandas. Write a script that uses pandas to describe the value counts of each key variable. Then slice your dataset using a value in a column of your choosing and compare a pattern of your choosing between your sliced sub-data-sets.

program objective

Write a script that answers basic data-based questions concerning health violations in Allgheny county

questions to pursue in Allegheny county health code violations

What types of health violations are most common in three municipal areas of your choosing?
Which types of violations are mostly only considered "high" severity and not "medium" or "low" severity?
Which classification of restaurant (i.e. use the column 'description') has the most "high" severity violations? Does this vary by the municipalities of your choosing?

choose your own dataset

Choose a dataset of your own or one listed on the Western PA Regional Data Center. It will be easiest to choose a dataset published as CSV file type for ingestion by pandas.

Write a script that uses pandas to describe the value counts of each key variable.

Then slice your dataset using a value in a column of your choosing and compare a pattern of your choosing between your sliced sub-data-sets.

Analytics with python [sqlite3, numpy, pandas, matplotlib]

bookTextbook sections

bookCore documentation

database: sqlite3

plotting: matplotlib

manipulation: pandas

Video tutorial resources

Misc resources

crunching base: numpy

bookSample code

bookData sets for exercises

listLesson Sequence

emoji_objectsCore objects: Index, Series, and DataFrame

cakeWork specification