header

Dictionaries and files

resources

The python foundation maintains the most comprehensive and authoritative documentation on all built-in aspects of the python language. Start here:

Warm-up 1: Looping to a file

Study the following file contents. Generate a program in Python to reproduce this structure. NOTE: You'll need to use two for() loops, nested. You can try seeing if you can print the pattern out first to the console, then replace the calls to print() with calles to yourfile.write().

file example

Warm-up 2: String formatting program specification

  1. Download the following text file of names: file link
  2. Your desired program output is a formal greeting for individuals whose names exist in a text file in the order: first last. The output should look like this:
  3. output
  4. The process of generating these greetings should take place in two methods: one should read in a file given a file name (assumed to be in the working directory) and extract the name in a single line. The second method should receive a line of text (i.e. a name in the form first last) and generate the greeting. Remember, you'll need to invert the string since we're using string formatting, not string concatenation.

Group code along with CSV files

What can we infer about the state of criminal justice in Allegheny County from their publicly released jail census?

  1. Download this CSV of jail census data by right-clicking the link and selecting something like "save link as..."
  2. Visit the WPRDC dataset page

Jail population statistics calculation example: recreate this program without using the CSV module at all! You'll need to use the split() function to break down lines of text from the file, and you'll want to use zip() to combine each row after the header with the values in the header row.


JSON encoding

live_helpGuiding questions

check_circleLearning Objectives

bookResources

listLesson sequence

  1. Course oranization system: gitHub and upload index, folders, attendance
  2. Review work from last week: programming CSV parsers without CSV module
  3. JSON, XML, and serial binary format notes
  4. JSON parsing simple examples: opening and printing the Capital projects in PGH
  5. Mini task 1: Use a for loop to list all project info in a neatly formed set, like shown in the screen shot below:
  6. Mini task 2: Write a method called logMalformedProject that is called each time a project is visited by the main loop that does not contain a value for the key 'area'. This method should write the project id to a single line in a log file with an appropriate name.
  7. Mini task 3: Create a method that assembles a list of unique values for the project area. If you are feeling ambitious, do the same for 'status' and 'asset type' since this will come in handy during the search specification.

Screen shot of capital project print formatting

capital project analysis output

JSON Search Criteria project specs

Write python code that conforms to the following diagram & specs. Note that this project was originally intended to require all students to process the same data set: the PGH capital projects. For FA21, you are using your own data set and adapting the specs. If you want to line up to the specs, use the PGH projects data set on the WPRDC.

Diagram

project spec diagram

purpose

Implement search criteria defined in the JSON format for searching for records in your chosen CSV dataset and then writing the selected records into an output file also in JSON

Unified JSON-encoded search criteria:

For example, WPRDC has a capital projects dataset. We might want to only select a subset of these records. Which subset we select needs to be encoded in a nice, python-friendly format, namely JSON. So we could allow the searcher to say: "I only want records in 2019, that started in January and that are in the area of 'infrastructure'"

{"fiscal_year": [-1], "start_date": [""], "area": [""], "asset_type": [""], "planning_status": [""]}

Sample Search Notes:

  • For dates: We will throw out malformed dates that are not YYYY-MM-DD
  • A blank value in any specified query for a column/field will disqualify that record from inclusion in the results
  • Empty string: do not limit results by this criteria at all
  • Note: the "planning_status" key in the search JSON corresponds to the field named "status" in the csv

program requirement 1: searching

Write code that can read in a search criterion JSON file of your specification (meaning, you determine what the keys/values should be that will generate selected records). You'll need to be prepared to share this specification with others in the class

Allow the user to specify search criteria for the records in your data set. IN the capital projects example, this included project fiscal year, start date, area, asset_type, and planning status

cakeExtension exercises

We can turn python objects into JSON files easily that can be digested by servers all over the world. We can also turn python objects into files that can only be eaten by other python interpreters. The library for this is called pickle. Try serializing (turning a python object into data inside a file, instead of in RAM) your project objects and resurrecting them using the pickle library.

311 Data Parsing challenge:

  1. Visit the WPRDC 311 data home page at the WPRDC. Study the data dictionary. Learn about the 311 program if you don't about their system.
  2. Extract a research question about the data that can be answered by processing the 300,000+ entries in their central data. Examples include:
  3. Write a program in python to answer these questions by ingesting the entire set of 311 data posted to the WPRDC. HINT: Start with a small subset of the data, like this random extraction of a few dozen rows.
  4. Come prepared to share your results!

arrow_upward back


Page created in 2019 by Eric Xander Darsow. Original content can be freely reproduced without any permission or attribution according to the site's content use agreement. Any content accessed by links to external sites or content with specific rights notices is governed by its respective use agreements.