Mighty PostgreSQL

Databases store organized information. Early databases arranged records pertaining to a specific idea into rows in tables, interlinked to avoid data duplication. Modern databases build on relational systems by flexibly storing arbitrary key:value pairs that are nicely digested by ECMAScript and its related web technologies.

Database ecosystems

Modern databases emerged in the 1970s with the advent of the relational database embodied in Multics --the first commecially available relational database. Today, database technology spans much more than table-based storage schemas to include so-called NOSQL ("not only SQL") databases such as MongoDB which are based on insertion and retrieval of key:value pairs--which integrates extremely well with web and javascript-based platforms.

Reverse engineering databases from query building forms

National Electronic Injury Surveillance System (NEISS)

Exercise Steps

Review the sample database above of national electronic injuries. On that front page, select a User Affiliation and click "Build Query". Then Run a query or two and see that once you get a national estimate, you can download the flat file showing the actual recorded incidents
This is an exercise in re-creating possible database tables and their relationships based on the query options provided and the output the query generates. With a partner, run a few queries about topics of interest to you related to electronic accidents.
Study the ouput carefully and begin sketching a possible table structure that might exist inside the relational database for generating the results that you observed. Notice that we are interested in what kind of "utility" or "lookup" tables that might provide fixed categories, like a Product Group table that provides entries for the drop-down box called Product Group.
Compare your tables with those in the class and discuss how they might be keyed, what data types might be used, and why this structure allows for normalization (i.e. not duplicating data anywhere).

External resource links

PostgreSQL home page and download link

PostgreSQL on the Wikipedia (excellent article!)

Official PostgreSQL documentation

The COPY command for writing to and from files

Software install and workspace setup

Carry out the following tasks

Install PostgreSQL database on a computer on which you have admin rights
Test your install by navigating to a command prompt and testing by issuing an empty command: psql which is the name of the postgres daemon on windows and *.nix systems (Linux, OSX, and Unix)

Configuring your PostgreSQL install

The postgreSQL access system is based on roles which are granted access to various database objects, such as a database itself, or an individual table or view. A user in the postgres system is synonymous with a role in that users and roles are both granted powers to carry out various database related tasks.

Postgres is built in with a super user called postgres. You can login as postgres with the following command on Windows
psql -U postgres
Once logged into the postgres prompt as the postgres user, you can run the following commands to create a database in which you can create tables, etc.

CREATE USER xxxxx WITH PASSWORD 'xxxxxx';

For Linux users"

CREATE DATABASE xxxx ENCODING 'UTF8' LC_COLLATE = 'en_US.UTF-8' LC_CTYPE = 'en_US.UTF-8';

For windows users, just create the database without any of the above parameters:

CREATE DATABASE xxxx;

GRANT ALL PRIVILEGES ON DATABASE xxx to xxx;

With this framework in place, you can then logout of your postgres prompt with \q

Now, login to the newly created database with your newly created user and begin running data definition commands, such as CREATE TABLE, ETC.:

psql -d xxx -U xxxxx

Create your first tables

Creating tables in PostgreSQL involves declaring the columns and their respective data types. The documentation on PostgreSQL for creating tables is straightforward. You can also consult the complete reference on the CREATE TABLE command in SQL.

Complete table creation and ID sequence

creating a sample table with sequence and foreign keys

This SQL code creates a squence for generating the primary key on the table. We access that primary key with the nextval() method. The ALTER TABLE commands add foreign key relationships to the table. You can create foreign keys inline with the table creation or after the table creation with these ALTER TABLE commands.

Sample Postgres DB Creation Scripts

Copying into postgreSQL tables from text files

The COPY FROM command in PostgreSQL allows you to systematically read data into database tables from flat files. The tool documentation is dense, but worth a close read.

The COPY command for writing to and from files

Navigate to the Western PA Regional Data Center's Allegheny County Jail Census page and download one month of data for testing.
Save the flat file into a known location on your hard drive. Navigate your command prompt to this location and use dir to ensure that the flat file is in the same location as your command prompt. This will make reading in the flat file into your postgreSQL table much easier since file locations will not be tricky.
Carefully read the syntax for the COPY ... FROM command.
Create a table for holding the census data as shown below.

Creating a table to hold file from a CSV

creating a table to hold allegheny county jail cenus data

This SQL code creates a table to store data read in from a flat file acquired from the WPRDC. Note that based on the reading for the COPY FROM command, the column order in the table specified at the beginning of the COPY command will be mached with the number of columns in the flat file. Types must match.

Naviate into your database via the command prompt so you are in the interactive terminal for postgres. This will be something like psql -d xxxx.
Compose the following command in a separate text file in case you need to try it a few times.
\copy jailcensus (datecount, gender, race, agebook, agecurrent) FROM '/home/sylvia/teaching/data/jail.csv' WITH (FORMAT CSV, DELIMITER ',', HEADER TRUE);

Note that we used the \copy command instead of COPY to allow us to avoid issues with super user rights. \copy functions just like COPY except we don't need to be the postgres user.
When complete, you should be able to query the table and see the data property read into the system.
Use an aggregate function to compute the average age of inmates at the time of booking on each day of the census.

Successful reading in of data from the flat file of jail census data

A successful execution of the import of the data will read out the number of rows copied into the database, in this case, over 73 thousand rows were copied in!

Data copied into the jailcensus table from the \copy command

A successful execution of the import of the data will read out the number of rows copied into the database, in this case, over 73 thousand rows were copied in!

Query practice

The SQL Language

Attempt to write SQL queries to answer the following questions about the Allegheny County Jail using the jail census data you imported into your first postgreSQL database in previous sections

How many days of data are in your dataset?

How many inmates were counted each day of record in the database?

How many inmates were counted each day of record whose agebooked was less than agecurrent?

What is the ratio of Black to White each day of record in the database?

What is the avergae age of inmates on each day counted in the census?

Postgres project outline steps

We are going to work through the process of creating and exploring multi-table data by first working with flat files of US census and ACS data. Then, once we feel good about the census data detached from its spatial components, we'll enable the PostGIS extension in PostgreSQL and being querying and viewing the data spatially. Use the following steps as a guide for assembling multiple tables of data from the US Census to create background data for a spatial inquiry of an area of your choosing.

Create a new file called your work process log where you can document the steps you take in this process--this is for you in a year, and for others who might want to reproduce or validate or build on your work.
Choose a region of the country, such as a half of a US state, and develop an inquiry question related to human life in this area that the American Community Survey can shed light on.
Develop a social inquiry question to investigate related to how one or a set of indicators have changed through time. We can pull ACS data from many different years, some of which we can go back as far as 20 years.
Locate 2-4 tables in american factfinder related to your pattern of interest. Download that table data at the census tract level
Organize and trim these files in a spreadsheet so that they can be read into your database with the \copy command nicely. This will require writing scripts to create tables to match the columns of interest in your mini-project.
- Boil down the two header rows into a single one
- All column names should be in lowercase characters, no spaces, and do not start with a number
- Remove all unwanted columns, focusing in on about 6-10 per us census table
- Ensure that you preserve both (or all) of the identifer columns, called geoid, id, etc.
Conduct a basic analysis of the change in indicator over time using formulas inputted into your SQL queries.
With flat file analysis done, you're ready to visualize the data spatially: follow instructions in the next section to enable postgis extensions in your database in postgres and read in your spatial data

sub-module h: PostGIS and QGIS!

With the postgreSQL database up and running, and the qgis modules under the belt, we can enable some schnazzy extensions to posgreSQL which allows for storage and manipulation of spatial data right alongside regular tabular data! Once we plug qgis into our fancy database tables, we can drop layers into onto a qgis whose data resides not in ordinary, disconnected operating system files, but rather in structured tables in our database!


objective	Add spatial data to a map project in qgis whose data source is a postgis-enabled postgreSQL database created by reading in an ESRI® shapefile with the shp2pgsql command line tool
contents	Section 0: Software and external resources Section 0.5: Pre-reqs Section 1: Enabling the postgis extension Section 2: Procuring ESRI® shapefile data Section 3: Reading in spatial data with shp2pgsql Section 4: Connecting postgreSQL and qgis Section 5: Packaging analytic queries in database views
software and resources	PostGIS extension install and docs shp2pgsql man page gnu project's BASH reference manual Official PostgreSQL documentation
system pre-reqs	sudo powers on a linux machine; this tutorial is built on ubuntu 18.04 A database that your postgres role owns or has owner-like privileges on qgis 2.x or 3.x (3.x is desired universally but is a huge pain to build on ubuntu 16.04. all is well with 18.04 LTS)
skill pre-reqs	Undertaking this sub-module is intended for users who are able to semi-fluently communicate with a postgresql database in SQL via bash using psql -d {database_name} (with heavy reliance on documentation) semi-knowledgeably populate a connection information GUI box switch into postgres user mode in unix-like systems and make basic changes to databases with heavy consultation of documentation create and manipulate basic project elements in qgis: layers, basemaps, etc.

Section 1: Enabling the postgis extension

postgis is the name of an extension module to the postgresql database which is itself a collection of tools which together allow users to:

Store spatial data--both vector and raster types--in database table columns and manipulate their values as we would any other "normal" postgres data type like TEXT or INTEGER.
Transfer, package, and convert spatial data as it makes its way into and out of the database tables

The following sequence assumes you have a database whose name you know and you can connect to in pgadmin

Open a terminal window in Ubuntu linux and update your packages, carefully reading any output and responding thoughtfully:

sudo apt update
Install the postgis extension package and its dependencies by saying:

sudo apt-get install postgis
If all goes well, no error messages will be on the screen. We're now able to enable this extension in our desired database(s). This change requires administrative rights on your database, which your linux username role probably doesn't have. But the all-powerful postgres user does, and this user is automatically created when you authorized the install of the postgres database.

To become the postgres user, we use the su tool which allows us to interact with the operating system as if we were a user that is not our current login user. Such identity-shifting powers are only allowed to so-called super users, so we must preface our request with sudo

sudo su postgres

You'll notice that if this command is successful, the prompt on our terminal has changed to read postgres@computer: {some dir path}$

Then you can talk to the postgres database through the psql program which needs to know which database we wish to tinker with. We tell it which database with the -d flag.

psql -d {your_database_name}

The command worked if we are in an interactive exchange with postgres shown by a new terminal prompt that is just your database name and the # symbol.
Ask postgres to wire up our newly installed postgis extension with

CREATE EXTENSION postgis;

If all is well, postgres will say CREATE EXTENSION back to you. Exit out of the interactive terminal with a \q and stop impersonating postgres with a simple exit command. You are now operating the terminal as your original login user. See image h1 below for a screen snip.

image h1: shell commands for enabling postgis

Purple text are annotations by the author, who is logged into an Ubuntu machine as user sylvia. This shell prompt tells us that the user is currently attached to the file tree in /home/sylvia/teaching/data/school. The author becomes postgres in order to create the postgis extension on the database named data201

top

Section 2: Procuring ESRI® shapefile data

Gather up spatial data that we can send into Postgres!

Procure spatial data of your choosing to read into the database. The author is reading in all of the US Census tracts in Allegheny County, Pennsylvania retrieved from the TIGER spatial database hosted by the US Census.
ESRI Shapefile data spans many individual files and will be extracted from archives (i.e. "zipped" directories) so stop and create a sensible place to put the uncompressed files, preferably within a master project directory and either in dedicated directories for each ESRI Shapefile layer or a directory for all ESRI Shapefile data that's separate from your work log and qgis files, etc.
Extract all zipped directories such that you can see five or more individual files that together make up the entire ESRI Shapefile. See image f2 below.
Use your bash terminal skills and the cd command to navigate into the directory containing the expanded ESRI shapefile data such that when you run the command

ls -al

you can see a listing of a set of files with strange extensions, one of which must end in .shp
Use the shp2pgsql tool which knows how to patch together the 5+ files in ESRI's proprietary spatial data format and extract all encoded geometrical features and their associated attributes. It will generate SQL code which inserts the features it finds into a database table.

We use the magical character called a pipe (on most keyboards, it lives in the shift position on the key above Enter) to direct the SQL code that shp2pgsql generates into the database on which we enabled the postgis extension:

shp2pgsql -c tl_2010_42_unsd10.shp | psql -d data201

Or more generally:

shp2pgsql -c {.shp_file_name} | psql -d {name_of_database_with_postgis}

If you get an error, try running the command on either side of the | independently and seeing if it behaves as expected. shp2pgsql -c tl_2010_42_unsd10.shp by itself on the author's computer outputs gobs and gobs of pages of hexadecimal digits representing spatial data.

Correct output will look like the author's output in image f3 belw: inserts will be successfully made into the database table that

image f2: ESRI Shapefiles

A directory containing both the compressed .zip file and a tree view of the directory into which the files are extracted.

image f3: shp2pgsql working!

The database is responding with feedback from each successful insert of spatial data.

top

sudo apt install postgis sudo su postgres psql -d [your database name} ENABLE EXTENSION postgis;

If all is well, you should see this lovely awesome delightful sucess feedback from postgres:

Reading in spatial data into postgres

PostGIS documentation

Start by securing an ESRI shapefile of spatial data (any kind) and save it to a known location on your drive.
For many, this will be through TIGER shapefile repo maintained by the US Census Bureau
With your shapefile downloaded and extracted, we can build our command to read in the entire shapefile into a table which

shp2pgsql -c tl_2010_42003_tract10.shp dat102db
With QGIS open, navigate to layers then data source manager then

[the great and mighty] PostgreSQL database

Database ecosystems

Reverse engineering databases from query building forms

Exercise Steps

saveExternal resource links

wb_incandescentSoftware install and workspace setup