home
techred home > data anlaytics master sequence

DAT-102: Introduction to Data Analytics

The following table maps course session dates, lesson topics, references, and content links for DAT-102, Introduction to Data Analytics

course date wk no. session links learning objectives out-of-class work
DAT-102 Sat
6-FEB'21
1

Introduction to data analytics

Further optional reading

Familiarize yourself with the range of data types provided by Python environments.

  • TR.102.DS.3.A - Decompose the data analytics field
  • TR.102.DS.1.A - Data Tables - Creating: Create a data table with logically assigned types for each column and a unique identifier for each row

Please develop a "strip survey" containing a categorical question and a opinion/spectrum question. Compose the tiny survey in a text document and upload to a folder named with your public ID in our shared drive.

DAT-102 Sat
13-FEB'21
2

Recording


Recommended pre-reading from textbook for next week

Lock-5 Stats book, ed. 1, Ch. 2, sections 1-4 only (sections 5-6 explore two quant variables which we'll cover later)

  • Broadly Classify data analytic artifacts/products/displays (Quant/qual/categorical/textual)
  • TR.102.DS.3.C - Continuous & categorical variables
  • TR.102.DS.3.D - Data structures (list, set, stream, table, graph, tree)
  • TR.102.DS.3.E - Analytic modes: describing, modeling, predicting
  • TR.102.DS.1.B - Data Tables - Converting: Export and import data tables in .xslx, .ods, .csv formats

TODO for Spring 2021

Completion target: before class, 20-FEB

  1. Create a graph-based representation of a data set of your interest. Create a scheme for coding nodes and edges of appropriate types. Check out the work of your peers for ideas. You can visualize your graph data using any tools you choose: paper/pen, diagrams.net google drive app, or gephi. Include a key for your symbols (eg. connect musician nodes who featured in one another's songs as blue, connect technical consultation in red)
  2. Capture a image of your graph either with camera, scanner, screenshot, export.
  3. upload to onedrive in the Spring 2021 directory
DAT-102 Sat
20-FEB'21
3

Recordings

Graph Data Structures

This week we continued our work from last week's data structures module: in part 1 of the session, we encoded our graphical representations of a graph data structure in tabular format for easy exporting to a text file

In Part 2, We built a survey instrument for administering a question of interest to our peers in the course.

Phase 1: Encode a peer's graph in tabular format

  1. Access our cloud drive with graphs upload to onedrive in the Spring 2021 directory
  2. Choose a peer's graph that you're interested in.
  3. Create a tabular representation of the nodes and their connecting edges in a spreadsheet tool of your choosing. Review video for help.
  4. Export the spreadsheet as a csv (comma separated value) file and upload this text file to your peer's graph directory for our next sequence which we'll do together in class later.

Complete ALL peer surveys by Tue, 23-FEB at Midnight

  1. Log into your google account registered with Eric. Nav to drive.google.com
  2. Click: shared with me and find the DAT102_sp21_masterShared directory
  3. Then enter spring21_stripSurveys
  4. Make a new folder whose name is your firstName_surveyTopic
  5. Inside your survey subdir: Click New >> More >> Google Drawings
  6. Use the drawing tools to create your cateegorical question and your spectrum question. For the spectrum question create a horizontal line and label the end points with extreme values.
  7. By Tuesday Feb-23 @ midnight please have submitted responses for each of your peer's strip surveys in their respective directories. ALSO: take your own survey!!!
  8. Starting Wednesday morning until class starts next week, please create a spreadsheet in your strip survey folder on google drive, with each survey response getting its own row/record in the table. Give each survey a unique identification number, which you can use to check your data in the spreadsheet.
  9. Measure the distance each respondent marked along your spectrum by adding a vertical guide bar to the drawing. First, Turn on rulers by going to View >> Show Ruler. Then click the left most ruler along the left edge of the screen and drag the vertical guide until it reaches the tick on the spectrum. The drawings tool will autmoatically display the distance in inches to two decimal points of precision.
DAT-102 Sat
27-FEB'21
4

SP'21 Recording

Strip survey analysis

Summary-based descriptive stats: mean and standard deviation

Extra

SP21: TODO

  1. Record student responses to your strip survey in a google sheet inside your google drive directory
  2. Measure your total line length. Enter this value in a dedicated special cell in your spreadsheet to use for scaling.
  3. Compute a scaled score for your slicer in the spreadsheet as a Percent of total line length. Do this by adding a new column to the right of your raw measured value.
  4. Use formula master skills to generate a percent of total line distance. Don't forget an absolute reference to your total line length
  5. With scaled values, compute your quant profile for your aggregate responses (not sliced)
  6. Create new tabs in your spreadsheet, one for each of your possible slicer responses. name the tabs logically, without spaces or weird characters
  7. Copy your aggregate data from your first sheet into each of your slicer tabs
  8. Select all your data and sort the data by slicer question response. Delete the rows of the responses whose slicer answer is NOT the focus of that tab
  9. With your responses trimmed by slicer, compute your summary values for each of your data sub-sets (N, min, median, max, and mean)
DAT-102 Sat
6-MAR'21
5

SP'21 Recordings

Lock-5 Pre-Reading

Edition 1, Sections 2.3 (Spread) and 2.4 (Box Plots)

Lock^5 Book sections

Chapter 2, Sections 1-Sections 4

Draw conclusions about a data set based on box plots

Compute the standard deviation of a data set, interpret the results, and make inferences using Z-scores

  • If you didn't get a chance to finish your section of the strip survey analysis or analyze a peer's data, please do so this week.
  • Complete activities in Chapter 1 of Statistics Notes handout
DAT-102 Sat
13-MAR'21
6

Applying mean, median, and standard deviation

Match up the Distribution, stats blocks, box plot, and data source in this file

  • TR.102.DS.6.A - Surveys - Designing:
  • TR.102.DS.6.B - Surveys - Sampling & Administering:
  • TR.102.DS.6.C - Surveys - Analyzing:

Task 1: Strip Survey Analysis

Please populate column B - U in our 'Strip_Survey' tab of our master tracker, including most importantly Column U which asks you to describe the relationship between your box plots

Task 2: Stdev, Z-score practice packet

NOTE: Several pages are in inverted order! (9 before 8, etc.)

Key will be posted next week

DAT-102 Sat
20-MAR'21
7

Session Recording

ida mae farlow darsow

Begin library section sampling, to be continued next week.

Sampling!

Begin library section sampling, to be continued next week.

    1. Implement a sampling procedure that reduces selection bias by employing a random number generator to select population members for data collection

Step 1: Please sample 30 books from each of your two library sections: record the call number, number of pages, and some creative variable for each book in each section. Please create a data analysis home in our shared google drive sp21_librarySampling subdirectory. Also, share your chosen Library of Congress sections in our class master tracker on the library_sampling tab

Step 2:Begin populating our analysis guide, either in your spreadsheet itself or in the editable document

Step 3:Preview the Learning Resources in the library module page so your brain is ready to apply confidence intervals to our estimates.

DAT-102 Sat
27-MAR'21
8

Session Recording

Population inference from sampling

Library samples continued

Use the boostrap sampling procedure to make an estimate of a population parameter from sample data.

Spring break TODO: library analysis

NOTE: Skip hypothesis testing questions/sections

Dedicate a few hours hours to carefully responding to the analysis questions from your library sample. See our sampling module, and choose the library sampling mini-project. Uplod all your work in our shared google drive (load your google.com account, then navigate to drive.google.com then select "shared with me" in the left sidebar then locate our shared directory for SP'21 DAT-102) for formal submission Be sure to name your files with your public first name and your library section prefixes.

DAT-102 Sat
3-APR'21
- Spring break; No class all week
DAT-102
Thu
15-APR'21
from 6:00 - 9:00pm
9 & 10

This is a combined class session held from 6pm-9pm which is a make-up of the cancelled 10-APR'21 session and the rescheduled 17-APR'21 session due to instructor family funeral. Since this is not our normal meeting time, please attend if you can, but don't sweat if you can't. I'll post the recording straight away.

Session Recording

Review of CI Fundamentals Socrative Quiz

Review Library Sample Findings

Review of ENDS article confidence intervals

Log our final project ideas

    • Sampling 1: Implement the process of making an inference about a population parameter from a sample.
    • Sampling 2: Use a statistical package--such as StatKey--to experimentally estimate the standard error of the sampling distribution

Wrap a bow on library sampling

STEP 1:Complete as much as feasible of the library analysis questions and data sheets and upload them to our shared google drive.

STEP 2: Transfer essential attributes of your page count sampling to our master SP21 tracker Library_Sampling tab: specifically the population point estimate, the estimated population standard deviation (called the standard error) and your 95% confidence interval bounds.

Conf. Interval article study

Please study the two American Journal of Public Health articles distributed in class. Prepare to dig into their confience intervals for each sub-population:

  1. Law Enforcement Agencies' Perceptions of the Benefits of and Barriers to Temporary Firearm Storage to Prevent Suicide (Feb-2019, Am J. Pub Health) by Brooks-Russell, Ashley; Runyan, Carol; Betz, Marian E.; Tung, Greg; Brandspigel, Sara; Novins, Douglas K.
  2. Sociodemographic Correlates of Electronic Nicotine Delivery Systems (ENDS) Use in the US (Sep-2019, Am J. Pub Health), by Spears, Claire Adams; Jones, Dina M.; Weaver, Scott R.; Huang, Jidong; Yang, Bo; Pechacek, Terry F.; Eriksen, Michael P. (2016-2017)
DAT-102 Sat
24-APR'21
11

Session Recording

Introduce BiVariate analysis

See Lock5 Stats Sections: 2.5: two quant variables with scatter plots

US Cesus and ACS

The longest-running and most comprehensive sample-based data set is the US Census American Community Survey (ACS), the data from which is publicly accessible and incredibly rich.

  • TR.102.DS.7.A - Experiments - Designing:
  • TR.102.DS.7.B - Experiments - Treatment assignment & Implementing:
  • TR.102.DS.7.C - Experiments - Analyzing:
  • TR.102.Q.10 - Standard errors
  • TR.102.Q.11 - Student's T-tests - Setup
  • TR.102.Q.12 - Student's T-tests - Interpretation

Step 1: Study AJPH Articles from last week's homework and Prepare for socrative quiz on conf intervals

Step 2: Populate our master tracker Library_sampling sheet with your library data

Step 3: Familiarize yourself with American factfinder for next week's worktime and propse soe variables of interest to you

Start thinking about your final project

DAT-102 Sat
1-MAY'21
12

Recordings

Census Groupwork

Final project practice and design

1

Begin final project

OPTIONAL Out of class:

Digest PGH Inquality report

Due to COVID-19 reorganiation, we will be unable to discuss the data and the sociology behind Pittsburgh's Inequality Across Gender and Race Report issued by the Pittsburgh Gender Equity Commission. As you desire, please engage with the report on your own and with others in your various circles. These discussion questions may be a guide for your discussion:

  1. Review the study's aggregation of smaller racial subcategories into the "AMLON" category. What are the advantages of this statistical approach? Its limitations? Would there be other ways to aggregation races into smaller categries?
  2. Review the Report's focus areas in the section called "Cultivating Livability." Which of these priorities do you believe are most salient at this time in Pittsburgh? Most data-based? Least data-based?
  3. Carefully study the comparison methodology in Appendix A. Develop a thoughtful opinion of the author's assertion on page 72, third paragrah which starts: "When outcomes, like grade reten tion rates, are similar across cities they are likely to be driven more by national policies and factors...". Can you think of any indicator patterns which do not exhibit this behavior?
DAT-102 Sat
8-MAY'21
13

Session Recording

Experiments: Mind Food

Randomized Controlled Trials

Final project concept development

Identify experimental design components in several novel experiments.

Undertake and document your final project for sharing next week. Create a sub-folder with your first name and your topic of study inside the directory in our shared drive called dat102_sp21_finalProjects

DAT-102 Sat
15-MAY'21
14

FINAL EXAM PERIOD from 10:00am-12:00noon