Syllabus

The goal of this course is to provide you with the tools to build data-driven interactive systems and explore the new opportunities enabled by this data through a combination of guest lectures, discussion of current literature, and practical skills development. Over the course of the semester, you will learn about the entire data pipeline from collecting and analyzing to interacting with data.

This course requires comfort with programming, as required projects make use of (at a minimum) python, sql, css, and javascript (including D3). A series of "project bytes" help to lay the groundwork for larger group projects.

The learning goals of the course are as follows:

  • To introduce basic concepts in data collection including data formats, parsing and sources of data
  • To introduce common problems with data such as structural problems, outliers, incomplete data, and dirty data
  • To introduce basic concepts in data interpretation including feature generation, statistical analysis and classification
  • To introduce basic concepts in data visualization including what makes a good visualization and the use of interaction in visualization
  • To provide practical applied examples of the data pipeline through an examination of current literature
  • To provide hands on experience with creating data driven applications and a produce a portfolio of such applications

Prerequisites:

The class will involve programming and debugging. If required by your background, it is possible to minimize the programming you do for projects (in which case you will be expected to spend more time on other factors such as beautiful visual designs). However, you should not take the course if you find programming or debugging extremely difficult because you will have to master several very different programming languages/concepts in very short order (projects make use use of web programming frameworks including Flask, Bootstrap, Ajax, jQuery, D3, Google Appspot; and multiple languages including Python, Javascript and SQL).

Projects:

The course is project oriented. It includes 1-2 self-defined projects along with 4-6 smaller "project bytes" designed to provide the stepping stones needed to complete the larger projects. Your work will be evaluated relative to your background and level of effort. This is a graduate class, and the assumption is that you are a mature and motivated student, and that you will define your work so that you learn and grow, given your background. Students who are taking this course as a part of a technical requirement (such as the computer science course requirement in the HCI PhD) will need to do more advanced or ambitious projects, and should consult with the instructor to make sure they are meeting this bar.

All bytes are to be done as individual work. It is expected that students may assist each other with conceptual issues, but not provide code. If you use example code, you must explicitly acknowledge this. If you are unsure about these boundaries, ask. The larger projects are to be done in groups of two or larger. 

Some of the specific skills that will be covered in projects include:

  • Display data from an API (such as the twitter API) on a website you create
  • Create a mashup of data from multiple web APIs
  • Create an interactive visualization of a data set

Exams:

There will be regular in-class quizzes. There will be a take-home final exam but no midterm. Of the in-class quizzes, you may drop your two lowest scores.

Course Materials:

This term we will be using Piazza for class discussion. The system is highly catered to getting you help fast and efficiently from classmates, the TA, and myself. Rather than emailing questions to the teaching staff, I encourage you to post your questions on Piazza. If you have any problems or feedback for the developers, email team@piazza.com.

Find our class page at: https://piazza.com/cmu/spring2016/05839/home/home

Readings will be made available on the CMU Blackboard. The following books are recommended:

Interactive Data Visualization for the Web (Free online version

Doing Data Science (Schutt & O'Neil) -- based on the very successful Columbia course on data science taught by Schutt (uses R and Python)

These books may also be useful:

Visualize This (Nathan Yau) (uses R and Python)

Programming Google App Engine, Charles Severance (uses Python, plus add-ons like JavaScript)

Python for Data Analysis, Wes McKinney (Python) 

Brief (and Tentative) List of Topics Covered:

Concepts

  • Structured vs unstructured data
  • Dealing with heterogeneous data
  • Sampling and Bias in Data Collection
  • Sensed Data
  • Mobile Data
  • Data transformation and analysis
  • Information Visualization
  • Current research in information driven interfaces

Skills

  • Getting Web data
  • Dealing with APIs and Oauth
  • Getting access to mobile data
  • Common data formats
  • Data parsing
  • Common problems with data
  • Tools for analyzing data
  • Tools for visualizing data

Readings and Discussion:

You will be expected to read assigned readings before the lecture they pertain to. These may include chapters drawn from textbooks about data, or readings about the research literature. To incentive this, each student will be required to make at least two relevant postings to the discussion group before the class on which each reading is due.

Grades:

The tentative breakdown for grading is below. The course will make use of peer grading (details will be provided in class). As a reminder, here is the university policy on academic integrity.

10%Class Participation
40%Project Bytes (Peer Graded)
40%Full Projects (midterm & final)
10%Final Exam & Quizzes
Comments