Visualizing your data: Byte 4

  • Description: Your final product will be visualization of an animal adoption data set ([uniquid]-viz.appspot.com)
  • Hand In: Fill out the peer grading form on blackboard. 

Overview

In this project, you will create a visualization about the data from the Louisville Animal Metro Services (or a data set of your choice). In order to do this, you will need to make use of the D3 visualization toolkit.
  • Using a Javascript visualization library (D3)
  • Passing data from your python code to javascript running in the user's browser
  • Designing a visualization of your data

Detailed Instructions for Visualization Byte

This Byte should ideally build on the the data set you used in your data exploration Byte (Byte 2), or Byte 3 if you prefer. An example can be found at jmankoff-viz.appspot.com You should start by setting up a second google application, called [uniqueid]-byte4.appspot.com. You may use either a fusion table or a data base. We walk you through the use of  Google Charts and D3 in this tutorial. However there are many other platforms you may want to investigate in the future. D3 has derivatives such as NVD3 and Vida; other options are  HighCharts; and gRaphäel. You can see a comparison of the full set of options at socialcompare.com and this 2012 article from datamarket is also helpful. 

Creating a Custom Visualization in Google Charts

For this first phase of this tutorial, we will start with one of the simplest options for visualization that is available, Google's Charts API

To use Google Charts, we will need to get data your database (or fusion table) through the python code in 'main.py' and send it through jinja all the way to google charts (which is embedded in a webpage using the javascript language).

There are three important pieces to visualizing the data: Gathering the data, setting up the plumbing to display it in a chart, and adding the chart itself to 'index.html'. In the interest of having something to show as soon as possible (which supports debugging) we will do this in reverse order

Adding a custom visualization to your web page

We will first create a chart in 'index.html' showing fake data. To do this, you can literally copy the javascript code found in google's chart api documentation into 'index.hml'. In particular, we will copy the code for a column chart in between the <head></head> portion of 'index.html'. In order to display the chart, we need to add <div id="chart_div"></div> somewhere in the body of 'index.html' as well. At the end we should have something like this:


Notice that the data for this chart is defined directly in the javascript we copied over, in the lines that say:

var data = google.visualization.arrayToDataTable([
    ['Year', 'Sales', 'Expenses'],
    ['2004',  1000,      400],
    ['2005',  1170,      460],
    ['2006',  660,       1120],
    ['2007',  1030,      540]
 ]);

In addition, the title and axis specifications are found in the javascript (and can be customized further):

var options = {
    title: 'Company Performance',
    hAxis: {title: 'Year', titleTextStyle: {color: 'red'}}
    };

We will want to replace this with our own chart. Since we know something about what our data will look like, let's first create fake data that is more realistic: 

data = [
  ['Age', 'Adopted', 'Euthanized'],
  ['< 6 months',  1000,      400],
  ['6-12 months',  1170,      460],
  ['12-5 years',  660,       1120],
  ['>5 years',  1030,      540]
]

You may find that the labels on the horizontal axis are cut off with this data. I updated my options as follows: 

var options = {
title: 'Animal Outcomes based on Age at Arrival',
width: 400, height: 200,
chartArea: {height: '50%'},
hAxis: {title: 'Age', titleTextStyle: {color: 'red'}}
};

Setting up the plumbing for passing data from 'main.py' to the visualization

Our next goal is to move the fake data to python and successfully pass it to the java script we just added to 'index.html'. 

1) Place data into a table in 'main.py.' As it turns out the data structure syntax is identical in python and javascript so we can literally copy the data = [... definition above into 'main.py' 

2) Next we need to JSON encode the data (this will turn it into a simple string); and store it in the context to pass to jinja (which will pass it on to 'index.html').

Taking these two steps together, we get:

@app.route('/')
def index():
    data = [... # all the stuff above ]
    template = JINJA_ENVIRONMENT.get_template('templates/index.html')
    return temeplate.render({'data':data})

3) Finally, we need to update the javascript in 'index.html' to retrieve the data. This simply requires us to write {{data|safe}} wherever we want to access the data. For example:

var data = google.visualization.arrayToDataTable({{data|safe}})

When you are passing information back and forth from your python code to jinja to java script for the visualization, it will be important to understand what information is available on both ends. You'll want to use the console for your browser to debug this (along with the 'console.log' function in javascript). In chrome, you access the console using an operating system specific key combination

Debugging Hints

The flow of information in this code is multi-faceted. You are (hopefully) loading data from somewhere in Python, and packaging it up to send to javascript. Inside of javascript you may do further processing, and visualize the code, which creates DOM elements. Because of these complexities, you need to trace errors across several possible locations. If there is an error in your python code, it is most easily caught by looking at the Google Appspot log file, where you can print things out using the familiar logging.info(). Also, crashed code will show up in the same log if they come from your python code.

Assuming that your code doesn't crash somewhere in python, you may also need to debug on the javascript side. For this, you will want to use the javascript console, to which you can write (from within javascript scripts) using console.log(). Crashes in your javascript code will also show up in your console. As discussed in class, you can also inspect the DOM using the elements tab that shows up among the developer tools that include your console. You may have to go back and forth between debugging in your browser and in your python log files. 

Using real data to show the relationship between age and outcome

Although we have now created a custom visualization, it only functions with the fake data we gave it. Our next step is to hook it up to the data. 

We can use the same code as from Byte2 to load the full data set directly from Google Fusion Tables. However, I have found that the speed of Fusion Tables can be variable to say the least. An alternative is to use the same mechanism as in 'explore.py' from Byte 2 to load the data from a file. You will need to download the data set into a file (such as 'data.json') using the code from explore.py because a google app engine application is not allowed to write to disk (it can write to a data store, but we will not be covering that in this class). 

Once you have a file with data in it (you could just use the one from Byte 2), it needs to be placed into a static directory. We'll need to create a directory ('data/') inside [uniqueid]-byte4 and place 'data.json' in that directory. We'll also need to update app.yaml to tell google about the directory and make it application readable. NOTE: If you choose to do this, GOOGLE WILL CHARGE YOU A SMALL FEE FOR THE SPACE on an ongoing basis.

handlers:
- url: /favicon\.ico
  static_files: favicon.ico
  upload: favicon\.ico
- url: /data
  static_dir: data
  application_readable: true
- url: .*
  script: main.app

Next, we'll use python to collect the parts of the data we care about (without serial SQL queries). For example, to map ages to outcomes we need to initialize an array that contains an entry for each age something like this:
    age_by_outcome = {}
    for age in ages:
        outcome_vals = {'Age':age}
        for outcome in outcomes:
            outcome_vals[outcome]= 0
        age_by_outcome[age] = outcome_vals

and then fill it with data. We are simply looping through all of the rows of data and counting up information.
     
    # find the column id for ages
    ageid = columns.index(u'Age')
    
    # find the column id for outcomes
    outcomeid = columns.index(u'Outcome')

    # loop through each row
    for row in rows: 
        # get the age of the dog in that row
        age = age_mapping[row[ageid]]
        # get the outcome for the dog in that row
        outcome = row[outcomeid]

        if age not in ages: age = 'Unspecified'
        if outcome not in outcomes: outcome = 'Other'
        
        # now record what we found
        age_by_outcome[age][outcome] += 1

Moving to D3

To use D3 (much more sophisticated than Google Charts), first download the latest version from the D3 website and unzip it into the [yourname]-byte4 directory. Next, be sure to update your 'app.yaml' file so that your application knows where to find d3. We'll also want to make use of CSS stylesheets when using d3, so we'll add a directory for stylesheets to 'app.yaml' as well. You should change the handlers section to look like this:

handlers:
- url: /favicon\.ico
  static_files: favicon.ico
  upload: favicon\.ico
- url: /data
  static_dir: data
  application_readable: true
- url: /d3
  static_dir: d3

- url: /stylesheets
  static_dir: stylesheets

- url: .*
  script: main.app

Scott Murray's D3 Fundamental's tutorial (or his free online book) will acquaint you with the basics of D3 (you may also find d3 tips and tricks useful). At a minimum, you'll want to produce a bar chart similar to the one we produced up above using Google's chart capabilities. However D3 can do so much more! This section of the tutorial will walk you through how I created the stacked bar chart at jmankoff-byte3.appspot.com (which I based on mbostock's example stacked bar chart).

First, I organized the data in main.py into a list of dictionaries, containing the number of animals in each outcome. Here is what the final output looks like in the log:
[{'Foster': 0, 'Returned to Owner': 0, 'Age': '<6mo', 'Adopted': 0, 'Euthanized': 0, 'Other': 0, 'Transferred to Rescue Group': 0}, {'Foster': 0, 'Returned to Owner': 0, 'Age': '6mo-1yr', 'Adopted': 0, 'Euthanized': 0, 'Other': 0, 'Transferred to Rescue Group': 0}, {'Foster': 0, 'Returned to Owner': 0, 'Age': '1yr-6yr', 'Adopted': 0, 'Euthanized': 0, 'Other': 0, 'Transferred to Rescue Group': 0}, {'Foster': 0, 'Returned to Owner': 0, 'Age': '>7yr', 'Adopted': 0, 'Euthanized': 0, 'Other': 0, 'Transferred to Rescue Group': 0}, {'Foster': 0, 'Returned to Owner': 0, 'Age': 'Unspecified', 'Adopted': 0, 'Euthanized': 0, 'Other': 0, 'Transferred to Rescue Group': 0}]
This is created using about 30 lines of code in 'main.py'. The key section of that code is already listed above. The remainder of the code simply sets up the structure necessary for this to work.Once we have done this we pass it to 'index.html' as context:
       # add it to the context being passed to jinja
       variables = {'data':json.encode(age_by_outcome)}
       
       # and render the response
       template = JINJA_ENVIRONMENT.get_template('templates/index.html')
       return template.render(variables)

Now we need to set up index.html. First we need to tell it about d3:
  <script type="text/javascript" src="d3/d3.v3.js"></script>

Next we start on the script for displaying the data. First we move the data into variables accessible to javascript:
 
  <script>
       // ----------- EVERY CHART NEEDS DATA --------------
       // this is the data we passed from main.py
       // the format for data is: 
       // [{outcome1: amount1, ..., outcomen: amountn,
       // Age:'<6mo'}, ..., {outcome1: amount1, ... , Age: '>7yr'}]
       var data = {{data|safe}}
       // now collect the possible ages for use in axes of the chart
       var ages = data.map(function (d) {return d.Age});
       console.log(age);
       // and outcomes, for the same reason
       var outcomes = d3.keys(data[0]).filter(function(key) { return key !== "Age"; });
       console.log(outcomes);


Now we can easily loop through the data to calculate information we will need later for graph creation.  We want to create a graph that stacks rectangles for each outcome on top of each other. This means that only the first outcome is at position y=0, the remaining will be proportionally higher based on the amount of data in each previous outcome. Looping is done in javascript by saying: 
       data.forEach(function(d) {

We calculate a y0 and y1 (bottom and top) position for each rectangle. We also calculate the total height of all of the stacked bars (from the bottom of the bottom bar (0) to the top of the top bar). 
 // the y0 position (lowest position) for the first stacked bar will be 0 
 var y0 = 0;
 // we'll store everything in a list of dictionaries, d.outcomes
 d.outcomes = y_labels.map(function(name) {
   // each outcome has a name, a y0 position (it's bottom), 
       // and a y1 position (it's top). 
   res = {name: name, y0: y0, y1: y0 + d[name]};
       // and we also have to update y0 for the next rectangle.
   y0 = y0 + d[name];
   return res;});
 // we also store the total height for this stacked bar
 d.total = d.outcomes[d.outcomes.length - 1].y1;

The next section of the d3 code, labeled 
       // ----------- EVERY CHART NEEDS SOME SETUP --------------
sets up the axes and color scales. You should check the d3 documentation to understand more about what is going on here. For color picking, it can be helpful to use a site such as colorbrewer2.org

The meat of any D3 visualization happens through DOM manipulation. D3 uses an SVG element for drawing, which in this case we place inside of the body of the HTML. In D3, a series of commands can be carried out as serial method calls, so for example we set up the svg using:
 // the svg element is for drawing. We set its size based 
 // on the margins defined earlier
 var svg = d3.select("#viz").append("svg")
     .attr("width", width + margin.left + margin.right)
     .attr("height", height + margin.top + margin.bottom)
       // and add a group that is inside the margins
   .append("g")
     .attr("transform", "translate(" + margin.left + "," + margin.top + ")");
D3 also has a very unusual way of looping through data -- you simply reference it as yet another function call using something like .data(data). In the sample code we first create a group dom item for each bar by looping through the ages. Note that we select all '.Age' elements before we have created them (that happens in .append("g").
   // Create a group for each age
   var age = svg.selectAll(".Age")
      .data(data)
      .enter().append("g")
        .attr("class", "g")
        .attr("x_position", function (d) {return x(d.Age);})
        .attr("transform", function(d) {return "translate(" + x(d.Age) + ",0)"; });

Next we create a rectangle for each outcome. Again, we are selecting all rects before we actually append them to the visualization. This is non-intuitive but allows d3 code to be written without loops. 

       // create a rectangle for each outcome (for each age)
       age.selectAll("rect")
            // bind the outcome data for that age to that rectangle
           .data(function(d) { return d.outcomes; })
         .enter().append("rect")
             .attr("width", x.rangeBand())
             // use the outcome data to determine y position and height
             .attr("y", function(d) { return y(d.y1); })
             .attr("height", function(d) { return y(d.y0) - y(d.y1); })
             // use the color scale to determine the fill color
             .attr("fill", function(d) { return color(d.name); })
             
At this point, you should be able to display a stacked bar chart in your browser generated using the code we just went through. However it is easy to add a little bit of interactivity. First, let's create a style sheet ('d3.css') which we reference in 'index.html' as:
  <link href="stylesheets/d3.css" rel="stylesheet" type="text/css">

 Next we can make our bars respond to hovering:
rect {
        -moz-transition: all 0.3s;
        -o-transition: all 0.3s;
        -webkit-transition: all 0.3s;
        transition: all 0.3s;
}

rect:hover {
        fill: orange;
}

This is nice, but what if we want tooltips as well? A simple way to do this is to create a hidden div that we position and show based on mouse over events. To do this we need to add to our stylesheet:
#tooltip.hidden {
        display: none;
}

and add the div to our HTML inside the <body>:
   <div id="tooltip" class="hidden">
       <p><strong>Number of Animals:</strong></p>
       <p><span id="value">100</span></p>
    </div>
 
Finally, we need to add two more function calls to how we define our "rects":
   .on("mouseover", function(d) {
       //Get this bar's x/y values, then augment for the tooltip
       var xPosition = parseFloat(d3.select(this.parentNode).attr("x_position")) + 
           x.rangeBand() / 2;
       var yPosition = parseFloat(d3.select(this).attr("y")) +   14;

       //Update the tooltip position and value
       d3.select("#tooltip")
           .style("left", xPosition + "px")
           .style("top", yPosition + "px")
           .select("#value")
           .text(d.y1-d.y0 + " animals were " + d.name + ".");

            //Show the tooltip (it's a div that is otherwise always hidden)
            d3.select("#tooltip").classed("hidden", false);
            })
   // and cause it to disappear when the mouse exits 
   .on("mouseout", function(d) {
            d3.select("#tooltip").classed("hidden", true)});

When you create your visualization, be sure to give it some sort of interactive aspect. To facilitate this, the data structure we introduce above was selected to match the structure of the data loaded from a multi-column csv file typical of other d3 tutorials. Here is an explanation of how d3 loads csv files and what the data looks like. This means you can compare our approach to other d3 tutorials that make charts from data that has rows and columns. Examples are mbostock's grouped bar chart and delimited.io's multi series charts. You will also find this data structure similar to single series data tutorials (except that they have only two columns rather than many, the labels and the values). An example is this excellent d3 tutorial. You can even go so far as loading csv files in appspot if you want to. 

Hand In Expectations

You will be asked three things about your handin. 

First, does the code work and support interaction with the visual narrative in some fashion.

Second, what narrative story does your visualization tell? The best answers would talk about the relationship between the question the visualization answers, the design choices that are made about the visualization, and how those choices help the user to get the most from the visualization. 

Third, what did you do to improve the D3 chart in particular. A minimal change would be to improve the meaningfulness of the existing visualization. As it stands, the data you are displaying is hard to interpret because it's raw numbers. For example, it's hard to get a sense of the difference in percentage of animals with different outcomes because there were just fewer puppies than older animals that the shelter dealt with. This could be addressed in the visualization by changing the types of numbers used. You could also play with alternative visualizations (non bar charts); visualizations of other aspects of the data set; and other types of interactivity. For example, the D3 book we pointed you at walks through how to update the visualization when the user clicks on something like a radiobutton. You could allow the user to change the style or content of the visualization.
Comments