Byte 3: Mobile Data

Description: Your final product will be a web page that tells a story about your mobile data (e.g., common locations you visit and the activities that you perform at those locations) using visualizations.

Sample Source code: jmankoff-mobile
Learning Goals: 
  • Collecting your own data
  • Working with a data about a familiar topic
  • Working with mobile data
  • Visualizing Data

Set up for Mobile Byte

Setting up for Byte 3 is a multi step process that depends on what platform you are using (iphone versus android). You will need to do the following tasks:

Set up a sample byte, named [uniqueid]
  1. Install AWARE on your phone (required)
  2. Set up a cloud database for your byte's data (optional)
  3. Make sure you have the data. There are two possibilities here:
    • Connect AWARE to your cloud database and test your set-up (data will be automatically moved there for you)
    • Upload your data to your google Cloud Storage (not until Mobile byte is assigned)

Installing AWARE on your phone (required)

AWARE is open source and you might find it interesting to expand the set of sensors and capabilities of this application as part of a final project. Many off the shelf sensors are included that require no programming. Additional things can be done with programming. However since mobile application development is not a major focus of this class, our tutorial will not include further information about AWARE development.
If you are an Android user, you should download and install the default aware android client. Note that the client may use battery, but it will not cost you any of your data plan because it will only upload data when you have Wi-Fi access, by default. 
If you are an iPhone user, you can download and install AWARE using iTunes.

Setting up a Cloud Database for your Byte's Data (optional)

You will need to create a Cloud SQL database for your application in order to collect data from the aware framework in a format that you can compute over. Also note that once the data collection is over, you can just change the study status to closed and your client will stop uploading/collecting information. The data stays in the database unless you delete it. The Google documentation describes the steps you have to do. Use the menu at the top left of your cloud console for your byte ([uniqueid]-mobile) to  create a Cloud SQL database  (select SQL, shown at bottom left, and then create instance, using the First Generation option).

You will need to keep track of the name of the database you created in Step 3 of  create a Cloud SQL database for your application. You might, for example, call it 'mobile-data.' For most other options, you can use the defaults. At the very bottom, you will need to specify 'Authorized Networks.' Add as the authorized IP. You should also add the IP of a machine you have access to so that you can access the database directly from your laptop for setup and debugging purposes. In my case, I used a CMU server (anything that is not just dhcp will probably be fine). 

Finally note down the IP the instance has obtained, which will be needed in the aware server. You can see that on the listing of sql cloud instances associated with your byte (for example, I can see mine at You should also note down the instance connection name (mine is visible below as jmankoff-mobile:use-central1:mobile)

For testing, you will also need to set up a root password (which you can do on the access control page, just below the 'request an ip' button). As you can see in this image, you can do so by clicking on 'Access Control', then 'Users' and then 'change root password'

Once you create an instance, you will also create a database. Go to the Databases tab and click on "New database":

Then enter your database name:

Test your connection from your local machine. You can use MySQLClient, or at the command prompt (assuming you have mysql installed), you can type:
mysql --host=instance-IP --user=root --password

The result should look something like this:

Once you are at the mysql prompt, you can set up a new user if you want (instead of root) to use for remote access. 

CREATE USER 'newuser'@'' IDENTIFIED BY 'password';
You will need to give this new user (or root) access to the database you created:

GRANT ALL PRIVILEGES ON data.* TO 'newuser'@'';
You may also want to do this for your own IP address so that you can test everything by logging out and logging in again as 'newuser'. 

Connecting Aware to your cloud database (ONLY if you choose to setup your Cloud database)

Once you have AWARE installed and you've set up a database, it is time to connect the two of them together. To do so, will need to create a 'study' (of yourself) and specifies what you wish AWARE to sense. To do this, you need to go to the website for study creation. To access this website, you will need to accept a security certificate. In my mac, this caused the browser to try to stop me from proceeding:
I clicked on advanced and chose to proceed. This led me to a 'sign in with google' button. Once you do that you will be able to see the dashboard. 
Click on 'Create Study' at right. Give it whatever name and description you would like. 

Then select 'Remote Server'. You will need to provide the details of your google data store in order to complete this task (as described above, hostname, port, database, username and password). The hostname can be the IP address for your cloud database (which you used to test things out above). The port is always 3306 (the default for MySQL). The database will be whatever you named your database (as mentioned above) and the username and password are specified under access control as described above. The username is 'root' (or [newuser] if you created a new user) and the password is what you set. 

The next page that you see summarizes the study and also gives you the opportunity to set parameters.

You should click the pencil to the far right of 'Sensors' and customize what data you would like to collect. A minimum requirement for this byte is 'google activity recognition' and location (GPS or google fused location). Be sure to check the box for what you want and click on the green check mark at the far right to save. 

Joining the Study

Once you have finished creating the study, you will need to join it. Open the aware client on your phone and press the menu button. Choose 'Studies'. This will launch a QR code reader. 

Meanwhile, on your screen, you should press 'Show QRcode' (it's just above edit, as shown in the screen shot above). Note that you can get to this screen anytime by logging into the website for study creation and clicking on your study.  

Scan the code with your phone to add it to the study. Once you scan the QRCode on your phone, follow any instructions it may provide: install a plugin, activate the accessibility services (See Increasing Availability). If you are connected over Wi-Fi, you should see your phone on the list of Devices.

Finding a phone's ID [optional]

If you end up collecting data from multiple people (say as a final project) you may need to separate them from each other. The ID for a phone is found as follows:

Open the AWARE Sensors icon from the mobile phone’s launcher and you should be able to see your device ID as AWARE Device ID, a UUID (Unique Universal ID) alphanumeric value that is assigned to their client when they installed the AARE Client.

Increasing Availability (Android Users Only)

Once you install the application, it should start logging data to the server. However, if you have an Android phone, be aware that Android kills "non-essential" services that are running in the background when there is an issue with memory, battery, and so on. If you want to ensure that your application is running at all times, you must activate the AWARE framework in the Accessibility services (in your phone settings). This is not required, but will increase the reliability of your data. Note this will cause your phone to talk to you on Samsung devices only due to an accessibility bug. 

Samsung Bug (Android Users Only)

- Go to Settings -> Accessibility -> Talkback « ON »
- Select « Settings » from the bottom of the page
- Uncheck all Talkback options
- Switch Talkback « Off » again
- Go back to Settings -> Application manager -> All
- Disable Google TTS (text to speech)
- Disable Samsung TTS (text to speech)

If you’re experiencing the same issue on another phone, try this :

- Go to Settings -> App Manager -> Select GoogleTTS -> Disable
- Go to Settings -> App Manager -> Select SamsungTTS (on a Samsung phone) -> Disable

Testing your setup

To test if your phone is correctly working and hooked up, you can use the same website used to set up the study to send your phone an ESM message. This is only enabled once you have a phone connected to the study, and can be found at the bottom of the same screen where you edit the sensors. Here's what it looks like:

If everything goes well (i.e., your device is currently connected to the server, the server is up and running), you should see a question on your phone’s screen.

You can also test the database link by selecting the Broadcasts tab, and sending a request to sync database. The image below shows how this is done, with the user's phone that should sync selected. 
Finally, you should double check that there is actually data in your google cloud database. You can do this by logging into the mysql server remotely mysql --host=instance-IP --user=root --password or by checking the cloud console's log of read write operations, as here:

Main Mobile Byte Tasks and Handin Expectations

Data from your phone will be in the google Cloud Storage database you created. When developing on the Google Cloud Shell you will always need to 'deploy' your code and then test it online at [uniqueid] When testing online, you can take advantage of Google's cloud logging service to see a trace of what happens, which is very useful for debugging. Access it using the same left hand menu that got you to SQL

There are two paths through this assignment, described below under 'Non Programmer Path' and 'Advanced User Path'
  • For those with minimal coding skills, we encourage you to export your data as a csv, and then load it into Lyra (documentation here). From Lyra you can export static images (less preferable) or vega-lite specifications (with your data embedded) which you can then load into your application as interactive visualizations (More preferable). We will update this with more details and demo in class on Thursday.
  • For others, you can work directly with Vega-lite. Some details are below.
  • Only for advanced users: Note that you could access your SQL database locally from your laptop if you have a static IP that you set up as an 'Authorized Network' when you set up your cloud instance. You can decide which approach to take.  

Non Programmer Path:

To export your data, you will need to use your google cloud console. You'll need to know the name of the database you used to collect AWARE data as well as the tables you wish to export to do this. Your first step will be to make sure that you know those names. Here are some commands that you can type into cloud shell to accomplish this. Note that you will need to use the same username you used when you set up the link between aware and the cloud shell, and you'll need to have set up permissions so that you're allowed to access that database from your cloud shell. Neither of those tasks are covered here but most likely you already set up a username and shell access when you set up aware.

gcloud beta sql connect [instance name] --user=[username]
Next, you will need to check the table names and pick one. This is standard mysql, and here is a way to do that (with images showing applied examples). words capitalized in my examples are SQL commands. Sometimes you will see these in lowercase in the images, SQL doesn't really care. However when you name a database or table, you have to always keep the case exactly the same, so SQL does care for those.

In this student's case, mobiledata contains the aware data. We'll call that [database] in example text instead so you know to use your database name there.
USE [database]
SHOW tables;
Once you have listed your tables, you should choose one to export, which we will call [table] (an example might be locations or battery).

Finally, you need to go to your cloud console and export the data, as described here
Here is a picture of what that would look like once set up properly to export the locations table as a csv file. Note that you have to click on 'advanced options' to enter the database name. Also note that you can write other more complex SQL queries if you wish.

Once the export is complete, you will get a browser notification. You have to go to a completely different part of the cloud console to download the exported file (sorry, folks!). To get to it you have to go to the storage area, and select the bucket for your database. 

You'll see either one file (a log), if your export failed, or two files (a log and a csv file) if your export succeeded. You can download the log file to find out why the export failed, or share it with us on piazza, if you run into trouble.

Once you have downloaded your CSV file, you can import it into wrangler to clean it, to Excel if you want to play around with different ways to visualize it, or directly into lyra to begin designing your final interactive visualization. 

Note that to import the data into Lyra, you either need to past raw values in, our use a URL. If you want to use a URL, in the interface shown one image previously in this tutorial, you can check the checkbox on the right (after the date) and it will set the data file to be publicly accessible. You can right click on the link provided and give that to Lyra. This image shows the part of Lyra used to upload data. 

Once your chart is complete, Lyra will let you export it using the share icon in the middle bottom of the screen. You can export it as a PNG, which you can embedd in your handin website on appspot as a static visualization. 
If you click 'with inlined data values' instead, you will get a json file, which you can then paste into your index.html template instead of the sample data shown in the sample code. To do this, you should find the beginning of the script tag up through the semi-colan:
 // Assign the specification to a local variable vlSpec.
 var vlSpec = {
   "data": {
     "values": [
{% for row in queries[0].results %}
          {"date": "{{row[0]}}", "count": {{row[1]}}},
        {% endfor %}
   "mark": "bar",
   "encoding": {
     "x": {
      "field": "date", "type": "nominal",
   "axis": {
       "title": "Date"
     "y": {
       "field": "count", "type": "quantitative",
       "axis": {
         "title": "Record Count"

You will want to delete everything inside the outermost {}; so first delete it:
  // Assign the specification to a local variable vlSpec.
  var vlSpec = {


and then paste your stuff in the blank space. Now when you deploy your app, you should see the interactive visualization! Not only that, if you are doing it this way you don't even have to deploy to the cloud to debug -- it should work on the dev server too! Have fun.

Advanced User Path:

The library we will be using to access the data is MySQLdb (documentation). Note: You must have the MySQL-python package installed on your system (even on your Google Shell), but not in your appspot application directory. This can be done with something like sudo apt-get install python-mysqldb or using pip to install it). You will also need to add it to 'app.yaml' (or copy it over from my github version jmankoff-mobile).

- name: MySQLdb
  version: "latest"

and (3) in '': 

import MySQLdb

Some other useful resources for working with MySQL databases from Python and Google Cloud Platform:
Now we can access the database. To access it you will need your DB instance connection name. You can get the name by going to the Overview tab for your connection in the Cloud Console or by executing gcloud sql instances describe [your-db-instance] in the Cloud Shell. The following code will show the names of all the tables in the database. :
_INSTANCE_NAME = '[your-instance-name]'
_DB_NAME = 'test' # or whatever name you choose
_PASSWD = 'your-password'

if (os.getenv('SERVER_SOFTWARE') and
    os.getenv('SERVER_SOFTWARE').startswith('Google App Engine/')):
    _DB = MySQLdb.connect(unix_socket='/cloudsql/' + _INSTANCE_NAME, db=_DB_NAME, user=_USER, passwd=_PASSWD, charset='utf8')
    _DB = MySQLdb.connect(host='', port=3306, db=_DB_NAME, user=_USER, charset='utf8')

def index():
    template = JINJA_ENVIRONMENT.get_template('templates/index.html')

    cursor = _DB.cursor()
    cursor.execute('SHOW TABLES')

    return template.render()

Yours may differ from mine, which were:
('applications_crashes',) ('applications_foreground',) ('applications_history',) ('applications_notifications',) ('aware_device',) ('esms',) ('locations',) ('mqtt_messages',) ('mqtt_subscriptions',) ('plugin_google_activity_recognition',) ('plugin_mode_of_transportation',)

If you have any trouble accessing your DB instance from your application, please make sure that your application has access to your DB. This is usually automatic, but you need to make sure. If you do not see the image below in the Access Control tab of your DB instance, you need to add your Application ID in your ”Access Control”:

Here are the details about some example tables:

Table: plugin_google_activity_recognition -- contains activities recognized by Google [possibly may not include all these varieties]
  • _id: integer - primary key, auto incremented, just numbers each row
  • timestamp: double - unixtime in milliseconds of when we collected this data point
  • device_id: varchar - the unique device who sent this data.
  • activity_type: integer - a constant number assigned by Google for each activity_name (from here: Values are:
    • in_vehicle: 0
    • on_bicycle: 1
    • on_foot: 2
    • still: 3
    • tilting: 5
    • unknown: 4
  • activity_name: varchar - a human-readable description of the physical activity. Possible values are: in_vehicle, on_bicycle, on_foot, still, unknown, tilting.
    • in_vehicle: the user is in a car/bus/train
    • on_bicycle: the user is biking
    • on_foot: the user is walking/running
    • still: the user is stationary somewhere
    • unknown: Google’s algorithm has no idea of what you are doing
    • tilting: the user has the phone on his hands/desk and its moving somehow minimally.
  • confidence: integer - Returns a value from 0 to 100 indicating the likelihood that the user is performing this activity. The larger the value, the more consistent the data used to perform the classification is with the detected activity.
  • activities: JSON Array with JSON objects with other potential activities the user might be doing. Each JSON object has two values: activity (which contains the activity_name) and confidence (as before). The sum of the confidences of all detected activities for a classification will be <= 100. This means that larger values such as a confidence of >= 75 indicate that it's very likely that the detected activity is correct, while a value of <= 50 indicates that there may be another activity that is just as or more likely.

Table: locations - the data from Google Fused Locations [possibly may not include all these columns]
  • _id: integer - primary key, auto incremented, just numbers each row
  • timestamp: double - unixtime in milliseconds of when we collected this data point
  • device_id: varchar - the unique device who sent this data
  • double_latitude: double - the latitude of coordinates
  • double_longitude: double - the longitude of coordinates
  • double_bearing: double - to where (in degrees) is the user heading (used to know in which direction is the user moving).
  • double_speed: double - how fast is the user moving in meters/second
  • double_altitude: double - how high above sea level in meters is the user
  • provider: varchar - how was this location fix established. One of: fused, network, gps.
  • accuracy: double - how accurate is the location fix, in meters.
  • label: varchar - a label for this location, if it exists. Some plugins can use it to label a specific location (e.g., home, work, etc).

Exploring the data

We will illustrate how to explore the data in the context of location data which many of you probably collected. For exploring the data, you will want to always select for only data that matches the device ID you care about. For those of you doing this assignment who do not have your own android devices, my device id is '785434b8-ce03-46a2-b003-b7be938d5ff4'. This is also the default device ID used in the code. You can see all of the device IDs currently registered by using the query:

SELECT DISTINCT device_id FROM locations

The word DISTINCT in this query simply ensures that only unique values for the column device_id are returned.

There are a few queries you may find useful to use for exploring the data initially. For example, 

SELECT FROM_UNIXTIME(timestamp/1000,'%Y-%m-%d') AS day_with_data, COUNT(*) AS records FROM locations WHERE device_id = '670b950e-f821-4e40-bb6c-547e4499c9c5' GROUP by day_with_data; 

will show the number of days for which location data was collected. This query creates a string for each row (record) in the table 'location' which specifies the year, month and day that row was recorded. The rows are grouped based on this string, and then counted. In more detail:

  • We've already introduced SELECT in Byte 2. Normally, SELECT takes the names of columns (or *) as arguments, which specifies which columns of the table should be selected. 
  • The command FROM_UNIXTIME(...) is used instead of the column name. It takes as input a unix timestamp (number of seconds since January 1, 1970) which we need to convert it to a format that will be useful in our query -- namely Year-Month-Day format. The 'timestamp' column of the locations table is stored in milliseconds since 1/1/1970 so we first divide by1000. 
  • The command AS simply provides a name for what we just did (in this case the name 'day_with_data' for the formatted timestamps we are producing)
  • COUNT(*) will count the number of records in a table. However, because we end by saying GROUP BY, the count will be divided up into one count per group (the number of rows in each group). Groups are defined using 'day_with_data', which was the name we gave our timestamps. 
  • FROM name specifies which table this stuff will all be found in
  • WHERE device_id = '...' specifies that we should only look at records with that specific device_id (i.e. records recorded by my phone, or yours if you change the id)

The result, for my data, looked something like this at the end of january (of course, this will change over time).

#Days with data from location data
('2013-12-17', 415L)
('2013-12-18', 1216L)
('2013-12-19', 241L)
('2013-12-20', 81L)
('2013-12-21', 820L)
('2014-01-08', 371L)
('2014-01-09', 1110L)

If you simply deploy the code to appspot at this point, you should see results similar to this:

It is also possible to trace the duration of events. Because of the complexities of these queries, I'm going to show the python code used to construct them. This code helps me have more clarity with respect to the queries because I can use variables to represent the unix time formatting and other key things. The key thing to realize about this code is that we are simply putting together a string that will then be used as a query. The variables _ID and _ACTIVITY are global variables defined at the top of ''. _ID holds the device id for my phone, _ACTIVITY is the name of the table in which activity information is stored.

# turns a unix timestamp into Year-month-day format
day = "FROM_UNIXTIME(timestamp/1000,'%Y-%m-%d')"
# turns a unix timestamp into Hour:minute format
time_of_day = "FROM_UNIXTIME(timestamp/1000,'%H:%i')"
# calculates the difference between two timestamps in seconds
elapsed_seconds = "(max(timestamp)-min(timestamp))/1000"
# the name of the table our query should run on
table = _ACTIVITY
# turns a unix timestamp into Year-month-day Hour:minute format
day_and_time_of_day = "FROM_UNIXTIME(timestamp/100, '%Y-%m-%d %H:%i')"
# Groups the rows of a table by day and activity (so there will be one 
# group of rows for each activity that occurred each day.  
# For each group of rows, the day, time of day, activity name, and 
# elapsed seconds (difference between maximum and minimum) is calculated, 
query = "SELECT {0} AS day, {1} AS time_of_day, activity_name, {2} AS time_elapsed_seconds FROM {3} WHERE device_id='{4}'  GROUP BY day, activity_name, {5}".format(day, time_of_day, elapsed_seconds, table, _ID, day_and_time_of_day)

will show in order each activity that occurred during each day that data was recorded, along with the duration of that activity.  

Identifying Common Locations (Optional)

Here is an example of what you can do with your own data. One of the first things we will want to do with mobile data is identify common locations. This is most easily done by combining work in SQL with work in Python. In particular, we will first run a query             
query = "SELECT double_latitude, double_longitude FROM {0} WHERE device_id = '{1}'".format(_LOCATIONS, _ID)

To make life easier, I have written a helper function in '' that runs a query. The function also checks if the query failed and logs as much information as possible using on failure. It returns the query results in a list, and is called make_query (you can find it in the source code on github). Once we have the locations in a list, we need to identify unique locations. For this, I have again written a helper function, called bin_locations, which takes a list of locations as input and compares them pairwise. The algorithm used is fairly simple: Store the first location in a dictionary under the bin name 1. For each new location, check its distance from each stored location already in the dictionary. If it is within epsilon of any of the stored locations, ignore it (it is not 'new'). If it is further than epsilon from all stored locations, add it to the dictionary under a new bin name -- it is a new location. Distance is calculated by the helper function distance_on_unit_sphere(..) which is based on John Cook's implementation and explanation. 

The results when I ran bin_locations on my data from a few years ago were: 

 latitude          longitude  hand calculated address  My explanation of this
 40.4520251    -79.943495499999997  436 Morewood Avenue, Pittsburgh, PA          Near CMU / my home
 40.435558200000003  -79.863789999999995  400-503 Sherwood Road, Wilkinsburg, PA    On way to Hidden Valley
 40.442977900000002    -79.781972100000004  Penn Lincoln Parkway, Monroeville, PA   On way to Hidden Valley
 40.355172099999997  -79.683622200000002  Pennsylvania Turnpike, Irwin, PA  On way to Hidden Valley
 40.264337300000001  -79.655496499999998  1937 Main Street, New Stanton  On way to Hidden Valley
 ...  ...  ...  ... several more of these

This summary immediately highlights several issues. The first is required to solve for this assignment. The remainder are optional (although would be very important in working further with this data than this byte takes you). 
  • First, the granularity here is to gross. In particular, everything happening within Pittsburgh is grouped at a single address (436 Morewood Avenue). This means that epsilon is probably not correct and needs to be tweaked. 
  • Second, we cannot differentiate between places where I spend a lot of time and places where I am just traveling through based on the data given. I have looked up the associated address by hand using and interpreted it for you (on the right), but nothing automatic is taking place. A very simple way to fix this would simply be to keep track of the number of matches for each location. Places where I spend a lot of time should have a lot more values logged. This would be relatively easy to add to the information stored in each bin. 
  • The third problem is that that the 'label' for a location is whatever shows up first in the log in the general area of that popular spot. This is not as big an issue as the other two, but could be fixed by keeping a count for all of the lat/lon pairs within epsilon of the first location found, and then using the most popular one as the label. 
  • Lastly, investigating this data will be hard if we are always calculating addresses by hand (since lat/lon is essentially meaningless to the reader). It might make your life easier if you could do this automatically, and the google geocoding api is one way to do so.

Associating Activities with Locations (Optional)

At this point, after working with my code and adding to it, you should have a reasonably descriptive list of locations (i.e. a reasonable epsilon) and be able to divide them into locations, possibly grouped into those that are visited a lot (common locations) and locations that are visited only occasionally. Your next step is to explore what activities take place in and around these locations.  To do this, you first need to make a query that will get locations and activities, both organized by day and hour. You can also collect the elapsed time 

time_of_day = "FROM_UNIXTIME(timestamp/1000,'%H:%i')"
day = "FROM_UNIXTIME(timestamp/1000,'%Y-%m-%d')"
query = "SELECT {0} as day, {1} as time_of_day, double_latitude, double_longitude FROM {2} WHERE device_id = '{3}' GROUP BY day, time_of_day".format(day, time_of_day, _LOCATIONS, _ID)
locations = make_query(cursor, query)

day_and_time_of_day = "FROM_UNIXTIME(timestamp/100, '%Y-%m-%d %H')"
elapsed_seconds = "(max(timestamp)-min(timestamp))/1000"
query = "SELECT {0} as day, {1} as time_of_day, activity_name, {2} as time_elapsed_seconds FROM  {3} WHERE device_id = '{4}' GROUP BY day, activity_name, {5}".format(day, time_of_day, elapsed_seconds, _ACTIVITY, _ID, day_and_time_of_day)
activities = make_query(cursor, query)

# now we want to associate activities with locations. This will update the
# bins list with activities.
group_activities_by_location(bins, locations, activities, _EPSILON)

Once you have locations, activities, and bins, you can look up the location for each activity (using the day and hour that the activity occurred at as an index into the list of locations) and then look up the bin for the activity (using its location). A method for doing this is in the group_activities_by_location function in your code.
The result looks something like this. Each item has a lat, a lon, and then a series of activities and durations. Note that some locations don't seem to have an activity, and the very last set of things are activities for which we could not find a location that matched. Determining the cause of this is left as an exercise to the curious (we are no longer in the 'quality checking' phase of the semester, though that is certainly a legitimate and necessary thing to do with mobile code.

[40.4520251, -79.943495499999997, 'in_vehicle', 39392.071000000004, 'on_foot', 39727.334000000003, 'still', 70414.203999999998, 'tilting', 70699.740000000005, 'unknown', 68273.095000000001, 'in_vehicle', 81884.464000000007, 'on_foot', 40043.336000000003, 'still', 84955.171000000002, 'tilting', 79536.949999999997, 'unknown', 80886.019, 'still', 86292.365999999995, 'in_vehicle', 26267.181, 'on_foot', 69574.019, 'still', 86338.226999999999, 'tilting', 72176.975999999995, 'unknown', 69511.421000000002, 'in_vehicle', 77076.141000000003, 'on_bicycle', 47379.989999999998, 'on_foot', 76726.036999999997, 'still', 80446.138999999996, 'tilting', 77615.462, 'unknown', 77532.361000000004]
[40.435558200000003, -79.863789999999995]
[40.442977900000002, -79.781972100000004]
[40.355172099999997, -79.683622200000002]
[40.264337300000001, -79.655496499999998, 'in_vehicle', 61206.953000000001, 'on_foot', 59838.290000000001, 'still', 85463.236000000004, 'tilting', 79628.938999999998, 'unknown', 80889.201000000001]
[40.159720700000001, -79.479924699999998]
[40.128560100000001, -79.404786000000001, 'on_foot', 253.934]
[40.042347599999999, -79.228534199999999, 'on_bicycle', 8648.982, 'in_vehicle', 5743.6329999999998, 'still', 86355.557000000001, 'tilting', 82030.725000000006, 'unknown', 81165.418000000005]
[40.121983999999998, -79.303508800000003]
[40.210303699999997, -79.579402999999999]
[40.409026599999997, -79.719704500000006]
[['in_vehicle', 4357.4719999999998], ['on_foot', 10664.239], ['still', 28324.585999999999], ['tilting', 26078.298999999999], ['unknown', 28031.917000000001], ['still', 38368.474999999999], ['tilting', 0.0], ['unknown', 0.0]]

Creating a Dashboard for Visual Data Exploration using Vega-Lite

You will now develop a narrative story about your mobile data. Visual representation of data allows you to inspect the data quicker than looking at table printouts. Also, you will go beyond simple visualizations that you made using Google Fusion Tables. To do this, you will use Vega-Lite, a language for defining and creating visualizations that builds on D3 (a JavaScript-based visualization framework).

To "install" Vega-Lite, you need to add links to the Vega-Lite scripts to your template files. For example, you could add the following to the header of your index.html template:
<!-- Vega-lite visualization libraries -->
<script src="//"></script>
<script src="//"></script>
<script src="//"></script>
<script src="//" charset="utf-8"></script>
<style media="screen">
   /* Add space between vega-embed links  */
    .vega-actions a {
      margin-right: 5px;

Then in your template body you need to add an element which you will use to display the visualization in:
<div id="vis"></div>

And finally, you need to add another script element below that specifies the data and how to visualize it on your page:

 // Assign the specification to a local variable vlSpec.
 var vlSpec = {
   "data": {
     "values": [
            {"date": "2017-02-01", "count": 3},
            {"date": "2017-02-02", "count": 2},
            {"date": "2017-02-03", "count": 5}, 
            {"date": "2017-02-04", "count": 6}, 
            {"date": "2017-02-05", "count": 2}, 
            {"date": "2017-02-06", "count": 3}, 
            {"date": "2017-02-07", "count": 3}, 
            {"date": "2017-02-08", "count": 4}, 
            {"date": "2017-02-09", "count": 1} 
   "mark": "bar",
   "encoding": {
     "x": {
      "field": "date", "type": "nominal",
   "axis": {
       "title": "Date"
     "y": {
       "field": "count", "type": "quantitative",
       "axis": {
         "title": "Record Count"
 var embedSpec = {
   mode: "vega-lite",  // Instruct Vega-Embed to use the Vega-Lite compiler
   spec: vlSpec
   // You can add more vega-embed configuration properties here.
   // See for more information.
 // Embed the visualization in the container with id `vis`
 vg.embed("#vis", embedSpec, function(error, result) {
   // Callback receiving the View instance and parsed Vega spec
   // result.view is the View, which resides under the '#vis' element

In the above code vlSpec is a variable that contains all of the information about your graph (including the data). In fact, you could copy the contents of it and paste it into Vega-Lite editor to view it:

The Editor provides you with example charts that you could use to kick off your visualization and to view it with dummy data. Here is an example chart that you can select from the drop down:

Now we want to replace this hard-coded, dummy data with your actual data from the database. To do this, we use Jinja2 to include the data from our database into the page. You will replace the data code above with the code that takes the data from your first query and adds it to the page:
"data": {
    "values": [
    {% for row in queries[0].results %}
        {"date": "{{row[0]}}", "count": {{row[1]}}},
    {% endfor %}

Once you create and display your visualization, you will have an option to modify the chart in the editor again. You can use this to speed up your debugging and fixing process.

OPTIONAL READING: Ideally, you might consider displaying the data back in a mobile setting using an application development tool such as Titanium or PhoneGap, however we will not cover any of that in this tutorial. However, our emphasis is not on mobile app development in this class.

Hand In Expectations

 Your assignment is to provide a visual narrative of your mobile data. You will use visualizations to explore the relationship between different aspects of your data (e.g., activities and time of day, locations and time of day). For this Byte you should:
  • Deploy a working version of your code that guide the user through your narrative.
  • As always you will need to write some reasonable text 'about this data set' and comment on 'data quality' and different decisions you had to make when preparing the data set (e.g., what epsilon you picked for binning locations and why, how did you impute the data, how did you remove outliers).
  • You will need to be able to explain in general terms what you learned about the relationships in your data. The best answers would talk about the relationship between the question the visualization answers, the design choices that are made about the visualization, and how those choices help the user to get the most from the visualization.
  • When you create your visualizations, be sure to give them some sort of interactive aspect.