Byte 1: Answering a Question With Fusion Table Data

  • Description: Your final product will be the a web-enabled application that use data from Google Fusion Tables to answer a question (should be something like [yourpseudonym]-fusion.appspot.com)
  • Source Code: See https://github.com/jmankoff/data, in Assignments
  • For some assignments we will provide complete or partial source code that you can look at. It is recommended that you try to construct your own source code using the tutorial and only refer to the provided code as needed. This is especially important since we build up the source code iteratively in the tutorial, gradually replacing portions of it, and the provided source code only shows a single view (the final version). In addition, it will rarely be the case that you can use that source code entirely unmodified to complete an assignment. 

Overview

In this project, you will create a small application that displays data from Google Fusion Tables. The work you do in this project byte is something you will build on throughout this class. This assignment has the following learning goals:

  • Setting up your environment
  • A first experience with Python
  • A first experience with programmable HTML
  • A first experience setting up a question and deciding what data helps to answer the question.

Detailed instructions for Byte 1

This project requires you to use Python 2.7 (please note the version number) and some additional libraries that are available for python. To learn more about Python, you may want to explore www.pythontutor.com. The textbook Introduction to Computing and Programming in Python is an excellent introductory book aimed at non programmers.

Setting Up Python using Google Cloud Platform/Google App Engine

Google Cloud Platform is a development environment that will let you place your code on the web with relative ease. An excellent "Getting Started" tutorial will walk you through the initial creation of a simple application that displays plain text on the web. 
  • You will need to select your language (select Python).
  • You will also need to supply a project name (I used 'jmankoff-fusion', but you can use [your_name]-byte1). When you create a google web application, you will need a unique identifier for it that no one else on the web has used. A good idea for the assignments in this class is to prefix them with a unique id you choose (you can use your username, but then students grading you may know who you are, an anonymous id is fine too).
  • You will need to select where you would like to serve your application from (e.g., us-east1).
The tutorial is quite detailed and helpful. Be sure to follow it all the way through until you can load your website on the web.

Here is what you will have accomplished after you complete the tutorial:
  • You will have your Google Cloud Platform configured.
  • You will have your first Python project and a sample application (note there can only be one app in one project) on Google App Engine.
  • You will have your Cloud Shell activated. The machine comes pre-installed with the Google Cloud SDK and other popular developer tools. Your 5GB home directory will persist across sessions, but the VM is ephemeral and will be reset approximately 20 minutes after your session ends. If you would like to develop locally, you will have to download Google Cloud SDK (https://cloud.google.com/sdk/), but the same instructions apply from your local machine.
  • You will have used a local development app server where you can test your application before deploying it. For more details about the development server and how to use it for common development tasks (e.g., debugging) see this document.
  • You will have a starter code that you will use for your Byte 1 below.

Using Github

It is a requirement for this project that you use github to manage your source code. This has several benefits, including providing a way to turn in your source code. You can get an account for free at https://github.com/. Note that unless you pay, everything you put on github is public. Once you create your account, you will need to create a repository, which you should name [yourbytename], which you can do following this tutorial: https://help.github.com/articles/create-a-repo/

You can see, for example, that I have created the repository jmankoff-fusion2017 in my public github account.

You should create a repository for your project. You can then connect it to your app by clicking on the 'create repository' button shown below. It will ask you for a name -- use something like [yourbytename]-github

Once you create the repository, you have to specify what to link it to:

Click on 'automatically mirror from github or bitbucket' and select github from the 'Select hosting service' menu and select the repository you just created.


Once you have done this, you should check out the base code for this assignment, using the following process. In using this tutorial, be sure to type each single line at a time and hit enter after each single line.

First (this is a onetime only thing) you need to setup github on Google Appspot's cloud terminal. Run the following commands, using your email address, name and so on:

git config --global push.default simple
git config --global user.email "jmankoff@cs.cmu.edu"
git config --global user.name "Jen Mankoff"

In addition, you need to make sure you are in the correct directory (should be ~/src/[your byte name]).

To change directories, use the command cd 

To look at the contents of a directory (such as to find the name of a file), use the command ls

You can google both of these commands for more information. Be aware that you can use a period to refer to the current directory, and double period to refer to the directory above the current one. So

ls .

produces a listing of the current directory (as does a plane ls) and

cd ..

moves you to the directory above the current directory. Once you have cd'd to ~/src/[your byte name] you will need to checkout the code from your repository. That will look something like this (you can find out the github repository name by going to github to your repository and clicking on the 'clone or download' button, and copying over the URL which will look something like 'https://github.com/jmankoff/jmankoff-fusion2017.git

git clone [your github repository]


... output ...

Cloning into '[your repository]'...remote:
Counting objects: 382, done.remote:
Compressing objects: 100% (276/276), done.
remote: Total 382 (delta 50), reused 382 (delta 50), pack-reused 0
Receiving objects: 100% (382/382), 776.03 KiB | 0 bytes/s, done.Resolving deltas: 100% (50/50), done.
Checking connectivity... done.

Next, you will need to check out the base code if you have not yet as part of the tutorial:

gcloud source repos clone python-gae-quickstart --project=[your byte name]
ls

(this will show the name of the directory that was created)


cd python_gae_quickstart[fill in the directory name based on what you saw in ls]
git remote remove origin

Finally, you will need to copy the base code into your repository, which we do by moving into your github project directory and then copying over the files from the python_gae_quickstart directory using the cp command.

cd ..
cd [your github project name]

cp -r ../python_gae_quickstart[fill in the directory name]/* .

The last step is to push the changes back to github so that your github repository mirrors what is on your google cloud drive. This can be done any time you make a major change to your project, and is the first step in making use of the benefits of version control, which we strongly recommend you become familiar with. Note that the words "base code" below can and should be replaced with any string you think is descriptive of a change you make (but always in quotes).

git add *
git commit -m "base code"
git status

once you hit git status, do not be alarmed if you see red things. the red things mean that you are still in intermediary stage, not that there is an error. The output will look something like this.

... output ...
On branch master
Your branch is ahead of 'origin/master' by 1 commit.
(use "git push" to publish your local commits)
nothing to commit, working directory clean

git push

You'll need to enter a user name and password for your github account at this point. Once that is done, look at your repository. You should see all your files:

You can now add/commit and push your changes as you work on your assignment, giving you a backup of the assignment, a way to edit code on your local machine (if you prefer) and so on.

Starting Code Editor

For this assignment we will be using the Code Editor, which allows you to develop your applications online in your Cloud Shell instance. To start your Code Editor click on the File icon in The Development tab.


Once the Code Editor starts, locate your project in the src folder in the directory tree.

Installing Libraries

Before we start, let's make sure we've included the correct set of libraries in our application. For this project we will be using:

  • Bootstrap. Head over to http://getbootstrap.com/getting-started/ to download it. Import it into your Code Editor (right click your project folder and select Import->File or Zip Archive). Then move the subdirectories (css, fonts, js) into the main directory of your quick-start application.

  • JQuery. Download the compressed production version at https://jquery.com/download/, and import it in the 'js' directory that you just added to your application. 
  • Jinja2 (the Python Template Engine). Jinja is already installed, but you do need to tell App Engine you are using it (see below)

Don't forget at this point to add all those files using

git add .
git commit -m "added libraries"

git push


Note that any time you commit, you can use the message of your choice to describe those changes (inside the quotes).

Now, set up your app.yaml with the correct information:
runtime: python27
api_version: 1
threadsafe: yes

# Handlers define how to route requests to your application.
handlers:
- url: /js
  static_dir: js
  application_readable: true
- url: /fonts
  static_dir: fonts
  application_readable: true
- url: /css
  static_dir: css
  application_readable: true
- url: /templates
  static_dir: templates
  application_readable: true
- url: .*
  script: main.app


libraries:
- name: jinja2
  version: latest
- name: webapp2
  version: latest

Once again, remember to commit to github!

Customizing your Application

To begin serving HTML pages, follow the Jinja2 tutorial. We've already updated the app.yaml file. Now we have to update the main.py file to follow through. Add the following lines at the top of the file:
# Imports
import os
import jinja2
import webapp2
import logging

JINJA_ENVIRONMENT = jinja2.Environment(
    loader=jinja2.FileSystemLoader(os.path.dirname(__file__)),
    extensions=['jinja2.ext.autoescape'],
    autoescape=True)

And change the hello() route as follows:
@app.route('/')
def hello():
    template = JINJA_ENVIRONMENT.get_template('templates/index.html')
    return template.render()

Create a directory inside your application folder named 'templates' and create a file named 'index.html' inside.


The 'index.html' file should contain the following html.
<!DOCTYPE html>
<html>
<head>
<title>Byte 1 Tutorial</title>
</head>
<body> <h1>Data Pipeline Project Byte Example</h1> </body> </html>
Start your test instance in the console the same way you did in the tutorial, and start your application in your Web Preview. The result, when you load it, should look like this:

Data Pipeline Project Byte Example

Now we want to add some bootstrap styling. The sample index.html and about.html files provided with your byte source code are based on a bootstrap theme, you can view more themes on the getting started. page at http://getbootstrap.com/getting-started/ and download sourcecode for example themes. Just be aware that you will need to modify these themes to reflect the directory structure of your Google App Engine application. Specifically, you should use 'css/...' to refer to css files, and 'js/...' to refer to javascript. You can also use a program to lay out bootstrap pages such as layitout.com or x-editable.

Remember that you have been working with the development app server. To make your application available to the public you need to deploy it (at this point you would have already created the application in the tutorial):

gcloud app deploy

Debugging your Application

You can quickly and easily test your scripts as you go using the development server. You can change the logging level using the command line when you start the development app server (e.g., set the logging level to "debug"):

dev_appserver.py --log_level=debug $PWD

You will then be able to see log entries on your command line. You can output your own debugging text there by using the python command Logging.info() (or another debug level, depending on your needs). Thus, a very good debugging and editing cycle is [Edit main.py] [reload local web preview] [check results and log to make sure your code is doing what you think it is] [rinse and repeat].

Once you deploy your application you can keep track of your logs using the Google Cloud Logging interface. For more information see the logging documentation page.

Using Google Fusion Tables

You will first need to identify a data set that is of interest to you. Here is some advice from Google Fusion Tables' website on where and how to find interesting data. Here is another source of interesting data: 

In case you use an existing Google Fusion Table, you should be aware that you will not be able to access all of the features of Google Fusion Tables. For this reason, you should make a copy (under the file menu) before proceeding. Click "View Copy" and proceed with the new table for the remainder of this assignment. 

Exploring your data in Google Fusion Tables

Start by following the tutorial for making a map (note that we can skip the first half of the tutorial since the data is "imported" as soon as you make your copy). For example, when I did this with the table of animal outcomes I copied from Louisville Animal Metro Services, I had the following display on my map:



Of course you can and should go further. For example, I customized my map to mark animals that were euthanized in with a different marker than those that were not using the method described in this tutorial. After following the tutorial, my map looked like this (you can find lots of interesting icon types at this change placemark icon tutorial).



Maps are only the beginning -- you can also explore the data using other types of charts:

Showing your visualizations in your google application

The visualizations you create can be embedded in the [yourname-explore] application. You will need to configure the fusion table following the tutorial on embedding to be publicly accessible (mine is accessible to only those with the correct url).  Once you have that set up, you can paste the iframe code provided with your new map or chart into your index.html file.

Setting up a question and answering it

At this point you should have a working version of code something like the reference application http://jmankoff-fusion.appspot.com/. The deeper thinking in this assignment requires that you select a Fusion table and display its contents in a way that correspond to a question you have designed and answered. Note that the example code does not demonstrate this. This is the first step (and of course will eventually be much more iterative) in any data pipeline: Figuring out what data you need to answer your question. You should:
  1. Identify a question and ensure that it is clear to the person viewing your assignment what that question is (this will probably involve modifying index.html to show the question). 
  2. Identify data that you think can help answer that question (you are limited by what was introduced in this byte).
  3. Display the results on the web page.

Questions you should be able to answer when you hand this in

What question does your application help to answer, and how does it let the end user answer that question? 

Where is the URL for the working version of your assignment?

Comments