Teaching Syllabus

2018 Summer MSIS 2629 Syllabus

Goals of the Class

  • Learn careers in data visualization
  • Learn how data visualization is used to tell stories - often better than what can be told through words through topics:
    • Dashboards in industry
    • Data viz in news
    • Storytelling through data visualization
    • Hard math through data viz
  • Build a portfolio of 2 significant projects hosted on GitHub (and maybe your personal website) to highlight your skills and passions.
    • That’s your credibility of data visualization skills
  • Make friends with your peers & collaborate with others on idea generation, data analysis, data visualizations, and presentations.
  • Have fun!

How and Why This Course is Different Than Other Courses

  • The previous class had lots of lab exercises with mandated visualizations to create and datasets to work off of. It seemed like a lot of context switching between types of visualizations, stories and datasets. This time around, there’s no lab assignments. I think the class will be more fun and enriching for you to spend more time on data you find interesting.
  • Most classes involve lots of lectures. You often sit idly for 1 hour or more every class. However, what if there’s something you don’t understand in that lecture? Or you want to stop and stare at something? Or google a concept? It’s often tough to learn at the same pace as your peers. It’s something I struggled with in every class. I want students to learn at their own pace. Instead of long lectures, we’ll have short ones - often just 15 - 30 min which I’ll expect more discussion from the audience. There’s fabulous content online and I’ll direct you to great content for your learning interests.
  • In school, I rarely chatted with professors 1-on-1 and got individual attention. I want to help all of you succeed at what interests you in the realm of data visualizations. Let’s do that together! I’ll likely be able to meet with all of you 1 on 1 at least every other class or through office hours online. I can provide feedback on your project ideas, code, visualizations and online portfolio for showcasing your work.
  • In most classes, you rarely leave with something to show potential employers. Most times there’s simply exams and homework assignments. This class is different. My goal is to help everyone build a portfolio of data investigation and visualization work they’re proud of! That’s not something you’ll get online in a MOOC, sitting at home trying to learn on your own or reading a book. With this class, you’ll get constant feedback from your peers and myself to build an excellent portfolio.
  • With most classes, students cram last minute to get it done. The panic monster sits in a couple days before the deadline...that won’t happen here. You’ll be expected to show progress each week to me 1-on-1 and through your GitHub activity. Don’t let the panic monster out!
  • Most teams settle on a project idea and run with it, whether it works out or not. In investigating data sources, that isn’t usually the case. You’ll likely uncover the data is messy, insufficient for the visualizations you have in mind, or maybe isn’t interesting to you anymore. Don’t worry. There’s ample time for you to change ideas. I want you to complete only something you’re proud of.
  • Some classes cover pros and cons of lots of technologies. This class isn’t as much on the technical details of a technology - rather, can you use but one or two technologies to tell a good story through visualizations? I’d rather you leave a master of one technology and making visualizations in general than a jack of all trades.
  • Great data visualizations are often subjective; there’s no clear cut answer. In fact, I could provide a million tips on how to make great visualizations. Therefore, you’ll need a lot more feedback from your peers and the instructor to fine tune your designs and ensure they’re easy to understand, aesthetically pleasing and relevant for your message.

Assignments

Project Name Due Date
Group data investigation + presentation Beginning of class on July 21
Individual data investigation + presentation Beginning of class on August 25
  1. Group project (2-3 people): The final deliverable must be a .ipynb file uploaded on GitHub documenting all your work and visualizations & a 6-minute max presentation. Code must be using Python and a Python visualization library. In your notebook, link to relevant inspiration/learning materials used in your process to publish your work.

  2. Individual project: same details as the group project, but this time solo & just a 4-minute max presentation. If you can't attend class, you can film yourself doing a 4-minute video presentation.

  3. 3 in-class learning exercises that can take place anytime in classes from week 3 to week 10. Each learning exercise will be taken in-class.

  4. 2 optional take-home learning exercises to analyze a dataset and produce visualizations.

In regards to the 5 total learning exercises, the top 3 will be chosen as your final grade submissions for this class. So, you don't have to do the 2 take-home learning exercises if you are comfortable with your 3 in-class grades.

Project Submission Details

Published project on GitHub should include all of the following:

  • Readme.md with summary and insights. It should include 2-5 paragraphs, each 2-5 sentences long that should cover why you did this project, your tech stack, 2-3 inline visualizations with 1-2 sentence interpretation of each visualization and a link to your dataset. No need to mention it was a class project.
  • Jupyter Notebook with all your exploratory analysis, inline visualizations and descriptions of visualizations. Please have it be neatly formatted with headers and a table of contents.
  • 1 folder with image files for all your visualizations
  • Dataset - barring it's not too large

Ideas on Datasets

Research datasets you may be interested in exploring. There’s gold in learning something new from data that nobody has uncovered yet. There’s also gold in finding patterns between multiple datasets that people/media are less opt to cover together. It’s also OK to simply explore a common dataset if it’s a topic you’re very interested in.

Forewarning, there’s often a lot of work to analyze the fields, values, quality of the data and if it’s something of interest to you! So, we’re researching early on to allow you time to pivot and work on a different dataset later if needed.

Some dataset suggestions:

Recommended Visualization Libraries

Recommended Approach to Building Projects

This is an extension of the Scientific Method.

  1. Find a classmate interested in a topic/idea. Hypothesize questions you want answered.
  2. Search for interesting data sources.
  3. Create a GitHub repo to store your code.
  4. Plan to meet in person or through a screen-share application to regularly work on your project together.
  5. Download the data source to your local machines using Pandas, code in a Jupyter Notebook and document your steps in Markdown.
  6. Understand the fields, values, and number of observations.
  7. Construct a hypothesis to one of your initial questions
  8. Clean only what's needed to answer your question.
  9. Test your hypothesis by extracting metrics and visualizing results. Try prototyping simple visualizations in Pandas Plot.
  10. Repeat steps 7-9 as many times as need to answer your questions
  11. Incorporate a similar dataset for comparison to tell a more interesting story.
  12. Improve upon your initial visualizations to include the additional dataset and/or tell a more interesting story. Ideally use a level of benchmarking and/or interaction in your visualizations.

Grading Rubric on Projects

Note: assessment criteria will be updated a bit to be more quantitative.

  • Technical:

    • Fair: Visualizations and/or code may be difficult to understand in certain sections. Mediocre storytelling through visualizations.
    • Good: Uses mainly slightly modified off-the-shelf examples. May focus on simple visualizations with simple bar, pie, line, etc. Visualizations incorporate some benchmarking or comparison efforts.
    • Excellent: 3 or more significant and interesting visualizations presented. Utilized advanced features of viz library. New viz type shown above simple bar/pie/line chart. May utilize strong stats/math. Possibly incorporates new dataset for additional insights. Visualizations incorporate benchmarking or comparison efforts.
  • Presentation:

    • Fair: Recites simply what’s on slides. May be hard to follow logic, speaks at great lengths without making a point, or dry.
    • Good: Presents interesting content but communications similar to norm of people studied/worked with previously.
    • Excellent: Expresses ideas clearly; engages/captivates audience. Would be really excited to see this person present again.
  • Documentation:

    • Fair: Work published online, but minimal details or difficult to understand. Or perhaps work not public on GitHub at all. Minimal or weak interpretations of visualizations.
    • Good: Work online but perhaps at times wordy or difficult to understand flow. Good interpretations of visualizations.
    • Excellent: Code, visualizations and text storyline documented in detail on GitHub. Great interpretation of visualizations. Significant effort in detail and would be worthy of citing in an established online publication. Bonus points if published through an additional website or blog post.

In-Class Learning Exercises - Material to Know

Topics you must know:

  • All functionality in Pandas Cheat Sheet
  • Pivot tables
  • Long to wide format
  • How to plot in code (Pandas Plot, Matplotlib and/or Seaborn) and interpret:
    • Line graphs
    • Vertical bar plots
    • Horizontal bar plots
    • Pie charts
    • Vertical stacked bar chart
    • Vertical grouped bar plots
    • Histograms
    • Box plots
    • Scatter plots
  • With plots:
    • Format x and y tick values
    • Limit range of values on x or y-axis
    • Add a plot title
    • Label axes with custom text
    • Make larger font for plots on x-axis, y-axis, ticks and title

I'd recommend you utilize the datasets in Seaborn to practice prototyping the types of visualizations above.

What you Need to Bring Every Class

  • Laptop
  • Laptop charger
  • Headphones (optional)
  • Lunch, snacks and/or drinks (optional - very long class)

Expectations for You

It’ll be a lot of work and a lot of new material! Be prepared to become very familiar with ingesting data in Python, cleaning data, prototyping visualizations and become very skilled in a new visualization framework.

You should spend 7+ hours per week outside of class time in order to succeed.

Class Meetings

I don’t think people learn best by being just talked to for hours on end. You learn through building, collaboration and feedback. Class is long so we’ll have intermittent short breaks so you don’t have to sit down for too long.

I'll also pack more lectures/discussions in the first few weeks so we can focus more on group work in the second half of the term.

Likely structure of classes:

  • ~40 min on discussion of a visualization topic or inspiration
  • ~2 hours coding + data viz design
  • ~15 min 1-on-1 with me for feedback

Additional (Optional) Resources

Class Schedule

Week 1 (June 23, 2018)

To start learning in class and continue at home (if needed):

  • Recite goals of class; make sure everyone is on same page.
  • Poll: what technologies do you know?
  • Poll: what would you like to learn?
  • Discuss careers in data viz
    • Business Intelligence
    • Data Analyst
    • Data Scientist
    • Frontend Engineer
    • Machine Learning Engineer
  • Why put your data viz and portfolio work online?
    • Get feedback from others
    • Self-reflective to learn the material
    • Get noticed by someone and can potentially get a job, mentor or invited to a conference/event
  • Are there any dashboards/visualizations you use frequently?
    • Fitbit
    • Apple Watch
    • Google Analytics
  • Create a GitHub profile
  • Everyone join class Slack group with this link.
    • If you download Slack from its website (not Mac App Store), then you can use Slack to video chat, screen share, and share a cursor on someone's computer.
  • Download Anaconda locally to use Python, Python libraries and Jupyter Notebook/Lab
  • Show Jupyter Lab
    • Browser-based interactive data analysis tool to show code, graphics and HTML elements in a single executable document
  • Value of quick data analysis in prototyping in creating data visualizations
  • Previously, there were labs with teaching different languages and frameworks in R, Python, Tableau, etc. These are now entirely optional. You can access the previous answers are online here.
  • Explanation of Bay Area Bike Share program with 1 minute video
  • Live walkthrough of Bay Area Bike Share data-driven project
  • Brainstorm project ideas on whiteboard
  • Form teams for group data viz project
  • Meet with interested students 1-on-1
    • What’s your background?
    • What are your interests?
    • Are you working now or what job do you want to have in the future?
    • What technologies do you enjoy most? What do you want to learn?
    • Help pick the best track for students to move forward with
  • OKCupid data investigation
  • Alice Zhao's talk on YouTube
  • See examples of the most popular types of data visualizations. Read all articles under the Subcategory Best Practices on my site.
  • How to Select a Data Visualization via Chartio

Week 2 (June 30, 2018)

  • No class due to Independence Day holiday

Week 3 (July 7, 2018)

  • Overview of Python visualization libraries: Pandas Plot, Matplotlib, Seaborn, Plotly and Altair
    • "In recent years, however, the interface and style of Matplotlib have begun to show their age. Newer tools like ggplot and ggvis in the R language, along with web visualization toolkits based on D3js and HTML5 canvas, often make Matplotlib feel clunky and old-fashioned. Still, I'm of the opinion that we cannot ignore Matplotlib's strength as a well-tested, cross-platform graphics engine." - via Python Data Science Handbook
  • Clear labeling of charts
  • Benchmarking and comparisons
  • Graded in-class learning exercise
  • Review team's project ideas with the class
    • Anyone want to switch teams or ideas?
    • Suggestions for teams on interesting data sources to look into?
  • Project work with team
  • Meet 1-on-1 with Dan
  • Read Chapters 1-3 of Data Viz Reader

Week 4 (July 14, 2018)

  • Explanation of project submission framework

  • Discussion of poor visualizations at vit.wtf

  • Graded in-class learning exercise
  • Project work with team
  • Meet 1-on-1 with Dan

Week 5 (July 21, 2018)

Week 6 (July 28, 2018)

  • Best practices in dashboards:
    • Most important insights at the top
    • Easy-to-read visualizations understandable by your audience
    • Context of values/charts - labels, changes over time, etc
  • Where are dashboards useful?
    • Executives to see high-level progress for teams on revenue, costs and revenue over time
    • Customer service teams to track # of emails answered and average response time (minutes)
    • Marketing teams to see website visits and visits by lead source
    • Devops/developers to see stats on product performance and monitor for issues
  • Popular dashboard tools in industry:
  • 30-min live demo of prototyping a dashboard for Bay Area Bike Share data
  • Graded in-class learning exercise
  • Project work
  • Meet 1-on-1 with Dan

Week 7 (August 4, 2018)

Week 8 (August 11, 2018)

  • Graded in-class learning exercise
  • Project work
  • Meet 1-on-1 with Dan

Week 9 (August 18, 2018)

  • Teaching statistical concepts through data visualizations
    • K-Means clustering example (link to be posted later)
  • Project work
  • Meet 1-on-1 with Dan

Week 10 (August 25, 2018)

  • Project presentations
  • Class reflection