2018-summer-msis-2629-syllabus
- title: 2018 Summer MSIS 2629 Syllabus
- slug: 2018-summer-msis-2629-syllabus
- summary: 2018 summer MSIS 2629 syllabus for the data visualization course at Santa Clara University
- date: 2018-07-29 12:20
- category: Teaching
- posttype: Syllabus
- tags: santa clara university
- keywords: msis 2629 syllabus
- authors: Dan Friedman
Goals of the Class¶
- Learn careers in data visualization
- Learn how data visualization is used to tell stories - often better than what can be told through words through topics:
- Dashboards in industry
- Data viz in news
- Storytelling through data visualization
- Hard math through data viz
- Build a portfolio of 2 significant projects hosted on GitHub (and maybe your personal website) to highlight your skills and passions.
- That’s your credibility of data visualization skills
- Make friends with your peers & collaborate with others on idea generation, data analysis, data visualizations, and presentations.
- Have fun!
How and Why This Course is Different Than Other Courses¶
- The previous class had lots of lab exercises with mandated visualizations to create and datasets to work off of. It seemed like a lot of context switching between types of visualizations, stories and datasets. This time around, there’s no lab assignments. I think the class will be more fun and enriching for you to spend more time on data you find interesting.
- Most classes involve lots of lectures. You often sit idly for 1 hour or more every class. However, what if there’s something you don’t understand in that lecture? Or you want to stop and stare at something? Or google a concept? It’s often tough to learn at the same pace as your peers. It’s something I struggled with in every class. I want students to learn at their own pace. Instead of long lectures, we’ll have short ones - often just 15 - 30 min which I’ll expect more discussion from the audience. There’s fabulous content online and I’ll direct you to great content for your learning interests.
- In school, I rarely chatted with professors 1-on-1 and got individual attention. I want to help all of you succeed at what interests you in the realm of data visualizations. Let’s do that together! I’ll likely be able to meet with all of you 1 on 1 at least every other class or through office hours online. I can provide feedback on your project ideas, code, visualizations and online portfolio for showcasing your work.
- In most classes, you rarely leave with something to show potential employers. Most times there’s simply exams and homework assignments. This class is different. My goal is to help everyone build a portfolio of data investigation and visualization work they’re proud of! That’s not something you’ll get online in a MOOC, sitting at home trying to learn on your own or reading a book. With this class, you’ll get constant feedback from your peers and myself to build an excellent portfolio.
- With most classes, students cram last minute to get it done. The panic monster sits in a couple days before the deadline...that won’t happen here. You’ll be expected to show progress each week to me 1-on-1 and through your GitHub activity. Don’t let the panic monster out!
- Most teams settle on a project idea and run with it, whether it works out or not. In investigating data sources, that isn’t usually the case. You’ll likely uncover the data is messy, insufficient for the visualizations you have in mind, or maybe isn’t interesting to you anymore. Don’t worry. There’s ample time for you to change ideas. I want you to complete only something you’re proud of.
- Some classes cover pros and cons of lots of technologies. This class isn’t as much on the technical details of a technology - rather, can you use but one or two technologies to tell a good story through visualizations? I’d rather you leave a master of one technology and making visualizations in general than a jack of all trades.
- Great data visualizations are often subjective; there’s no clear cut answer. In fact, I could provide a million tips on how to make great visualizations. Therefore, you’ll need a lot more feedback from your peers and the instructor to fine tune your designs and ensure they’re easy to understand, aesthetically pleasing and relevant for your message.
Assignments¶
Project Name | Due Date |
---|---|
Group data investigation + presentation project | Class on July 28 |
Individual data investigation + presentation project | End of day on September 1 |
Take home learning exercise 0 | End of day on September 1 |
Take home learning exercise 1 | End of day on September 1 |
Group project (2-3 people): The final deliverable must be on GitHub and meet the Project Submission Details below. In addition to your work on GitHub, you must do a 6-minute max presentation in front of the class. Code must be using Python and a Python visualization library.
Individual project: same details as the group project, but this time solo & just a 4-minute max presentation. You will present in either week 10 of class or week 11.
3 in-class learning exercises that can take place anytime in classes from week 3 to week 10. Each learning exercise will be taken in-class. Submissions will be on Camino.
2 optional take-home learning exercises to analyze a dataset and produce visualizations. Submissions will be on Camino.
In regards to the 5 total learning exercises, the top 3 will be chosen as your final grade submissions for this class. So, you don't have to do the 2 take-home learning exercises if you are comfortable with your 3 in-class grades.
Project Submission Details¶
Published project on GitHub should include all of the following:
- 1-sentence description at the tope of the Code section.
- 3 or more tags/topics that appear just below the description section.
- README.md file with the following sections:
- Summary of your project (3 - 10 sentences) that explains why you chose this dataset, important fields/columns utilized and your tech stack utilized (ex - Python, Pandas, Numpy)
- 2-4 inline visualizations each with a 1-3 sentence interpretation of a visualization
- Link to the original source of your dataset (if applicable)
- Jupyter Notebook with all your exploratory analysis, inline visualizations and descriptions of visualizations. Please have it be neatly formatted with headers and a table of contents. You can include 2 Notebooks if needed too.
- 1 folder with image files for all your visualizations
- Dataset hosted on GitHub as a CSV or similar file type (barring it's not too large)
Ideas on Datasets¶
Research datasets you may be interested in exploring. There’s gold in learning something new from data that nobody has uncovered yet. There’s also gold in finding patterns between multiple datasets that people/media are less opt to cover together. It’s also OK to simply explore a common dataset if it’s a topic you’re very interested in.
Forewarning, there’s often a lot of work to analyze the fields, values, quality of the data and if it’s something of interest to you! So, we’re researching early on to allow you time to pivot and work on a different dataset later if needed.
Some dataset suggestions:
Recommended Visualization Libraries¶
Recommended Approach to Building Projects¶
This is an extension of the Scientific Method.
- Find a classmate interested in a topic/idea. Hypothesize questions you want answered.
- Search for interesting data sources.
- Create a GitHub repo to store your code.
- Plan to meet in person or through a screen-share application to regularly work on your project together.
- Download the data source to your local machines using Pandas, code in a Jupyter Notebook and document your steps in Markdown.
- Understand the fields, values, and number of observations.
- Construct a hypothesis to one of your initial questions
- Clean only what's needed to answer your question.
- Test your hypothesis by extracting metrics and visualizing results. Try prototyping simple visualizations in Pandas Plot.
- Repeat steps 7-9 as many times as need to answer your questions
- Incorporate a similar dataset for comparison to tell a more interesting story.
- Improve upon your initial visualizations to include the additional dataset and/or tell a more interesting story. Ideally use a level of benchmarking and/or interaction in your visualizations.
Grading Rubric on Projects¶
Note: assessment criteria will be updated a bit to be more quantitative.
Technical:
- Fair: Visualizations and/or code may be difficult to understand in certain sections. Mediocre storytelling through visualizations.
- Good: Uses mainly slightly modified off-the-shelf examples. May focus on simple visualizations with simple bar, pie, line, etc. Visualizations incorporate some benchmarking or comparison efforts.
- Excellent: 3 or more significant and interesting visualizations presented. Utilized advanced features of viz library. New viz type shown above simple bar/pie/line chart. May utilize strong stats/math. Possibly incorporates new dataset for additional insights. Visualizations incorporate benchmarking or comparison efforts. Appropriate visualizations chosen that help emphasize key takeaways.
Presentation:
- Fair: Recites simply what’s on slides. May be hard to follow logic, speaks at great lengths without making a point, or dry. Visualizations difficult to understand in limited presentation time.
- Good: Presents interesting content but communications similar to norm of people studied/worked with previously. Visualization has parts difficult to read.
- Excellent: Expresses ideas clearly; engages/captivates audience. Visualizations are easy to read and annotated if needed to highlight key points. Would be really excited to see this person present again.
Documentation:
- Fair: Work published online, but minimal details or difficult to understand. Or perhaps work not public on GitHub at all. Minimal or weak interpretations of visualizations. Possibly no table of contents included.
- Good: Work online but perhaps at times wordy or difficult to understand flow. Good interpretations of visualizations.
- Excellent: Code, visualizations and text storyline documented in detail on GitHub. Great interpretation of visualizations. Table of contents included in Notebook. Significant effort in detail and would be worthy of citing in an established online publication. Bonus points if published through an additional website or blog post.
In-Class Learning Exercises - Material to Know¶
Topics you must know:
- All functionality in Pandas Cheat Sheet
- Pivot tables
- Long to wide format
- How to plot in code (Pandas Plot, Matplotlib and/or Seaborn) and interpret:
- Line graphs
- Vertical bar plots
- Horizontal bar plots
- Pie charts
- Vertical stacked bar chart
- Vertical grouped bar plots
- Histograms
- Box plots
- Scatter plots
- With plots:
- Format x and y tick values
- Limit range of values on x or y-axis
- Add a plot title
- Label axes with custom text
- Make larger font for plots on x-axis, y-axis, ticks and title
I'd recommend you utilize the datasets in Seaborn to practice prototyping the types of visualizations above.
What you Need to Bring Every Class¶
- Laptop
- Laptop charger
- Headphones (optional)
- Lunch, snacks and/or drinks (optional - very long class)
Expectations for You¶
It’ll be a lot of work and a lot of new material! Be prepared to become very familiar with ingesting data in Python, cleaning data, prototyping visualizations and become very skilled in a new visualization framework.
You should spend 7+ hours per week outside of class time in order to succeed.
Class Meetings¶
I don’t think people learn best by being just talked to for hours on end. You learn through building, collaboration and feedback. Class is long so we’ll have intermittent short breaks so you don’t have to sit down for too long.
I'll also pack more lectures/discussions in the first few weeks so we can focus more on group work in the second half of the term.
In-Class Resources¶
Additional External Resources¶
Learn Python:
- Introductory Python 3 book introducing you to data structures and algorithms: Think Python book by Allen Downey.
- Online introductory Python course: Py Course.
Learn Pandas for Data Analysis:
- Mode Analytics tutorials
- Intro to Pandas official documentation.
- Cheat Sheet
- Python for Data Analysis (book)
- Chris Albon Data Wrangling in Pandas
Pandas Plot (visualizations):
Learn command line
- Intro to Mac OS command line:
- Intro to Windows command line:
Learn Git & GitHub
Career advice on data-driven jobs:
- Why you should start a blog
- Advice for Applying to Data Science Jobs
- Building Your Data Science Network
- Advice from Reshama
- Lyft engineering blog post on data science titles
- Indeed post on data science titles
- Data Scientist vs. Analyst perspective
- Difference between data science, machine learning and AI
Data visualization inspiration:
- Alice Zhao's blog
- Shirley Wu’s D3.js art site
- FiveThirtyEight's best visualizations of 2016
- Visualize NBA shots in R
- Bubble Map
- Tesla supercharger prices
- Tableau workbook of survey responses
- Wait But Why post on the story of Tesla and it's importance to the world
- Data visualization principles
- Black Panther box office visualizations
- Vega Lite - A Grammer of Interactive Graphics (YouTube video).
Data visualizations in industry best practices:
Class Schedule¶
Week 1 (June 23, 2018)¶
To start learning in class and continue at home (if needed):
- Recite goals of class; make sure everyone is on same page.
- Poll: what technologies do you know?
- Poll: what would you like to learn?
- Discuss careers in data viz
- Business Intelligence
- Data Analyst
- Data Scientist
- Frontend Engineer
- Machine Learning Engineer
- Why put your data viz and portfolio work online?
- Get feedback from others
- Self-reflective to learn the material
- Get noticed by someone and can potentially get a job, mentor or invited to a conference/event
- Are there any dashboards/visualizations you use frequently?
- Fitbit
- Apple Watch
- Google Analytics
- Create a GitHub profile
- Everyone join class Slack group with this link.
- If you download Slack from its website (not Mac App Store), then you can use Slack to video chat, screen share, and share a cursor on someone's computer.
- Download Anaconda locally to use Python, Python libraries and Jupyter Notebook/Lab
- Show Jupyter Lab
- Browser-based interactive data analysis tool to show code, graphics and HTML elements in a single executable document
- Value of quick data analysis in prototyping in creating data visualizations
- Previously, there were labs with teaching different languages and frameworks in R, Python, Tableau, etc. These are now entirely optional. You can access the previous answers are online here.
- Explanation of Bay Area Bike Share program with 1 minute video
- Live walkthrough of Bay Area Bike Share data-driven project
- Brainstorm project ideas on whiteboard
- Form teams for group data viz project
- Meet with interested students 1-on-1
- What’s your background?
- What are your interests?
- Are you working now or what job do you want to have in the future?
- What technologies do you enjoy most? What do you want to learn?
- Help pick the best track for students to move forward with
- OKCupid data investigation
- Alice Zhao's talk on YouTube
- See examples of the most popular types of data visualizations. Read all articles under the Subcategory Best Practices on my site.
- How to Select a Data Visualization via Chartio
Week 2 (June 30, 2018)¶
- No class due to Independence Day holiday
Week 3 (July 7, 2018)¶
- Overview of Python visualization libraries: Pandas Plot, Matplotlib, Seaborn, Plotly and Altair
- "In recent years, however, the interface and style of Matplotlib have begun to show their age. Newer tools like ggplot and ggvis in the R language, along with web visualization toolkits based on D3js and HTML5 canvas, often make Matplotlib feel clunky and old-fashioned. Still, I'm of the opinion that we cannot ignore Matplotlib's strength as a well-tested, cross-platform graphics engine." - via Python Data Science Handbook
- Clear labeling of charts
- Benchmarking and comparisons
- What is the value of a data point? A data point only becomes valuable information when it can be well understood - often referenced to another data point or feeling/reaction from the outside world.
- Fitbit: daily goal and conditional bar colors in steps per day
- Google Analytics: this week versus last week site visits
- Heatmap of count of bike rides by day and hour of day for Bay Area Bike Share
- Grouped boxplots using Seaborn
- Distributions of subsets of data using Seaborn
- Bar chart with highlight using Altair
- Wikipedia donations over the years using Altiar
- Graded in-class learning exercise
- Review team's project ideas with the class
- Anyone want to switch teams or ideas?
- Suggestions for teams on interesting data sources to look into?
- Project work with team
- Meet 1-on-1 with Dan
- Read Chapters 1-3 of Data Viz Reader
Week 4 (July 14, 2018)¶
- Explanation of project submission framework
- Discussion of poor visualizations at vit.wtf
- Project work with team
- Meet 1-on-1 with Dan
Week 5 (July 21, 2018)¶
- Practice presentation with peer groups and code reviews
Week 6 (July 28, 2018)¶
- Practice presentation with peer groups
- Group project presentations
- Reading homework:
- Understand the business behind Bay Area's bike share program
- Dashboard articles/insights:
- Popular dashboard tools in industry:
- Best practices in dashboards:
- Most important insights at the top
- Easy-to-read visualizations understandable by your audience
- Context of values/charts - labels, changes over time, etc
- Where are dashboards useful?
- Executives to see high-level progress for teams on revenue, costs and revenue over time
- Customer service teams to track # of emails answered and average response time (minutes)
- Marketing teams to see website visits and visits by lead source
- Devops/developers to see stats on product performance and monitor for issues
Week 7 (August 4, 2018)¶
- Live coding session of Bay Area bike share dataset and begin dashboard design
- Graded in-class learning exercise
- Recommended reading/homework:
- What is the Difference Between KPIs and Metrics?
- How to Create a Scorecards Dashboard
- Create an account with Chartio and build your first dashboard. You can copy the steps outlined in my Chartio getting started video on our class Google Drive.
- Using Google Drawings, create an Entity Relationship Diagram based on the dataset from UFO Sightings.
- As a goal, see if you can build at least 4 different types of simple visualizations from the UFO Sightings dataset on your first dashboard: table, bar graph, line graph and a single value tile. Chartio offers comprehensive learning resources.
Week 8 (August 11, 2018)¶
- Graded in-class learning exercise
- Practice using Chartio
- Recommended reading/homework:
- See my UFO dashboard mockup as reference. Create at least one section and utilize 4 different types of visualizations in Chartio using the UFO dashboard dataset. Chartio offers comprehensive learning resources too.
Week 9 (August 18, 2018)¶
- Practice using Chartio with UFO dataset
- Guest presentation by Sean Blake on how to make your data meaningful and beautiful
Week 10 (August 25, 2018)¶
- How to best present data visualizations in slide decks
- Presentation of bad data visualization examples in slide decks
- Peer reviews of individual projects
- ~6 project presentations for those unable to attend Week 11
Week 11 (September 1, 2018)¶
- ~24 project presentations
- What it takes to be a successful Data Scientist
- Class reflection