APIs

The Internet

In browsing websites, you've encountered HTTP before. It stands for HyperText Transfer Protocol.

For visiting my website, you may have entered http://dfrieds.com into your browser. Your browser, also known as the client, made a request over HTTP to get the contents of my website. Once the reuqest is complete, the browser can render my site's code and you'll see the homepage of my website!

The contents of my web page live on another computer on the Internet called a server. I use the hosting service through GitHub Pages and so they store my site's code on their servers.

I view the Internet as a large network of connected servers. Every page on the Internet is stored on a server.

APIs

An API is an Application Programming Interface. An API is part of the server that receives requests and sends a response.

Think of an API like a coding contract; it specifies the ways your program can interact with the application.

There are four HTTP methods to interact with an API. They are:

HTTP Method Description
GET retrive information from a source
POST send new information to a source
PUT update existing information of a source
DELETE remove existing information from a source

Popular APIs you may have interacted with are:

  • Twilio to send and receive txt messages. In the past, Uber used Twilio's API so they could send you a txt message to alert you that your driver was nearby.
  • Weather Underground to retrieve weather data. If you use a mobile app to view the weather today, that app is likely using an API from Weather Underground or a similar company to retrive Weather Underground's proprietary weather data they collect.

Typically, when a company offers an API to customers, such as Twilio or Weather Underground, it just means they designated a number of URLs that just return data. So the responses often don't contain the styles in HTML and CSS you'd see on a typical webpage.

In the following example, I'll just cover the use of GET requests since that's common for retrieving information to perform data science.

Getting Started with the Hacker News API

In this lesson, we'll use Python to request data on Hacker News posts.

Hacker News is a social news website in which anyone can upload a link to a story, pose a question or comment on any of these items. It's similar to Reddit. However, Hacker News is often more geared towards technology and entrepreneurship.

Import module

from requests import get

API documentation

The API documentation on GitHub details how to retrieve and intrepret data from Hacker News.

Hacker News regards each text blob posted on its site, whether it be a story, comment, job, or poll to be called an item. Each item is identified by a unique ID, which are integers.

The URLs start with https://hacker-news.firebaseio.com/v0/.

Make a Simple Request

We'll use the get method from the requests library in order to get a response object from the Hacker News API.

The response object allows us to retrieve the data returned from a Hacker News item.

I'm curious about post/item number 8863 on Hacker News. We can simply append /item/8863 to the API URL string. In doing so, we're hitting an API endpoint - an API-defined location where particular data is stored.

Hacker News has additional endpoints for getting details on user profiles in which you'd append /user/{name_of_user} instead of getting an item.

Below, we assign a variable named hacker_news_response to the returned response object.

hacker_news_response = get('https://hacker-news.firebaseio.com/v0/item/8863.json')

We can see the type of this object below to be a response object from the requests library.

type(hacker_news_response)
requests.models.Response

We can call the text attribute on hacker_news_response to get the body of the post. The body is a string.

type(hacker_news_response.text)
str

Below we see the body of our response. We could convert this string to a Python dictionary to more easily parse the details.

hacker_news_response.text
'{"by":"dhouston","descendants":71,"id":8863,"kids":[9224,8952,8917,8884,8887,8943,8869,8940,8908,8958,9005,8873,9671,9067,9055,8865,8881,8872,8955,10403,8903,8928,9125,8998,8901,8902,8907,8894,8870,8878,8980,8934,8876],"score":104,"time":1175714200,"title":"My YC app: Dropbox - Throw away your USB drive","type":"story","url":"http://www.getdropbox.com/u/2/screencast.html"}'

This item was posted by username dhouston on unix epoch time of 1175714200 which is equivalent to Wednesday, April 4, 2007 at 7:16 PM. The title of the item was My YC app: Dropbox - Throw away your USB drive.

We used Python to programmatically retreive the contents of a post on Hacker News. You could use this same URL, enter it in your browser search bar, and see the same results. Try it!

Application to data science

Retrieving data from sites is valuable in data science so you can later perform analysis on the data.

For example, you could retrieve posts over time on Hacker News and see popular topics broken down by day or week.