The Internet¶
In browsing websites, you've encountered HTTP before. It stands for HyperText Transfer Protocol.
For visiting my website, you may have entered https://dfrieds.com into your browser. Your browser, also known as the client, made a request over HTTP to get the contents of my website. Once the reuqest is complete, the browser can render my site's code and you'll see the homepage of my website!
The contents of my web page live on another computer on the Internet called a server. I use the hosting service through GitHub Pages and so they store my site's code on their servers.
I view the Internet as a large network of connected servers. Every page on the Internet is stored on a server.
APIs¶
An API is an Application Programming Interface. An API is part of the server that receives requests and sends a response.
Think of an API like a coding contract; it specifies the ways your program can interact with the application.
There are four HTTP methods to interact with an API. They are:
HTTP Method | Description |
---|---|
GET | retrive information from a source |
POST | send new information to a source |
PUT | update existing information of a source |
DELETE | remove existing information from a source |
Popular APIs you may have interacted with are:
- Twilio to send and receive txt messages. In the past, Uber used Twilio's API so they could send you a txt message to alert you that your driver was nearby.
- Weather Underground to retrieve weather data. If you use a mobile app to view the weather today, that app is likely using an API from Weather Underground or a similar company to retrive Weather Underground's proprietary weather data they collect.
Typically, when a company offers an API to customers, such as Twilio or Weather Underground, it just means they designated a number of URLs that just return data. So the responses often don't contain the styles in HTML and CSS you'd see on a typical webpage.
In the following example, I'll just cover the use of GET requests since that's common for retrieving information to perform data science.
Getting Started with the Hacker News API¶
In this lesson, we'll use Python to request data on Hacker News posts.
Hacker News is a social news website in which anyone can upload a link to a story, pose a question or comment on any of these items. It's similar to Reddit. However, Hacker News is often more geared towards technology and entrepreneurship.
Import module¶
from requests import get
API documentation¶
The API documentation on GitHub details how to retrieve and intrepret data from Hacker News.
Hacker News regards each text blob posted on its site, whether it be a story, comment, job, or poll to be called an item. Each item is identified by a unique ID, which are integers.
The URLs start with https://hacker-news.firebaseio.com/v0/
.
Make a Simple Request¶
We'll use the get
method from the requests
library in order to get a response object from the Hacker News API.
The response object allows us to retrieve the data returned from a Hacker News item.
I'm curious about post/item number 8863
on Hacker News. We can simply append /item/8863
to the API URL string. In doing so, we're hitting an API endpoint - an API-defined location where particular data is stored.
Hacker News has additional endpoints for getting details on user profiles in which you'd append /user/{name_of_user} instead of getting an item.
Below, we assign a variable named hacker_news_response
to the returned response object.
hacker_news_response = get('https://hacker-news.firebaseio.com/v0/item/8863.json')
We can see the type of this object below to be a response object from the requests
library.
type(hacker_news_response)
requests.models.Response
We can call the text
attribute on hacker_news_response
to get the body of the post. The body is a string.
type(hacker_news_response.text)
str
Below we see the body of our response. We could convert this string to a Python dictionary to more easily parse the details.
hacker_news_response.text
'{"by":"dhouston","descendants":71,"id":8863,"kids":[9224,8952,8917,8884,8887,8943,8869,8940,8908,8958,9005,8873,9671,9067,9055,8865,8881,8872,8955,10403,8903,8928,9125,8998,8901,8902,8907,8894,8870,8878,8980,8934,8876],"score":104,"time":1175714200,"title":"My YC app: Dropbox - Throw away your USB drive","type":"story","url":"http://www.getdropbox.com/u/2/screencast.html"}'
This item was posted by username dhouston
on unix epoch time of 1175714200
which is equivalent to Wednesday, April 4, 2007 at 7:16 PM. The title of the item was My YC app: Dropbox - Throw away your USB drive
.
We used Python to programmatically retreive the contents of a post on Hacker News. You could use this same URL, enter it in your browser search bar, and see the same results. Try it!
Application to data science¶
Retrieving data from sites is valuable in data science so you can later perform analysis on the data.
For example, you could retrieve posts over time on Hacker News and see popular topics broken down by day or week.