Feedback should be send to goran.milovanovic@datakolektiv.com
. These notebooks accompany the Intro to Data Science: Non-Technical Background course 2020/21.
We want to practice REST API access from R, a topic covered in Session 04. In the following example we will access a free REST API from within our R environment, collect the API response as JSON, convert it to an R list, and play with the data.
In this example we will rely on the free https://datausa.io/ API to obtain statistical data. Here is the intro to their API: datausa.io API.
baseEndPoint <- "https://datausa.io/api/data"
We will use {httr} to get in touch with the API. It is a part of {tidyverse}.
library(httr)
Step 1. Define API parameters.
First we define the API parameters.
### --- compose API call
# - use base API endpoint
# - and concatenate with API parameters
# - from the following example: https://datausa.io/about/api/
# - parameter: drilldowns
drilldowns <- paste0("drilldowns=", "Nation")
# - parameter: measures
measures <- paste0("measures=", "Population")
# - parameters:
params <- paste("&", c(drilldowns, measures),
sep = "", collapse = "")
cat(params)
&drilldowns=Nation&measures=Population
Step 2. Compose API call.
We put together the baseEndPoint
with the API call parameters:
api_call <- paste0(baseEndPoint, "?", params)
cat(api_call)
https://datausa.io/api/data?&drilldowns=Nation&measures=Population
Step 3. Make API call.
We use httr::GET()
to contact the API, ask for data, and fetch the result:
response <- GET(URLencode(api_call))
class(response)
[1] "response"
The URLencode(api_call)
call to the base R URLencode()
function will take care of Percent-encoding where and if necessary. Hint: always use URLencode(your_api_call)
.
We can see that response
is now of a response
class. It is pretty structured and rich indeed:
str(response)
List of 10
$ url : chr "https://datausa.io/api/data?&drilldowns=Nation&measures=Population"
$ status_code: int 200
$ headers :List of 27
..$ date : chr "Sat, 05 Feb 2022 11:29:49 GMT"
..$ content-type : chr "application/json; charset=utf-8"
..$ x-dns-prefetch-control : chr "off"
..$ strict-transport-security : chr "max-age=15552000; includeSubDomains"
..$ x-download-options : chr "noopen"
..$ x-content-type-options : chr "nosniff"
..$ x-xss-protection : chr "1; mode=block"
..$ content-language : chr "en"
..$ etag : chr "W/\"55b-jEIUyvQphH/gM3DVlQl2pEdoLeo\""
..$ vary : chr "Accept-Encoding"
..$ content-encoding : chr "gzip"
..$ last-modified : chr "Sat, 05 Feb 2022 10:25:59 GMT"
..$ x-cache-status : chr "MISS"
..$ x-frame-options : chr "SAMEORIGIN"
..$ access-control-allow-origin : chr "*"
..$ access-control-allow-credentials: chr "true"
..$ access-control-allow-methods : chr "GET, POST, OPTIONS"
..$ access-control-allow-headers : chr "DNT,X-CustomHeader,Keep-Alive,User-Agent,X-Requested-With,If-Modified-Since,Cache-Control,Content-Type"
..$ x-cache-key : chr "GET/api/data?&drilldowns=Nation&measures=Population"
..$ cache-control : chr "max-age=1800"
..$ cf-cache-status : chr "HIT"
..$ age : chr "3830"
..$ expect-ct : chr "max-age=604800, report-uri=\"https://report-uri.cloudflare.com/cdn-cgi/beacon/expect-ct\""
..$ report-to : chr "{\"endpoints\":[{\"url\":\"https:\\/\\/a.nel.cloudflare.com\\/report\\/v3?s=FM8spWRDTZInGwYuO8rleDDLFu2qRN5h2Xy"| __truncated__
..$ nel : chr "{\"success_fraction\":0,\"report_to\":\"cf-nel\",\"max_age\":604800}"
..$ server : chr "cloudflare"
..$ cf-ray : chr "6d8bcd9c7b3878ac-VIE"
..- attr(*, "class")= chr [1:2] "insensitive" "list"
$ all_headers:List of 1
..$ :List of 3
.. ..$ status : int 200
.. ..$ version: chr "HTTP/2"
.. ..$ headers:List of 27
.. .. ..$ date : chr "Sat, 05 Feb 2022 11:29:49 GMT"
.. .. ..$ content-type : chr "application/json; charset=utf-8"
.. .. ..$ x-dns-prefetch-control : chr "off"
.. .. ..$ strict-transport-security : chr "max-age=15552000; includeSubDomains"
.. .. ..$ x-download-options : chr "noopen"
.. .. ..$ x-content-type-options : chr "nosniff"
.. .. ..$ x-xss-protection : chr "1; mode=block"
.. .. ..$ content-language : chr "en"
.. .. ..$ etag : chr "W/\"55b-jEIUyvQphH/gM3DVlQl2pEdoLeo\""
.. .. ..$ vary : chr "Accept-Encoding"
.. .. ..$ content-encoding : chr "gzip"
.. .. ..$ last-modified : chr "Sat, 05 Feb 2022 10:25:59 GMT"
.. .. ..$ x-cache-status : chr "MISS"
.. .. ..$ x-frame-options : chr "SAMEORIGIN"
.. .. ..$ access-control-allow-origin : chr "*"
.. .. ..$ access-control-allow-credentials: chr "true"
.. .. ..$ access-control-allow-methods : chr "GET, POST, OPTIONS"
.. .. ..$ access-control-allow-headers : chr "DNT,X-CustomHeader,Keep-Alive,User-Agent,X-Requested-With,If-Modified-Since,Cache-Control,Content-Type"
.. .. ..$ x-cache-key : chr "GET/api/data?&drilldowns=Nation&measures=Population"
.. .. ..$ cache-control : chr "max-age=1800"
.. .. ..$ cf-cache-status : chr "HIT"
.. .. ..$ age : chr "3830"
.. .. ..$ expect-ct : chr "max-age=604800, report-uri=\"https://report-uri.cloudflare.com/cdn-cgi/beacon/expect-ct\""
.. .. ..$ report-to : chr "{\"endpoints\":[{\"url\":\"https:\\/\\/a.nel.cloudflare.com\\/report\\/v3?s=FM8spWRDTZInGwYuO8rleDDLFu2qRN5h2Xy"| __truncated__
.. .. ..$ nel : chr "{\"success_fraction\":0,\"report_to\":\"cf-nel\",\"max_age\":604800}"
.. .. ..$ server : chr "cloudflare"
.. .. ..$ cf-ray : chr "6d8bcd9c7b3878ac-VIE"
.. .. ..- attr(*, "class")= chr [1:2] "insensitive" "list"
$ cookies :'data.frame': 0 obs. of 7 variables:
..$ domain : logi(0)
..$ flag : logi(0)
..$ path : logi(0)
..$ secure : logi(0)
..$ expiration: 'POSIXct' num(0)
..$ name : logi(0)
..$ value : logi(0)
$ content : raw [1:1371] 7b 22 64 61 ...
$ date : POSIXct[1:1], format: "2022-02-05 11:29:49"
$ times : Named num [1:6] 0 0.0207 0.0442 0.1067 0.1485 ...
..- attr(*, "names")= chr [1:6] "redirect" "namelookup" "connect" "pretransfer" ...
$ request :List of 7
..$ method : chr "GET"
..$ url : chr "https://datausa.io/api/data?&drilldowns=Nation&measures=Population"
..$ headers : Named chr "application/json, text/xml, application/xml, */*"
.. ..- attr(*, "names")= chr "Accept"
..$ fields : NULL
..$ options :List of 2
.. ..$ useragent: chr "libcurl/7.77.0 r-curl/4.3.2 httr/1.4.2"
.. ..$ httpget : logi TRUE
..$ auth_token: NULL
..$ output : list()
.. ..- attr(*, "class")= chr [1:2] "write_memory" "write_function"
..- attr(*, "class")= chr "request"
$ handle :Class 'curl_handle' <externalptr>
- attr(*, "class")= chr "response"
You need to check one thing: the server status response.
response$status_code
[1] 200
200
means that your request was processed successfully. Introduce yourself to server status responses and learn a bit about them from the following source: HTTP response status codes.
The results is found in response$content
, but…
class(response$content)
[1] "raw"
What is raw
? It means that your data were obtained as raw binary data and they need to be decoded into an R character
class representation. It is easy:
resp <- rawToChar(response$content)
class(resp)
[1] "character"
Is resp
lengthy?
nchar(resp)
[1] 1371
cat(resp)
{"data":[{"ID Nation":"01000US","Nation":"United States","ID Year":2019,"Year":"2019","Population":328239523,"Slug Nation":"united-states"},{"ID Nation":"01000US","Nation":"United States","ID Year":2018,"Year":"2018","Population":327167439,"Slug Nation":"united-states"},{"ID Nation":"01000US","Nation":"United States","ID Year":2017,"Year":"2017","Population":325719178,"Slug Nation":"united-states"},{"ID Nation":"01000US","Nation":"United States","ID Year":2016,"Year":"2016","Population":323127515,"Slug Nation":"united-states"},{"ID Nation":"01000US","Nation":"United States","ID Year":2015,"Year":"2015","Population":321418821,"Slug Nation":"united-states"},{"ID Nation":"01000US","Nation":"United States","ID Year":2014,"Year":"2014","Population":318857056,"Slug Nation":"united-states"},{"ID Nation":"01000US","Nation":"United States","ID Year":2013,"Year":"2013","Population":316128839,"Slug Nation":"united-states"}],"source":[{"measures":["Population"],"annotations":{"source_name":"Census Bureau","source_description":"The American Community Survey (ACS) is conducted by the US Census and sent to a portion of the population every year.","dataset_name":"ACS 1-year Estimate","dataset_link":"http://www.census.gov/programs-surveys/acs/","table_id":"B01003","topic":"Diversity","subtopic":"Demographics"},"name":"acs_yg_total_population_1","substitutions":[]}]}
Now we can see that the API response is JSON indeed. To work with JSON in R, we need to convert it into some R known data structures. For example a list.
Step 4. Convert JSON data to an R list.
We will use jsonlite, also a part of {tidyverse}, to convert from JSON to an R list:
library(jsonlite)
resp_list <- fromJSON(resp)
str(resp_list)
List of 2
$ data :'data.frame': 7 obs. of 6 variables:
..$ ID Nation : chr [1:7] "01000US" "01000US" "01000US" "01000US" ...
..$ Nation : chr [1:7] "United States" "United States" "United States" "United States" ...
..$ ID Year : int [1:7] 2019 2018 2017 2016 2015 2014 2013
..$ Year : chr [1:7] "2019" "2018" "2017" "2016" ...
..$ Population : int [1:7] 328239523 327167439 325719178 323127515 321418821 318857056 316128839
..$ Slug Nation: chr [1:7] "united-states" "united-states" "united-states" "united-states" ...
$ source:'data.frame': 1 obs. of 4 variables:
..$ measures :List of 1
.. ..$ : chr "Population"
..$ annotations :'data.frame': 1 obs. of 7 variables:
.. ..$ source_name : chr "Census Bureau"
.. ..$ source_description: chr "The American Community Survey (ACS) is conducted by the US Census and sent to a portion of the population every year."
.. ..$ dataset_name : chr "ACS 1-year Estimate"
.. ..$ dataset_link : chr "http://www.census.gov/programs-surveys/acs/"
.. ..$ table_id : chr "B01003"
.. ..$ topic : chr "Diversity"
.. ..$ subtopic : chr "Demographics"
..$ name : chr "acs_yg_total_population_1"
..$ substitutions:List of 1
.. ..$ : list()
Step 5. Inspect the result and play with the data.
What is the length of resp_list
?
length(resp_list)
[1] 2
Let’s discover what is inside:
class(resp_list$data)
[1] "data.frame"
How does the resp_list$data
data.frame look like?
head(resp_list$data)
Oh, nice! Let’s plot the time series of the US population over years then:
library(ggplot2)
library(ggrepel)
ggplot(data = resp_list$data,
aes(x = Year,
y = Population,
label = Population)) +
geom_path(size = .25, color = "blue", group = 1) +
geom_point(size = 2, color = "blue") +
geom_label_repel(size = 3) +
ggtitle("US Population") +
theme_bw() +
theme(panel.border = element_blank()) +
theme(plot.title = element_text(hjust = .5))
What is the second element of resp_list
?
class(resp_list$source)
[1] "data.frame"
Let’s see what is in:
head(resp_list$source)
Oh, no: there is a nested data.frame in resp_list$source
; we do not like such things in R but that happens too often when we work with API responses. There is a nice function to take care about such occurrences: jsonlite::flatten()
:
source <- flatten(resp_list$source, recursive = TRUE)
colnames(source)
[1] "measures" "name" "substitutions"
[4] "annotations.source_name" "annotations.source_description" "annotations.dataset_name"
[7] "annotations.dataset_link" "annotations.table_id" "annotations.topic"
[10] "annotations.subtopic"
I understand now: resp_list$source
are the metadata! The API informed us about the sources of the data that it delivered:
source$measures
[[1]]
[1] "Population"
And then:
source$annotations.source_description
[1] "The American Community Survey (ACS) is conducted by the US Census and sent to a portion of the population every year."
Awesome: we get the data and the documentation for it!
For each API that you want to use you will need to read its documentation and learn about the parameters that you may pass to it.
I have stripped this API call from https://datausa.io/profile/soc/education-legal-community-service-arts-media-occupations: just click on View data in the top-right corner.
You can copy and paste the entire API call into your browsers navigation bar to obtain the JSON response directly.
The data are on education, legal, community service, arts, & media occupations in the USA.
Make a call and check the server response status:
api_call <- paste0(baseEndPoint,
"?",
paste("PUMS Occupation=210000-270000",
"measure=Total Population,Total Population MOE Appx,Record Count",
"drilldowns=Wage Bin",
"Workforce Status=true",
"Record Count>=5",
sep = "&"))
response <- GET(URLencode(api_call))
response$status
[1] 200
Convert the response to JSON and than to list and a data.frame:
response <- rawToChar(response$content)
response <- fromJSON(response)
data <- response$data
head(data)
Visualize with {ggplot2}:
R Markdown is what I have used to produce this beautiful Notebook. We will learn more about it near the end of the course, but if you already feel ready to dive deep, here’s a book: R Markdown: The Definitive Guide, Yihui Xie, J. J. Allaire, Garrett Grolemunds.
Goran S. Milovanović
DataKolektiv, 2020/21
contact: goran.milovanovic@datakolektiv.com
License: GPLv3 This Notebook is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. This Notebook is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this Notebook. If not, see http://www.gnu.org/licenses/.