rbind_pages.Rd
The rbind_pages
function is used to combine a list of data frames into a single
data frame. This is often needed when working with a JSON API that limits the amount
of data per request. If we need more data than what fits in a single request, we need to
perform multiple requests that each retrieve a fragment of data, not unlike pages in a
book. In practice this is often implemented using a page
parameter in the API. The
rbind_pages
function can be used to combine these pages back into a single dataset.
rbind_pages(pages)
pages | a list of data frames, each representing a page of data |
---|
The rbind_pages
function generalizes base::rbind
and
plyr::rbind.fill
with added support for nested data frames. Not each column
has to be present in each of the individual data frames; missing columns will be filled
up in NA
values.
# Basic example x <- data.frame(foo = rnorm(3), bar = c(TRUE, FALSE, TRUE)) y <- data.frame(foo = rnorm(2), col = c("blue", "red")) rbind_pages(list(x, y))#> foo bar col #> 1 -1.400043517 TRUE <NA> #> 2 0.255317055 FALSE <NA> #> 3 -2.437263611 TRUE <NA> #> 4 -0.005571287 NA blue #> 5 0.621552721 NA red# \donttest{ baseurl <- "https://projects.propublica.org/nonprofits/api/v2/search.json" pages <- list() for(i in 0:20){ mydata <- fromJSON(paste0(baseurl, "?order=revenue&sort_order=desc&page=", i)) message("Retrieving page ", i) pages[[i+1]] <- mydata$organizations }#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#>#> [1] 2100colnames(organizations)#> [1] "ein" "strein" "name" "sub_name" #> [5] "city" "state" "ntee_code" "raw_ntee_code" #> [9] "subseccd" "has_subseccd" "have_filings" "have_extracts" #> [13] "have_pdfs" "score"# }