Just Another Data Blog: Accessing Open Data Portal (India) using APIs

EDIT: I've wrapped up this code into an R package. You can find more info about it on this blog post and here on GitHub.

As I mentioned in my previous blog post, Government of India have started an Open Data Portal for making various data public. Most of the data-sets on the portal are available for manual download. Some of the data-sets though are also available to be accessed using APIs. In this post, I'll go over how to access that data using APIs (specifically JSON API) in R.

Again, the variety of R packages available makes this a not so difficult task. I've made use of mainly these packages - XML, RCurl, RJSONIO, plyr.

The complete process can be described in following steps -

Get the resource id, API key to make the API call
Recursively call API until all the data is obtained
Append all the data creating a single data-set.

Now, I'll describe in details each of the above steps. The resource id is the identifier for the dataset and can be found on the website (For e.g. resource-id 6176ee09-3d56-4a3b-8115-21841576b2f6 refers to dataset on the pin-code details). Another mandatory detail when making an API call is the API key. This key can be obtained by signing up on this data portal. Thus, the API URL would look something like this -

http://data.gov.in/api/datastore/resource.json?resource_id=6176ee09-3d56-4a3b-8115-21841576b2f6&api-key=<your API key>

The content of this URL can be downloaded into R by using getURL() function. Currently, there's a limit of 100 elements that can be downloaded in a single API call. This necessitates the 2nd step - making recursive API calls until all elements have been downloaded. For accomplishing this we can add one more offset parameter to the URL. The URL would now look like -

http://data.gov.in/api/datastore/resource.json?resource_id=6176ee09-3d56-4a3b-8115-21841576b2f6&api-key=<your API key>&offset=1

Here offset signifies the number of calls. For e.g. if in each call we are downloading 100 data elements; after downloading the 1st set of 100 elements, we'd specify offset=1 to download elements 101-200.

The data thus obtained using the recursive API calls can be converted to data.frame using ldply() and each data.frame can be combined into a master data.frame using rbind().

Following GitHub Gist describes the actual R code. You can also look at my GitHub project to proper understand the directory structure used in the code.

	####Download the data from Government of India open data portal#####
	w_dir = getwd()
	source(file=file.path(w_dir,"Code/Core.R"))
	checkAndDownload(c("XML","RCurl","RJSONIO","plyr"))

	### Alternative - 1: Using APIs ###
	#JSON#
	getJSONDoc <- function(link, res_id, api_key, offset, no_elements){
	jsonURL = paste(link,
	"resource_id=",res_id,
	"&api-key=",api_key,
	"&offset=",offset,
	"&limit=",no_elements,
	sep="")
	print(jsonURL)
	doc = getURL(jsonURL)
	fromJSON(doc)
	}
	getFieldNames <- function(t){
	#t: list
	names(t[[4]])
	}
	getCount <- function(t){
	#t: list
	t[[3]]
	}
	getFieldType<-function(t){
	t[[4]]
	}
	getData <- function(t){
	t[[5]]
	}
	toDataFrame <- function(lst_elmnt){
	as.data.frame(t(unlist(lst_elmnt)), stringsAsFactors = FALSE)
	}
	acquire_x_data <- function(x,res_id,api_key){
	currentItr = 0
	returnCount = 1
	while(returnCount>0){
	JSONList = getJSONDoc(link="http://data.gov.in/api/datastore/resource.json?",
	res_id=res_id,
	api_key=api_key,
	offset=currentItr,
	no_elements=100)
	DataStage1 = ldply(getData(JSONList),toDataFrame)
	print(currentItr)
	print(is(DataStage1$id))
	returnCount = getCount(JSONList)
	if(currentItr == 0) {
	returnData = DataStage1
	returnFieldType = ldply(getFieldType(JSONList),toDataFrame)
	}
	else if(returnCount > 0) returnData = rbind(returnData, DataStage1)
	print(currentItr)
	print(is(returnData$id))
	currentItr = currentItr + 1
	}
	list(returnData,returnFieldType)
	}



	#get the resource list file
	#(it has resource names and resource ids used for the API call)
	resourceList = read.table(
	file=file.path(w_dir,"Data/goi_api_resource_details.csv"),
	header=TRUE,
	sep=",",
	as.is=TRUE)

	api_key = read.table(
	file=file.path(w_dir,"Data/goi_api_key_do_not_share.csv"),
	header=TRUE,
	sep=",",
	as.is=TRUE)

	#make the API call
	res = subset(resourceList, resource_name == "pincode")
	pincodeDetails = acquire_x_data(x = res[1], res_id = res[2], api_key = api_key)

	save(pincodeDetails, file=file.path(w_dir,"Data/pincodeDetails.RData"))

view raw GOIDataInput.R hosted with ❤ by GitHub

9 comments:

Vijay BarveApril 16, 2014 at 5:51 PM
I liked this blog post very much and am interested to replicate this. I am not able to find the file Data/goi_api_resource_details.csv please guide me.
steadyfishApril 16, 2014 at 7:31 PM
You might want to take a look at the project (https://github.com/steadyfish/JustAnotherDataBlog.git) on GitHub repository to understand the directory structure I'm using. Specifically, you can also download the required data files from here - https://github.com/steadyfish/JustAnotherDataBlog/tree/master/Data
Ankit ChiplunkarApril 22, 2014 at 11:05 AM
How can i get the API key from this website?
steadyfishMay 4, 2014 at 12:18 AM
@Ankit Sorry for my late response. You can get the API key by signing up on the Open Government Data Portal, India (data.gov.in)
wayfarerJune 26, 2014 at 10:46 AM
Was unable to locate the resource ids for the data sets- specifically i was looking for the Bhindi under the agriculture category? Will appreciate any pointers. I have registered and logged in. i was able to download the files, though...Thanks
UnknownSeptember 26, 2015 at 1:37 AM
i fond the posting and many other things which needs to be done..get jobs for freshers Sarkari Naukri here
AppsLotJanuary 16, 2016 at 8:50 AM
Nice blog, I have registered and logged in with https://data.gov.in, but could not find how to get the resource ids for various catalog like, resource id for each state/city for pincode? thanks
RamAugust 16, 2017 at 8:50 AM
Could not find how to get the resource ids for various catalog like, resource id for each state/city for pincode, please help.
RTUAugust 5, 2019 at 2:25 AM
hey i am trying to get this data with ajax but this is showing less data how can i get the full data from the api

Just Another Data Blog

Tuesday, April 15, 2014

Accessing Open Data Portal (India) using APIs

9 comments:

Blogroll

Just Another Data Blog