Getting JSON out of Google Spreadsheets

One of the nice features in Google Spreadsheets is the option to publish your data in various formats, including HTML, CSV, PDF, and XML. You’ll notice that list does not include everyone’s favorite, JSON. But it turns out that even though JSON is not included as one of the publish options there is a JSON feed available via the Data API.

But it also turns out that this feed is a bit of a mess. For one, the JSON output is chock-full of meta data and that means you have to do some digging to get to the data that you actually want. Not so bad but there are other problems. Google offers two formats to structure your JSON, “list” and “cells”. The list option puts all of your spreadsheet data into a single property and, oddly, commas in your data are not escaped, which makes the feed useless in many cases. The cell option is better but it spreads your data out across the JSON in an awkward way. In their raw state, I found neither option to be usable.

Pour some PHP on it

To get around these obstacles I started to write some PHP that stripped away the nasty meta data and reworked the problematic parts of the structure. As I got into it I realized that if I had to parse Google’s feed I might as well start with something easier to work with. In other words, instead of teasing apart quirky, bloated JSON maybe it would be better to start with clean data in another format and then convert to JSON.

The Google Documents CSV feed, which contains only the spreadsheet content (no meta data), is perfect for this. All you need is a way to convert the CSV to JSON.

So here’s how to get it set up.

Three easy steps

1. Set up a Google Documents spreadsheet using the first row as your keys. It’s a good idea to use camelCase for these names since you’re probably going to be looping over them later in JavaScript.

Screenshot of Google Spreadsheet showing example data

2. Click Collaborate > Publish as a web page. Click Start publishing and then select the CSV option in the select menu below. The URL it gives you is your CSV feed. Copy it so we can paste it into our script in the next step.
Screenshot showing exporting feed from Google Spreadsheet

3. Now we need to convert Google Spreadsheets’ CSV output to JSON. You can do this a few ways, but I’m handling it with a small PHP script that gobbles up the CSV feed and echoes out JSON. This all happens on-the-fly so, for example, if you’re requesting the data with an AJAX request you just make the request to your whatever.json.php file. The code is below as well as on Github.

animals.json.php

<?php
header('Content-type: application/json');
 
// Set your CSV feed
$feed = 'https://docs.google.com/spreadsheet/pub?hl=en_US&hl=en_US&key=0Akse3y5kCOR8dEh6cWRYWDVlWmN0TEdfRkZ3dkkzdGc&single=true&gid=0&output=csv';
 
// Arrays we'll use later
$keys = array();
$newArray = array();
 
// Function to convert CSV into associative array
function csvToArray($file, $delimiter) { 
  if (($handle = fopen($file, 'r')) !== FALSE) { 
    $i = 0; 
    while (($lineArray = fgetcsv($handle, 4000, $delimiter, '"')) !== FALSE) { 
      for ($j = 0; $j < count($lineArray); $j++) { 
        $arr[$i][$j] = $lineArray[$j]; 
      } 
      $i++; 
    } 
    fclose($handle); 
  } 
  return $arr; 
} 
 
// Do it
$data = csvToArray($feed, ',');
 
// Set number of elements (minus 1 because we shift off the first row)
$count = count($data) - 1;
 
//Use first row for names  
$labels = array_shift($data);  
 
foreach ($labels as $label) {
  $keys[] = $label;
}
 
// Add Ids, just in case we want them later
$keys[] = 'id';
 
for ($i = 0; $i < $count; $i++) {
  $data[$i][] = $i;
}
 
// Bring it all together
for ($j = 0; $j < $count; $j++) {
  $d = array_combine($keys, $data[$j]);
  $newArray[$j] = $d;
}
 
// Print it out as JSON
echo json_encode($newArray);
 
?>

Which returns this:

[
    {
        "animal": "Giraffe",
        "favoriteFood": "Peaches",
        "isFluffy": "Not so much",
        "likesFishes": "Probably",
        "id": 0
    },
    {
        "animal": "Koala",
        "favoriteFood": "Huckleberries",
        "isFluffy": "Mostly",
        "likesFishes": "Probably",
        "id": 1
    },
    {
        "animal": "Porcupine",
        "favoriteFood": "Figs",
        "isFluffy": "No, not at all",
        "likesFishes": "Not sure",
        "id": 2
    }
]
Publishing

The nice thing about this is that edits can be made to the spreadsheet and the changes will automatically be reflected on the front end. In this way Google Documents functions as an extremely convenient CMS. One thing to note is that the “Automatically republish when changes are made” behavior seems to be inconsistent. When I worked on this a month or two ago, changes were reflected in the CSV feed the moment I saved the spreadsheet. Now it seems there’s a lag or that the feed is not updated at all. But there’s an easy solution. Just click Collaborate > Publish as Web Page > Republish Now when you want to publish your changes and you’ll see your changes take effect immediately.

Empty cells

Another important point concerns empty cells. Empty cells will be respected only if the last cell in the row has a value. If you have 5 columns in a row and only the first 2 have data, your CSV line will look like this: value,value. It looks like a row with only 2 columns. On the other hand if you place one of the values at the end of the row, the structure will be preserved and your CSV will look like this: value,,,,value.

So the solution is to make sure the last column always has a value. If possible, make it a required field. Otherwise you could may need to stuff the last row with some placeholder value.

Performance/Caching

There’s a good chance this won’t be an issue at all and you won’t have to do anything. When I test locally or on EC2 the JSON consistently returns in around 300ms. But when I host the proxy script on WebFaction there’s a lot of latency and it takes about 2 seconds to return. If your initial page load depends on this data, 2 seconds is a long time.

If performance is a problem, you’ll want to cache the JSON output of the script. (Using Memcache, nginx, Varnish, Squid, etc, whatever plays most nicely with your environment.)

Props

A few blog posts came in handy when I was researching this. One was uxebu’s JSONP for Google Spreadsheets and the other was Pamela Fox’s How to Convert a Google Spreadsheet into JSON, XML, and MySQL. And thanks to Paul Irish who steered me in the right direction with a few helpful tweets.

Update (3/23/12)

Here’s another option for getting JSON out of Google Spreadsheets. The Miso Project, a pretty badass-looking collaboration between The Guardian and Boucoup, has released a library called Dataset, which includes a utility for extrating data from Google Spreadsheets. It works differently than the above solution (all of the processing is done client side) but you should definitely give it a look.

Both comments and pings are currently closed.

Discussion

Rob,

I’m looking to leverage a JSON feed which will give me an output similar to what you have above. What I’m trying to figure out is the best way to go from A > Z, meaning, I’ve got my JSON feed, now I want to leverage your PHP above, then format it, and extract just the data I want within a function, and output that as static HTML content. Basically creating a “dashboard” into some Google Spreadsheet data. Any tips would be awesome! Thanks.

Steve,

Google Spreadsheets also has an option to export your spreadsheet as HTML. Not sure if this will suit your needs but it’s the easiest solution so I thought I’d mention it.

The benefit of getting the spreadsheet into JSON is that you can easily work on it in JavaScript. You can make an AJAX request to the PHP script and then loop over the data with jQuery or whatever you prefer. If you want to construct the HTML on the server you can skip the JSON output and just write out your HTML directly in PHP.

Rob,

Thanks for the reply. The HTML output won’t work well. I lack the knowledge of taking it one step further such as leveraging AJAX and PHP for output, etc.

Regarding your last line skipping JSON, what would that process look like? I could use the direct URL’s to specific cell’s, create variables, and then output those variables in my HTML I suppose as an alternative.

Wim Tibackx

Sir, you are awesome! You just saved me a lot of development time and regex headaches. Thanks!

Barun Saha

Excellent article, sir! Really helped me :)

Hi Rob, great post, thanks! I am wondering if one could modify your script to allow for nested JSON, e.g. if there are several favorite foods, have favoriteFood.1 = “bananas” favoriteFood.2 = “ants”…. and so forth. Even objects within arrays would be thinkable, by declaring/formatting the top line accordingly. How would I go about solving this?

I’m also interested in the question that Nick asked. This is very helpful!

This worked beautifully for me on the first try (after removing an empty first row that was throwing things off. Thanks!!

Wow, this is great, I’ve been looking for something like this for quite a while!! I’m also wondering the same thing as nick, is there a way to make nested JSON with this?

The greatest problem is that if my data contain commas, I cannot export them as json…

Alan,
This technique allows commas in your data. Commas are escaped in Google Doc’s CSV output.

To those asking about making this work with more complicated data structures, offhand I don’t have a one-size-fits-all solution. But customizing the script for your particular case to group/nest elements shouldn’t be too tricky.

THANK YOU for this. I had proposed a google spreadsheet to help a client easily make edits to their events page on their (non-CMS) site. I thought this would be super-easy to set up knowing that google provided a JSON feed, but was dismayed when I saw how the feed was structured. Your CSV-to-JSON script just saved me hours I would have spent trying to wrangle the google JSON into something workable.

i’ve one doubt. Is it possible to get the jsondata without the square bracket at each ends?

Justin,
The standard JSON feed does contain all of the data but extracting the values you want can be messy. I prefer working with clean data so that the client side processing can be as simple as possible. But using the PHP proxy script does introduce some overhead server side… so it’s a trade-off. Law of Conservation of Complexity, right? Just depends on where you want to engage the complexity.

For me, working with Google spreadsheet feeds on multiple projects, I think absorbing the complexity with the PHP script has made things easier.

Hi! I added a comment on the gist code, figured I’d crosspost here in case anyone finds it useful:

a. I was getting PHP warnings on ln. 54. Adding an @ in from of array_combine(); suppresses that.

b. If you’re wanting the JSON to be consumed on a different domain than the one hosting the PHP (i.e., through jQuery’s getJSON(); function — http://api.jquery.com/jQuery.getJSON/), you’ll need a way around AJAX’s same domain origin policy. I’ve done so by changing the header to script/javascript and replacing line 59 with

echo $_GET['callback']. '(' . json_encode($newArray) . ');';

This allows you to consume the output through JSONP by attaching a callback variable to the path (e.g., “?callback=mycallback”).

Really though, fantastic script, saved me a bunch of time. I’m currently writing a blog entry about how awesome it is. :)

@├ćndrew,
Thanks! I just left a comment on your gist. Interesting, I haven’t seen those PHP warnings. Are they associated with empty arrays? What version of PHP are you using?

Thanks for the tip about formatting the JSON into JSONP. Very useful.

Hi. I’m doing something, but I am processing the CSV on the client side. I just use a very simple php proxy to get around same-origin issues, ( it just echos the csv data without modification ) and convert the data to json / whatever on the client.

This is a good way of avoiding the server load, or having to cache the data.

Somnath

Thanks for such help full Script..

One thing can we set the some data return to the some xls sheet of the google Docs.

if Yes then please help for some.

Thanks You.
Somnath

Thanks a lot, the second step is really not obvious, I was stumped!

Brandon

How do you go about publishing the Json data on the webpage into a table? Newbie here, trying to dynamically access client data for easy updates!

Newbie here. This is fantastic.

I’ve come across an issue: I’ve been trying to implement this, but for whatever reason the script never accesses the CSV url for me. When I save the same CSV file locally and test it, the script works fine.

It’s strange because the CSV url works fine otherwise. Any ideas? Even tried the URL you included and it didn’t work.

@Brian,
Thanks, I’m glad you’ve found it useful. Are you sure the PHP script is running? Are you seeing error messages? Are you running this locally or on a server?

Hi! You don’t need to do all of this to get spreadsheet in json. There’s an indirect way.
http://shashankmehta.in/archive/2012/fetching-google-spreadsheet-in-json.html

PS: Seeing that this post one of the top results for relevant query, I though I should post it here, in case someone finds it useful.

@Shashank,
You’ve misunderstood the point of the post. As I mention in the first paragraph, Google does provide JSON output but the problem is that it’s difficult to work with. My post explains how to use the CSV output to produce a cleaner and easier to work with JSON feed.

Works like a charm, thank you so much!

Great work! Thanks so much!!!!
saved me a lot of time in trying to format and parse json directly from google.

I have the same issue as the previous commentor Brian did. I’m not able to execute the php when accessing the csv url directly but CSV file on my server worked fine. I figure it’s a server setting issue?

Anyway, this solution still works perfect for my purpose as I am not parsing something that changes often. (it just needed to be in a google doc.)

That was a live saver helped to make the Html snapshots for the google spreadsheet Ajax calls Seo friendly
great share thanks a lot Rob
found via Google + card

Thanks for the info. Great work

@Cheri,
Check your php.ini configuration:
allow_url_fopen = “1″

http://php.net/manual/en/filesystem.configuration.php

@Rob,
Thank you, I’m using your CSV to array method to export to plist and xml. Linked your site with this tutorial, but will be explaining on how you can use this data on IOS.

@Po Yu

Thanks for the tip. Yes, that looks to be the problem. I can’t really change the site’s .ini file. But I’ll suggest the change.

@Rob,

(I meant to come leave a comment about this earlier) I am including your script one of the plugins I developed.
http://cherihung.github.io/OpenStatesPHP/
The plugin is to handle OpenStates API but I’m using your script to add the ability to handle extra data not from the API. Thanks very much!

I trying to use this to format a table in Panic’s statusboard but I keep getting the error that I need too include an object before any arrays. Any advice for how to accomplish this?