Scraping a table from a webpage using =importHTML
In Google Docs, one of the most useful formulae for getting data from a webpage into your spreadsheet is =importHTML - you simply type the formula (with the appropriate values) into a cell, it fetches the data from the table on that webpage, and puts it into your spreadsheet.
Best of all, every time you open the spreadsheet the data will be updated with the webpage.
Here's how the formula is broken down:
=importHTML('URL containing table', 'table', number representing which table you want)
An example might look like this:
=ImportHTML("http://www.horsedeathwatch.com/", "table", 1)
What this does is tell the spreadsheet to
- go to the webpage http://www.horsedeathwatch.com/
- look for a HTML table (specifically, it is looking for the tag <table> in the raw HTML used to create the page)
- grab the contents of the first one (1)
You must put quotation marks around the web address and "table", or it won't work.
The number - what's called an index - does not need quotation marks.
How do you know what number to use? Well, there are two ways: you can look at the raw HTML and count how many tables there are - and which one you need. Or you can just use trial and error, beginning with 1, and going up until it grabs the table you want. That's normally quicker.
Note: it's a good idea to keep a copy of your snapshot in case the webpage is taken down or changed. You can do this by selecting all the cells and clicking on Edit>Copy then going to a new spreadsheet and clicking on Edit>Paste values only