Sunday, November 07, 2010

Making a Table of Contents for Sites with Google Apps Script

At work I was building a document using Google Sites. While building them, the section headings were given their own numbers, like so:

Introduction
Part 1: Apples
  Introduction
  1.1: Fuji
  1.2: Macintosh
  1.3: Cameo
  Summary
Part 2: Pears
  Introduction
  2.1: Bosc
  2.2:D'Anjou
  2.3:Warden
  Summary

Yes, of course there's already a gadget you can use to create a table of contents for your page (Menu > Insert > Table of Contents). It's a rather nice gadget, but because I've intentionally numbered some sections and not others, there's a mental mismatch.

As you see, the Table of Contents generator gave section numbers to everything, so depending where you look, 2.2 is either Fuji apples or Bosc pears.

I didn't want to remove the section headings, and there was no real way to remove them via the Table of Contents gadget.

If you're not familiar with Google Apps Script, it's as you think: a scripting engine for Google properties. It's particularly powerful for scripting spreadsheets, and. as of October 22, you can process Google Sites with Google Apps Script.

Before going any farther, let me show you what I generated with Google Apps Script:

I'm no HTML Picasso, but it's not bad. Not bad at all. The links work, thanks to Google Sites automatically injecting <a name> tags with every <h[1-6]> tag.

The code

Let me share the code with you. First, here's how it's invoked:

function run({
  createToc(
    'https://sites.google.com/site/sitename/source-page',
    'https://sites.google.com/site/sitename/destination-page');
}



Yes, so you can specify a page from which to read and generate a second page with the table-of-contents included. (Side note: This actually gives me lots of ideas about workflow -- envision if the source URL was a private site where drafts were written, you could use Google Apps Script as a way to push from a staging area to production.)

OK, the code. It's mostly processing XML and generating HTML.

// I do not warrant this code in any way.
// If you break something important, don't come crying to me.
// Actually, check your Sites revision history first.
// Then go cry.

// Load content from fromUrl. Find the table of contents, and stick it in the document
// Save at toUrl.
function createToc(fromUrltoUrl{
  var page SitesApp.getPageByUrl(fromUrl);
  var content page.getHtmlContent();
  var doc Xml.parse(contenttrue);
  var root doc.getElement();
  var toc=[];
  parse(tocroot);

  var prefixDiv "<div id='kberg-toc'>";
  var postfixHtml "</div><div id='kberg-toc-coda'/>";
  var codaDiv "<div id='kberg-toc-coda'/>";

  var html prefixDiv tocToHtml(tocpostfixHtml;

  var index content.indexOf(prefixDiv);

  var newContent;
  if (index == -1{
    var "<div dir='ltr'>";
    index content.indexOf("<div dir='ltr'>"x.length;
    Logger.log("3: " index);
    var outdex index;
    newContent content.substr(0indexhtml content.substr(outdex);
  else {
    var outdex content.indexOf(codaDiv);
    newContent content.substr(0indexhtml content.substr(outdex codaDiv.length);
  }
  var newpage SitesApp.getPageByUrl(toUrl);
  newpage.setHtmlContent(newContent);
}

// Recurse through the XML, finding <h[1-5]> tags.
function parse(tocelement{
  var name element.getName().getLocalName().toUpperCase();
  if (name == "H1" || name == "H2" || name == "H3" || name == "H4" || name == "H5"{
    var level parseInt(name.substring(1));
    var anchorChildren element.getElements("a");
    var anchorText (anchorChildren.length 0)
        anchorChildren[0].getAttribute("name").getValue("";
    
    var item {
      level level,
      anchor anchorText,
      description element.getText(};
    toc.push(item);
  else {
    for each (var child in element.getElements(){
      parse(tocchild);
    }
  }
}
     
// Convert the table of contents entries into HTML.
function tocToHtml(toc{
  var html =
      "<div style='width: 250px; background-color: #e8e8e8;'>\n"
      "<br/><b>Contents</b><br/><br/>\n"
  for each (var entry in toc{
    var html2 indent(entry["level"]"<a href='#" entry["anchor""'>" +
        entry["description""</a><br/>\n";
    html html html2;
  }
  html html "<br/></div>";
  return html;
}

// Given an indentation level, return (2 * (level - 1)) non-breaking spaces.
function indent(level{
  var "";
  for (var 0level 1i++{
    "&nbsp;&nbsp;";
  }
  return x;
}


Final Thoughts

The remaining issue is triggering this script. Google Apps Script does have an event notification mechanism (e.g. run this script when the spreadsheet opens) but right now, Google Sites only has a time-based trigger (e.g. run this script every n minutes/hours.) To be honest, if an onSave trigger was written, would createToc(url, url), which would save the TOC to itself, trigger another onSave event? No big deal, if it results in a no-change, I could tweak the code accordingly.

But the lack of an onSave trigger makes this just slightly usable for me. Having this run every two minutes is too frequent for when the site isn't updated. I don't know the plans for the Google Apps Script team and/or the Google Sites team, but this is a good use case for onSave.

I wonder how hard it would be to hide the default TOC renderers' section numbers with clever CSS, or just write my own Google Gadget...

9 comments:

Nicolas ANDRE said...

Hi Robert

Thanks for the script. Do you think you can do the same thing for google docs ? That wold be helpful for a lot of people.

Nicolas ANDRE said...

Hi Robert,

Thank for you script. To you think you can provide one for google doc as well ?

konberg said...

Nicolas, I'm glad you like it. Since I've basically provided the code, If someone else wants to adapt it to Google Docs they are more than welcome to do so.

Anonymous said...

Why not have the code run when the page is loaded. function doGet() runs when the Apps Script loads. So embed the Apps Script gadget and when the page reloads the script runs everytime. Then you will not require a time based trigger. Is this what you are trying to accomplish?

konberg said...

Anonymous, possibly because it wasn't available at the time, or even if it did, it met my requirements at the time. I can assure you that now, two years later, I might do something different.

Tony Proctor said...

I've run into this double-numbering "big time", and I see lots of other people have it.

I fail to see why Google aren't acknowledging this, and tweaking the standard TOC gadget accordingly. How hard can it be to optionally leave out the auto-numering?

Anonymous said...

I know this is an old post and you might not be monitoring it anymore, but I thought I would ask. I've tried installing the script, and it mostly works for me. Except the actual Table of Contents it generates is blank. It generates all the links, but no text in between. There must be a small detail I'm missing?

This is my test page:
https://sites.google.com/a/wellesleyps.org/wps-student-links-test/toc-test-fromurl

The TOC links look like this:
<a href="#TOC-846-Drawing-and-Painting-I"></a>

There is no text between the opening and closing <a> tags, so they appear blank. In the script it says that
entry["description"]
should go there, but something is wrong.

Anonymous said...

I think I solved my problem. I will just post it here in case anyone else has the same problem. My colleague who made the webpage originally had added a lot of formatting to the H1, H2, H3 tags on the page (fonts, colors, sizes etc.) and that somehow caused the table of contents not to work. I just cleared the formatting of the headers and reset them to just plain H1, H2, H3 tags. Then I re-ran the script and it worked fine.

Thank you!

(I don't know why Google doesn't include an unnumbered list in the Table of Contents gadget by default. Seems like a lot of people would use it.)

marvemudanca said...

This is a great bblog