Pages

(+)

Wednesday, 23 July 2008

Google XML Pages: A Functional Markup Generation Tool

Recent Posts



Google XML Pages (GXP) is a templating system we use at Google. Its main focus is markup: we mostly use it for generating HTML and XHTML, but it can work with other flavors of XML, like Atom, KML, and RSS. It also has some support for a few non-markup languages (JavaScript, CSS and plain text), though mostly for embedding them within markup.

GXP's story begins in 2001, when I was working on an early version of Google AdWords. We had been having a lot of issues with the templating system that the AdWords Java frontend was using at the time, so I started looking around for alternatives. I made a list of some features I'd like from a Java compatible templating system, which looked something like this:
  • Compile time type checking and markup validation.

  • Convenient parameter passing/modularization. (our old templating system just used "includes")

  • Automatic escaping of untrusted content (to protect against Cross Site Scripting attacks).

  • A way to prevent "business logic" from ending up in our templates without being too oppressive.

  • Easy to use internationalization support.

  • A lightweight runtime, preferably not tightly coupled to the Servlet API. (so you can use it to generate email, not just web pages)

After investigating a number of templating systems, I couldn't find anything that met our criteria. I started playing around with ideas for what a templating system that met all of these criteria would look like, and I showed my eventual design to some other members of the team. They were enthusiastic about it, and our tech lead suggested that I go ahead and build it.

The original version of the compiler was written in Python and had a very tiny runtime system written in Java. We used this new language when we rewrote the AdWords system. Since then a number of other projects at Google have used GXP, including AdSense, Blogger, and Google Reader.

Over the next few years the language grew: the internationalization support was expanded, XHTML support was added (it originally only generated HTML), and the automatic escaping system was improved to significantly improve protection against XSS. About two years ago I began a complete rewrite of the compiler in Java. One of my hopes was that this would encourage users of GXP to become contributors to the compiler.

Several months into the rewrite, Harry Heymann joined in. Not only did Harry aid in getting the new compiler finished to the point where users of the old compiler could switch to the new one, but he also contributed extremely valuable new features like runtime compilation, instantiable templates, and support for generating non-HTML markup. Over the past several months Harry has also lead the effort to open source GXP.

GXP is now available to the Open Source community. It has been a useful tool for us, and we hope that others can find a use for it as well. Like virtually all software, it is still a work in progress, so we're also hoping others would like to help build on this work. In particular, the new compiler was designed to enable generation of code in languages other than Java. We've started writing a C++ code generator, but it is not yet complete. Working on this, or developing code generators for other languages (JavaScript?, Ruby?, INTERCAL?) could be particularly interesting. Take a look and let us know what you think!

Also, if you happen to be in Portland for the O'Reilly Open Source Convention, stop by for our presentation on Wednsday.
(+)