Pages

(+)

Friday, 29 July 2011

Who’s New in Google Summer of Code: Part 9

Recent Posts


Every Friday all summer long we have spotlighted three or four new organizations participating in Google Summer of Code. This time organization administrators from Evergreen, Astrometry.net and Xapian give more insight into their projects and discuss some of the tasks their students are working on this summer.
The Evergreen library system provides a public catalog interface for libraries and manages library operations such as circulation (checkouts and checkins), acquisition of library materials, and sharing resources among groups of libraries. In 2004, the Georgia Public Library Service chose to build an open source solution to satisfy their need for a scalable catalog shared by approximately 285 public libraries in the state of Georgia. The first version of Evergreen was released in 2006: today almost 1000 libraries across the United States, Canada and many other countries run Evergreen.

We were delighted to have two Google Summer of Code students join us in this, our first year in the Google Summer of Code program, as their projects are addressing two of the major pain points with Evergreen:
• A kinder, gentler configuration user interface. Evergreen's flexibility as a system that can be used by a consortium of hundreds of libraries or just a single library by itself has come at a price, as hundreds of configuration options were added over time, but the usability for Evergreen administrators has not kept pace. Joseph Lewis has made use of online usability testing in his quest to improve the the experience of Evergreen administrators.
• Improved packaging and deployment. Evergreen is currently distributed only as a tarball, requiring administrators to go through the standard configure / make / make install cycle. After eliminating some long-standing issues with our build infrastructure, Ben Webb has automated the creation of Arch, Debian, Fedora, and Ubuntu packages, and is working towards automating the creation of LiveCDs and virtual machines to help with advocacy and testing.

By Dan Scott, Organization Co-Administrator for Evergreen

----------

Astrometry.net is a computer vision system that takes as input arbitrary images—snapshot, amateur astronomer, or professional—of the night sky, and returns precise meta-data about where those images are located in the sky, and the identities of the astronomical objects visible in those images. If you have a photograph of the night sky, Astrometry.net can identify the constellations, stars and galaxies in the picture. This is not just a cool trick, it makes citizen scientist and badly calibrated astronomical imaging useful for science. That means amateur astronomical images taken in backyards and historical images gathering dust in library archives can be used to make novel astronomical discoveries.

Our Google Summer of Code students, Kevin Chen and Carlos Lalimarmo, are building an image processing and sharing web site where astronomers can process images, share data, use our code, and learn from one another. This site is powered by our core code which can be downloaded and run anywhere, but is accessible to casual users and those who don't want to install the core code. Carlos and Kevin know a lot more about coding for the web than we do at Astrometry.net headquarters (we are astronomers, not coders), so they are making our user experience and our system far better than we ever could have without them. They have re-built our web presence from the ground up and created an integrated API for interfacing with other services, like flickr, where we run on user-submitted images. With Carlos and Kevin's help, Astrometry.net is going from a set of static web pages to a user-generated, interactive community site that is inviting and easy to use. This sets us up for some qualitatively new kinds of citizen science.

By David W. Hogg (NYU) & Dustin Lang (Princeton), Astrometry.net Organization Administrators

----------

Xapian is a Search Engine Library which aims to be fast, scalable, and flexible. It's used by many organizations around the world, including Debian, Ubuntu, One Laptop per Child, and the Gmane mailing list archive. It supports probabilistic ranking and a rich set of boolean query operators. The core library is written in C++, with bindings to allow use from C#, Java, Perl, PHP, Python, Ruby, and Tcl.

This is our first year as a mentoring org in Google Summer of Code, but we've been involved with Xapian-related projects for other orgs in previous years, and Xapian has been under development since 1999. We're mentoring four students this year:
• Nikita Smetanin from Russia is working on an assortment of enhancements to Xapian's existing spelling correction support. This involves such enhancements as allowing multiple possible corrections to be suggested, making use of phonetic algorithms, and handling typos which run words together or move a character from one word to the next. He's also made some substantial performance improvements.
• Xiaona Han is from China, and has been adding support for using Xapian from the programming language Lua, which wasn't previously possible. An early portion of this work has already been merged, and appeared in last month's Xapian 1.2.6 release to make it easy for interested users to try out.
• Dai Youli from China is working on segmenting Chinese text to allow it to be more usefully indexed and searched. Chinese text is usually written without spaces or other indications of where the word breaks are, so Youli has been working on implementing an algorithm to determine where the word breaks are. This is a challenging problem, as the algorithm needs to be reliable, fast enough to process large volumes of text, and robust in the presence of words it doesn't know about.
• Parth Gupta is from India, and his project is adding a "Learning to Rank" framework to Xapian. Learning to Rank is one of the hot topics in Information Retrieval research at present - it's the application of machine learning to tuning the relative weighting of the different features used to determine the order to present results to the end user. Parth recently completed a Master's thesis on Learning to Rank, which has given him a good theoretical background for this project.

By Olly Betts, Xapian Organization Administrator

We had 48 new organizations participating this year in the Google Summer of Code. For a complete list of the 175 organizations participating please visit our program site.

By Stephanie Taylor, Open Source Programs
(+)