Pages

(+)

Tuesday, 12 April 2011

Introducing CityHash

Recent Posts

We’re pleased to announce the new CityHash family of hash functions for strings. We’re releasing two functions today: CityHash64 and CityHash128. They hash strings to 64- and 128-bit hash codes, respectively. These functions aren’t suitable for cryptography, but our experience so far shows that they’re great for, say, hash tables.

We tried to optimize for CPUs that are common in Google’s datacenters, but it turns out that most PCs and laptops have the relevant features as well. The important ones are 64-bit registers, instruction-level parallelism, and fast unaligned memory accesses.

We were greatly inspired by previous work on hashing, especially Austin Appleby’s MurmurHash. The key advantage of our approach is that most steps contain at least two independent mathematical operations. Modern CPUs tend to perform best with this type of code.

The disadvantage of our approach is that the code is more complicated than most popular alternatives. We decided to optimize for speed rather than simplicity and even included special cases for short inputs.

Overall, we believe that CityHash64 and CityHash128 are exciting new ways to solve a classic problem. Under real-life conditions we expect CityHash64 to outperform previous work by at least 30% in speed, and perhaps as much as a factor of two. Also, as far as we know, these functions’ statistical properties are sound. Please don’t hesitate to try this fast new code!

By Geoff Pike and Jyrki Alakuijala, Software Engineering Team
(+)