[Updated 12.19.12: We are now using Mongodb (and Elasticsearch) for Gander. Still, we spent several months getting decently cozy with Couchdb, so if you have any questions about it specifically, feel free to ask in the comments or on twitter!]

Why CouchDB?

CouchDB’s schema less design, JSON document storage, HTTP API, and scalability make it the best tool to solve our problem. The schema less design grants us the ability to add new properties as our application evolves. The HTTP API allows us to easy query, input, and update data. Scalability allows us to grow our CouchDB cluster as our data requirements expand.

Why ElasticSearch?

ElasticSearch was chosen because it provides an easy way to add advanced search capabilities to data stored in CouchDB (via the ES CouchDB river plugin) and for its distributed architecture. So far, it's provided Gander with a very fast, relevancy-based search engine. 

CouchDB Installation and Configuration

This installation guide used Ubuntu 11.04 server edition as the base OS and each component was installed in a separate virtual machine.

--A note on installing from source: I had several issues attempting to install couchdb from source. The easiest method of installation is to use the package found in the repository. The drawback to this is that you may be behind a minor version since software repositories packages

1. Install couchdb

  • #sudo apt-get install couchdb

2. Verify couchdb daemon is running

3. Modify the ip address that couchdb listens for requests on (by default it only listens locally)

  • Stop couchdb
    • #sudo /etc/init/d/couch stop
    • Un-comment and modify the bind address to the static ip (i.e.
      • #sudo nano /etc/couchdb/local.ini
      • Start couchdb
        • #sudo /etc/init/d/couch start

4. Run the couch test suite in Futon (CouchDB's administrative web GUI) to verify an issue free instance.

ElasticSearch Installation and Configuration

Don't reference the tutorials listed on installation unless they are relatively new. Installation options have changed since the tutorials for installing on debian and couchdb integration were published in 2010.

1. Install the headless open-jdk (headless = minus GUI crap)

  • #sudo apt-get install openjdk-6-jre-headless

2. Download the latest elasticsearch debian package

3. Install the package

  • i.e. #sudo dpkg -i elasticsearch-0.19.1.deb

4. Test that the ElasticSearch daemon is running

  • # curl
  • Example respone should ={ "ok" : true, "status" : 200, "name" : "Ramrod", "version" : { "number" : "0.19.1", "snapshot_build" : false }, "tagline" : "You Know, for Search"}

5. Modify the default index storage directory (note the index can be stored directly in memory, though I haven't tested this).

  • Stop the elasticsearch daemon
    • #sudo /etc/init.d/elastisearch stop
    • Create the new data directory and set permissions
      • #sudo mkdir /var/data/elasticsearch
      • #sudo chown elasticsearch /var/data/elasticsearch
      • #sudo chgrp elasticsearch /var/data/elasticsearch
      • Change the data directory in the configuration file
        • #sudo nano /etc/default/elasticsearch
        • change the DATA_DIR= line to /var/data/elasticsearch

6. Install the couchdb river plugin (elasticsearch should still be stopped) -

  • #cd /usr/bin/elasticsearch/
  • #sudo ./bin/plugin -install elasticsearch/elasticsearch-river-couchdb/1.1.0

7. Start the elasticsearch daemon

  • #sudo /etc/init.d/elastisearch

8. Testing elasticsearch couchdb integration

#curl -X PUT '' -d '{"name":"jim"}' #curl -X PUT '' -d '{"name":"ben"}' #curl -X PUT '' -d '{"name":"haley"}' #curl -X PUT '' -d '{"name":"laura"}'

  • Create the index in ElasticSearch

#curl -XPUT '' -d '{ "type" : "couchdb", "couchdb" : { "host" : "", "port" : 5984, "db" : "es_test_db2", "filter" : null }, "index" : { "index" : "es_test_db2", "type" : "es_test_db2", "bulk_size" : "100", "bulk_timeout" : "10ms" } }'