Blog


A blog about security, privacy, algorithms, and email in the enterprise. 

Viewing entries tagged
mongodb

Comment

MongoDB Resource

mongodb.jpg

Skimbox stores user data in Mongo, the Bon Iver of distributed databases. So far, we're digging Mongo, but it's definitely a mix of guns and roses. If you're using it or thinking about trying it, hopefully, some of the following advice and information will be helpful!

Tips 'n' Tricks

1. Pretty Printing from the Command Line Shell:

When using the shell, the results are often lumped together into a single line. Use the .pretty() method to format nicely, as in:

 > db.emails.find({_id:/b8037de0-1170*/}, {headers:1}).pretty();

If you only have one element that you are looking at, specifically indexing it will also give pretty results:

 > db.emails.find({_id:/b8037de0-1170*/}, {headers:1})[0]

or

 > db.emails.findOne({_id:/b8037de0-1170*/}, {headers:1})
 

2. Backup The Production Database Locally:

$ mongodump -h <your mongo instance> --port <port> -d Gander -u <user> -p '<password>'

(enclose the password in apostrophes to prevent the shell from interpreting special characters).

This will create a subdirectory called ./dump that contains the exported database.

3. Restoring a Mongo Dump to a Local Meteor Instance

$  mongorestore -h localhost --port 3002

This assumes that the current directory has a subdirectory ./dump created from a previous backup and that Meteor is running locally.

The production database is named Gander, which is the database that will be created by this restore. By default, the local Meteor database is named 'meteor'. To rename the local database, open the mongo command line tool:

> db.copyDatabase( "Gander", "meteor" )

There may be a faster way to copy databases between servers, but I haven't tried it yet.

4. Retrieving Specific Subcollection Fields

To retrieve certain fields from the top level of a collection, use the field selector:

> db.collection.find ({}, {field1:1, field2:1});

If you wanted just fields of a subcollection, use apostrophes:

> db.collection.find({}, {'field1.subfield1':1, 'field1.subfield2':1, field2:1});

5. Repairing a Corrupt Local Database

Over the weekend, my trusty Macbook shutdown due to battery exhaustion. Usually it puts it to sleep, but I guess it ran out of juice. When I rebooted Ubuntu, upon starting Meteor, I got this nasty message:

[[[[[ ~/Projects/Mahogany ]]]]]
Unexpected mongo exit code 100. Restarting.
Unexpected mongo exit code 100. Restarting.
Unexpected mongo exit code 100. Restarting.
Can't start mongod
MongoDB had an unspecified uncaught exception.

So I basically had a corrupt local database. Riffing off these instructions, I did the following operations specific to the Meteor install of Mongo:

  cd .meteor/local
  rm db/mongod.lock
  /usr/local/meteor/mongodb/bin/mongod --dbpath db --repair --repairpath db1

and all was good again. Ensure that no other instances of Meteor and Mongo are running when you do this procedure.

Gotchas we've found

1. Keys Cannot Contain Periods:

A key cannot contain a period "." or start with a "$" (ref). This is particularly annoying if the hash is an email address like "joe@smith.com". This also occured with folder_name, specifically for the last_uid collection. For Gander, in order to escape the period, I used:

  a = addr.gsub('.','#DOT#') # Ruby encoding
  i = item.replace(/#DOT#/g, '.'); // JavaScript decoding

Unfortunately, this was discovered the hard way: no error message when trying to do an update:

   @user_coll.update({"_id" => user['_id']},{"$set" => {'address_book' => vips } })

The update would fail silently and not generate an exception.  

2. Exercise Caution When Using update

I was testing marking messages for deletion. I wanted to revert the change and retest, so I entered:

   db.emails.update({gander_status:'deleting'}, {gander_status:'gmail'});

Intuitively, you would think that this would simply change the value of the gander_status item. Wrong - it deletes all the other fields, leaving only the gander_status field (and id of course). The correct syntax is using $set

   db.emails.update({gander_status:'deleting'}, {$set: {gander_status:'gmail'}});

3. Mongo Does Not (Easily) Support SSL

See http://docs.mongodb.org/manual/administration/ssl/ and a relevant discussion at: http://stackoverflow.com/questions/11310299/securing-mongodb-transport-in-the-cloud.

4. Count does not take into account skip and limit

Let's say you have the following code:

  • Example 1 - limit()

var x = Emails.find();

console.log("x=",x.count());

var y = Emails.find({},{limit:20});

console.log("y=",y.count());

You would expect:

y = 20 (if x > 20)

  • Example 2: skip

var x = Emails.find();

console.log("x=",x.count());

var y = Emails.find({},{skip:50});

console.log("y=",y.count());

You would expect:

     y = x - 50 (if x > 50)

This is not the case. By default, Mongo's .count() does not take into account usage of skip and limit. So in both of the above examples x = y.  Count() returns the entire cursor's count. Different drivers (e.g. Mongo shellperlJavaScript) have other means of returning the actual expected cursor count. I have not found a way in Meteor's driver to find the adjusted count.

Best Practices

Replica Set Configuration

  • Don't use IP addresses
  • Don't use /etc/hosts
  • Use DNS
    • Pick appropriate TTLs 

See Also

General:

Schema Design:

Indexing and Performance:

Books: MongoDb: The Definitive Guide title says it all. 

Comment

Comment

Meteor Resources

gander meteor.png

When we started prototyping Skimbox using Meteor's open source web app platform, we weren't sure the Shiny New Thing with its CSS/jQuery/node.js/Mongodb stack and symmetric real-time data model had legs capable of carrying our email app into the real world. Our relationship in the two months since hasn't always been angels and butterflies, but it has been fruitful--enough so that we're (for now) planning on keeping it for Gander's live version (still in closed beta, but you can sign up for the waitlist here). 

The following list contains information on Meteor's features, third-party tools, Q&As, things to watch out for, events, and best-practices. They've proven useful to myself and our team; hopefully they are to you as well. If you have further questions, though, don't hesitate to ask them in the comments, on twitter, or on Stack Overflow.  

General Links

Meteor Main Page

  • This includes links to the documentation, lots of other resources, Stack Exchange tags, etc.
  • Worth following on twitter http://twitter.com/meteorjs

Matt DeBergalis talk at Realtime 2012

  • Matt describes an overall view of Meteor and the underlying DDP protocol.

David Greenspan introduces Spark 

  • Demo and brief explanation of reactivity, Live HTML and templates

Debergalis Oct 2012

  • Introduction to Meteor, including authentication and reconnection demos

Tom Coleman's Github Page

https://github.com/oortcloud

  • Includes some potentially useful tools for packaging Meteor, either to include new packages, or to deploy to Heroku instead of meteor.com

EventedMind training videos

  • This guy has produced an awesome set of training videos to learn Meteor. This will be very valuable for newbies to Meteor. The one on DDP is especially useful. Great stuff.

http://andrewscala.com/meteor/

  • Nice simple tutorial for Meteor. 

How to debug Meteor server-side. Client side debugging is typically done using the Chrome Developer Tools. See also https://github.com/meteor/meteor/pull/412

Test Driven Development in Meteor pretty comprehensive SO question about using TDD in Meteor

How Does Reactivity Work Behind the Scenes  another pretty comprehensive SO answer explaining how Meteor's reactivity works.

Some Things to be Aware Of

Minimongo Restrictions

Since data is replicated on the client in 'minimongo', an in-memory copy of the data,  there are some restrictions in the client:
  • The MongoDB _id default cannot be used because it is a BSON object for efficiency. Instead a straight text _id must be used like an RFC4122 UUID
  • Indexes are not available client side. Since minimongo is entirely in memory, this should still be fast, but don't send down 100,000 items and expect fast lookups.
  • Since entire collections are copied, there is no point is sorting server-side. For display purposes, sorting has to be done on the client.
  • Minimongo currently does not allow sorting on subkeys (ie. obj.headers.Subject) (This restriction was removed in Meteor 0.5.3 and later).
  • Minimongo supports only String, Number, Boolean, Array and Object types (ref). Binary dates must be converted to strings.

Meteor uses Fibers, not Node.js Async

Meteor uses fibers, which is debatable. In some ways, it makes it easier to program server-side code. OTOH, this is inconsistent with the majority of node.js Node Packaged Modules. It isn't clear how to mix and match these two different synchronization approaches in a single Meteor app. It also isn't clearly documented how to use fibers in a custom package to avoid blocking and other messiness. Further reading: TechCrunch and the sparse Meteor docs, and this performance related SO question (see also the odd comment from Tom Wijsman at the bottom), Quora.

There's a pretty helpful gist that demonstrates a number of different techniques for handling async behavior and node integration with Meteor.

Mongo Access is Serialized and Synchronous

When looking at the packages/mongo-livedata package, it appears that Meteor puts all Mongo in a fiber, which serializes it. It also uses Mongo's safe mode which is synchronous. This leads to a simpler, more reliable implementation. However, this model will introduce scaling problems under higher loads. Mongo supports asynchronous writes, with callbacks, and as well multiple concurrent calls. It is typically used that way in node.js deployments. The Meteor team will likely have to do some non-trivial rewriting of their driver for higher performance.

Publishing User Defined Fields is Poorly Documented

If you add fields to the user's collection, retrieving them later in the app is poorly documented. On the server, follow the documentation:

Meteor.publish("userData", function () {
 return Meteor.users.find({_id: this.userId}, {fields: {'other': 1, 'things': 1}}); 
});

On the client, subscribe to this collection:

Meteor.subscribe("userData");

Now the new fields will appear in Meteor.users, as in:

Meteor.users.findOne()['other'];
Meteor.users.findOne()['things']; 

So even though the subscription is named differently, the data still shows up in the users collection. If the client does not subscribe to this new collection, the additional fields will not be available. (I had to file a bug report to understand this).

Reactivity is not Guaranteed

We have found two places where Meteor's reactive context does not work as advertised.

  • The subscription callback for completeness. This callback will be invoked before the collection is completely loaded on the client. According to a recent SO question, DeBergalis contradicted the documentation stating "subscribe() takes a callback that will run when the initial [emphasis mine] set of documents are on the client." It seems that the publisher will call OnComplete, triggering the client-side callback, prior to the low level shipping of the full collection. Bottom line: it is non-deterministic to know if the collection is fully populated. UseMeteor.call to invoke something guaranteed, synchronously. 
  • Autorun will not execute when some of the dependent Session variables or collections change.

While we don't know for certain, it seems like reactivity breaks down when there are multiple dependencies. If one reaction changes something that should set off a second reaction, the second reaction does not fire consistently.

As a workaround for the subscription callback, it is always possible to use a separate Meteor.call to the server. The server method could return an aggregate object. When the client Meteor.call callback is invoked, the full results are present in the callback. So this is still asynchronous processing but the results are definitely on the client when the callback is invoked.

Synchronous Usage of Meteor.call Does Not Return Results

When using Meteor.call in the synchronous form, this is just a stub and there will be no return value from the server. Instead you have to set up an asynchronous call to capture the correct result.

Spurious Error Message Due to Mal-Formed HTML in a Template

It took a couple of days to track down a bug in Beaver. The browser's page would crash hard, where even a hard reload was not working consistently. In the browser console, there was a very long stack trace starting with:

Exception from Meteor.flush: Error: An invalid or illegal character was specified, such as in an XML name. at Function.Spark.Patcher.copyAttributes

The problem was due to missing a closing '>' in the HTML of the template. Meteor crashes hard if the HTML is mal-formed in a template. Unfortunately the HTML checking tools will not work on templates, especially partials so this must be done manually and carefully.

<td><input type="checkbox" id="show_subcategories"

 {{#if preferences.show_subcategories}}

 checked="true"

 {{/if}}

 >        <---- This was missing

</td>

No Deny without Allow

Meteor allows deny and allow methods in order to implement security on Insert, Update and Delete operations. While it is possible to have allow methods without deny, it is not possible to have deny without allow. If a given collection has a single deny method, it must have at least one allow method. This is buried in the docs as:

"If you never set up any allow rules on a collection then all client writes to the collection will be denied, and it will only be possible to write to the collection from server-side code."

Open Questions

None at this time.

Closed Questions

What storage limits are there to databases stored on meteor.com?

  • Hosting on meteor.com does not currently have any data caps, but it also has no SLA or guarantees =). If someone abuses the hosting service with unreasonable data or bandwidth usage, we may very well add data caps, but we haven't had a problem yet.

How often does the client do a full refresh? SO claims every 10 ten seconds. T/F?

  • True, but misleading. 10 seconds is the time it takes the server to notice documents added directly to the mongo database by an external process. When clients write to the database through Meteor, other connected clients see it immediately, not 10 seconds later.

Is packaged Mongo 32 or 64 bit?

  • MacOS and Linux x86_64 ship with 64 bit mongo. You'll only get a 32 bit mongo if you are running on a 32 bit linux machine. meteor.com is running 64 bit mongo (hosted by MongoHQ).

Does Meteor play nice with Mongo's Auth package, or does it require no Auth?

  • Meteor's auth works at a level above the database. It does not integrate directly with Mongo's auth.

What ports are required to be opened on a Meteor server besides standard 80/443?

  • Meteor listens for incoming HTTP requests on whatever port you tell it (3000 by default in local development, 80 by default when run via the meteor bundle command). The meteor bundle does not handle HTTPS, that is a feature of the meteor.com hosting. The meteor development mode runner takes an addition internal port and runs mongod which takes 2 more ports. But these ports are not used externally, and should not be open.

According to SO, "$ meteor mongo -U" is valid for only one minute. What's a more reliable way to determine this persistently?

  • That's correct. The one minute thing is a feature, designed to limit the risk of your database being attacked. If you want a permanent URL, though, you can work around this. Your deployed server is passed a URL to a mongo endpoint as the "MONGO_URL" environment variable. You can print 'process.env.MONGO_URL' in server code to see it. Be careful though, because we don't expose a good way to change your database password short of deleting the app and re-deploying.

Can regular Mongo monitoring services be used against Meteor?

  • In practice, it doesn't matter. Any serious application should be using a stable Mongo database hosted elsewhere, like MongoHQ. When deploying to meteor.com, Meteor requires hardcoding credentials in a local copy of Meteor for accessing the MongoHQ instance. Long term, it is preferable to run a Meteor app elsewhere like AWS or Heroku, rather than meteor.com.

Related Projects

Meteor either encapsulates or plays nice with

  • Handlebars (HTML templating)
  • jQuery client JavaScript library for eventing
  • node.js server Javascript execution environment v0.8
  • bootstrap the scaffolding / column layout manager used by Mahogany

The partial list of third party packages for Meteor can be found at https://atmosphere.meteor.com/ (DW: doesn't the atmosphere cause meteors to burn up and vaporize?)

Alternatives to Meteor

See Also

Meteor Google+ which includes links to a bunch more talks.

Comment

1 Comment

New York’s Big Datascape, Part 1: Timehop, Parse.ly, Bitly, 10Gen, 2tor

[Image via AllThingsD]

I started writing about innovators in Boston’s big data scene in the earliest days of Riparian. Researching what other companies were building, analyzing, and selling provided me with a narrative to what might otherwise still be a murky set of concepts.  It also introduced me to some fascinating ideas—Bluefin Labs’ TV Genome and Recorded Future’s event forecasting come to mind. And so, nearly two months in to my New York sojourn, I’m expanding this series in the hopes of making the acquaintance of these companies’ NYC equivalents.

Some people like to say that New York and Boston are rivals. When it comes to sports, I think this is valid; when it comes to technology, I think it’s silly. By and large, the technology each city produces serves different sectors—life sciences, healthcare, and higher ed in Boston, fashion, media, finance, and consumer web in New York. Of course, there are exceptions (there are always exceptions)—but exceptions are testaments to heterogeneity, not (usually) harbingers of power shifts. Four of the following companies serve one or more of the city’s main sectors; the fifth serves higher ed, a sector that, especially these days, needs to be better served everywhere.

Timehop

Parse.ly

Bitly

10Gen

  • Product: 10Gen makes MongoDB, which is a distributed database that stores data in JSON/BSON documents (think MySql with a document-based data model).
  • Founders: Dwight Merriman, CEO (@dmerr), Eliot Horowitz, CTO (@eliothorowitz)
  • Technology Used: MapReduce, Aggregation Framework, atomic operations
  • Target industries: Consumer web, Digital Media, Mobile
  • Location: Soho (Also, Palo Alto, CA)
  • Funders: Flybridge Capital Partners, Sequoia Capital, Union Square Ventures 

2tor

 

1 Comment