This week, I took some time to evaluate Karmasphere  Analyst. Particularly, I was interested in how it worked with Hadoop (as opposed to MapR, which it also supports).

Setting up

The setup for Karmasphere is rather painless: a simple installer on windows and a shell script on Linux. However, the windows version does require cygwin. Once open, Karmasphere divides itself into three major steps.


This is where you set up connections to existing HDFS databases. Karmasphere only supports Hive, but it's pretty nice about it... kind of. It will go through the process of installing Hive for you through a rather nice GUI, which allows you to easily specify a Derby database, MySQL database, or whatever other database you have a Java connector for. The downside to this is you can't easily use an already-existing Hive installation. This was a major shortcoming for me, but I get the impression that it should be possible to import an existing Hive database. I'll let you know as soon as the Karmasphere rep gets back to me.


Once I decided to install a new Hive metastore (which was rather painless), importing new tables from sequence files was simple for all the steps that involved Karmasphere (making the sequence file was annoying though). I don't have a problem with how Karmasphere does this. My only real problem is that it seems to hide away the shell that interacts with the Hive cluster Karmasphere uses, which seems like it might be limiting. I could be wrong, but I don't see how you could ever import anything without working through Karmasphere.


Supposedly, this is where the magic happens. The interface here was much simpler compared to other analytic tools. But that may be because there is not fancy drag-and-drop interface, or amazing visual features. It turns out Karmasphere is a glorified query writer. But in its defense, it's very glorified. I've written queries against Hive before, but I've never managed to write them as quickly or as painlessly as Karmasphere allows me to. The bells and whistles it brings to the table include:

  • immediate and clear feedback regarding any errors or warnings in your queries
  • one-click execution of any written queries
  • caching of past queries and results
  • effective sampling of data to test queries on smaller subsets
  • Table, column, and function library indexes
  • A "Query Plan" which shows you just how exactly your query will translate into Hadoop map-reduces

Once you have your data, it's pretty simple to export that data into various useful mediums such as Excel files, SQL tables, or perhaps back into Hive. Also, there is some charting functionality that was relatively simple to use, although I didn't look too much into it since it wasn't of interest to me.


All this makes the tool worthwhile, but I'm not sure it's worth the price (we were unable to obtain pricing information at time of publication, but will update if they get back to us). Since ultimately, you are just making queries, it doesn't add any additional analytic functionality that we couldn't do before. Technically, once you make your query, you don't even need Karmasphere anymore. Although once you have your data, it does let you do several things with that data that would otherwise be difficult to do (export, graphing, etc...).

If you're looking to analyze your unstructured data, I would say Karmasphere is ill-suited for the task, as unstructured data tends to take more than just the SQL-like queries Hive offers. All in all, this product is useful. But once my trial runs up, I will discontinue use.