Friday, September 23, 2016

Query disjoint data bases in parallel and combine, compose, and return the output using Drill

Instead of using MongoDB as a single or a clustered data store, we may partition the data in independent MongoDB instances that are hosted remotely. Then we may use the UNION operator of Drill to join the results accordingly. 

Why do we need to do this? 
i) Because we may already have the data partitioned in different sources.
ii) Due to the domain knowledge, we may do a better job in partitioning the data.
iii) Even in a dumb partitioning, Drill scales and performs well.
iv) There are some interesting research questions, leveraging locality of data to provide better and faster outputs than a clustered or distributed Mongo deployment.

Be warned that Drill has its limitations in data structures that may hurt the performance - for example, nested complex schema such as multi-dimensional arrays. We previously have discussed a work-around for this.
In this post, we will see the simplest example of achieving this.

1. Define the Mongo Storage Plugins
For each of the Mongo Server, define the storage plugin separately in Drill.

Multiple definition of Mongo Storage Plugin, pointing to various Mongo deployments

For example, above mongo3 is defined as below in http://localhost:8047/storage/mongo3

  "type": "mongo",
  "connection": "mongodb://",
  "enabled": true

2. Now query through the query browser:
Querying from the multiple Mongo Deployments and UNION them to the results.

select last_name as id from mongo.employee.empinfo
union all
select first_name as id from mongo2.employee.empinfo
union all
select first_name as id from mongo3.employee.empinfo

Now you may execute this, and get the results. Depending on the nature of the query and partitioning and scale of the data, you may be able to experience performance benefits due to the data partitioning. How do we actually partition the data in each of the MongoDB deployment, with related items co-located in a single partition is a research question, and probably deserves another post.

No comments:

Post a Comment

You are welcome to provide your opinions in the comments. Spam comments and comments with random links will be deleted.