Getting the most out of a big data project is an art as well as a science. That is why this month Qubole will be running a series of blog posts featuring big data tips from data scientists, industry experts, experienced big data users, and, of course, Qubole’s own experts. The first round of tips is below.
The full series of big data tips are presented in a slideshow below.
1. Grant Unlimited Access
Create a data lake and give your business and data analysts access to all your data – structured and unstructured – with SQL engines like Hive. They will surprise you with the insight and value they can extract, and your development team will have less work answering ad-hoc queries.- Christian Prokopp, Principal Consultant at Big Data Partnership
2. Select the Right Tools
Very often the query is when to use MapReduce/Pig/Hive vs. HBase/Cassandra/Impala frameworks. NFR (Non-Functional Requirements) have to be considered while deciding the framework. MapReduce/Pig/Hive is used for high throughput/high latency requirements as in the case of Batch processing/ETL. HBase/Cassandra/Impala is used for low throughput/low latency requirements as in the case of a customer filling out an online application. -Praveen Sripati, Hadoop trainer and author of Hadoop Tips
3. Use Presto
Improve query performance by considering Presto with RCFile or ORC File format–Minesh Patel, Qubole
4. Incorporate Machine Learning
Use Robust Machine Learning Algorithms to extract the data – Data collection and massive storage is only the enabling infrastructure. You should leverage existing and also propriety machine learning algorithms, that will discover hidden patterns, and will learn from the data what is important for the analyst to view and examine, and what is not.- Idan Tendler, CEO of Fortscale
5. Automation is Key
There is a big need for automation in Big Data. Security is an important industry that has proven the value of Big Data. But, that has just as quickly proved that Big Data is also valueless without automation wrapped around it to make it practical. Only once you make Big Data practical can you begin to perform analytics, etc., which is where the value of Big Data in the security industry really gets unlocked. – Sean Brady, VP of Product Management at Vorstack
6. Identify Easy Wins
Segment the data based on demographic and/or firmographic information. This is an easy and inexpensive way to highlight trends in the primary customers and industries served. This information is very helpful when determining what new products and/or services should be offered. In addition, look for trends in behavioral transaction information and further optimize the customer’s experience with relevant marketing and messaging.- David Handmaker, CEO of Next Day Flyers
7. Think Broad
Identify all of the data you have access to and/or will produce, and explore possible audiences and use cases for it. Oftentimes, big data plays are geared toward a fairly narrow audience and set of use cases based on the original inspiration for the solution. Or, there is not an active and explicit exploration of the full potential of what you have to offer. I can all but assure you that there are major opportunities for your offering that you haven’t even considered yet. The earlier you have a crisp view of the potential of your big data and offers, the better able you will be to build the right thing, in the right way, to exploit the potential of that idea. -Dirk Knemeyer, founder of Involution Studios
For more big data tips, check out these posts on adapting strategy, setup, or adopting a data mindset.