Moving beyond storing large volumes of data to collecting business insights from that data requires more than having the right technology in place. A good business structure, a good team, and insight into best practices are all critical to big data success. Check out the following tips from experienced big data users to learn some good best practices when working with big data.
1. Data Science Mindset
“Have an always-on data science mindset — Successful big data initiatives start with a holistic 360 view of the problem space. This includes understanding the inputs (data types, sources, features), the desired outputs (decisions, goals, predictions), and the constraints (model parameters, boundary conditions, optimization constraints). To achieve this perspective, one must be thinking like a scientist from start to finish: (Tweet This) collect data, infer a testable hypothesis, design an experiment, test and evaluate the results, refine your hypothesis, and repeat (if necessary).” – Kirk Borne, Data Scientist, Astrophysicist, and Big Data Science Consultant
Follow Kirk on Twitter @KirkDBorne
2. Return on Innovation
“The most important ROI in Big Data Analytics projects is Return On Innovation. (Tweet This) What are you doing that’s different and consequential? What sets you apart from the rest of the multitudes in this space?” – Kirk Borne, Data scientist, Astrophysicist, and Big Data Science Consultant
3. Focus on the Users
“Developing a big data platform requires focusing on the users. Serve a few users well, and let their processing scale up with your capabilities. “Premature Platformization” or trying to satisfy too many use cases too early in the project leads to failures. (Tweet This) Make the initial users successful, and the ecosystem will thrive and grow.” – Owen O’Malley – Sr. Architect and Co-founder of Hortonworks
Follow Owen on Twitter @owen_omalley
4. Use the API
“Using the API: samples for Java SDK, Python SDK, and REST” (Tweet This) – Minesh Patel, Qubole
Follow Minesh on Twitter @m1nesh
5. Take Real-Time Action
“If you cannot take real-time action, you have no need for real-time processing. (Tweet This) There will always be batch processing workloads supporting the enterprise, and increasingly dynamic decision areas can be effectively supported by analytical systems because of advances in data architectures.” – Sanjay Mathur, CEO, Silicon Valley Data Science
Follow Sanjay on Twitter @sanjaymathur
6. Store Denormalized State
“State—the full context of an event, like a customer visit or the completion of a step in a manufacturing process—can be expensive to reassemble after the fact. This is particularly true with highly relational systems: witness the complex ETL (extract, transform, load) workloads that enterprise data warehouse systems struggle to scale. Storing denormalized state, e.g. rich logs, for analysis has proven highly successful for the web businesses of silicon valley, (Tweet This) and those techniques can be applied to industries across the economy.” – John Akred, CTO, Silicon Valley Data Science
Follow John on Twitter @BigDataAnalysis
7. Build a Common Platform
“Whether you are thinking about migrating towards Big Data or whether you are just starting out with data altogether, it helps to focus on building and maintaining a common platform. (Tweet This) Similar to software development platforms, data platforms should also include source control, change management, and testing scenarios. This will help reduce future migration costs and will lead to long-term sustainable, competitive data capabilities.” – Ryan Kirk, SR. Data Scientist at Hipcricket
Follow Hipcricket on Twitter @Hipcricket
For more big data tips, check out these posts on adapting strategy, setup, or these expert tips.