A Data Science Central Community
Guest blog post by Nicholas Hartman
Excitement around Big Data has significantly increased the demand for data scientists. This hot position has been hailed as the "Sexiest Job of the 21st Century" by the Harvard Business Review.
When we started in the advanced analytics space we didn't use the term 'data scientist.' We were more focused on solving problems than defining a job description. Our initial teams have evolved and adapted to challenges through hands-on experience to find the appropriate methodology for success. However, as the demand for resources grows, ensuring that we recruit individuals that will quickly add value to our teams is a constant challenge.
Quality data scientists may be elusive, but they do exist. From our experience in the trenches, we've found seven core characteristics of a successful data scientist. This is a developing view, but offers some insight into the position and the types of skills we are seeking for our team.
It's rare that any one individual is an expert in all of these areas. However, a successful data scientist understands enough to intelligently collaborate with experts that can complement knowledge and skill gaps. Below, we describe these seven core competencies and the seven key collaborators of a successful data scientist.
A good data scientist:
...and knows an: Experienced project manager
Many 'Big Data' projects fail because they do not set clearly defined objectives. With all the hype surrounding data analytics, there's often the false assumption that simply having a lot of data will magically produce valuable results. Frequently there is also the unrealistic expectation that a vendor's black box will easily spit out valuable answers. Such flashy products can make great tools, but without a good project blueprint these tools are just tools.
Just like a science experiment, a Big Data project requires clearly defining the problem to be addressed, and developing a targeted plan for quickly translating raw data into a solution. A data scientist also understands that the results from a complex experiment are frequently influenced by what data is studied and how it's measured – being careful to not accidentally pre-determine an outcome simply because of the way an experiment was designed. Without effective project management, the stakeholders (who are also likely holding the checkbook) will quickly become disillusioned at a lack of tangible results.
...and knows a: DBA familiar with the data sources
‘Big Data’ is largely a misnomer. Many companies have been managing huge volumes of data for years and storage technologies largely keep up with storage demand. The real challenge of Big Data is that most of it is a complete mess. Traditional quantitative analysts (aka quants) that really came into force during the 90s and early 2000s are trained mostly to work with highly structured and very clean data (aka ‘dream data’). Dumping messy data on the desks of traditional analysts has proven problematic. Most of the skill and effort in Big Data comes from parsing, cleaning, de-normalizing, re-normalizing, linking, indexing, interpreting and otherwise preparing all this messy data for analysis. A data scientist thrives in tackling this work and pulling together jumbled disorganized data in order to solve a puzzle.
Most of the data used in a project is already stored somewhere else—be that an e-mail server, transaction database or event logs. A data scientist will need to partner with the owners of those systems and leverage an experienced DBA and/or infrastructure expert that can coordinate access to all this information and integrate it into the project’s compute environment—either through direct feeds or a separate consolidated data store.
Click here to continue reading the remaining 5 characteristics...
Agree? Disagree? Have a different experience?
Let us know! Post a comment or write to us directly at [email protected]
About the author: Nicholas Hartman is a Director at CKM Advisors specializing in leveraging digital data for performance improvement.