Subscribe to our Newsletter

Guest blog post by Michael Walker

 

Data Veracity, uncertain or imprecise data, is often overlooked yet may be as important as the 3 V's of Big Data: Volume, Velocity and Variety.

 

Traditional data warehouse / business intelligence (DW/BI) architecture assumes certain and precise data pursuant to unreasonably large amounts of human capital spent on data preparation, ETL/ELT and master data management.
 
Yet the big data revolution forces us to rethink the traditional DW/BI architecture to accept massive amounts of both structured and unstructured data at great velocity. By definition, unstructured data contains a significant amount of uncertain and imprecise data. For example, social media data is inherently uncertain.

Considering variety and velocity of big data, an organization can no longer commit time and resources on traditional ETL/ELT and data preparation to clean up the data to make it certain and precise for analysis. While there are tools to help automate data preparation and cleansing, they are still in the pre-industrial age.
 
As a result, organizations must now analyze both structured and unstructured data that is uncertain and imprecise. The level of uncertainty and imprecision varies on a case by case basis yet must be factored.
 
It may be prudent to assign a Data Veracity score and ranking for specific data sets to avoid making decisions based on analysis of uncertain and imprecise data.
 

 

Views: 8251

Comment by Doug Laney on July 9, 2015 at 6:24am

Great to see others catching on to Gartner's 3Vs of Big Data that we first posited a decade and a half ago. For the professional courtesy of a citation, here's a link to the original piece I published in 2001: http://goo.gl/wH3qG. Also note that "veracity" is not a defining characteristic of Big Data. It is not a measure of magnitude, rather it is a qualitative characteristic of all data. Furthermore, while some like to cleverly (?) lop on additional "Vs" to the Big Data definition, "veracity" is entirely misplaced because the relationship is inverse. That is, Big Data tends to have greater veracity than other data, by virtue of it being automatically generated more often than other kinds of data. Therefore, veracity is *less* of a concern with Big Data. Note also that Gartner has identified over a dozen key characteristics of all/any data, "veracity" being just one of them. --Doug Laney, VP Research, Gartner, @doug_laney

Comment

You need to be a member of BigDataNews to add comments!

Join BigDataNews

On Data Science Central

© 2019   BigDataNews.com is a subsidiary of DataScienceCentral LLC and not affiliated with Systap   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service