Big data has become a buzzword and a "vague term", with entrepreneurs, consultants, scientists, and the media. Big data is a pretty straightforward adjective applied to large data sets that are so large and complex that traditional data-processing software applications are inadequate to deal with them.
Big data concepts initially referred to the obvious; volume, variety, and velocity. Here I choose to focus on a fourth that without question has been overlooked for so long and for so many reasons and on so many levels so it should be brought to the forefront and that is veracity.
Without veracity poor decisions are made, misleading expectations are unfulfilled, redundancy is rampant, bankruptcy is imminent, and 500 errors are unfixable.
Big data challenges include capturing data, storage, data analysis, search, sharing, transfer, visualization, querying, updating, information privacy and data source (integrity).
Veracity is cutting through the fat and getting to the muscle by culling down unnecessary data.
Information generation algorithms must detect and address invisible issues such as machine degradation, component wear, much like placing a tarp on a building structure to prevent rust.
Quantity vs. Quality
The growing maturity of the veracity concept more starkly delineates the difference between "big data" and "Business Intelligence”. The quality of captured data can vary greatly and if it is inaccurate it affects its ability to be analyzed. The size of the data determines the value and potential insight, and whether it can be considered big data or not.
Success is what we measure
Business Intelligence uses descriptive statistics with data with high information density to measure and detect trends aka analytics. Data is obviously meaningless if it is interpreted wrongly.
The size of the data determines the value and potential insight, and whether it can be considered big data or not. Nevertheless, Inflated data may work for Facebook (we all know what a catfish profile is), clueless Angel groups or VC firms that are unaware of pay for likes and social follows, but for any responsible organization that follows an actual P/L and keeps track of it... responsibly, should forecast that it would be a potential disaster.
Peeva pulls and pairs very large data sets of pet medical records and microchip ID ’s via referent tracking while providing real-time updates, alerts, and reminders to pet owners, and unfettered access to vital medical information to thousands of veterinary professionals.
In our current beta, Peeva had well over 600 million medical histories for 5,333,775 companion animals. That is a lot of data, but a lot of it was shit data.
After culling the data down, however, 984,150 of these pets had been marked as deceased, 32,250 were deleted, and 3,061,000 were marked inactive (not seen in a while / reported missing). Needless to say, a significant amount of the medical histories were not needed and many more were redundant.
"Big data" and veracity refers to the use of predictive analytics, user behavior analytics, or certain other advanced data analysis methods that extract value from data, and seldom to a particular size of data set. Inflated data is meaningless.
Big data very often means `dirty data' and the fraction of data inaccuracies increases with data volume growth." Human inspection at the big data scale is impossible and there is a desperate need in health service for intelligent tools for accuracy and believability control and handling of lost information that falls through the cracks. While the extensive information in healthcare is now electronic, it fits under the big data umbrella as most are unstructured and difficult to use. Peeva in that regard is kind of like a keto diet. We cut through all the fat and we are left with lean data that we can act on accordingly.