Guest blog post by Vincent Granville
Based on requests from clients - vendors of data processing platforms and products - as well as trends in popular blogs, job postings, and my own reading. Here are a few topics recently gaining strong traction (items beyond #13 were recently added)::
- The rise of data plumbing, to make big data run smoothly, safely, reliably, and fast through all "data pipes" (Internet, Intranet, in-memory, local servers, cloud, Hadoop clusters etc.), optimizing redundancy, load balance, data caching, data storage, data compression, signal extraction, data summarization and more. We bought the domain name DataPlumbing.com last week.
- The rise of the data plumber, system architect, and system analyst (a new breed of engineers and data scientists), a direct result of the rise of data plumbing
- Use of data science in unusual fields such as astrophysics, and the other way around (data science integrating techniques from these fields)
- The death of the fake data scientist
- The rise of the right-sized data (as oppose to big data). Other keywords related to this trend is "light analytics", big data diet", "data outsourcing", the re-birth of "small data". Not that big data is going away, it is indeed getting bigger every second, but many businesses are trying to leverage an increasingly smaller portion of it, rather than being lost in a (costly) ocean of unexploited data.
- Putting more intelligence (sometimes called AI or deep learning) into rudimentary big data applications (currently lacking any true statistical science) such as recommendation engines, crowdsourcing or collaborative filtering. Purpose: detecting and eliminating spam, fake profiles, fake traffic, propaganda, attacks, scams, bad recommendations and other abuses, as early as possible.
- Increased awareness of data security and protection, against computer or business hackers.
- The rise of mobile data exploitation. For instance processing billions of text messages to detect the spread of a disease or other global risks, to help design alarm systems or market the right product in real-time (via opt-in, user-customized text messages) to a walking customer in a shopping mall. Not sure that even the NSA is capable of doing it as of today. The issue is more about capturing and reacting to the right signal, rather than absorbing/digesting big data. Another trend is optimization of revenue from mobile apps, leveraging mobile app dashboards.
- The rise of the "automated statistician", in short, automated, scalable, robust analytic solutions fit for batch processing, real-time, machine-to-machine communications, and black-box analytics used by non-experts. More on this in our upcoming book, entitled data science 2.0.
- Predictive modeling without models. Operations research and mathematicians contributing to the science of predicting, bringing mathematical optimization and simulation as an alternative to delicate and mysterious statistical models.
- High performance computing (HPC) which could revolutionize the way algorithms are designed.
- Increased collaboration between government agencies worldwide to standardize data and share it, for intelligence purposes. Imagine the census bureau sharing data with the IRS. Or banks in US sharing data with security agencies in Switzerland.
- Forecasting space weather (best time / best location lo land on Mars), and natural events on Earth (volcanoes, Earthquakes, undersea weather patterns and implications to humans, when will Earth's magnetic field flip).
- Use of data science for automated content generation (including content aggregation and classification); for automated correction of student essays; data science used in court to strengthen the level of evidence - or lack of - against a defendant; for plagiarism detection; for car traffic optimization and to compute optimum routes; for identifying, selecting and keeping ideal employees; for automated IRS audits sent to taxpayers to avoid costly litigation and time wasting; for urban planning; for precision agriculture
- Measuring yield of big data or data science initiatives (that is, benefit after software and HR costs, over baseline)
- Digital health: diagnostic/treatment offered by a robot (artificial intelligence, decision trees) and/or remote doctors; digital law: same thing, with attorneys replaced by robots, at least for mundane cases or tasks. Even lawyers and doctors could have their jobs replaced by robots! This assumes that a lot of medical or legal data gets centralized, processed and made well structured for easy querying, updating and retrieval by (automated) deep learning systems.
- Analytic processes (even in batch mode) accessible from your browser anywhere on any device. Growth of analytics apps and APIs.
Follow us on Twitter: @DataScienceCtrl | @AnalyticBridge