A Data Science Central Community

We are in the process of writing and adding new material (compact eBooks) exclusively available to our members, and written in simple English, by world leading experts in AI, data science, and machine learning. We invite you to sign up here to not miss these free books. …

ContinueAdded by Vincent Granville on February 4, 2020 at 8:00am — No Comments

In a recent article (February 2019) published in Forkes (see here) it was argued that there will be no data science job titles by 2029. The author wrote that *Automation is coming for many tasks data scientists perform, including machine learning*.

I disagree. If you haven't automated most of your tasks yet, you are not…

ContinueAdded by Vincent Granville on February 4, 2019 at 4:30pm — No Comments

Join the largest community of machine learning (ML), deep learning, AI, data science, business analytics, BI, operations research, mathematical and statistical professionals: Sign up here. If instead, you are only interested in receiving our newsletter, you can subscribe here. There is no…

ContinueAdded by Vincent Granville on September 8, 2018 at 10:18am — No Comments

Black hat data science consists of techniques designed to fool existing algorithms (Google search, Amazon rankings, and so on), compromising or tampering with the metrics -- especially ratios -- that they rely on, without actually *physically* touching or altering data stored in their databases. It exploits flaws in these algorithms, and it also relies on reverse engineering, to achieve its goal. So black hat data science is different from traditional hacking,…

Added by Vincent Granville on May 23, 2018 at 9:00am — No Comments

*These predictions for 2018 are from Infologix.*

“Metadata management and ensuring data privacy for regulations such as GDPR joins earlier trends like AI and IoT, but the unexpected trend of 2018 will be the convergence of data management technologies,” said Emily Washington, senior vice president of product management at Infogix. “Big data has been the next big technology phenomenon for a long time, but businesses are increasingly evaluating ways to…

ContinueAdded by Vincent Granville on December 30, 2017 at 10:59am — No Comments

Sign up here to receive (at no cost) our IoT Central weekly digest and full access to our professional network. Alternatively, click here if you are only interested in the newsletter.

The full membership includes, in addition to the newsletter…

ContinueAdded by Vincent Granville on November 29, 2017 at 11:13am — No Comments

This famous statement -- the six degrees of separation -- claims that there is at most 6 degrees of separation between you and anyone else on Earth. Here we feature a simple algorithm that simulates how we are connected, and indeed confirms the claim. We also explain how it applies to web crawlers: Any web page is connected to any other web page by a path of 6 links at most.

The algorithm below is rudimentary and can be used for simulation purposes by any programmer: It does not even…

ContinueAdded by Vincent Granville on October 24, 2017 at 11:30pm — No Comments

Interesting Infographics produced by Villanova University.

Originally posted here.

**DSC Resources**

- Services: Hire a Data Scientist | …

Added by Vincent Granville on August 21, 2017 at 9:34am — No Comments

There are numbers that are so large that there is no compact formula to represent them. Think of a number so large, that its number of digits is so large, that the number of digits of its number of digits is so large... and it goes on and on -- you get the idea.

Sure, if you are able to *define* such a number, then add one, or even 0.5, and you get an even bigger number. But this is not the point. The issue is to come up with such massive numbers in the first place. The biggest…

Added by Vincent Granville on August 16, 2017 at 1:00pm — No Comments

Big Data News is one of Data Science Central channels. Below is a selection of popular articles published a while back:

ContinueAdded by Vincent Granville on June 8, 2017 at 7:00pm — No Comments

*Glimpsing the Far Side—How Healthcare Organizations are Applying Predictive Analytics*

*Guest blog post by Paul Bradley*

Anyone who works in healthcare—or who has pondered, even fleetingly, how it differs from other sectors of the American economy—likely won’t be surprised to hear it’s the final frontier for predictive analytics.

For indeed, while predictive analytics has long since reshaped retail, shipping & logistics, and even…

ContinueAdded by Vincent Granville on August 7, 2015 at 10:30am — No Comments

Nice infographics created by the Technology Services Group. TSG have also produced a blog post to complement the infographic, which you may find useful. It talks around how much technology has shrunk over the years and yet its power has grown.…

ContinueAdded by Vincent Granville on May 5, 2015 at 9:26am — No Comments

*Originally posted here.*

Retailers know they need Big Data and are charging forward to get in the game. But many retailers continue to face challenges. What type of data should be collected? How should the data be used to generate insights? How do I measure ROI?

101data recently surveyed US retailers, across a range of…

ContinueAdded by Vincent Granville on April 23, 2015 at 9:47am — No Comments

Infographics by Adeptia. You’re probably familiar with the terms byte, megabyte, and gigabyte — but do you know what a terabyte is? How about a petabyte, or an exabyte?

**DSC Resources**

- Career: …

Added by Vincent Granville on April 15, 2015 at 12:15pm — No Comments

Added by Vincent Granville on March 19, 2015 at 3:00pm — No Comments

Let's say you have to cluster 10 million points, for instance keywords. You have a dissimilarity function, available as a text file with 100,000,000 entries, each entry consisting of three data points:

*Keyword A, Keyword B, distance between A and B denoted as d(A,B)*

So, in short, you can perform k-NN (k-nearest neighbors) clustering or some other types of clustering, which typically is O(n^2) or worse, from a computational complexity point of view.…

ContinueAdded by Vincent Granville on January 27, 2015 at 10:54am — No Comments

*Guest blog post by Bernard Marr, first published here.*

The field of Big Data requires more clarity and I am a big fan of simple explanations. This is why I have attempted to provide simple explanations for some of the most important technologies and terms you will come across if you’re looking at getting into big…

ContinueAdded by Vincent Granville on December 12, 2014 at 1:42pm — 1 Comment

There has been a few people questioning the value of big data recently, and predicting that big data is going to get smaller in the future. While most of these would-be oracles are traditional statisticians working on small data and worried about their career, or practitioners in small countries (Canada and France in particular) who do not have access to big data, I was surprised to see Mike Jordan - a famous machine learning professor at Berkeley - …

ContinueAdded by Vincent Granville on October 31, 2014 at 11:32pm — 1 Comment

*This article was originally posted on Wikibon. Here I selected a few out of the dozens statistics. Enjoy the reading, and visit the original article: it also features a nice infographic on big data. It would be interesting to add stats about sensor data, or data used in engineering (NASA etc.) For instance, how many data points are used to make weather forecasts? How many synthetic molecules are simulated each…*

Added by Vincent Granville on October 21, 2014 at 3:00pm — No Comments

Defining big data is now a hot topic. Berkeley University posted 40 very short definitions by thought leaders (including me). Here our goal is to offer a very detailed, comprehensive definition that (hopefully) suits everyone.

**First, there are three…**

Added by Vincent Granville on October 8, 2014 at 8:52am — No Comments

- Weekly Digest - August 26
- Unstructured Data: InfoGraphics
- Internet Topology - Massive and Amazing Graphs
- Fast clustering algorithms for massive datasets
- [Book] Mining of Massive Data Sets
- A Comprehensive List of Big Data Statistics
- Another large data set - 250 million data points - available for download

© 2020 BigDataNews.com is a subsidiary of DataScienceCentral LLC and not affiliated with Systap Powered by

Badges | Report an Issue | Privacy Policy | Terms of Service

**Most Popular Content on DSC**

To not miss this type of content in the future, subscribe to our newsletter.

- Book: Statistics -- New Foundations, Toolbox, and Machine Learning Recipes
- Book: Classification and Regression In a Weekend - With Python
- Book: Applied Stochastic Processes
- Long-range Correlations in Time Series: Modeling, Testing, Case Study
- How to Automatically Determine the Number of Clusters in your Data
- New Machine Learning Cheat Sheet | Old one
- Confidence Intervals Without Pain - With Resampling
- Advanced Machine Learning with Basic Excel
- New Perspectives on Statistical Distributions and Deep Learning
- Fascinating New Results in the Theory of Randomness
- Fast Combinatorial Feature Selection

**Other popular resources**

- Comprehensive Repository of Data Science and ML Resources
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- 100 Data Science Interview Questions and Answers
- Cheat Sheets | Curated Articles | Search | Jobs | Courses
- Post a Blog | Forum Questions | Books | Salaries | News

**Archives:** 2008-2014 |
2015-2016 |
2017-2019 |
Book 1 |
Book 2 |
More

**Most popular articles**

- Free Book and Resources for DSC Members
- New Perspectives on Statistical Distributions and Deep Learning
- Time series, Growth Modeling and Data Science Wizardy
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- Comprehensive Repository of Data Science and ML Resources
- Advanced Machine Learning with Basic Excel
- Difference between ML, Data Science, AI, Deep Learning, and Statistics
- Selected Business Analytics, Data Science and ML articles
- How to Automatically Determine the Number of Clusters in your Data
- Fascinating New Results in the Theory of Randomness
- Hire a Data Scientist | Search DSC | Find a Job
- Post a Blog | Forum Questions