A Data Science Central Community
This famous statement -- the six degrees of separation -- claims that there is at most 6 degrees of separation between you and anyone else on Earth. Here we feature a simple algorithm that simulates how we are connected, and indeed confirms the claim. We also explain how it applies to web crawlers: Any web page is connected to any other web page by a path of 6 links at most.
The algorithm below is rudimentary and can be used for simulation purposes by any programmer: It does not even…Continue
Added by Vincent Granville on October 24, 2017 at 11:30pm — No Comments
Indexing is commonly used among programmers. Without fully grasping the idea behind the technique, a programmer is always eager to take advantage of it whenever they encounter a query performance problem, only to get disappointed by the result on many occasions. By analyzing the principle of indexing, the article tries to show programmers when is the appropriate time to use an index and how to use it.
The purpose of indexing is to quickly find…Continue
Added by JIANG Buxing on August 29, 2017 at 12:30am — No Comments
By JIANG Buxing
In the previous article, we discussed the necessity of the existence of a computing layer in the reporting architecture. Reporting tools support the user-defined interface-based programming with its host language (i.e. the programming language used for developing a reporting tool) to achieve the functionality of a computing layer for implementing complex computational logics, but the strategy reveals some real-life problems. An explicit data computing layer…Continue
Added by JIANG Buxing on August 24, 2017 at 10:30pm — No Comments
If a person wishes to relax himself, travelling is probably the best pick for most people. Choosing the right place to stay for your vocation is one of the most important parts in a travel, but how to do so may be a problem. Reading through reviews of a certain hotel may be a good choice, referring to visitors’ experience, you get to know some more specific details about the hotel, however, this method is not comprehensive enough, and reading a bunch of reviews would irritate you. Here is a…Continue
Added by Zhouyiming on August 28, 2017 at 12:00am — No Comments
There are numbers that are so large that there is no compact formula to represent them. Think of a number so large, that its number of digits is so large, that the number of digits of its number of digits is so large... and it goes on and on -- you get the idea.
Sure, if you are able to define such a number, then add one, or even 0.5, and you get an even bigger number. But this is not the point. The issue is to come up with such massive numbers in the first place. The biggest…Continue
Added by Vincent Granville on August 16, 2017 at 1:00pm — No Comments
Interesting Infographics produced by Villanova University.
Originally posted here.
Added by Vincent Granville on August 21, 2017 at 9:34am — No Comments
There is an estimated 50 Petabytes of data in the health care realm, predicted to grow to 25,000 Petabytes by 2020, reported by a new info-graphic from Oracle. From this astonishing data report, we can see that the healthcare industry is generating a huge amount of data, driven by clinical records, medical care and compliance & regulatory requirements.
Luckily, big data analytic application has been widely used in…Continue
Added by Paul Black on August 10, 2017 at 7:00pm — No Comments
Big data and analytics can help a business predict consumer behavior, improve decision-making across the board and determine the ROI of its marketing efforts. By addressing these aspects adequately, the business would not only be able to protect its market share, but also expand into new territories. The below infographic by Villanova University School of Business Online takes a detailed look at this…Continue
Probability and physics are helping make even roulette seem ultimately predictable.
In his new book, The Perfect Bet: How Science and Math Are Taking the Luck Out of Gambling, Adam Kucharski details how trying to understand dice games led one mathematician to develop probability theory,…Continue
Added by Edward Turner on July 19, 2016 at 4:30pm — No Comments
Big Data News is one of Data Science Central channels. Below is a selection of popular articles published a while back:Continue
Added by Vincent Granville on June 8, 2017 at 7:00pm — No Comments
To start with Sentiment Analysis, what comes first to our mind is where and how we can crawl oceans of data for our analysis. Normally, web crawler or crawling from web social media should be one reasonable way to get access to the public opinion data resource. Thus, in this writing, I want to share with you about how I crawled the website using web crawler and proceeded to deal with those data for…Continue
Added by Paul Black on February 28, 2017 at 10:00pm — No Comments
Advanced analytics continues to permeate more functional areas of the enterprise. From marketing campaigns and sales optimization to supply chain and human capital management, business users are deploying newer, easier to use technologies to gain deeper insights, improve decision outcomes at the…Continue
Added by Gabriel Lowy on April 11, 2017 at 8:00am — No Comments
A recent LinkedIn post linking to an Innovation Enterprise article entitled 'Hadoop Is Failing' certainly got our attention, as you might expect.
Apart from disagreeing with the assertion that 'Hadoop...is very much the foundation on which data today is built' the main thrust of the article…Continue
Added by Richard Jackson on April 14, 2017 at 12:30am — No Comments
Many social media, like Twitter, Facebook and etc, are evolving to become a source of information for people to scrape varied kinds of data, since microblogs on which users post real time messages shows millions of opinions about their attitudes or sentiment towards hot topics and current issues. Recently, I decided to learn how Regional sentiment analysis can help people to make specific decisions or policy…Continue
Added by Paul Black on March 20, 2017 at 2:30am — No Comments
(picture from www.re-work.co)
Most people keep close eyes on the top of the fast-moving technology trends. There’s no doubt that deep learning is most trending buzzwords today. Deep learning has made a significant breakthrough and is applied in many areas like facial recognition, recognizing images and AlphaGo Games. Thus…Continue
Added by Paul Black on December 14, 2016 at 11:30pm — No Comments
As a part of Twitter Data Analysis, So far I have completed Movie review using R& Document Classification using R. Today we will be dealing with discovering topics in Tweets, i.e. to mine the tweets data to discover underlying topics– approach known as Topic Modeling.
Added by suresh kumar gorakala on December 23, 2015 at 8:30pm — No Comments
Here we discuss two potential algorithms that can perform clustering extremely fast, on big data sets, as well as the graphical representation of such complex clustering structures. By extremely fast, we mean a computational complexity of order O(n) and even faster such as O(n/log n). This is much faster than good Hierarchical Agglomerative Clustering…Continue
Added by Vincent Granville on November 18, 2013 at 10:30am — No Comments
This is the full resolution GDELT event dataset running January 1, 1979 through March 31, 2013 and containing all data fields for each event record. The years 1979 through 2005, inclusive, are available as yearly downloads containing all records for each year, while starting in January 2006 data is available as monthly downloads due to the larger number of records per month over time.…Continue
Yes we know that you will be having a lots of queries such as Collection of Big Data, How organizations gather Big Data, how to gather information for quantitative research so don't stress, in the event that you are here to hunt down these questions here then you are on the right page as here we are going to give you a complete article on Collection of Big Data strategies quickly. …Continue
Added by Ayushi Mishra on November 4, 2016 at 3:00am — No Comments