A Data Science Central Community
Web scraping (also termed web data extraction, screen scraping, or web harvesting) is a web technique of extracting data from the web, and turning unstructured data on the web into structured data that can stored to your local computer or a database.
The web scraping technique is implemented by web scraping software tools. These tools interacts with websites in the same way as you do when using a…Continue
Added by Paul Black on September 22, 2016 at 11:00pm — No Comments
The modern world seems really fast and dynamic with a multitude of new products being launched. Marketing agencies are making fortune by monitoring the markets and delivering reports on consumers’ opinions. For today, the feedback analysis is a separate area, let’s say a growing industry with an array of products and services. And the prices for those services are pretty exorbitant.
So, do vendors have a chance to cut down…Continue
Added by Yana Yelina on August 12, 2016 at 12:00am — No Comments
Big Data is an accumulation of data that is too large and complex for processing by traditional database management tools.
Yeah But, What Really Makes Big Data Big Data? This question is as fundamental to data science as the chicken/egg question should be to researchers at KFC. But we’re not dealing with an A/B chicken model here. It’s more elephant to the dark room or scaling it up, the nearest star to our galactic…Continue
Added by Orion Stallard on July 8, 2016 at 12:54pm — No Comments
I want to share an interesting article about data scaping that you might need in your business. The article below is mainly reprinted from here.
Text in the HTML document is the content that placed between HTML tags like <a> </a> , <title> </title>. Sometimes we want to extract the text in the HTML document and there are two methods that can…Continue
Added by Nora Choi on May 31, 2016 at 2:30am — No Comments
Yarn Resource manager (The Yarn service Master component)
1) Controls of the total resource capacity of the cluster
2) Whatever the container is needed in the cluster it sets the minimum container size that is controlled by yarn configuration property
àyarn.scheduler.minimum-allocation-mb 1024(This value changes based on cluster ram capacity)
Description: The minimum allocation for every container request at the RM, in MBs.…Continue
Added by skumar T on May 30, 2016 at 8:00pm — No Comments
What does The Library of Alexandria, The Normans and a book have to do with data? I never thought about
...at Alexandria was in charge of collecting all the world's knowledge, and most of the staff was occupied with the task of translating works onto papyrus paper... 1
Or The Normans and the...
Domesday Book (Latin: Liber de Wintonia "Book of…
Added by George Psistakis on May 20, 2016 at 5:20am — No Comments
As a central repository and processing engine, data lakes hold great promise for raising return on data assets (RDA). Bringing analytics directly to different data in its native formats can accelerate time-to-value by providing data scientists and business users with increased flexibility and…Continue
Added by Gabriel Lowy on April 11, 2016 at 12:00pm — No Comments
As we evolve toward a software-defined world, there’s a new user experience urgency emerging. That’s because the definition of “user” is going to be vastly expanded. In the Internet of Things (IoT) era, users include machines.…Continue
Added by Gabriel Lowy on March 30, 2016 at 9:43am — No Comments
Is your company poised to take advantage of three key trends in Big Data? Syncsort, a global leader in Big Data and mainframe software, recently released the results of its second annual Hadoop survey. Based on the survey results there are three areas that companies will focus on in 2016, to realize the full potential of Big Data analytics.
First, Apache Spark will move from a talking point into deployment. Nearly 70 percent of survey respondents are interested in Apache…Continue
Added by John McCure on January 22, 2016 at 4:00pm — No Comments
Curse of Dimensionality:
One of the most commonly faced problems while dealing with data analytics problem such as recommendation engines, text analytics is high-dimensional and sparse data. At many times, we face a situation where we have a large set of features and fewer data points, or we have data with very high feature vectors. In such scenarios, fitting a model to the dataset, results in lower predictive power of the model. This scenario is often termed as…Continue
Added by suresh kumar gorakala on February 28, 2016 at 9:30pm — No Comments
With increasing frequency, CIOs are being asked by their senior management, “What’s our big data strategy?” But do you really need a big data strategy?
In our view, companies should instead focus on data…Continue
Added by Gabriel Lowy on January 26, 2016 at 11:48am — No Comments
Gets Tweets from Twitter:
Added by suresh kumar gorakala on January 11, 2016 at 6:00am — No Comments
Virtually Print Receipts for Easier Tax Audits
While many individuals will make personal resolutions this December, the New Year is also the perfect time for business leaders to consider what steps they…Continue
Added by Sai Gundavelli on December 7, 2015 at 1:52pm — No Comments
Yes, we are marching towards New Year 2016! What happened to Resolution of 2014, 2015? Quit Habits? Practice Habits? Road ahead? Am into all, but i could not able to keep it up. Hence this New Year 2016 is no more resolutions, just implement the plan.
Extend to that, as we know big data is bringing more business value to enterprise by leveraging the data lake. Data Lake..... What is that? Data Lake is loosely defined word and the definition gets changed during implementation…Continue
Added by Kumar Chinnakali on December 2, 2015 at 6:00am — No Comments
Guest blog post by Bernard Marr
It’s all well and good to talk about customer experience and managing inventory flow, but what has big data done for me lately?
I’ve rounded up seven of the most interesting — and unique — applications for big data I’ve seen recently and how they may be impacting your life.…Continue
To build career as a Hadoop developer, one must be clear with Hadoop concepts and have a working knowledge of analysing data using MapReduce, Hive and Pig. Typical Hadoop interview questions include topics such as replication factor, node failures and distributed caching.
Here are some basic hadoop interview questions
Added by John on October 22, 2015 at 9:30am — No Comments
Guest blog post by Bernard Marr
From time to time, you still come across someone with the opinion that Big Data is nothing more than a fad, which will be forgotten about soon enough.
You might not expect to hear this from me, but they’re actually right. Well – half right, at least!
As I’ve written before, I’m not actually a fan of the term “Big Data”, which puts overemphasis on the…Continue
Added by Andrei Macsin on October 20, 2015 at 3:59pm — No Comments
Originally posted here.
Added by suresh kumar gorakala on October 13, 2015 at 6:30am — No Comments
It’s the age of digital and we are all wired in. Smart phones, tablets, hundreds of television channels, thousands of apps, social media, and online shopping are part of our…
Added by Michael Meyers on October 2, 2015 at 8:30am — No Comments
Benjamin Graham’s The Intelligent Investor is considered by many, including his disciple Warren Buffet, as the finest book ever written on the subject of investing. At the heart of Graham’s thesis is the question, “Can investors uncover intrinsic value in specific securities if…Continue
Added by Gabriel Lowy on October 1, 2015 at 7:33am — No Comments