Subscribe to our Newsletter

Featured Blog Posts (212)

Top 30 Free Web Scraping Software

Web scraping (also termed web data extraction, screen scraping, or web harvesting) is a web technique of extracting data from the web, and turning unstructured data on the web  into structured data that can stored to your local computer or a database.

The web scraping technique is implemented by web scraping software tools. These tools interacts with websites in the same way as you do when using a…

Continue

Added by Paul Black on September 22, 2016 at 11:00pm — No Comments

Smart Business: automated sentiments analysis on top

The modern world seems really fast and dynamic with a multitude of new products being launched. Marketing agencies are making fortune by monitoring the markets and delivering reports on consumers’ opinions. For today, the feedback analysis is a separate area, let’s say a growing industry with an array of products and services. And the prices for those services are pretty exorbitant.

So, do vendors have a chance to cut down…

Continue

Added by Yana Yelina on August 12, 2016 at 12:00am — No Comments

Data Wars: Dawn of the Yottabyte

Big Data is an accumulation of data that is too large and complex for processing by traditional database management tools.

-Merriam Webster

 

Yeah But, What Really Makes Big Data Big Data?  This question is as fundamental to data science as the chicken/egg question should be to researchers at KFC. But we’re not dealing with an A/B chicken model here.  It’s more elephant to the dark room or scaling it up, the nearest star to our galactic…

Continue

Added by Orion Stallard on July 8, 2016 at 12:54pm — No Comments

7 Tools to extract text from HTML document

I want to share an interesting article about data scaping that you might need in your business. The article below is mainly reprinted from here

Text in the HTML document is the content that placed between HTML tags like <a> </a> , <title> </title>. Sometimes we want to extract the text in the HTML document and there are two methods that can…

Continue

Added by Nora Choi on May 31, 2016 at 2:30am — No Comments

Hadoop Yarn explanation and container memory allocations

Yarn Resource manager (The Yarn service Master component)

1) Controls of the total resource capacity of the cluster

2) Whatever the container is needed in the cluster it sets the minimum container size that is controlled by yarn configuration property

àyarn.scheduler.minimum-allocation-mb 1024(This value changes based on cluster ram capacity)

Description: The minimum allocation for every container request at the RM, in MBs.…

Continue

Added by skumar T on May 30, 2016 at 8:00pm — No Comments

Data has always existed, the key is the right data

What does The Library of Alexandria, The Normans and a book have to do with data? I never thought about

The Library...

...at Alexandria was in charge of collecting all the world's knowledge, and most of the staff was occupied with the task of translating works onto papyrus paper... 1

Or The Normans and the...

Domesday Book (Latin: Liber de Wintonia "Book of…

Continue

Added by George Psistakis on May 20, 2016 at 5:20am — No Comments

Data Lakes Still Need Governance Life Vests

As a central repository and processing engine, data lakes hold great promise for raising return on data assets (RDA).  Bringing analytics directly to different data in its native formats can accelerate time-to-value by providing data scientists and business users with increased flexibility and…

Continue

Added by Gabriel Lowy on April 11, 2016 at 12:00pm — No Comments

The IoT User Experience Urgency

As we evolve toward a software-defined world, there’s a new user experience urgency emerging.  That’s because the definition of “user” is going to be vastly expanded.  In the Internet of Things (IoT) era, users include machines.…

Continue

Added by Gabriel Lowy on March 30, 2016 at 9:43am — No Comments

Three Big Data Trends for 2016

Is your company poised to take advantage of three key trends in Big Data? Syncsort, a global leader in Big Data and mainframe software, recently released the results of its second annual Hadoop survey. Based on the survey results there are three areas that companies will focus on in 2016, to realize the full potential of Big Data analytics.

         First, Apache Spark will move from a talking point into deployment. Nearly 70 percent of survey respondents are interested in Apache…

Continue

Added by John McCure on January 22, 2016 at 4:00pm — No Comments

Principal Component Analysis using R

Curse of Dimensionality:

One of the most commonly faced problems while dealing with data analytics problem such as recommendation engines, text analytics is high-dimensional and sparse data. At many times, we face a situation where we have a large set of features and fewer data points, or we have data with very high feature vectors. In such scenarios, fitting a model to the dataset, results in lower predictive power of the model. This scenario is often termed as…

Continue

Added by suresh kumar gorakala on February 28, 2016 at 9:30pm — No Comments

Do You Really Need a Big Data Strategy?

With increasing frequency, CIOs are being asked by their senior management, “What’s our big data strategy?”  But do you really need a big data strategy?

In our view, companies should instead focus on data…

Continue

Added by Gabriel Lowy on January 26, 2016 at 11:48am — No Comments

Learn Everything about Sentiment Analysis using R

Today I will explain you how to create a basic Movie review engine based on the tweets by people using R. The implementation of the Review Engine will be as follows:
  • Gets Tweets from Twitter
  • Clean the data
  • Create a Word Cloud
  • Create a data dictionary
  • Score each tweet.

Gets Tweets from Twitter:

First step is to fetch the data from Twitter. In R, we have facility to call the twitter API using package…
Continue

Added by suresh kumar gorakala on January 11, 2016 at 6:00am — No Comments

The CFO’s New Year’s Resolution

Virtually Print Receipts for Easier Tax Audits

While many individuals will make personal resolutions this December, the New Year is also the perfect time for business leaders to consider what steps they…

Continue

Added by Sai Gundavelli on December 7, 2015 at 1:52pm — No Comments

The Collective Definition of Data Lake by Big Data Community

Yes, we are marching towards New Year 2016!  What happened to Resolution of 2014, 2015? Quit Habits? Practice Habits? Road ahead? Am into all, but i could not able to keep it up. Hence this New Year 2016 is no more resolutions, just implement the plan.

Extend to that, as we know big data is bringing more business value to enterprise by leveraging the data lake. Data Lake..... What is that? Data Lake is loosely defined word and the definition gets changed during implementation…

Continue

Added by Kumar Chinnakali on December 2, 2015 at 6:00am — No Comments

The 7 Most Unusual Applications of Big Data You’ve Ever Seen!

Guest blog post by Bernard Marr

It’s all well and good to talk about customer experience and managing inventory flow, but what has big data done for me lately?

I’ve rounded up seven of the most interesting — and unique — applications for big data I’ve seen recently and how they may be impacting your life.…

Continue

Added by Andrei Macsin on November 3, 2015 at 5:52pm — 1 Comment

Top hadoop interview questions

To build career as a Hadoop developer, one must be clear with Hadoop concepts and have a working knowledge of analysing data using MapReduce, Hive and Pig. Typical Hadoop interview questions include topics such as replication factor, node failures and distributed caching.

Here are some basic hadoop interview questions

  1. Define Sequence file in Hadoop?
  2. What is meant by Replication factor?
  3. List out the key components of HBase and tell when you…
Continue

Added by John on October 22, 2015 at 9:30am — No Comments

Big Data = Hype; But Why That Doesn't Matter

Guest blog post by Bernard Marr

From time to time, you still come across someone with the opinion that Big Data is nothing more than a fad, which will be forgotten about soon enough.

You might not expect to hear this from me, but they’re actually right. Well – half right, at least!

As I’ve written before, I’m not actually a fan of the term “Big Data”, which puts overemphasis on the…

Continue

Added by Andrei Macsin on October 20, 2015 at 3:59pm — No Comments

Basic recommendation engine using R

Originally posted here

In our day to day life, we come across a large number of Recommendation engines like Facebook Recommendation Engine for Friends’ suggestions, and suggestions of similar Like Pages, Youtube recommendation engine suggesting videos similar to our previous searches/preferences. In today’s blog post I will explain how to build a basic…
Continue

Added by suresh kumar gorakala on October 13, 2015 at 6:30am — No Comments

How Big Data Drives Digital Marketing Success

digital

 

It’s the age of digital and we are all wired in. Smart phones, tablets, hundreds of television channels, thousands of apps, social media, and online shopping are part of our…

Continue

Added by Michael Meyers on October 2, 2015 at 8:30am — No Comments

Benjamin Graham Meets Advanced Analytics

Benjamin Graham’s The Intelligent Investor is considered by many, including his disciple Warren Buffet, as the finest book ever written on the subject of investing.  At the heart of Graham’s thesis is the question, “Can investors uncover intrinsic value in specific securities if…

Continue

Added by Gabriel Lowy on October 1, 2015 at 7:33am — No Comments

On Data Science Central

© 2019   BigDataNews.com is a subsidiary of DataScienceCentral LLC and not affiliated with Systap   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service