A Data Science Central Community

The word “Big data” prevailed in 2017, and it’s going to keep prevailing in the following years. In our previous post, I’ve introduced some concepts about big data, machine learning, and data mining (see post: Understanding Big data, Data mining, and Machine Learning in 5 Minutes). Now let's dig deeper into Machine Learning with a brief walk-through of some most commonly used ML algorithms, no codes, no abstract theories, just pictures and some examples of how they are used.

The list of algorithms covered in this article include:

- Decision tree
- Random forest
- Logistic regression
- Support vector machine
- Naive Bayes
- k-NearestNeighbor
- k-means
- Adaboost
- Neural network
- Markov

**1. Decision Tree**

Classify a set of data into different groups using certain attributes, execute a test at each node, through brach judgement, further split the data into two distinct groups, so on and so forth. Tests are done based on existing data, and when new data are being added it can be classified to the corresponding group

Classify data according to some features, whenever the process goes to the next step, there is a judging branch, and the judgement divides the data into two, and the process goes on. When tests are done with existing data, new data can be These questions are learned by the existing data, when there is new data coming in, computer can categorize data into the right leaves.

**2.Random Forest**

Select randomly from the original data, and form into different subsets.

Matrix S is the original data, and it contains 1-N data rows, while A, B, C are the features, and the last C stands for categories.

Create random subsets from S, let’s say we got M sets of subsets.

And we get M sets of decision trees from these subsets:

Throw new data into these trees, we can get M sets of results, and we count to see which results are the most in all M sets, we can consider that as the final result.

**3.Logistic Regression**

When the probability of the predicting target is larger than 0, and less than or equal to 1, it cannot be fulfilled by simple linear model. Because when domain of definition is not within certain level, the range would exceed the specified interval.

We better go with model with this kind.

So how can we get this model?

This model needs to fulfill two conditions, “Larger than or equal to 0”, “Less than or equal to 1”

And we transform the formula, we can get the logistic regressions model:

By calculating the original data, we can get corresponding coefficients.

And we get the logistic model plot.

**4.Support Vector Machine**

To separate the two classes from hyperplane, the best choice will be the hyperplane that leaves the maximum margin from both classes. Because Z2>Z1, so the green one is better.

Use a linear equation to express the hyperplane, class above the line is larger than or equal to 1, the other class is less than or equal to -1.

Calculate the distance between the point to the surface by using the equation in the graph:

So we get the expression of total margin as below, the aim is to maximize the margin, which we need to do is to minimize the denominator.

For example, we use 3 points to find the optimal hyperplane, define weight vector=(2, 3) - (1, 1)

And get weight vector (a, 2a), substitute these two points into the equation

When a is confirmed, the result using (a, 2a) is support vector,

Equation substituting in a and w0 is support vector machine.

**5.Naive Bayes**

Here’s an example of NLP:

Giving out a pieces of text, examine the text’s attitude is positive or negative.

To solve the problem, we can only look at some of the words:

And these words, will represent by only some of words and their counts.

And the original question is: Give you a sentence, which category does it belong?

By using Bayes Rules, it is going to be an easy question.

The question becomes, in this class, what’s the probability of occurrence of this sentence? And remember not to forget the other two probabilities in the equation.

Example: the probability of occurrence of the word “love” is 0.1 in the positive class, and 0.001 in the negative class.

**6.k-NearestNeighbor**

When comes a new datum, which category has the most points nearest to it, it belongs to which category.

For example: To distinguish “dog” and “cat”, we judge from two features, “claws” and “sound”. Circles and triangles are the known categories, what about “star”:

When K=3, these three lines connect the nearest 3 points, and circles are more, so “star” belongs to “cat”.

**7.k-means**

Separate the data into 3 classes, the pink part is the biggest, while the yellow is the smallest.

Pick 3, 2, 1 as default, and calculate the distance between the rest data and the defaults, and classify it into the class that has the shortest distance.

After classification, calculate the means of each class, and set it as the new center.

After some rounds, we can stop when the class no longer changes.

**8.Adaboost**

Adaboost is one measure of boosting.

Boosting is to gather up the classifiers that didn’t have satisfied results, and generate a classifier that may have better effect.

As the below shows, tree 1 and tree 2 don’t have good effects individually, but if we input the same data, and sum up the results, the final result will be more convincing.

An example for adaboost, in handwriting recognition, the panel can extract many features, such as the beginning direction, distance between beginning point and ending point, and etc.

When training the machine, it will get the weight of each feature, like 2 and 3, the beginnings of writing them are very similar, so this feature does little to classification, so its weight is little.

But this alpha angle has a great recognizability, so the weight of this feature will be great. The final outcome will be a result of considering all of these features.

**9.Neural Network**

In NN, an input may end up into at least two classes.

Neural network is formed of neures, and connections of neures.

The first layer is the input layer, and the last layer is the output layer.

In hidden layers and output layer, they both have their own classifiers.

When an input comes in the network, and being activated, the calculated score will be passed down to the next layer. Scores shown in the output layer are the scores for each class. Example below gets the result of class 1;

same input being passed to different knots generates different scores, which is because that in each knot, it has different weights and bias, and this is propagation.

**10.Markov**

Markov Chain consists of states and transitions.

For example, get a Markov Chain based on “the quick brown fox jumps over the lazy dog”.

First, we need to set every word under a state, and we need to calculate the probability of state transitions.

These are the probabilities calculated by one single sentence. When you use massive data of texts to train the computer, you will get a bigger state transition matrix, such as words that can follow “the”, and their corresponding probabilities.

Source: Octoparse

© 2019 BigDataNews.com is a subsidiary of DataScienceCentral LLC and not affiliated with Systap Powered by

Badges | Report an Issue | Privacy Policy | Terms of Service

**Most Popular Content on DSC**

To not miss this type of content in the future, subscribe to our newsletter.

**Technical**

- Free Books and Resources for DSC Members
- Learn Machine Learning Coding Basics in a weekend
- New Machine Learning Cheat Sheet | Old one
- Advanced Machine Learning with Basic Excel
- 12 Algorithms Every Data Scientist Should Know
- Hitchhiker's Guide to Data Science, Machine Learning, R, Python
- Visualizations: Comparing Tableau, SPSS, R, Excel, Matlab, JS, Pyth...
- How to Automatically Determine the Number of Clusters in your Data
- New Perspectives on Statistical Distributions and Deep Learning
- Fascinating New Results in the Theory of Randomness
- Long-range Correlations in Time Series: Modeling, Testing, Case Study
- Fast Combinatorial Feature Selection with New Definition of Predict...
- 10 types of regressions. Which one to use?
- 40 Techniques Used by Data Scientists
- 15 Deep Learning Tutorials
- R: a survival guide to data science with R

**Non Technical**

- Advanced Analytic Platforms - Incumbents Fall - Challengers Rise
- Difference between ML, Data Science, AI, Deep Learning, and Statistics
- How to Become a Data Scientist - On your own
- 16 analytic disciplines compared to data science
- Six categories of Data Scientists
- 21 data science systems used by Amazon to operate its business
- 24 Uses of Statistical Modeling
- 33 unusual problems that can be solved with data science
- 22 Differences Between Junior and Senior Data Scientists
- Why You Should be a Data Science Generalist - and How to Become One
- Becoming a Billionaire Data Scientist vs Struggling to Get a $100k Job
- Why do people with no experience want to become data scientists?

**Articles from top bloggers**

- Kirk Borne | Stephanie Glen | Vincent Granville
- Ajit Jaokar | Ronald van Loon | Bernard Marr
- Steve Miller | Bill Schmarzo | Bill Vorhies

**Other popular resources**

- Comprehensive Repository of Data Science and ML Resources
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- 100 Data Science Interview Questions and Answers
- Cheat Sheets | Curated Articles | Search | Jobs | Courses
- Post a Blog | Forum Questions | Books | Salaries | News

**Archives**: 2008-2014 | 2015-2016 | 2017-2019 | Book 1 | Book 2 | More

**Most popular articles**

- Free Book and Resources for DSC Members
- New Perspectives on Statistical Distributions and Deep Learning
- Time series, Growth Modeling and Data Science Wizardy
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- Comprehensive Repository of Data Science and ML Resources
- Advanced Machine Learning with Basic Excel
- Difference between ML, Data Science, AI, Deep Learning, and Statistics
- Selected Business Analytics, Data Science and ML articles
- How to Automatically Determine the Number of Clusters in your Data
- Fascinating New Results in the Theory of Randomness
- Hire a Data Scientist | Search DSC | Find a Job
- Post a Blog | Forum Questions

## You need to be a member of BigDataNews to add comments!

Join BigDataNews