A Data Science Central Community

Data science is probably the most popular concept nowadays. I believe that many people are looking for an entrance to get inside the industry, and I just happened to read an article that lists some great data science books that may be helpful for you. So I concluded it in this article and I’ve also given the books brief introductions, so you can choose the ones you’d like to read. Some of the data science books you can find it online, and I've given out the links. But most of them I think you may need to find them on Amazon.

**Part I: Data Scientist Core Skills**

- Data Science
- Math
- Probability and Statistics
- Machine Learning
- Data Mining
- SQL
- R
- Python
- Data Scientist Interview
- Algorithm
- Handbook
- Web Scraping and Data Wrangling
- Data Visualization and Storytelling
- A/B Testing

**Part II: Data Science Advanced Skills**

- Neural Network and Deep Learning
- Information Theory
- Causal Inference
- Sampling
- Convex
- Growth Analytics
- Text Mining and Natural Language Processing
- Anomaly Detection
- Recommender Systems
- Social Network Analysis
- Time Series Analysis and Forecasting
- Reinforcement Learning and Artificial Intelligence

**Part III: Leisure Reading**

**Part I: Data Scientist Core Skills**

**Data Science**

**1. The Data Science Handbook: Advice and Insights from 25 Amazing Data Scientists**

25 experts in the industry gave out some advice in this handbook, very helpful for starters.

*2. Data Science for Business: What You Need to Know about Data Mining and Data-Analytic Thinking*

Written by renowned data science experts Foster Provost and Tom Fawcett, Data Science for Business introduces the fundamental principles of data science, and walks you through the "data-analytic thinking" necessary for extracting useful knowledge and business value from the data you collect. This guide also helps you understand the many data-mining techniques in use today.

*3. Doing Data Science: Straight Talk from the Frontline*

In many of these chapter-long lectures, data scientists from companies such as Google, Microsoft, and eBay share new algorithms, methods, and models by presenting case studies and the code they use. If you’re familiar with linear algebra, probability, and statistics, and have programming experience, this book is an ideal introduction to data science.

**Math**

*4. Multivariate Calculus*

https://ocw.mit.edu/courses/mathematics/18-02sc-multivariable-calculus-fall-2010/index.htm

*5. Linear Algebra*

https://ocw.mit.edu/courses/mathematics/18-06sc-linear-algebra-fall-2011/index.htm

**Probability and Statistics**

*6. Introduction to Probability, Statistics, and Random Processes*

This book introduces students to probability, statistics, and stochastic processes. It can be used by both students and practitioners in engineering, various sciences, finance, and other related fields. It provides a clear and intuitive approach to these topics while maintaining mathematical accuracy. You can also find courses and videos online.

https://www.probabilitycourse.com

*7. OpenIntro Statistics*

The OpenIntro project was founded in 2009 to improve the quality and availability of education by producing exceptional books and teaching tools that are free to use and easy to modify. And whose inaugural effort is OpenIntro Statistics. Corresponding courses and videos can be found in:

https://www.openintro.org

*8. Statistical Inference*

It’s a textbook for fresh graduates in many colleges.

Discusses both theoretical statistics and the practical applications of the theoretical developments. Includes a large number of exercises covering both theory and applications.

*9. Applied Linear Statistical Models*

Applied Linear Statistical Models is the long established leading authoritative text and reference on statistical modeling. The Fifth edition provides an increased use of computing and graphical analysis throughout, without sacrificing concepts or rigor. In general, the 5e uses larger data sets in examples and exercises, and where methods can be automated within software without loss of understanding, it is so done.

*10. An Introduction to Generalized Linear Models*

Contents summarized as the title. An introduction to generalized linear models.

*11. All of Statistics: A Concise Course in Statistical Inference*

This book is for people who want to learn probability and statistics quickly. It is suitable for graduate or advanced undergraduate students in computer science, mathematics, statistics, and related disciplines.

*12. Computer Age Statistical Inference: Algorithms, Evidence, and Data Science*

Efron and Hastie gave us a comprehensive introduction to statistics in the big data era through this book.

*13. Statistics in a Nutshell: A Desktop Quick Reference*

A quick reference as the title says

*14. Bayes' Rule: A Tutorial Introduction to Bayesian Analysis*

*15. Think Bayes: Bayesian Statistics in Python*

Briefly introduces how to use Python to do Bayesian Statistics

http://www.greenteapress.com/thinkbayes/thinkbayes.pdf

*16. Bayesian Methods for Hackers*

Advance tutorials on how to use Python to do Bayesian statistics

https://github.com/CamDavidsonPilon/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers

*17. Practical Statistics for Data Scientists: 50 Essential Concepts*

This practical guide explains how to apply various statistical methods to data science, tells you how to avoid their misuse, and gives you advice on what's important and what's not.

You can find it here: https://github.com/andrewgbruce/statistics-for-data-scientists

**Machine Learning**

*18. An Introduction to Statistical Learning: with Applications in R*

A good book no doubt, everyone in the field should have heard about it.

http://www-bcf.usc.edu/~gareth/ISL/

https://lagunita.stanford.edu/courses/HumanitiesSciences/StatLearning/Winter2016/about

*19. Applied Predictive Modeling*

Applied Predictive Modeling covers the overall predictive modeling process. A must-read before interview or work.

**20. Python Machine Learning**

Python Machine Learning Second Edition now includes the popular TensorFlow deep learning library. The scikit-learn code has also been fully updated to include recent improvements and additions to this versatile machine learning library.

*21. Fundamentals of Machine Learning for Predictive Data Analytics: Algorithms, Worked Examples, and Case Studies*

A comprehensive introduction to the most important machine learning approaches used in predictive data analytics, covering both theoretical concepts and practical applications.

*22. Real-World Machine Learning*

This book tells you how to use machine learning to solve real-world problems. Strongly recommend to all data scientists to read it before internship or work

*23. Learning From Data*

Explained many machine learning theories that many books don’t mention, such as VC dimension.

https://work.caltech.edu/telecourse.html

*24. The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition*

This book describes the important ideas in a variety of fields such as medicine, biology, finance, and marketing in a common conceptual framework. The great ESL, I think it is suitable for thumbing through and excerpting.

*25. Pattern Recognition and Machine Learning*

The book presents approximate inference algorithms that permit fast approximate answers in situations where exact answers are not feasible. It uses graphical models to describe probability distributions when no other books apply graphical models to machine learning.

**Data Mining**

*26. Principles of Data Mining*

A basic introduction to Data mining, talks a lot about association rules.

*27. Introduction to Data Mining*

Introduction to Data Mining presents fundamental concepts and algorithms for those learning data mining for the first time.

*28. Data Mining Techniques: For Marketing, Sales, and Customer Relationship Management*

Uses practical examples to introduce how to use data mining to earn from customers.

**SQL**

*29. SQL Cookbook: Query Solutions and Techniques for Database Developers*

This cookbook mentions lots of traps in SQL query, and it gives out every popular database’s query code.

**R**

*30. R in Action*

The book begins by introducing the R language, including the development environment. Focusing on practical solutions, the book also offers a crash course in practical statistics and covers elegant methods for dealing with messy and incomplete data using features of R.

*32. R Packages*

*33. Advanced R*

Written by Professor Hadley Wickham.

R for Data Science, with Garrett Grolemund, introduces the key tools for doing data science with R.

R packages teaches good software engineering practices for R, using packages for bundling, documenting, and testing your code.

Advanced R helps you master R as a programming language, teaching you what makes R tick.

**Python**

*34. Think Python*

This hands-on guide takes you through the language a step at a time, beginning with basic programming concepts before moving on to functions, recursion, data structures, and object-oriented design. Suitable for beginners

*35. Fluent Python*

Author Luciano Ramalho takes you through Python’s core language features and libraries, and shows you how to make your code shorter, faster, and more readable at the same time.

*36. Python for Probability, Statistics, and Machine Learning*

This book covers the key ideas that link probability, statistics, and machine learning illustrated using Python modules in these areas.

*37. Python Data Science Handbook*

A very comprehensive handbook, tells about using Python to solve data science problems.

https://github.com/jakevdp/PythonDataScienceHandbook

**Data Scientist Interview**

*38. Data Science Interviews Exposed*

Data Science Interviews Exposed offers data science career advice and REAL interview questions to help you get the six-figures salary jobs!

*39. Cracking the PM Interview: How to Land a Product Manager Job in Technology*

In U.S.A., many data scientists work closely related to products, even some of they are employed as product managers, so this book talking PM interview has its referential value to data scientists.

**Algorithm**

*40. Grokking Algorithms: An illustrated guide for programmers and other curious people*

Grokking Algorithms is a fully illustrated, friendly guide that teaches you how to apply common algorithms to the practical problems you face every day as a programmer.

*41. Problem Solving with Algorithms and Data Structures Using Python*

The study of algorithms and data structures is central to understanding what computer science is all about. And these are what this book all about.

Electronic edition: http://interactivepython.org/runestone/static/pythonds/index.html

*42. Algorithms in a Nutshell: A Practical Guide*

An algorithm guide for quick review.

**Handbook**

*43. The Data Science Handbook*

A comprehensive overview of data science covering the analytics, programming, and business skills necessary to master the discipline

**Web Scraping and Data Wrangling**

*44. Web Scraping with Python: Collecting Data from the Modern Web*

With this practical guide, you’ll learn how to use Python scripts and web APIs to gather and process data from thousands—or even millions—of web pages at once. Actually, simply using Octoparse can fulfill your web scraping needs.

*45. Data Wrangling with Python: Tips and Tools to Make Your Life Easier*

This book teaches you how to cleanse messy original data. Wrangle it into the way you want.

*46. Regular Expressions Cookbook*

Though regular expressions are annoying, you have to face it. You can use this book to check up the regular expressions you want.

**Data Visualization and Storytelling**

*47. Communicating Data with Tableau: Designing, Developing, and Delivering Data Visualizations*

This practical guide shows you how to use Tableau Software to convert raw data into compelling data visualizations that provide insight or allow viewers to explore the data for themselves.

*48. Interactive Data Visualization for the Web: An Introduction to Designing with D3*

This fully updated and expanded second edition takes you through the fundamental concepts and methods of D3, the most powerful JavaScript library for expressing data visually in a web browser.

*49. Data Visualization with Python and JavaScript: Scrape, Clean, Explore & Transform Your Data*

With this hands-on guide, author Kyran Dale teaches you how build a basic dataviz toolchain with best-of-breed Python and JavaScript libraries—including Scrapy, Matplotlib, Pandas, Flask, and D3—for crafting engaging, browser-based visualizations.

*50. Storytelling with Data: A Data Visualization Guide for Business Professionals*

This book demonstrates how to go beyond conventional tools to reach the root of your data, and how to use your data to create an engaging, informative, compelling story.

**A/B Testing**

*51. A / B Testing: The Most Powerful Way to Turn Clicks Into Customers*

**52. Designing with Data: Improving the User Experience with A/B Testing**

**Part II: Data Science Advanced Skills**

This part of books is recommended for those who are wishing to become a Saiyan among data scientists.

**Neural Network and Deep Learning**

*53. Make Your Own Neural Network*

A step-by-step gentle journey through the mathematics of neural networks, and making your own using the Python computer language.This guide will take you on a fun and unhurried journey, starting from very simple ideas, and gradually building up an understanding of how neural networks work.

*54. Deep Learning*

An introduction to a broad range of topics in deep learning, covering mathematical and conceptual background, deep learning techniques used in industry, and research perspectives.

*55. Hands-On Machine Learning with Scikit-Learn and TensorFlow*

This practical book shows you how to use simple and efficient tools to implement programs capable of learning from data.

**Information Theory**

*56. Data Science and Information Theory*

This is an article that introduces the importance of Information Theory in data science field.

*57. Information Theory: A Tutorial Introduction*

In this richly illustrated book, accessible examples are used to introduce information theory in terms of everyday games like ‘20 questions’ before more advanced topics are explored.

*58. Information, Entropy, Life and the Universe: What We Know and What We Do Not Know*

If you are interested in exploring the world of Information, Entropy and Probability or just the world in general this is a great place to start. Arieh takes the reader through a detailed unfolding of these topics while providing numerous common examples to help with these sometimes difficult to grasp topics

**Causal Inference**

*59. Causal Inference in Statistics: A Primer*

Judea Pearl presents a book ideal for beginners in statistics, providing a comprehensive introduction to the field of causality.

*60. Field Experiments: Design, Analysis, and Interpretation*

A brief, authoritative introduction to field experimentation in the social sciences.

**Sampling**

*61. Sampling*

Sampling provides an up-to-date treatment of both classical and modern sampling design and estimation methods, along with sampling methods for rare, clustered, and hard-to-detect populations.

**Convex**

*62. Convex Optimization*

A comprehensive introduction to the subject, this book shows in detail how such problems can be solved numerically with great efficiency.

**Growth Analytics**

*63. Lean Analytics: Use Data to Build a Better Startup Faster (Lean Series)*

Written by Alistair Croll (Coradiant, CloudOps, Startupfest) and Ben Yoskovitz (Year One Labs, GoInstant), the book lays out practical, proven steps to take your startup from initial idea to product/market fit and beyond.

*64. Web Analytics 2.0: The Art of Online Accountability and Science of Customer Centricity*

Web Analytics 2.0 provides specific recommendations for creating an actionable strategy, applying analytical techniques correctly, solving challenges such as measuring social media and multichannel campaigns, achieving optimal success by leveraging experimentation, and employing tactics for truly listening to your customers.

**Text Mining And Natural Language Processing**

*65. Natural Language Processing with Python: Analyzing Text with the Natural Language Toolkit*

This book offers a highly accessible introduction to natural language processing, the field that supports a variety of language technologies, from predictive text and email filtering to automatic summarization and translation.

Read online: http://www.nltk.org/book/

*66. Text Analytics with Python: A Practical Real-World Approach to Gaining Actionable Insights from your Data*

Text Analytics with Python teaches you the techniques related to natural language processing and text analytics, and you will gain the skills to know which technique is best suited to solve a particular problem.

*67. Introduction to Information Retrieval*

Class-tested and coherent, this groundbreaking new textbook teaches web-era information retrieval, including web search and the related areas of text classification and text clustering from basic concepts.

Read online: https://nlp.stanford.edu/IR-book/

**Anomaly Detection**

*68. Fraud Analytics Using Descriptive, Predictive, and Social Network Techniques: A Guide to Data Science for Fraud Detection*

*Fraud Analytics Using Descriptive, Predictive, and Social Network Techniques* is an authoritative guidebook for setting up a comprehensive fraud detection analytics solution.

*69. Outlier Analysis*

This book provides comprehensive coverage of the field of outlier analysis from a computer science point of view. It integrates methods from data mining, machine learning, and statistics within the computational framework and therefore appeals to multiple communities.

**Recommender Systems**

*70. Recommender Systems: The Textbook*

This book comprehensively covers the topic of recommender systems, which provide personalized recommendations of products or services to users based on their previous searches or purchases.

**Social network analysis**

*71. Network Science*

This pioneering textbook, spanning a wide range of topics from physics to computer science, engineering, economics and the social sciences, introduces network science to an interdisciplinary audience.

*72. Social and Economic Networks*

In Social and Economic Networks, Matthew Jackson offers a comprehensive introduction to social and economic networks, drawing on the latest findings in economics, sociology, computer science, physics, and mathematics.

*73. Social Network Analysis for Startups: Finding connections on the social web*

You'll learn concepts and techniques for recognizing patterns in social media, political groups, companies, cultural trends, and interpersonal networks.

**Time Series Analysis and Forecasting**

*74. Practical Time Series Forecasting with R: A Hands-On Guide*

The book introduces popular forecasting methods and approaches used in a variety of business applications. The book offers clear explanations, practical examples, and end-of-chapter exercises and cases.

*75. Forecasting: principles and practice*

This textbook provides a comprehensive introduction to forecasting methods and presents enough information about each method for readers to use them sensibly.

**Reinforcement Learning and Artificial Intelligence**

*76. Reinforcement Learning: An Introduction*

Richard Sutton and Andrew Barto provide a clear and simple account of the key ideas and algorithms of reinforcement learning. Their discussion ranges from the history of the field's intellectual foundations to the most recent developments and applications.

*77. Artificial Intelligence: A Modern Approach*

Artificial Intelligence: A Modern Approach, 3e offers the most comprehensive, up-to-date introduction to the theory and practice of artificial intelligence. Number one in its field, this textbook is ideal for one or two-semester, undergraduate or graduate-level courses in Artificial Intelligence.

**Part III: Leisure Reading**

*78. Soft Skills: The software developer's life manual*

Soft Skills: The software developer's life manual is a unique guide, offering techniques and practices for a more satisfying life as a professional software developer.

*79. The Healthy Programmer: Get Fit, Feel Better, and Keep Coding*

This is an excellent book for any professional who sits too much for the job. It contains informative suggestions to improve your health in ways that fit into your busy day. What makes this book different is its practical suggestions which fit into the hectic lifestyle.

*80. Exposing the Magic of Design*

This book offers a way of thinking about complicated, multifaceted problems with a repeatable degree of success. Design synthesis methods can be applied in business to produce new and compelling products and services, or these methods can be applied in government with the goal of changing culture and bettering society.

*81. Thinking, Fast and Slow*

The book has about 3k reviews in Amazon. No certain description was given, but I believe it’s a great and interesting book for all people.

*82. Naked Statistics: Stripping the Dread from the Data*

Perhaps the most interesting statistics textbook you’d have ever read.

*83. Uncertainty: The Soul of Modeling, Probability & Statistics*

This book presents a philosophical approach to probability and probabilistic thinking, considering the underpinnings of probabilistic reasoning and modeling, which effectively underlie everything in data science.

*Source: Octoparse*

**More related sources:**

Top 30 Big Data Tools for Data Analysis

Top 8 Technology Trends for 2018 You Must Know

Top 30 Process Automation Tools for 2018

Top 30 Free Web Scraping Software

Big Data: 70 Amazing Free Data Sources You Should Know for 2017

© 2019 BigDataNews.com is a subsidiary of DataScienceCentral LLC and not affiliated with Systap Powered by

Badges | Report an Issue | Privacy Policy | Terms of Service

**Most Popular Content on DSC**

To not miss this type of content in the future, subscribe to our newsletter.

- Book: Statistics -- New Foundations, Toolbox, and Machine Learning Recipes
- Book: Classification and Regression In a Weekend - With Python
- Book: Applied Stochastic Processes
- Long-range Correlations in Time Series: Modeling, Testing, Case Study
- How to Automatically Determine the Number of Clusters in your Data
- New Machine Learning Cheat Sheet | Old one
- Confidence Intervals Without Pain - With Resampling
- Advanced Machine Learning with Basic Excel
- New Perspectives on Statistical Distributions and Deep Learning
- Fascinating New Results in the Theory of Randomness
- Fast Combinatorial Feature Selection

**Other popular resources**

- Comprehensive Repository of Data Science and ML Resources
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- 100 Data Science Interview Questions and Answers
- Cheat Sheets | Curated Articles | Search | Jobs | Courses
- Post a Blog | Forum Questions | Books | Salaries | News

**Archives:** 2008-2014 |
2015-2016 |
2017-2019 |
Book 1 |
Book 2 |
More

**Most popular articles**

- Free Book and Resources for DSC Members
- New Perspectives on Statistical Distributions and Deep Learning
- Time series, Growth Modeling and Data Science Wizardy
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- Comprehensive Repository of Data Science and ML Resources
- Advanced Machine Learning with Basic Excel
- Difference between ML, Data Science, AI, Deep Learning, and Statistics
- Selected Business Analytics, Data Science and ML articles
- How to Automatically Determine the Number of Clusters in your Data
- Fascinating New Results in the Theory of Randomness
- Hire a Data Scientist | Search DSC | Find a Job
- Post a Blog | Forum Questions

## You need to be a member of BigDataNews to add comments!

Join BigDataNews