A reader (Ian) recently asked me what I felt was the best way to learn data science in his spare time.
Great question, Ian, especially considering the field is red-hot right now and shows no signs of slowing down!
In 2018 demand for data scientists grew by 29%. This highlights the 344% increase since 2013 according to reports by Indeed and Dice. The supply of qualified candidates, however, drastically lags behind.
My first instinct was to refer Ian to the contemporary King of AI, Siraj Raval, and his free curriculum “Learn Data Science in 3 Months”. The video, however, was created six months ago (a lifetime in the ever-changing world of data science) and I saw some opportunities to update it.
Some courses were gone or had changed… so I updated/replaced them.Some people prefer a written guide… so I typed it out.Some need to know why a topic is important to understand it… so I added explanations.Finally, (based on personal experience) some can get stuck on any single course no matter how interesting it is. I always like to have an additional course available that may explain something a different way or fill in some knowledge gaps… so I added some alternatives/additions.
Hopefully, this complete curriculum to becoming a data scientist helps Ian and anyone else interested in the field!
1. Learn Python
Tools you’ll use? Python
Massachusetts Institute of Technology (MIT) | Introduction to Computer Science and Programming in Python
Kaggle | Python
Siraj Raval | Learn Python for Data Science
2. Learn Statistics and Probability
Math
As a data scientist, you’ll have to extract useful information from extremely imperfect data. You can’t completely eliminate uncertainty but you can reduce it with a strong grasp of statistics and probability fundamentals.
Khan Academy | Statistics and Probability
UC San Diego | Probability and Statistics in Data Science using Python
3. Learn Data Analysis
Pandas, R
Data analysis enables you to summarize the characteristics of a data set. This deeper understanding of the data can direct you to the best way to extract useful, actionable conclusions. In short… learn how to understand and clean data. It’s what 90% of your time will be spent doing.
Georgia Tech | Computing for Data Analysis
Kaggle | Pandas