Want to learn data science in 2021? Here’s the internet’s best curriculum
Curated by David Venturi for the Not a Real Degree community
Note: Not a Real Degree is learner-supported. Some of the resources I recommend may be affiliate links, meaning I receive a commission (at no extra cost to you) if you use that link to make a purchase.
In a previous post, I announced Not a Real Data Science Degree, a community of learners following a curated curriculum made up of the internet’s best resources. I believe it’s the best bang-for-your-buck method for learning data skills in the digital era of education.
In this post, I list the courses, books, and other resources included in my latest version of the curriculum, plus the rationale behind those picks.
A note from the founder
Hey, it’s David. I wrote this article back in 2021. Since then, I’ve refined Not a Real Degree by niching down to Data Analysts and changing the branding. Do you want to become a data analyst without spending 4 years and $41,762 to go to university? Follow my 90-day curriculum below.
While the courses, articles, etc. in the above (new) curriculum are different, the design of the curriculum that you’ll read in this article is the same, so I will keep this article up.
Okay, back to the article.
Here’s what you’ll read in this article:
- Curriculum overview
- Learn: courses, books, and tutorials
- Frame: blog posts and YouTube videos
- Assess: adaptive tests
- Create: self-directed projects
- Career services
- How I created the curriculum
If you want to get started now, follow the steps in this post.
Before we begin, here’s a quick TL;DR of the announcement post, which provides a general overview of the curriculum and the community.
- Motivation: To create a curriculum made up of the best online courses, books, etc. and build a community to compete with universities and bootcamps. I did this in 2015, and am updating my picks after five years of experience in the EdTech industry.
- Just show me your picks: Sure! Curriculum.
- Prerequisites: Basic arithmetic and high school algebra.
- Target role: The analyst-machine learning expert hybrid.
- Languages: 85% Python, 10% SQL, 5% R.
- Two terms: Term 1 covers basic data analysis. Term 2 covers machine learning and advanced data analysis topics.
- Time commitment: Each term is roughly 65 days, where one day contains 4–5 hours of focused learning. You set your schedule and location.
- Price: Varies since some resources require a subscription. As little as $249 for those who complete the curriculum in six months, and $375 for a year.
- Community: I’m creating a community so you don’t have to learn alone. Parts of the community will be paid ($8/month or $67/year) so members are invested and engaged. Join the waitlist!
- Why I’m doing this: To help democratize education. I chose an affiliate revenue model blended with a paid community to make maintaining the curriculum and building the community my full-time job.
Let’s get to the curriculum.
Curriculum overview
Term 1: Data Analysis
- Introduction to Data Science
- Introduction to Python Programming
- Setting Up Your Computer
- Python Data Science Toolbox
- Importing Data
- Preparing Data
- Exploratory Data Analysis
- Statistics
- Data Visualization
- More Statistics
- Databases & SQL
- Data Engineering
- Data Warehouses & Cloud Computing
- Analytics Engineering
Term 2: Machine Learning & More
- Objects & Algorithms
- Introduction to Machine Learning
- More Python Programming
- Supervised Learning
- More Data Visualization
- Unsupervised Learning
- Introduction to Neural Networks
- Data Science Ethics
- Scalable Data Science
- Time Series Analysis
- Text Analysis
- Other Fun Stuff
Learn: courses, books, and tutorials
Below, I’ll link to each course, book, and tutorial I selected in the order they appear in the curriculum. I’ll also briefly explain why I put it there.
I don’t include individual explanations for DataCamp courses because one explanation can be applied to the 39 courses I recommend. In my opinion, DataCamp’s combination of product and content creates the most beginner-friendly experience for learning data skills online.
- By product, I am referring to their in-browser course software that delivers their short videos and interactive coding exercises.
- By content, I am referring to how they’ve unbundled the multi-week course into four-hour mini-courses taught by subject matter experts.
I filtered through their 300+ course catalog and identified the courses that I think are the best options for their specific subject. These courses represent 30% of the curriculum. The other courses, books, and tutorials I recommend either 1) are unique in some way that outweighs this product/content combo or 2) teach a subject/tool for which DataCamp does not have a course.
Term 1: Data Analysis
Introduction to Data Science
First, you’ll acquire a framework for understanding the data science industry in a theory-only course. Then, you’ll do a little data science using Python. You’ll make your first coding errors under the instructor’s guidance, so no need to be intimidated!
- Data Science for Everyone by Lis Sulmont, Sara Billen, and Hadrien Lacroix on DataCamp (1 day)
- Intro to Data Science in Python by Hillary Green-Lerman on DataCamp (1 day)
Introduction to Python Programming
Next, you’ll learn Python programming and the fundamentals of computer science, which are foundational to the data skills you’ll learn next.
My research suggests that Dr. David Joyner’s Introduction to Python Programming series on edX is the clear winner for this subject area. This series is identical to Georgia Tech’s first class in undergraduate computer science:
Over 400 students on campus have completed this version of the course, and our analysis shows that they exit the course with the same learning outcomes as students taking the traditional on-campus version. This Professional Certificate uses the same instructional material and assessments as learning Python on campus, giving you a Georgia Tech-caliber introduction into the field of computing at your own pace.
You’ll take the first three courses in Term 1. These courses and a fourth in Term 2 make up 25% of the curriculum.
- Fundamentals and Procedural Programming on edX (8 days)
- Control Structures on edX (8 days)
- Data Structures on edX (8 days)
Setting Up Your Computer
Next, you’ll set up your computer and learn how to work in your own computing environment (as opposed to the environment set up for you by DataCamp or edX, for example).
First, you’ll learn how to interact with the command line:
- Elements of the Command Line on Dataquest (1 day)
- Command Line: Intermediate on Dataquest (1 day)
Then you’ll learn how to set up and manage data science software using conda:
- Conda Essentials by Team Anaconda on DataCamp (0.5 days)
- How to install conda on davidventuri.com (0.5 days)
Then you’ll learn how to use JupyterLab, a popular web-based user interface for data science:
- How to use JupyterLab on davidventuri.com (0.5 days)
And finally, you’ll learn how to keep track of your work and collaborate on projects in team environments with Git:
- Git and Version Control on Dataquest (1 day)
For the command line and Git courses, I like Dataquest’s offerings because of their depth, plus how they teach those in the context of the data science workflow. Note: use this referral link for $15 off.
I like DataCamp’s conda course because you learn conda without you having to install it first, which is a stumbling block for many beginners (myself included a few years ago). You’ll install conda on your computer next by following my recommended installation process for this curriculum.
The JupyterLab blog post I compiled contains various online resources (documentation, videos, and tutorials) that together resemble a course.
Python Data Science Toolbox
Next, you’ll add some more programming tools to your Python toolbox. These tools will come in handy later in the curriculum.
- Chapters 2 & 3 of Python Data Science Toolbox (Part 1) by Hugo Bowne-Anderson on DataCamp (0.5 days)
- Chapters 1 & 2 of Python Data Science Toolbox (Part 2) by Hugo Bowne-Anderson on DataCamp (0.5 days)
Chapter 1 of Part 1 is skippable because you already learned how to write functions in Introduction to Python Programming. Chapter 3 of Part 2 is skippable because it is a review of the topics you just learned (i.e., a case study) and this curriculum uses adaptive tests and self-directed projects instead.
Importing Data
As you’ll have learned in Data Science for Everyone, importing data is part of the first step of the data science workflow. You’ll learn how to import data using pandas, the most popular analytics library in Python.
- Chapters 1, 2 & 4 of Streamlined Data Ingestion with pandas by Amany Mahfouz on DataCamp (0.5 days)
I suggest you skip Chapter 3 because you will learn the “Importing Data from Databases” skill in a later course after you have learned some SQL.
Preparing Data
You’ll then learn how to prepare your data for analysis. The following courses teach the skills you’ll use most often.
- Data Manipulation with pandas by Richie Cotton and Maggie Matsui on DataCamp (1 day)
- Joining Data with pandas by Aaren Stubberfield on DataCamp (1 day)
- Cleaning Data in Python by Adel Nehme on DataCamp (1 day)
- Working with Dates and Times in Python by Max Shron on DataCamp (1 day)
You will learn more advanced data preparation skills later in the curriculum.
Exploratory Data Analysis
Exploratory data analysis (EDA) is the process of exploring data to summarize their main characteristics. You’ll learn EDA next, and you’ll do it often in your data career.
- Exploratory Data Analysis in Python by Allen Downey on DataCamp (1 day)
Statistics
Statistics is the study of how to collect, analyze, and draw conclusions from data. You’ll learn how to do that in Python, while also learning some of the probability theory that underlies statistical inference.
- Introduction to Statistics in Python by Maggie Matsui on DataCamp (1 day)
Data Visualization
You’ll then learn how to visualize your data. First, you’ll learn the theory behind data visualization, then how to use the most popular data viz libraries in Python.
- Data Visualization for Everyone by Richie Cotton on DataCamp (1 day)
- Introduction to Data Visualization with Matplotlib by Ariel Rokem on DataCamp (1 day)
- Introduction to Data Visualization with Seaborn by Erin Case on DataCamp (1 day)
More Statistics
You’ll then dive a little deeper into statistics, which will prepare you for machine learning in Term 2. You’ll also learn the basics of R, a programming language that is optimized for statistics.
- ModernDive: Statistical Inference via Data Science by Chester Ismay and Albert Y. Kim on moderndive.com (4 days)
- Introduction to Regression in Python with statsmodels by Maarten Van den Broeck on DataCamp (1 day)
- Experimental Design in Python by Luke Hayden on DataCamp (1 day)
- Fundamentals of Bayesian Data Analysis in R by Rasmus Bååth on DataCamp (1 day)
ModernDive is an online book that is frequently recommended for learning statistical inference. I read it in 2020 and was amazed by how intuitively the authors teach the subject, which is why I placed it first in this section. I think the content more than makes up for the book’s lack of software features (e.g., interactive grading) and choice of language (it uses R, i.e., not the language of focus for this curriculum).
The Bayesian course is in R because 1) Rasmus Bååth is a great teacher and 2) (as far as I am aware) there isn’t anything comparable in quality and length in Python right now.
Databases & SQL
Nearly every data role requires the basics of databases and SQL, and you’ll acquire them next.
- Introduction to SQL by Nick Carchedi on DataCamp (1 day)
- Introduction to Relational Databases in SQL by Timo Grossenbacher on DataCamp (1 day)
- Joining Data in SQL by Chester Ismay on DataCamp (1 day)
- Intermediate SQL by Mona Khalil on DataCamp (1 day)
- Improving Query Performance in PostgreSQL by Amy McCarty on DataCamp (1 day)
- Chapters 1–4 of Introduction to Databases in Python by Jason Myers on DataCamp (1 day)
Chapter 5 of the last course is skippable because it is a case study.
Data Engineering
You’ll then play data engineer so you can understand how data analysts (and later, analytics engineers) interact with them. You’ll also dive deeper into modeling data.
- Introduction to Data Engineering by Vincent Vankrunkelsven on DataCamp (1 day)
- Database Design by Lis Sulmont on DataCamp (1 day)
Data Warehouses & Cloud Computing
You learned the basics of storing and querying data, now you’ll scale these skills out to data warehouses in the cloud. First, you’ll learn about data warehouses and how they’re different from regular databases. Next, the basics of cloud computing. You’ll then learn how to use Snowflake, a cloud-based data warehouse that is rapidly conquering their industry.
- What is a data warehouse? How is it different than a database? on davidventuri.com (0.1 days)
- Cloud Computing for Everyone by Sara Billen, Lis Sulmont, and Hadrien Lacroix on DataCamp (0.5 days)
- Snowflake: The Data Warehouse Built for the Cloud by James Harnischmacher on YouTube (0.1 days)
- WebUI Essentials by Snowflake Inc. on Snowflake University (1.5 days)
Analytics Engineering
An analytics engineer exists somewhere between the data engineers and the data analysts. You’ll learn how the role came to exist and how to use the hottest tool for analytics engineering in 2021 — dbt.
- dbt Fundamentals by Kyle Coapman on dbt Learn (2 days)
Term 2: Machine Learning & More
Objects & Algorithms
To kick off Term 2, you’ll wrap up Dr. David Joyner’s Introduction to Python Programming series. Object-oriented programming and algorithms are concepts that beginners can struggle with, so I delayed this course until right now. These topics are important foundations for machine learning.
- Objects & Algorithms on edX (8 days)
Introduction to Machine Learning
Next, you’ll learn the basics of machine learning. First, you’ll acquire a framework for understanding it from Karolis Urbonas (Head of Machine Learning and Science at Amazon) in a theory-only course. Then, you’ll start doing machine learning in Python with Aurélien Géron’s Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow book.
- Machine Learning for Business by Karolis Urbonas on DataCamp (1 day)
- Preface of Hands-On Machine Learning (0.1 days)
- The Machine Learning Landscape (Chapter 1 of Hands-On Machine Learning) (0.5 days)
- End-to-End Machine Learning Project (Chapter 2 of Hands-On Machine Learning) (0.5 days)
I prefer having one instructor teach me machine learning with a unified narrative, so I prefer Aurélien’s book (and accompanying notebooks) to DataCamp’s bite-sized courses from various instructors. I also like Aurélien’s choice of machine learning libraries (more on this shortly), the book’s rave reviews, and its recent release (late 2019). These factors make up for the lack of video and interactive grading in the book and notebooks.
More Python Programming
These advanced Python skills will round out your programming for data science toolbox. With them, you’ll feel fully in command of the code you write.
- Writing Functions in Python by Shayne Miel on DataCamp (1 day)
- Writing Efficient Python Code by Logan Thomas on DataCamp (1 day)
- Software Engineering for Data Scientists in Python by Adam Spannbauer on DataCamp (1 day)
Writing Functions in Python covers advanced concepts like context managers and decorators. You’ll build on the functions skills you learned early on in Term 1, allowing you to write “complex and beautiful functions so that you can contribute research and engineering skills to your team.”
Supervised Learning
You’ll then continue with the supervised learning chapters of Aurélien Géron’s Hands-On Machine Learning book. These chapters teach the Scikit-Learn library, which is described by the author in the following fashion:
Scikit-Learn is very easy to use, yet it implements many Machine Learning algorithms efficiently, so it makes for a great entry point to learn Machine Learning.
- Classification (Chapter 3 of Hands-On Machine Learning) (2 days)
- Training Models (Chapter 4 of Hands-On Machine Learning) (1.5 days)
- Support Vector Machines (Chapter 5 of Hands-On Machine Learning) (1.5 days)
- Decision Trees (Chapter 6 of Hands-On Machine Learning) (1 day)
- Ensemble Learning and Random Forests (Chapter 7 of Hands-On Machine Learning) (1 day)
More Data Visualization
To break up the machine learning content, you’ll up your data visualization game. First, more Seaborn. Second, tips to make your visualizations more compelling. Third, interactive data visualization with plotly.
- Intermediate Data Visualization with Seaborn by Chris Moffitt on DataCamp (1 day)
- Improving Your Data Visualizations in Python by Nicholas Strayer on DataCamp (1 day)
- Introduction to Data Visualization with Plotly in Python by Alex Scriven on DataCamp (1 day)
Unsupervised Learning
You’ll then hop back into Aurélien Géron’s Hands-On Machine Learning, where you’ll learn unsupervised learning techniques in Scikit-Learn.
- Dimensionality Reduction (Chapter 8 of Hands-On Machine Learning) (1.5 days)
- Unsupervised Learning Techniques (Chapter 9 of Hands-On Machine Learning) (2 days)
Note, dimensionality reduction can be supervised (e.g., LDA), but the dimensionality reduction techniques you’ll learn in this chapter are unsupervised.
Introduction to Neural Networks
You’ll then learn how to train neural networks (i.e., do deep learning) with my final recommended chapters of Aurélien Géron’s Hands-On Machine Learning. You’ll pick up Keras, which is a high-level deep learning library that makes it simple to train and run neural networks.
- Introduction to Artificial Neural Networks with Keras (Chapter 10 of Hands-On Machine Learning) (1 day)
- Training Deep Neural Networks (Chapter 11 of Hands-On Machine Learning) (1.5 days)
Why stop at Keras in this curriculum? In Chapter 12 (i.e., the chapter after these ones), Aurélien writes:
Up until now, we’ve used only TensorFlow’s high-level API, tf.keras, but it already got us pretty far: we built various neural network architectures, including regression and classification nets, Wide & Deep nets, and self-normalizing nets, using all sorts of techniques, such as Batch Normalization, dropout, and learning rate schedules. In fact, 95% of the use cases you will encounter will not require anything other than tf.keras.
If you’re interested, you’re welcome to continue with Chapter 12 onwards, where you’ll learn how to build more advanced models with TensorFlow 2.0.
Data Science Ethics
Next, you’ll learn how to navigate the ethical dilemmas when exercising your new data skills. The main resource you’ll use is H.V. Jagadish’s University of Michigan course, where you’ll learn about informed consent, data ownership, privacy, anonymity, data validity, and algorithmic fairness. You’ll then learn how to use deon, a command line tool that allows you to add an ethics checklist to your data science projects. You’ll then learn about ethics in AI at a deeper level.
- Data Science Ethics by H.V. Jagadish on edX (4 days)
- deon: introduction by Vincent D. Warmerdam on calmcode.io (0.5 days)
- Forget the robots! Here’s how AI will get you by Cassie Kozyrkov on Towards Data Science (0.1 days)
Scalable Data Science
You’ll then learn how to scale up your work to “big data” using parallel computing, GPUs, and the cloud. I selected Dask, BlazingSQL, and Coiled for this curriculum because they are the easiest to learn given the Python skills you’ve acquired thus far, they have strong development teams, and they are gaining industry adoption.
- GPUs: Explained by Alex Hudak on YouTube (0.1 days)
- How to learn Dask in 2021 by James Bourbeau and Matthew Rocklin on coiled.io (1.5 days)
- How to learn BlazingSQL in 2021 by Rodrigo Aramburu and Tom Drabas on app.blazingsql.com (1 day)
- How to learn Coiled in 2021 by James Bourbeau on coiled.io (1 day)
Dask scales up the existing Python ecosystem to multi-core machines and distributed clusters. It allows you to use your NumPy, Pandas, and Scikit-Learn skills on big data, instead of having to learn a new programming style like you would have with big data tools like Scala or Spark.
BlazingSQL provides a high-performance distributed SQL engine in Python. Like Dask, it will feel natural for Python users. A quote from Dask co-creator Matthew Rocklin:
One of the common requests we get for Dask is, “Hey, do you support SQL? I love that [with Dask] I can do some custom Python manipulation, but then I want to hand it off to a SQL engine.” And my answer has always been, “No, there is no good SQL system in Python.” But now there is — if you have GPUs.
Coiled is a startup that aims to make parallel computing and cloud computing easy for Python and Dask users. Per their website, “Dask scales Python for data science and machine learning, Coiled makes it easy to scale on the cloud.”
Note that “Coiled runs on AWS today, with Azure support coming soon.” Google Cloud is on the roadmap, and Google’s Head of Decision Intelligence is excited about that:
Learning a tool that is still being built out shows the benefits of an opinionated curriculum curated by an individual. I can be a little more agile than a school or company — compiling online resources only takes a few hours. I can also take a little more “tool risk.” In this case, I believe the risk is worth the reward. Plus, you’ll still learn the basic mechanics of scaling to the cloud with Coiled.
I helped curate the learning materials for each of the resources above, which is why they have the same naming convention.
Time Series Analysis
Next, you’ll hop back into developing your analyst skills. A time series is a series of data points indexed in time order. This type of data is ubiquitous, particularly in finance and applied science disciplines. First, you’ll learn how to handle time series data, then you’ll learn how to forecast based on that data.
- Manipulating Time Series Data in Python by Stefan Jansen on DataCamp (1 day)
- ARIMA Models in Python by James Fulton on DataCamp (1 day)
Text Analysis
You’ll then develop your text analysis skills, learning the basics of regular expressions and natural language processing.
- Regular Expressions in Python by Maria Eugenia Inzaugarat on DataCamp (1 day)
- Introduction to Natural Language Processing in Python by Katharine Jarmul on DataCamp (1 day)
Other Fun Stuff
You’ll wrap up the program by learning skills that don’t have obvious curriculum categories. First, you’ll experience common machine learning pitfalls and how to fix them in real-life workflows. Then A/B testing, a critical skill for successful online experiments. Then, web scraping, which is a hacky but effective way of importing data on the internet. Next, you’ll learn how to analyze data that has a geographic component to it. Finally, you’ll learn an exciting new data analysis tool.
- Designing Machine Learning Workflows in Python by Christoforos Anagnostopoulos on DataCamp (1 day)
- How do A/B tests work? by Cassie Kozyrkov on Towards Data Science (0.1 days)
- Customer Analytics and A/B Testing in Python by Ryan Grossman on DataCamp (1 day)
- Web Scraping in Python by Thomas Laetsch on DataCamp (1 day)
- Working with Geospatial Data in Python by Joris Van den Bossche and Dani Arribas-Bel on DataCamp (1 day)
- Intro Data Science with Siuba by Michael Chow on learn.siuba.org (1 day)
Siuba, born in 2019, is a new library that emulates an R library called dplyr that you’ll learn in ModernDive. Though Siuba doesn’t have much adoption yet, I’m including it because doing EDA in a dplyr-like way in Python would be a massive addition to an analyst’s toolbox, and early feedback on it is positive. Plus, the creator of the package built an online course (and the software to deliver that course) to promote adoption.
Frame: blog posts and YouTube videos
Interspersed between the resources above are blog posts and YouTube videos. These high-level resources frame your new skills in the context of the data industry in real life.
For example, after your introduction to data science, you’ll read a piece called, “Is data science a bubble?” You’ll gain an appreciation for where the industry is today and where the author thinks it is going.
That piece is a Cassie Kozyrkov creation — most of the Frame resources I selected are. She’s an excellent communicator with her pieces striking a nice balance between informative and humorous. She also has major industry experience so her opinions carry weight.
Other examples of blog posts of hers that you’ll read include:
- What makes a data analyst excellent? (after you learn how to do exploratory data analysis)
- Machine learning — Is the emperor wearing clothes? (after you learn how to do a little machine learning)
- Data science and AI are a mess… and your startup might be making it worse (after you learn how to use Dask, BlazingSQL, and Coiled, three tools designed to work within the Python data science ecosystem)
An excerpt from the last linked piece to get a sense of her writing:
Don’t build tools for their own sake, build them to fulfill your users’ needs and make your users happy. Focus on integration — it’s important to make these tools play well with the rest of the ecosystem, because no one wants to stop what they’re doing to give your tool special treatment unless it’s a cure-all.
I’ve personally learned a lot from Cassie’s pieces. They’ve also shaped many of my decisions for this curriculum. I think you’ll find them valuable, too.
Assess: adaptive tests
After you learn a new skill and frame that new skill, you’ll then assess how proficient you are at this new skill. You’ll use DataCamp Signal, a new adaptive testing tool launched in 2019. You’ll mainly use this tool to:
- See if you need to revisit any of the Learn resources before starting your project.
- Create a digital transcript using your test scores to prove what you learned.
- Track your scores throughout the curriculum to visualize and gamify your progress.
Here’s how your score is presented:
Each assessment is a series of 15 challenges. The difficulty of your next challenge changes based on how well you’ve scored up until that point. The entire assessment takes 5–10 minutes total.
I strategically interspersed the following assessments within the curriculum to leverage a memory phenomenon called the spacing effect, which describes how our brains learn more effectively when we space out our learning over time.
- Python Programming
- Importing & Cleaning Data with Python
- Data Manipulation with Python
- Data Visualization with Python
- Statistics Fundamentals with Python
- Data Analysis in SQL (PostgreSQL)
- Understanding and Interpreting Data
- Machine Learning Fundamentals in Python
At points in these adaptive tests, you’ll encounter some skills that you haven’t learned yet, and that’s okay. Again, these tests are designed to adapt to your skill level. Skip those questions or give your best guess. You’ll come back to that assessment later and you’ll be able to visualize your progress.
At the end of Term 1 and Term 2, you’ll revisit all of the skill assessments you’ve completed up until that point. These scores will provide a quantitative gauge for how prepared you are for the analyst role (Term 1) and the analyst-ML expert hybrid role (Term 2).
Note that I didn’t include any R assessments because learners are unlikely to score well on those even if they master the R resources I recommend. Learners should, however, be able to score well on the SQL assessment.
Create: self-directed projects
Here’s where you’ll set yourself apart from the crowd.
A self-directed project is a project with no defined end goal, no starter code or dataset, and no templated grading. These projects, in my opinion, are the only kind of projects that employers and clients truly want to see.
You’ll use your newly acquired skills to create something unique on a subject that you’re passionate about. There are eight projects spread throughout the curriculum (four in each term) and a capstone project at the end. I recommend spending two days on each regular project, and four days on the capstone.
You’ll feature some or all of the skills you learned in the courses immediately preceding each project. Here’s one potential outcome:
- Project 1: Python programming
- Project 2: Importing and preparing data
- Project 3: Exploring and visualizing data
- Project 4: SQL
- Project 5: Object-oriented programming
- Project 6: Supervised machine learning
- Project 7: Deep learning
- Project 8: Scalable data science
- Capstone: Whatever you want!
Not a Real Degree community members get access to my guides for creating self-directed projects, plus recordings of me creating my self-directed projects.
You can tailor the curriculum to an industry you’d like to target. Interested in Bitcoin? Interested in fashion? Interested in healthcare? Find a dataset (using Google’s Dataset Search, for example) and create a project on it. You can dedicate all of your projects to that industry if you’d like!
You’ll include all of these projects in your digital transcript with your skill assessment scores to prove what you learned.
The main drawbacks of self-directed projects are:
- What happens if I get stuck?
- Grading is hard. How do I know if my work is correct?
For the first one, DataCamp’s adaptive tests will help. Right before you start a project, you’ll get quizzed on your new skills. You’ll receive a score and a diagnosis of your skill gaps. You can revisit learning materials if necessary. If you score well, these will serve as 10-minute skill refreshers that will make starting your project a little less daunting.
The community will also help mitigate these concerns.
First, I’ve set up a Circle community with dedicated spaces for each project.
I’ve also set up a Deepnote team. Deepnote (the tool) is a new kind of data science notebook with real-time collaboration. Think Google Docs, but for data science.
If you get stuck, post in the community and someone (me, a fellow learner, or a community mentor) can help you debug in Circle and/or Deepnote. In 2021, my main priority will be solving the grading problem with these tools.
I’m excited to see the projects that you create.
Career services
The final piece that weaves the curriculum together is Build a Career in Data Science by Emily Robinson and Jacqueline Nolis. Published in March 2020, it’s comprehensive and up-to-date. It even has an accompanying podcast.
The book is divided into four parts, with the parts spread equally throughout this curriculum.
- Part 1: Getting Started with Data Science
- Part 2: Finding Your Data Science Job
- Part 3: Settling Into Data Science
- Part 4: Growing in Your Data Science Role
I will also experiment with additional career services as a part of the paid community throughout 2021. Resume reviews, interview coaching, etc.
That’s the curriculum! Read on to learn about the curation process, and your next steps.
How I created the curriculum
My process for curating this curriculum started with two questions:
- What subjects does an analyst-ML expert hybrid need to know in 2021
- What are the best resources for learning those subjects?
First, I consulted my list of subjects from when I curated my personal data science curriculum in 2015. I then compared this list to the subjects covered in data science curricula from colleges, universities, bootcamps, and EdTech companies in 2020. I also asked for feedback from some data analyst and data scientist friends to ensure I was covering the latest tools used in industry.
I made some adjustments to my subject list, then got to selecting individual resources based on the explanations I provide above. I spread this process out over fifty hours over multiple weeks.
Next steps
Interested in taking the program? Here are your next steps:
- If you haven’t yet, read the announcement post for a general overview of the curriculum and the community.
- Follow the steps in this post to set up your learning experience.
- Start learning.