Want to learn data science? Here’s the internet’s best curriculum

Curated by David Venturi for the Not a Real Degree community

datascience.notarealdegree.com 🎒
The process you’ll use to build your new data skills.

Curriculum overview

Learn: courses, books, and tutorials

Term 1: Data Analysis

A video from Data Science for Everyone on DataCamp.

Over 400 students on campus have completed this version of the course, and our analysis shows that they exit the course with the same learning outcomes as students taking the traditional on-campus version. This Professional Certificate uses the same instructional material and assessments as learning Python on campus, giving you a Georgia Tech-caliber introduction into the field of computing at your own pace.

Dr. Joyner teaching in Georgia Tech’s Introduction to Python Programming series on edX.
The JupyterLab interface, where you can interact with the command line, conda, and Git, as well as do fancy data science as displayed in the notebooks.
An exercise from Introduction to Statistics in Python on DataCamp.
The ModernDive website.
An exercise from Joining Data in PostgreSQL on DataCamp.
Snowflake is a cloud-based data warehousing company.
dbt is pioneering modern analytics engineering.

Term 2: Machine Learning & More

Scikit-Learn is very easy to use, yet it implements many Machine Learning algorithms efficiently, so it makes for a great entry point to learn Machine Learning.

An exercise from Intermediate Data Visualization with Seaborn on DataCamp.

Up until now, we’ve used only TensorFlow’s high-level API, tf.keras, but it already got us pretty far: we built various neural network architectures, including regression and classification nets, Wide & Deep nets, and self-normalizing nets, using all sorts of techniques, such as Batch Normalization, dropout, and learning rate schedules. In fact, 95% of the use cases you will encounter will not require anything other than tf.keras.

keras.io

One of the common requests we get for Dask is, “Hey, do you support SQL? I love that [with Dask] I can do some custom Python manipulation, but then I want to hand it off to a SQL engine.” And my answer has always been, “No, there is no good SQL system in Python.” But now there is — if you have GPUs.

Built with the PyData ecosystem in mind, Dask and BlazingSQL work nicely together.

Frame: blog posts and YouTube videos

Don’t build tools for their own sake, build them to fulfill your users’ needs and make your users happy. Focus on integration — it’s important to make these tools play well with the rest of the ecosystem, because no one wants to stop what they’re doing to give your tool special treatment unless it’s a cure-all.

Assess: adaptive tests

From the DataCamp Signal white paper: “Assessment results include a score (0–200), a percentile (0%-100%), and an associated knowledge level (Novice, Intermediate, Advanced).”
The screen before you start DataCamp’s Python Programming assessment.
My Python skills measured over time. June 8th: A little rusty. June 9th: After refreshing my skills, I scored 149 (95th percentile). December 24th: Rusty again (plus a little tired). Just like any skill, your data skills can erode over time if you don’t keep them sharp!

Create: self-directed projects

DataCamp Signal telling me my current strengths and skill gaps for Python programming.
How we’ll collaborate in Deepnote.

Career services

How I created the curriculum

Next steps

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store