Read her story here.
Data analysis of my Dataquest learning curve
The first and most frequently asked question from beginners of a course is probably: how long does it take to finish this course? I know I had that question when I started the Data Scientist in Python path on Dataquest.
Paradoxically, that’s a question you won’t have an answer to until you’ve completed the course and no longer need an answer. However, I’ll attempt to provide some ‘clues’ based on the weekly accomplishment emails from Dataquest and the Data Scientist in Python curriculum I reviewed. These insights can be particularly valuable when exploring concepts for junior developers.
Tribute to The Dataquest Community
Before I get carried away with numbers, I want to pay tribute to the awesome Dataquest Community, the best online learning community in my experience. It has been a year of grief, but in the Dataquest community, I see people from all over the world trying their best to learn and help each other every day.
So, one of my biggest incentives to do this project is to encourage beginners on this journey by giving them a peek into the road ahead. Please keep in mind though, the time and effort to complete this course is highly relevant to personal situations. I will briefly go over mine later in this article.
Special thanks to
Otávio Simões Silveira and his amazing project How web scraping helped me go from Learning to teaching.
“Clues” to Questions Every Beginner Has
To better understand the answers, I want to first clarify what “Steps”, “Courses” and “Missions” are on Dataquest with a screenshot
Now, let’s get some answers!
1. How many days did it take for me to finish the Data Scientist in Python path on Dataquest? (timespan, including intervals I didn’t spend on studying)
175 days. From June 19th, 2020 to December 11th, 2020.
2. What’s my best learning steak and average learning streak?
My best learning streak was 20 days, and 6.6875 days on average. From my personal experience, it’s important to get into the groove and keep going. I took a week-long break in October and it took another week to get back to the same learning efficiency as before.
3. How much time was spent in total?
The total hours spent in finishing the path was 306.4 hours. This means if I studied 24/7, the path could be finished in roughly 13 days. Instead, it took me 175 days. I’m sure the robots are laughing at us humans.🙂
4. How many hours did I spend on average in the weeks I studied?
Assuming I studied 5 days out of a week on average, in the 24 weeks I studied, I would have studied for 120 days. This means I spent close to 3 hours a day studying on Dataquest on average. That sounds about right, but note it’s a rough estimation. Plus I did spend quite some time in the community and reading up extracurricular materials, those are not counted in this project.
5. What’s the average time spent to finish a mission?
111.43 minutes, in other words, close to 2 hours. It looks like it takes a dauntingly long time to finish a mission. But this also includes time spent on guided projects, which are most definitely more time-consuming than just learning missions. It’s not uncommon to spend days on a guided project. I wish I had more granular data on time spent on each mission so I can see the average time spent on projects and non-project missions, but I don’t know if that data even exists.
6. What are the speed bumps in the Data Scientist in Python curriculum on Dataquest?
Steps 2(Data Analysis and Visualization), Step 4(Working with Data Sources), Step 5(Probability and Statistics), and Step 6(Machine Learning Introduction) took more weeks than others to finish. Among them, Step 2 and 6 have the most number of missions. Step 2 also has the most number of guided projects. That makes Step 4 and 5 missions the most time-consuming of all. Between the two, Step 4 is more time-consuming than Step 5. Which reflects my memory pretty well. In Step 4, the time-consuming part was SQL, and in step 5, it was the probability courses.
Put the Data in Context
A Little Context About My Personal Learning Situations:
- I started the Data Scientist path in Python on Dataquest on June 19th, 2020, and finished it on December 11th, 2020. I didn’t spend a lot of time in the last two weeks, it’s mostly spent on finishing two last guided projects(counts as 2 missions) and extracurricular projects. That’s probably why I didn’t get any weekly accomplishment emails after the last of November.
- I was a Marketing Account Manager for 5 years and had close to no coding experience. I learned Python fundamentals from a data science course on Udemy for a couple of weeks right before I decided to switch to Dataquest.
- I finished Andrew Ng’s Machine Learning course on Coursera a few weeks before starting the path. I learned Machine Learning fundamentals and basic Octave during that course.
- I’m currently unemployed so I have a lot of spare time for learning.
A Closer Look at The Project
A) Data collection
The data I used in this project are collected from two sources:
1. The learning progress data in this project comes from the weekly accomplishment emails I got from Dataquest on Mondays if I made enough progress the previous week. It consists of:
- missions_completed: Number of missions completed.
- missions_increase_pct: Percentage increase/decrease compared to last week on the number of missions completed.
- minutes_spent: Minutes spent on learning.
- minutes_increase_pct: Percentage increase/decrease compared to last week on the minutes spent.
- learning_streak(days): Number of consecutive days spent on learning.
- best_streak: Best learning streak.
I first created a tag in my Gmail to group the weekly accomplishment emails, then went to Google Takeout to download them. You can choose the file format in the process, what I had downloaded was a .mbox file. I used the Python Standard Library module called mailbox to parse the file. You will find the code used in the GitHub link at the end of the post.
2. The curriculum data in this project comes from the Dataquest dashboard for the Data Scientist path. It consists of 8 Steps, 32 courses, and 165 missions including 22 guided projects in hierarchical order.
As mentioned at the beginning of the post, I used Selenium and ChromeDriver to scrape the curriculum. The Dataquest dashboard you saw earlier contains a grid of Steps and collapsible lists of Courses and Missions.ChromeDriver enabled auto-login and clickings to flip through the full curriculum.
While it’s not the focus of this article, I want to share the code used for scraping the page
B) Data imputation
The weekly email dataset in this project is very small, with only 16 rows containing data from 16 weeks. But my learning span was in fact 26 weeks. There were weeks where I didn’t study at all, but still, for such a small dataset, I can’t really afford to lose 10 weeks of data.
Luckily, on the profile page, Dataquest provides the learning curve throughout a path. So I came up with an imputation strategy:
- Fill in the blanks where possible, plot the learning curve of existing data
- Compare with the Dataquest generated learning curve, and integrate with my personal experience(guilty memories of taking vacations & slacking:simple_smile:) to impute the missing number of missions completed data.
- Impute minutes spent based on average minutes spent on a mission.
You will find more details of the imputation strategy in the project notebook.
While I think the imputation was pretty successful (in serving the needs in this project), I wish we could have more data on our learning journey from Dataquest.
C) Visualizations in this project
I used Plotly to plot all the visualizations in this project. I’m especially pleased with the Hours Spent vs Missions Completed plot below. It helped me make quite a few interesting observations and answered the curriculum-related questions at the beginning of this article.
- My learning curve
- Hours spent weekly and the corresponding number of missions completed, with the steps they belong to
Number of missions and guided projects in each learning Step
Last but not least, to the beginners of the Data Scientist in Python path on Dataquest: what I’ve done in this project is more data collecting, data cleaning, and imputation, which you will learn in the first 4 Steps of the curriculum. That means you will be equipped to do all of this halfway through the path!
P.s. if anyone has more questions regarding this project or the Data Scientist in Python path on Dataquest, feel free to ask me in the comment or reach me at veratsien@gmail.com. I will try my best to provide an answer. Also, click here for a referral discount ($15 off) if you want to get a Dataquest subscription.
You can click here to view the full project on GitHub.
Thanks for reading!