fbpx

AWS for Data Scientists: Learn and Earn

Woman working remotely on laptop in bed holding coffee

ARTICLE SUMMARY

Darya Petrashka, data scientist, shares her beginner-friendly user guide on how to start using AWS if you are a data scientist.

Learning cloud technologies could be scary: a lot of services, tons of tutorials, and a fear of spending a lot of money accidentally. But the reality is that more and more companies are starting to use cloud services and this requires appropriate knowledge. This article will give a beginner-friendly user guide on how to start using AWS if you are a Data Scientist. You will learn how to use the main AWS services for Data Science, how to combine them, and where to find learning resources. You will find out what certification to put your attention on and what cool opportunities you can have during your exciting cloud journey.

HOW TO START

There are many ways to learn AWS – no need to spend enormous money on bootcamps. The first proven resource is AWS Skill Builder. There are many free learning plans and you can be sure they are up to date. The subscription is available for $29 per month.

For those who is in high school, there is AWS Academy. It helps students to prepare for industry-recognized certifications and careers in the cloud. There are plenty on AWS courses on popular platforms like udemy or Coursera.

No matter what resource will you choose, the most important is to go hands-on. If you simply read or watch videos, it will not give you any experience, only theoretical knowledge. Try to repeat course demos and exercises on your personal account, take a look at AWS Blog to find some project ideas, or explore AWS public datasets for inspiration. If you want to build something simple, safe, and predictable, check hands-on labs on cloudacademy or A Cloud Guru. You pay only for the subscription and use their AWS accounts for practicing. 

WHAT IF I FORGET TO DELETE RESOURCES AND SPEND ALL MY MONEY?

With pay-as-you-go model for most AWS services no forward payments are required. A lot of services are available in Free Tier meaning that they are free to some extent. You can set an alarm if you almost have exceeded the Free Tier or spent more than a certain amount of money. You can refer to the official AWS Pricing Calculator and estimate your project costs. No hidden payments and all prices are transparent. You can be sure that AWS is on your side: they also suggest best practices to save resources usage and money.

AWS SERVICES FOR DATA SCIENCE

At the moment AWS contains more than 200 fully-managed services. But to start with, a Data Scientist should put their attention on AWS SageMaker and S3.

AWS SageMaker is the heart of Machine Learning on AWS. It helps to build, train and deploy ML models. SageMaker has dozens of built-in algorithms and can work with a huge variety of data sources. You can learn all the details in SageMaker documentation.

To do a quick start, log in to your AWS account and go to SageMaker. Services can be searched in the search field.

Then you will see the SageMaker welcome screen. There is a ‘Get Started’ button where you can do initial setup and explore tutorials.

Let’s try to run a sample notebook. Search for the Notebook instances on the left-side panel and hit the ‘Create notebook instance’ button.

Give a name to your instance, then choose ml.t2.medium as Notebook instance type. Notebook instances are billed by hours and the more advanced instance is the higher it costs. Leave all the rest default, scroll down and hit the ‘Create notebook instance’ button.

Wait for several minutes until your notebook status be InService. Then you can click on ‘Open Jupyter’ under Actions headline.

You will be redirected to the Jupyter environment, where you can select the ‘SageMaker Examples’ tab. There are plenty of ready-to-use notebooks and you can choose any. To start the interaction, click on its ‘Use’ button.

Then you simply create a notebook copy by clicking on the ‘Create copy’ button.

And that’s it! You can interact with the selected notebook: read instructions, run cells, change code. Each notebook is an end-to-end solution so you can learn SageMaker best practices.

Important! After you have finished with notebooks exploring, don’t forget to go back to Notebook Instances, select created notebook instance and Stop it, then Delete it. Make sure that it was deleted. Otherwise, you will be charged (even if it is stopped).

You can find varios kind of SageMaker example notebooks here.

While exploring sample notebooks, you probably have noticed that AWS SageMaker often uses S3 in data manipulations. Simple Storage Service (S3) is a data storage service where you can store different file types. It is very simple to start using it: go to S3 Management Console, create your first bucket and upload files. To know more, just refer to its official User Guide.

CERTIFICATIONS

To prove your AWS knowledge we do recommend putting your attention to the various certifications available. For completely new to cloud peers, there is an AWS Certified Cloud Practitioner certification. It covers all core concepts. However, if you prefer something more advanced, there are AWS Certified Data Analytics and AWS Certified Machine Learning specialties. The main difference is that Data Analytics doesn’t contain Machine Learning questions and focuses mainly on databases, ETL, and data visualization. Machine Learning covers data engineering, AWS SageMaker, and high-level ML services.

MORE OPPORTUNITIES

It is always great to learn a certain technology when it has a community around it. This is totally true in the case of AWS. Once you get certified, you become a part of the AWS certified community. AWS experts called AWS Heros share a lot of useful and free content. 

AWS Community Builders program is open for everyone who wants to learn, build and spread their knowledge. Many community builders share experience, tips, and tutorials on dev.to. There is a special portal and amazing community on AWS re:Post to ask technical questions. 

Besides this, there are almost 500 free AWS User Groups in online/offline/hybryd format located in different countries and covering different topics. AWS holds a lot of events: local Community Days, annual AWS re:Invent, global summits, etc. During such events, you can meet like-minded people, network, and upgrade your AWS skills.

CONCLUSION

Getting cloud skills is not as overwhelming and scary as it could seem at the first sight. With the right resources and approaches, you will unlock the huge potential of AWS cloud and join its awesome community.

DARYA PETRASHKA, DATA SCIENTIST

About Darya:

AWS Community Builder, works as a Data Scientist at SLB. She is passionate about data and its usage for problem-solving. The area of interest includes classical ML and NLP, as well as working with AWS services.

An eternal student, she likes taking part in online schools, courses, and workshops. She shares insights on her Linkedin page and medium blog.

RELATED ARTICLES

Explore the journey of Mitra Goswami, Senior Director of Data Science & Machine Learning at PagerDuty, in building inclusive AI products. With a background in...
Darya Petrashka, data scientist, share her tips on data science training for women in tech.
In the ever-evolving world of data science, the role of code assistants has become increasingly crucial. These AI-powered tools assist data scientists in writing, optimizing,...
Darya Petrashka, Data Scientist, shares her top certification picks for data scientists.

This website stores cookies on your computer. These cookies are used to improve your website and provide more personalized services to you, both on this website and through other media. To find out more about the cookies we use, see our Privacy Policy.