Latest posts

How to become a Data Scientist? — A detailed step by step guide!

Originally posted on Medium by Anupran Trivedi

Image for postYou couldn’t have missed the buzz.

Whether it’s the media, articles, job postings or interviews of top leaders from companies such as Google and Facebook, everyone seems to have been talking about Data Science and Artificial Intelligence. If you’re like most, you would also be thinking — How to become a Data Scientist?

It’s time to take that question seriously. In 2012 Harvard Business Review dubbed Data Scientist as the Sexiest Job of the Century. The demand and hype around it have made it a very lucrative career option for college students & software professionals.

Is it easy to become a Data Scientist?

As enticing as it seems, data science is not an easy field to enter into as it requires some strong pre-requisites in many areas. People with good programming skills, mathematics and love for data have good chances of becoming a Data Scientist.

Image for post

In this guide, I have tried to cover almost every aspect of a data science and it will be able to help you decode the most efficient and fastest way of becoming a data scientist.

What is Data Science?

Data science is all about uncovering meaningful insights (usage, trends, consumer behaviour, retention etc)and findings by using complex algorithms & tools, machine learning processes, mathematics & statistics, programming & technology.

That was quite a mouthful. Let’s take some examples to understand how data science is being actually used. Uber & Google are using data science to make driver-less cars, Flipkart & Amazon keep cookies and use your personal data (age, location, sex, etc) to improve their overall shopping experience. Their recommendation engine also uses these properties & data science to recommend you the products you are most likely to buy.

Here’s a quick video that shows the importance of data at Uber:

Basically, businesses today are using data science to outperform the competition, reduce costs, increase retention and make smart business decisions.

But how exactly they do this? How do the awesome data scientists make this random and unstructured data meaningful?

Who are Data Scientists & What they do?

Given the wide range of stuff data scientists do, there seems to be confusion around the roles of data scientists. Are they statisticians, mathematicians or software engineers?

This statement pretty much puts things in perspective:

A data scientist is someone who is better at statistics than any software engineer and better at software engineering than any statistician.

Here are some things data scientists are normally asked to do:

  • Identify & frame data analytics based problems which can have a direct positive impact on the company or the clients.
  • Collect, cleanse, transform and process the structured and unstructured data from different sources.
  • Build statistical models & use machine learning algorithms if necessary to perform in-depth analysis of processed data.
  • Interpret the data models to identify patterns & find out the solutions & opportunities for the company’s growth & problems.
  • Communicate the discoveries to stakeholders in a comprehensible way. Storytelling is one of the most important skills a data scientist must have.
Image for post
Ref. Data Science Report — Crowdflower

Outside of these finer tasks, the overall role of a Data scientist is to advise the teams and management to take data-based decisions vs taking adhoc decisions. Watch how Mayur Datar, Principal Data Scientist — Flipkart is talking about what being a data scientist is all about:

What are the skills required to become a data scientist?

The skill set of a good data scientists consists modular expertise in many fields like data mining, data analysis, programming, mathematics & statistics, machine learning, business, data hacking, data visualization, database & (big) data. Following is the brief description of all the major skills required to become a data scientist and how to acquire them:

Mathematics (Probability, Statistics, Linear Algebra):

Let’s get this straight — mathematics is the core foundation of data science.

To take an example, let’s say you work in a drone company who does crowd surveillance & you want to find out the number of male and female attendees at an event. Now, for doing so that too from a far distance you need a strong grip on probability & statistics (concepts such as Maximum likelihood estimation). Probability will help you in finding out the chances of occurrence of a male or female person on the basis of their face and physical appearance.

Image for post

Mathematics is important for a data scientist because working on data or building data products require an ability to view data, patterns or textures through a mathematical mindset. After converting data into a structured form, If you want to analyse or visualize it then also you must have a good knowledge of statistics. Linear algebra is one of the most important functions of machine learning. It is also very important if you want to uncover some characteristics of users in a big data sets — talking about matrix here.

Following are some resources which will help you in learning & improving these skills:

Introduction to probability & data by Duke University | Introduction to Statistics | Linear Algebra by MIT OCW


For prototyping small & quick solutions or stitching complex data systems, a data scientist must know how to code. It helps you in cleaning and organizing unstructured data. The most important programming languages & technologies which you must know or learn to excel in this field are Python, R, SAS, SPSS, Perl & SQL/NoSQL.

Image for post

Trust me if you are genuinely passionate about getting into data sciences then you must have a good command over programming. It will be your best support in reaching your KRA’s on time.

Following are some resources which will help you in learning & improving these skills:

Learn Python | Learn R | Learn SAS | Learn SPSS | Learn PERL

Machine Learning (ML):

Machine learning is used to train computers to learn & develop continuously by themselves on feeding them with new data. Recommendation engines, self-driving cars, recruitment companies, etc in today’s times are heavily relying on ML to improve their user experience.

To clear the confusion you can say ML is the core subset of Artificial Intelligence. Machine learning helps companies automate their important processes in real time hence reducing the cost of operations based on human intervention. Data scientists must know ML because it helps them in making such systems which can make high-value predictions & take decisions in real time.

Following are some resources which will help you in learning & improving your ML skills:

ML by Coursera | Learn ML by Udacity

Data skills

Knowledge of Databases:

Data scientists need to access, manipulate and store data all the time. Knowledge of relational databases such as MySQL as well as NOSQL databases such as MongoDB & Cassandra is very important to do this effectively.

Following are some resources which will help you in learning & improving these skills:

Learn SQL | MongoDB University | Learn Cassandra

Big Data:

Big data is basically a huge amount of data, generating from multiple sources at high velocity and variability which can’t be handled easily by traditional database management systems such as the relational database.

Big Data is a problem and tools like Hadoop & Spark are solutions to it. Hadoop is an open-source software framework used for distributed storage and processing of datasets of big data.

Following are some resources which will help you in learning & improving these skills:

Introduction to Big Data | Introduction to Hadoop & MapReduce |Big Data Analysis with Scala & Spark

Data Munging/Wrangling & Visualization:

Data munging/wrangling is the process of transforming one “raw” data form into another form making it more convenient to understand and use.

Data Visualization & Reporting: Data visualization is the creation & study of the visual representation of the data by using statistical graphics, plots and information graphics. Data reporting is the process of arranging data into informational reports in order to gain meaningful insights for improving & monitoring different areas within a business.

Following are some resources which will help you in learning & improving these skills:

Learning Tableau | Data Wrangler Vega — A visualization Grammar | Getting & Cleaning Data

So, how to become a Data Scientist?

With all this background, let’s just get to the steps needed to become a data scientist:

  1. Learn all the skills mentioned above.
  2. Apply the skills: After learning all those skills, it’s time to get some hands dirty. You can begin with sites such as Kaggle that not only give you interesting problems but also give real data dumps to solve them.

Resources: Kaggle Competitions | TopCoder Competition | Data Science Test

3. Get a real-world project: Now, you know all the skills, you have already done few projects, passed multiple tests and you are very well aware of the whole data science scenario — What’s next? It’s time to take the litmus test.

If you are a student, it’s easier. From startups to big companies like Amazon provides data science internships throughout the year. Getting an internship in data science is not difficult if you have given sufficient amount of time in getting your basics clear and have hands-on experience.

If you are already working and want to switch to data science, don’t worry. Demand for data scientists is increasing exponentially day by day.

Image for post

4. Connect with people hiring for data scientists: Don’t waste your time on regular job portals like Indeed, Monster or Naukri. These portals are very noisy and the output is really low. Try to use intelligent platforms (we built CutShort- “an AI-based platform to find just the right jobs in product based companies and startups”) to find which can really cut short your path to becoming a data scientist.

Here are real teams hiring for data scientists currently:

Image for post

Data Science Jobs | Machine Learning Jobs Deep Learning Jobs | Artificial Intelligence Jobs | Big Data Jobs


Data science as a career is a great option that is interesting as well as rewarding. The demand for data scientists is just going to explode in next decade.

However, it’s a challenging role too. You may be able to get into it in short term, but for longer-term success, you need to really build a strong foundation in this domain. All the best!

If you liked this article, please share with your friends. They will thank you for it!

Other important notes

  1. The most important factor which will differentiate you from the crowd is the projects and internships you have done in the field of data science. Try to solve more complex and real problems and take projects which requires a deep understanding of the subject.
  2. Make your own website and upload your projects/portfolio, internship details, achievements & so on. Representing your thoughts, storytelling & communication skills are one of the most imp skills a data scientist must have.
  3. Grow your network — offline as well as online. Try to be in touch with industry experts and take their help in getting things done. LinkedIn is one of the best ways if you want to grow your network regardless of boundaries.