| A Guide On How To Become A Data Scientist (Step By Step Approach) | ||
| مجلة الجمعية المصرية لنظم المعلومات وتکنولوجيا الحاسبات | ||
| Article 17, Volume 25, Issue 25, August 2021, Pages 13-15 PDF (6.99 M) | ||
| Document Type: • البحوث والدراسات والمقالات المستوفاة للقواعد العلمیة المتعارف علیها، والتى یجریها أو یشارک فى إجرائها أعضاء هیئة التدریس والباحثون فى الجامعات ومراکز البحوث المصریة والعربیة، وذلک باللغتین العربیة والإنجلیزیة . | ||
| DOI: 10.21608/jstc.2021.191421 | ||
| Author | ||
| محمد الهادي | ||
| أستاذ متفرغ الحاسب الآلى ونظم المعلومات قسم الحاسب الآلى ونظم المعلومات أکاديمية السادات للعلوم الادارية | ||
| Full Text | ||
| Tags: Career Advice, Data Science Skills, Data Scientist, Python, R, SQL, Statistics Becoming a Data Scientists is an exciting path, but you cannot learn data science within one year or six months—instead, it’s a lifetime process that you have to follow with proper dedication and hard work. To guide your journey, the skills outlined here are the first you must acquire to become a data scientist. 
  of data science for kids. 
 By Aditya Agarwal, Graduate Student at Northeastern U. There are tons of resources and links out there, but often we get confused on which resources to follow. Don’t worry, I have got you covered. I have attached the links to several YouTube channels, blogs, courses, and other websites that I found appropriate for a beginner. You can also use the Data Science Community Websites Like Analytics Vidhya and Kaggle for implementing your learning and getting hands-on experience in Data Science. 
 Data Science Roadmap 
 STEP 1: Choose A Programming Language (Python / R) 
 The first step while starting the Data Science Journey is to get familiar with a programming language. Between the two, Python is the most preferred coding language and is adopted by most Data Scientists. It is easy to understand, versatile, and supports various in-built libraries such as Numpy, Pandas, MatplotLib, Seaborn, Scipy, and many more. 
 NOTE: While learning Python, one should know essential Python variables, data types, OOPs concepts, Numpy, Pandas, Matplotlib, and Seaborn. 
 STEP 2. Statistics 
 For becoming a Data Scientist, having knowledge of statistics and probability is as essential as having salt in food. Knowing them will help the data scientists interpret large data sets, get insights from them, and analyze them better. 
 NOTE: Statistics provides the ideas about Mean, Median, Mode, Range, Variance, Standard Deviation, Graphs or Plotting, Populations, and Samples. 
 STEP 3: Learn SQL 
 Structured Query Language (SQL) is used for extracting and communicating with large databases. One should focus on understanding the different types of normalization, writing nested queries, using co-related questions, group-by, performing join operations, etc., on the data and extract in raw format. This data will then further be cleaned either in Microsoft Excel or by using Python libraries. 
 NOTE: In SQL, one should know about creating tables, inserting data, updating data, deleting data, and performing some basic query operations. 
 STEP 4. Data Cleaning 
 When a Data Scientist is given a project, the majority of the time goes into cleaning the data set, removing unwanted values, handling missing values. It can be achieved by using some inbuilt python libraries like Pandas and Numpy. One should also know how to manipulate data using Microsoft Excel. 
 NOTE: In Microsoft Excel, you should know basic data filtering or sorting, Functions or Formulas, Vlookup, Pivot table and charts, and Tables, etc. 
 STEP 5: Exploratory Data Analysis 
 Exploratory data analysis is the essential part when talking about data science. The data scientist has many tasks, including finding data patterns, analyzing data, finding the appropriate trends in the data and obtaining valuable insights, etc., from them with the help of various graphical and statistical methods, including: A) Data Analysis using Pandas and Numpy B) Data Manipulation C) Data Visualization 
 
 Types of plots in the Seaborn Python library. 
 STEP 6: Learn Machine Learning Algorithms 
 According to Google, “Machine learning is a method of data analysis that automates analytical model building. It is a branch of artificial intelligence based on the idea that systems can learn from data, identify patterns and make decisions with minimal human intervention.” It is the most crucial step in a life cycle of a data scientist where one has to build various models using machine learning algorithms and should be able to predict and come with the most optimum solution to solve any problem. 
 
 Machine Learning landscape. 
 Step 7: Practice On Analytics Vidhya and Kaggle 
 After acquiring the basics of Data Science, now it’s time to get hands-on experience in its part. There are many online platforms, like Kaggle and Analytics Vidhya, that can provide you with hands-on experience with both beginner and advanced level data sets. They can help you to understand various machine learning algorithms, different analyzing techniques, etc. You can follow the below approach to know how effectively you can use these platforms. 
 | ||
| Statistics Article View: 230 PDF Download: 2,363 | ||