Hello to everyone,
In my previous article, I talked about what I did before starting this road.
In this article, I will talk about my roadmap to become a data scientist. I shared below the training topics that I will take in my Data Science and Machine Learning Bootcamp training from Data Science School.
What are the training topics?
- Introduction to Data Science and Artificial Intelligence
- Python Programming
- Data Literacy
- Data Manipulation
- Data Visualization
- Statistics for Data Science
- Data Pre-Processing
- Machine Learning
- Natural Language Processing and Text Mining
- Databases and SQL
- Big Data Analytics
- Data Science Project Management
- Production Level Data Science (git, github, linux, makefile, flask, model deployment)
I will share and reflect all my experience as much as possible an above listed subjects.
I have also shared the projects it aims to realize below:
- Association Analysis and Recommendation Systems
- Customer Segmentation with RFM Analysis
- AB Test: Conversion Rate Test
- AB Test: Achievement Test of ML Model
- House Price Prediction Model
- Diabetes Prediction with Machine Learning
- Corporate Customer Abandonment Modeling
- Credit Risk Modeling
- Price Strategy Decision Support System
- Performance Impact Measurement of In-House Training
- Homepage Content Strategy Determination
- Creating a Work Environment in AWS and Google Cloud
- Developing Large Scale Projects with PyCharm
So what are the basic knowledge required to do these?
Let’s look at the graph:
When we look at the discipline intersection graph, we see the necessary information fields to extract useful information from the data. So far, I think I have talked about what everyone knows. Increasing our level for each of these fields of knowledge plays an important role in getting valuable outputs in the field of data science, which is the common intersection point.
So, we can start from math and statistics.
How do we extract meaningful information from data? What we call meaningful here are precisely predictive valuable outputs.
We have to use probability and math for these predictions. The algorithms we will teach the system must also have math, which we call machine learning. Before we come to those parts, we have to master a programming language. Again we need knowledge of mathematics and statistics. To summarize, first of all, it becomes important to improve our numeracy skills.
Basically, by using statistical information such as mean, median, standard deviation etc. , we can interpret the data and make the necessary improvements to reach the point of required values. It will be useful to take a look at the basic math terms before these terms.
Basic concepts in math that we need to know:
- Linear algebra
- Matrix Algebra
Four big terms in statistics are population, sample, parameter, and statistic:
- A population is the entire group of individuals you want to study, and a sample is a subset of that group.
- A parameter is a quantitative characteristic of the population that you’re interested in estimating or testing (such as a population mean or proportion).
- A statistic is a quantitative characteristic of a sample that often helps estimate or test the population parameter (such as a sample mean or proportion).
What is descriptive statistics?
- Descriptive statistics are single results you get when you analyze a set of data — for example, the sample mean, median, standard deviation, correlation, regression line, margin of error, and test statistic.
- Statistical inference refers to using your data (and its descriptive statistics) to make conclusions about the population. Major types of inference include regression, confidence intervals, and hypothesis tests.
After we improve our mathematical and statistical skills, we can focus on our programming skills second. We need to choose one programming language and move forward. I chosed Python as the programming language. First, I received basic level online training. Later I learned the syntax of this programming language and continued with basic function writing exercises. After gaining basic programming knowledge without losing much time here, I continued to reinforce it in the machine learning part.
Finally, let’s talk about domain knowledge!
The most important part of developing a project is a clear understanding of the business needs. In this context, it is necessary to question the customer’s expectation from this business. It is necessary to focus on what the project will bring to the customer. Will this project really meet the need? What is the success goal of the project? Who are the project stakeholders? After analyzing the current situation, the answers to these questions will be given more clearly.
The second part can be called the study of best practices. For the same scenario and business need, the previous approach should be investigated. This will allow us to follow a very useful route for the solution.
The last part is understanding the data. The size of the data, where it is stored and how it will be used are very important in terms of the project plan.
To summarize, in this article, I talked about the 3 discipline at the outermost part of the circle.
I will wirte about my projects in my next articles.
See you in my next article!