Data-scientist Programme
Data Science Course Objectives
- Apply quantitative modeling and data analysis techniques to the solution of real world business problems, communicate
- findings, and effectively present results using data visualization techniques.
- Apply principles of Data Science to the analysis of business problems.
- Use data mining software to solve real-world problems.
- Employ cutting edge tools and technologies to analyze Big Data.
- Demonstrate use of team work, leadership skills, decision making and organization theory.
About the Course
We have chosen all the important tools and techniques, which are being used in the analytics Industry, and created a course to prepare a data scientist aspirant at the most economical prices. This course is highly recommended for a person who is just a beginner and wants to shift into the analytics industry. You will be learning 2 of the most important tools of Analytics along with 6 different machine learning algorithms. You will also be learning SQL and MS Excel which are the most supportive tools used along with SAS and R. We also provide a complementary course on CV building and Analytics Mock Interview sessions, conducted by associates employed at least CMMI Level 5 companies. These sessions help a candidate to become ready for a real-life interview.
Machine Learning with Python
Every day, around the United States, more than 36,000 weather forecasts are calculated. They gather all 36,000 forecasts, put them in a database, and compare them to the actual conditions encountered in that location on that day. All that collection, analysis, and reporting take a lot of heavy analytical horsepower and it is done with one programming language: Python. Over 40% of all data scientists use Python in their day-to-day work. Python has long been known as a simple programming language to pick up, which has propelled it to be the most preferred tool for a Data Scientist. In this course you will learn how to use the power of Python to analyze data, create beautiful visualizations, and use powerful machine learning algorithms to formulate business strategies.
Class 1: | Introduction to Python Programming Language |
---|---|
o Introduction and Installation of Python software Python packages: Pandas, & NumPy o Concepts of Data frame Filtering o Loc and iloc for filtering Usage of Boolean in Filtering Appending |
|
Class 2: | Data handling in Python |
o Handling of Missing values If else statement o Extra trick of using if else statement Removal of Duplicates o Frequency Distribution o Merging – Inner, Outer, Left and Right Binding and Appending o Descriptive Statistics o Inbuilt Numeric functions of R |
|
Class 3: | More data handling using Python |
o Pivot Table of Excel in Python Grouping function o Learning of SQL queries using Python Grouping numeric data Class 4: Additional functions of Python o Text functions o Data cleaning with efficient text functions Inbuilt String functions of Python Reshape functions of Python |
|
Class 5: | Statistics |
o Everything you want to know about statistics….Well sort of!! Mean, Median, Mode o Standard Deviation, Variance, Normal Distribution Hypothesis testing o T-test, Anova, Normality test |
|
Class 6: | Linear Regression |
o Predictive Analytics – Linear Regression Concepts of Linear Regression o Simple and Multiple Linear Regression Automatic Dummy Variables creation technique Model Validation parameters o Model Assumption testing o Splitting of data for Validation and testing o Business Case Study with real data to model in Python |
|
Class 7 : | Linear Regression Practice Case Study |
Participants will be asked to develop a Linear Regression model on a real life data, in presence of the instructor. Time given is 2.5 hours. Participants will be treated like an industry employee, but in terms of help certainly the instructor will not be as ruthless as the boss. After completion of the model (with the help of the instructor wherever it is required), the instructor will show how to present a model to a real life client. | |
Class 8: | Logistic Regression |
o Predictive Analytics – Logistic Regression Concepts of Logistic Regression o Difference between Linear Regression and Logistic Regression Automatic Dummy Variables creation technique o Model Validation parameters Model Assumption testing o Splitting of data for Validation and testing o Business Case Study with real data to model in Python |
|
Class 9: | Logistic Regression Practice Case Study |
Participants will be asked to develop a Logistic Regression model on a real-life data, in presence of the instructor. Time given is 2.5 hours. Participants will be treated like an industry employee, but in terms of help certainly the instructor will not be as ruthless as the boss. After completion of the model (with the help of the instructor wherever it is required), the instructor will show how to present a model to a real life client. | |
Class 10: | Time Series Forecasting |
o Time series forecasting: ARIMA o Difference between forecasting and prediction Concepts of time series data o Concepts of ARIMA o Descriptive analytics for ARIMA Development of model o Best model selection Forecasting with the best model Residual analysis o Business Case Study with real data to model in R software o Participants will be asked to develop a model in presence of the instructor. |
|
Class 11: | Cluster Analysis |
o Unsupervised Machine Learning with Python Cluster Analysis: Concepts o Cluster analysis with Python – K Means, Hierarchical etc. |
|
Class 12: | Decision Tree and Random Forest |
o Concepts of Decision Tree Decision Tree with Python Concepts of Random Forest Random Forest with Python | |
Important points: | |
1. After each class, assignments will be given as homework which are needed to be completed before the next class. The first 15 minutes of every class will be reserved to answer the participant’s queries. 2. After every session, the discussed codes, presentations, handouts will be emailed to all the participants. Participants are advised to carry it either in soft copy or as print outs in the class. 3. Participants are advised to bring their own computers so that they can practice the codes along with the instructor. 4. Normally the class duration would be 3 hours, with a break of maximum 5-10 minutes depending of the requirement of the participants. In case all the queries of the participants are not answered with in the stipulated time of 3 hours then the instructor will extend the class by 15 minutes to 30 minutes. 5. After the completion of the module, there will be an option for all the participants to work on other case studies on real life data for further practice. (This is optional and will not be considered for calculating your final grade) 6. If a participant feels that he/she requires further help on certain topic, then they can attend the same session of some other batch. |