This week introduces students to the tools and libraries of Python that data scientists regularly use. The program also goes over basic college-level statistical concepts.
The objective is for students to become comfortable with the tools they will use for the next four weeks to write programs in Python.
Preprocessing data is crucial before conducting analysis, especially when the dataset contains thousands of rows and values. We cover some data manipulation techniques, including scaling the data to fit a normal distribution.
Normalisation is important because it maps data from multiple distributions to a single scale. After preprocessing, the students learn how to create basic classification algorithms. This serves as the introduction to machine learning.
Students dive into more machine learning topics, including linear/logistic regression and ensemble learning. Sci-kit learn is used heavily to build and train classifiers.
Students will be comfortable with the library by the end of the week. Several optimization techniques are introduced to speed up training time, including principal component analysis.
Students learn how to evaluate and measure the accuracy of classifiers. Evaluation is important when selecting a model for analyzing and classifying data.
After covering the basic process of building machine learning models, students will dive into additional topics that help further optimise their models.
Students learn some basic software engineering skills, so they can deploy the machine learning models they have constructed to the web using Flask.
Students apply the techniques from the first 5 weeks to the task of image recognition and classification, working with the MNIST database.
After learning more about visualising data, students focus on their final projects. Once that is complete, they will receive an official certificate and start their internship.