1 Introduction
Description
In this course we will develop the skills necessary to move through the data science life cycle, and understand the mathematical and statistical foundations behind it.
Learning Objectives
LO1: Develop literacy in the fundamentals of coding in R and Python
LO2: Select the appropriate language and package/module for numerical and visual exploration of a variety of data types, including numerical, text and geographical data
LO3: Train, test, validate and stack multiple models using the tidymodels package in R and scikit learn library in Python.
LO4: Specify a variety of models; identify and describe parameters; describe the behavior of loss functions and investigate optimization.
LO5: Comprehend the mathematical fundamentals of neural networks and implement them using PyTorch.
LO6: Articulate and explore modern challenges and opportunities in machine learning, including covariate shift and transfer learning
LO7: Implement end-to-end machine learning workflows for multiple data types using MLflow
Course Outline
Topic | Language/Package | Book Sections/Notes/Resources | Time Frame (Tentative) |
---|---|---|---|
Intro/Overview/ Defining Roles | R/Python | LTX IDL.2,3 TMR.1,2 |
WEEK 1 (Jan 16) |
Data Types: Numerical, Text, Image, Sound, Geographical. Descriptive Statistics and Shiny/Learnr Apps |
Numerical – R, Python Sound – Python Image – Python Geographical – R Shiny – R |
WEEK 2 – 4 (Jan 23&30, Feb 6) | |
Predictive Modeling | R – Caret/Tidymodels Python – Scikit Learn Python/R – PyTorch/Tensorflow/Keras |
https://www.tidymodels.org/find/parsnip/ https://agua.tidymodels.org/articles/auto_ml.html https://scikit-learn.org/stable/index.html ESL.3-5,8-10,12,15,16; CASI 8,16,17,19; TMR.6-8 NN: ESL.11; CASI.18; https://www.tensorflow.org/overview |
WEEK 4 – 6 (Feb 6, 13&20) |
Loss Functions and Regularization | ESL.7 TMR.9-15 |
WEEK 7 (Feb 27) | |
Traditional Learning Schemes: Supervised and Unsupervised | ESL.2,3: Sup; ESL.14: Unsup |
WEEK 8 – 9 (Mar 6&13) | |
Modern Learning Challenges (MLC): Covariate Shift | Python: Various | Machine learning in non-stationary environments : introduction to covariate shift adaptation | WEEK 10 – 11 (Mar 20&Apr 3) |
MLC: Transfer Learning | Python: Various | Hands on Transfer Learning with Python |
WEEK 12 – 13 (Apr 10&17) |
MLC: Self Supervised Learning | Papers: data2vec.pdf | WEEK 14 (Apr 24) | |
Project: MLFlow Implementation | https://mlflow.org/ | WEEK 15 – FINALS (May 1&8) |
Textbooks
Selections from the following will be incorporated:
Elements of Statistical Learning_Tibshirani.pdf (ESL)
ComputerAgeStatistical_Inference_Efron_Tib.pdf (CASI)
Tidy Modeling with R (TMR)
2018_Book_IntroductionToDeepLearning.pdf (IDL)
2018_Book_NeuralNetworksAndDeepLearning.pdf (NNDL)
Mastering Shiny (MS)