1  Introduction

Description 

In this course we will develop the skills necessary to move through the data science life cycle, and understand the mathematical and statistical foundations behind it.

 

Learning Objectives 

LO1: Develop literacy in the fundamentals of coding in R and Python 

LO2: Select the appropriate language and package/module for numerical and visual exploration of a variety of data types, including numerical, text and geographical data  

LO3: Train, test, validate and stack multiple models using the tidymodels package in R and scikit learn library in Python. 

LO4: Specify a variety of models; identify and describe parameters; describe the behavior of loss functions and investigate optimization. 

LO5: Comprehend the mathematical fundamentals of neural networks and implement them using PyTorch.

LO6: Articulate and explore modern challenges and opportunities in machine learning, including covariate shift and transfer learning 

LO7: Implement end-to-end machine learning workflows for multiple data types using MLflow 

Course Outline

Topic Language/Package Book Sections/Notes/Resources Time Frame (Tentative)
Intro/Overview/ Defining Roles R/Python

LTX

IDL.2,3

TMR.1,2

WEEK 1 (Jan 16)

Data Types: Numerical, Text, Image, Sound, Geographical.

Descriptive Statistics and Shiny/Learnr Apps

Numerical – R, Python

Sound – Python

Image – Python

Geographical – R

Shiny – R

https://www.tidyverse.org/

https://plotly.com/r/

https://shiny.rstudio.com/

WEEK 2 – 4 (Jan 23&30, Feb 6)
Predictive Modeling

R – Caret/Tidymodels

Python – Scikit Learn

Python/R – PyTorch/Tensorflow/Keras

https://www.tidymodels.org/find/parsnip/

https://agua.tidymodels.org/articles/auto_ml.html

https://scikit-learn.org/stable/index.html

ESL.3-5,8-10,12,15,16; CASI 8,16,17,19; TMR.6-8

NN: ESL.11; CASI.18;

https://www.learnpytorch.io/

https://www.tensorflow.org/overview

https://keras.io/

https://tensorflow.rstudio.com/

https://gitlab.com/ShirinG/keras_tutorial_user2020

WEEK 4 – 6 (Feb 6, 13&20)
Loss Functions and Regularization

ESL.7

TMR.9-15

WEEK 7 (Feb 27)
Traditional Learning Schemes: Supervised and Unsupervised

ESL.2,3: Sup;

ESL.14: Unsup

WEEK 8 – 9 (Mar 6&13)
Modern Learning Challenges (MLC): Covariate Shift Python: Various Machine learning in non-stationary environments : introduction to covariate shift adaptation WEEK 10 – 11 (Mar 20&Apr 3)
MLC: Transfer Learning Python: Various

Hands on Transfer Learning with Python

Hands on Transfer Learning with TensorFlow: Video

Transfer Learning for NLP

WEEK 12 – 13 (Apr 10&17)
MLC: Self Supervised Learning Papers: data2vec.pdf WEEK 14 (Apr 24)
Project: MLFlow Implementation https://mlflow.org/ WEEK 15 – FINALS (May 1&8)

 

Textbooks 

Selections from the following will be incorporated: 

Elements of Statistical Learning_Tibshirani.pdf (ESL) 

ComputerAgeStatistical_Inference_Efron_Tib.pdf (CASI) 

Tidy Modeling with R (TMR) 

2018_Book_IntroductionToDeepLearning.pdf (IDL) 

2018_Book_NeuralNetworksAndDeepLearning.pdf (NNDL) 

Mastering Shiny (MS) 

2017_Book_LaTeXIn24Hours.pdf (LTX) 

https://www.learnpytorch.io/