Grundlagen des "Machine Learning" für die Physik
Essential Machine Learning for Physicists

Modul NAT3009

Diese Modulbeschreibung enthält neben den eigentlichen Beschreibungen der Inhalte, Lernergebnisse, Lehr- und Lernmethoden und Prüfungsformen auch Verweise auf die aktuellen Lehrveranstaltungen und Termine für die Modulprüfung in den jeweiligen Abschnitten.

Basisdaten

NAT3009 ist ein Semestermodul in auf das unregelmäßig angeboten wird.

Das Modul ist Bestandteil der folgenden Kataloge in den Studienangeboten der Physik.

  • Spezifischer Spezialfachkatalog Physik der kondensierten Materie
  • Spezifischer Spezialfachkatalog Kern-, Teilchen- und Astrophysik
  • Spezifischer Spezialfachkatalog Biophysik
  • Spezifischer Spezialfachkatalog Applied and Engineering Physics

Soweit nicht beim Export in einen fachfremden Studiengang ein anderer studentischer Arbeitsaufwand ("Workload") festgelegt wurde, ist der Umfang der folgenden Tabelle zu entnehmen.

GesamtaufwandPräsenzveranstaltungenUmfang (ECTS)
150 h 60 h 5 CP

Inhaltlich verantwortlich für das Modul NAT3009 ist Zinonas Zinonos.

Inhalte, Lernergebnisse und Voraussetzungen

Inhalt

Machine Learning (ML) is the new electricity of the 21st century's technological revolution and is overwhelming most of the applications accompanying us in our lives, from spam email rejection to face recognition. ML is also the critical factor for tremendous scientific accomplishments, without which it would be unthinkable to achieve them by compatible means.  Recent scientific attainments include the Higgs boson discovery in proton collisions and the detection of gravitational waves from pulsars in binary star systems.

This course is designed to be the physicists' most complete recourse for learning how stuff works in ML and how this can be applied to physics research problems involving big scientific data. We will start off with the basics, learning how to cope with and manipulate big datasets using the NumPy and Pandas libraries with Python. Then we will dive deeper into working with data by learning about visualizations with the Pandas, Matplotlib, and Seaborn libraries.

Afterward, we will get into the heart of the course, covering general prediction models. We will begin with the basics of Regression, using classic statistical approaches with SciPy and StatModels. Then will head over to utilizing the world's most famous suite for Machine Learning called Scikit-Learn, as well as other state-of-the-art libraries such as XGBoost, LightGBM, and CatBoost. We will learn how to automatically build ML systems that can combine numerous features together and provide reliable predictions such as weather forecasting or air quality prediction.

Then we will move on to understanding ML with these libraries to conduct supervised and complex tasks of Classification, such as identifying physics signals from background events with high accuracy. We will expand this knowledge to more complex supervised learning methods for imbalanced classification problems, such as the detection of very rare phenomena, where our machine learning models will detect patterns and major characteristics from lakes of data.   During the course, we will understand fundamental ML concepts such as the dimensionality curse, feature selection for building robust models, model underfitting and overfitting, model metrics, and data leakage. You will also learn efficient techniques how to optimize learning algorithms, how to evaluate your trained models, and how to cross-validate them through a variety of methods.

Moreover, the course is packed with practical exercises that are based on physics and real-life examples. So not only will you learn the theory, but you will also get essential hands-on practice building your own models.   At the end of the course, physics students will acquire a thorough understanding of ML concepts as well as all skills needed to apply ML principles to challenging physics problems with real data.

So, what are you waiting for? Get enrolled now and become a real machine-learning professional in physics!

Lernergebnisse

In this course, you will be walked step-by-step into the world of Machine Learning with applications in Physics. With every lecture and tutorial, you will develop new skills and improve your understanding of this challenging yet emerging field of Data Science.

This course is fun and exciting, but at the same time, we dive deep into the grounds of Machine Learning. The content is structured the following way:

Part 1: Data Management & Data Visualization

  • NumPy
  • Pandas Dataframes
  • Data visualization with python libraries (Matplotlib, Seaborn)

Part 2: Data Preprocessing & Feature Engineering

  • Handle missing data
  • Encode categorical (nominal and ordinal) data
  • Handle outliers
  • Feature scaling
  • Data partitioning into train and test samples
  • Imputation of missing class values 

Part 3: Regression

  • Simple Linear Regression
  • Multiple Linear Regression
  • Polynomial Regression
  • Support Vector Regression
  • Decision Tree Regression
  • Random Forest Regression

Part 4: Regression Project

  • Train a regression model on big data and make predictions
  • Control model overfitting & underfitting
  • Regression metrics
  • Feature selection
  • Model optimization with hyperparameter grid search 
  • Cross-validation
  • Model evaluation
  • Full model training and deployment to make predictions

Part 5: Classification

  • Logistic Regression
  • Support Vector Machines
  • Kernel SVM
  • Naive Bayes
  • Decision Tree Classification
  • Random Forest Classification
  • Boosting methods

Part 6: Classification Project

  • Train a classification model on big data and make predictions
  • Control model overfitting & underfitting
  • Control data label imbalancing
  • Classification metrics
  • Feature selection
  • Model optimization with hyperparameter grid search 
  • K-fold cross-validation
  • Model evaluation
  • Full model training and deployment to make predictions

Part 7: State-of-art machine learning libraries

  • XGBoost
  • CatBoost
  • LightGBM Microsoft

Part 8: Dimensionality Reduction

  • Principle Component Analysis
  • Linear Discriminant Analysis
  • Kernel PCA
Part 9: Clustering
  • k-Means Clustering
  • Hierarchical Clustering
  • Density-Based Spatial Clustering

Part 10: Model Deployment

  • Model persistence
  • Model API

Voraussetzungen

  • Keine Vorkenntnisse nötig, die über die Zulassungsvoraussetzungen zum Masterstudium hinausgehen.

Lehrveranstaltungen, Lern- und Lehrmethoden und Literaturhinweise

Lehrveranstaltungen und Termine

ArtSWSTitelDozent(en)TermineLinks
VO 2 Essential Machine Learning for Physicists Zinonos, Z. Mi, 12:00–14:00, LMU-HS
sowie einzelne oder verschobene Termine
Unterlagen
UE 2 Exercise to Essential Machine Learning for Physicists Hessler, J.
Leitung/Koordination: Zinonos, Z.
Mi, 14:00–16:00, PH HS3
sowie einzelne oder verschobene Termine

Lern- und Lehrmethoden

The course content will be interactively presented over Jupyter Notebooks and shared in the classroom. This course balances theory and practical implementation, with complete Jupyter notebook guides of code and easy-to-reference notes. We also have plenty of exercises to improve your new skills along the way! At the end of the course, you will become proficient in the following areas:

  • Python Programming
  • Data manipulation and visualization with famous Python libraries
  • Numerical processing with Python libraries
  • Principles of Machine Learning
  • Supervised Machine Learning; Regression and Classification
  • Build efficient Machine Learning Systems and solve physics problems with data

Medienformen

The lectures, as well as the exercises, will be delivered on a web-based interactive computing platform known as Jupyter notebooks.

The material will be versioned and distributed over GitHub.

Literatur

Links to Jupyter Notebooks over GitHub repositories will be shared during the lectures and labs. References to further online reading material will be distributed throughout the lectures.


Recommended textbooks:

  1. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Trevor Hastie, Robert Tibshirani, and Jerome Friedman
  2. Applied Predictive Modeling. Max Kuhn

  3. Introduction to Machine Learning with Python: A Guide for Data Scientists. Andreas C. Müller and Sarah Guido

  4. The Hundred-Page Machine Learning Book. Andriy Burkov

  5. Python Machine Learning: Machine Learning and Deep Learning with Python, scikit-learn, and TensorFlow. Sebastian Raschka and Vahid Mirjalili

  6. Hands-On Machine Learning with Scikit-Learn and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems. Aurélien Géron

  7. Understanding Machine Learning: From Theory to Algorithms. Shai Shalev-Shwartz, Shai Ben-David

  8. Machine Learning. Tom M. Mitchell

Modulprüfung

Beschreibung der Prüfungs- und Studienleistungen

Es findet eine schriftliche Klausur von 60 Minuten Dauer statt. Darin wird exemplarisch das Erreichen der im Abschnitt Lernergebnisse dargestellten Kompetenzen mindestens in der dort angegebenen Erkenntnisstufe durch Verständnisfragen und Beispielrechnungen überprüft.

Prüfungsaufgabe könnte beispielsweise sein:

  • Erläuterung der Regressions- und Klassifizierungsmetriken
  • Wie funktionieren Klassifikationsmodelle?
  • Wie funktionieren Regressionsmodelle?
  • Wie funktionieren Ensemblemethoden?
  • Wie funktioniert das " Boosting " in Entscheidungsbäumen?
  • Wie kann ein Lernmodell optimiert werden?
  • Wie kann man das "Overfitting"- und "Underfitting" von Lernmodellen kontrollieren?
  • Welches sind die wichtigsten Methoden der Datenvorverarbeitung?

Wiederholbarkeit

Eine Wiederholungsmöglichkeit wird am Semesterende angeboten.

Nach oben