Project title: Feature Role Identification/Interpretation in ML Models
Description:
The project develops new algorithms for determining feature roles in predictive ML models and assessing feature importance (Feature Importance and Shapley Values). For example, when it's necessary to determine gene roles in disease development. Such feature interpretation capabilities exist in well-known models like Random Forest, Gradient Boosting (LightGBM, CatBoost), and specialized Neural Networks. Of particular interest is implementing these technologies in specialized Neural Trees (hybrids of decision trees and neural networks) and Kolmogorov-Arnold Networks (KAN). Features can be relevant (necessary, informative), or they may carry no information (irrelevant), they can be important or not very important, they may affect the predicted variable independently of each other, or they may form special combinations (interact - interactive features).
Existing methods for assessing Feature Importance and Feature Interaction have many drawbacks - accuracy, reliability/stability, and performance suffer. This is because features depend on each other, and also have different natures (categorical and numerical), their number can be very large (for example, in bioinformatics). Currently, our Interpretable/eXplainable AI group is developing new modifications of Gradient Boosting, Neural Trees and Shapley Values where these shortcomings are eliminated.
An internship with us will provide experience in developing real modern and highly demanded ML algorithms, writing high-quality research-intensive code for new software libraries, their GPU parallelization, conducting high-quality computational experiments, and even writing Q1/A* level scientific papers.
Candidate requirements:
➔ Knowledge of Machine Learning basics, Probability&Statistics
➔ Willingness to read scientific papers and documentation for ML algorithm libraries
➔ Proficiency in mathematical tools including combinatorics
➔ Python
➔ Interest in Cython/C++
➔ Strong mathematical background necessary for deriving complexity and accuracy estimates of algorithms is highly welcome
Supervisor: Andrei Lange
Internship duration: 4 months, extendable based on resultsDeadline for submission: August 24, 2025
The start date of the internship: September 1, 2025
Compensation: no monetary compensation, work experience in a skilled team, real applied projects
Contact person: Andrei Lange
A.Lange@skoltech.ruDescription of the internship in Russian