An intern is required for the project titled "Feature Role Identification/Interpretation in ML Models"

Project title: Feature Role Identification/Interpretation in ML Models

Description:
The project develops new algorithms for determining feature roles in predictive ML models and assessing feature importance (Feature Importance and Shapley Values). For example, when it's necessary to determine gene roles in disease development. Such feature interpretation capabilities exist in well-known models like Random Forest, Gradient Boosting (LightGBM, CatBoost), and specialized Neural Networks. Of particular interest is implementing these technologies in specialized Neural Trees (hybrids of decision trees and neural networks) and Kolmogorov-Arnold Networks (KAN). Features can be relevant (necessary, informative), or they may carry no information (irrelevant), they can be important or not very important, they may affect the predicted variable independently of each other, or they may form special combinations (interact - interactive features).

Existing methods for assessing Feature Importance and Feature Interaction have many drawbacks - accuracy, reliability/stability, and performance suffer. This is because features depend on each other, and also have different natures (categorical and numerical), their number can be very large (for example, in bioinformatics). Currently, our Interpretable/eXplainable AI group is developing new modifications of Gradient Boosting, Neural Trees and Shapley Values where these shortcomings are eliminated.

An internship with us will provide experience in developing real modern and highly demanded ML algorithms, writing high-quality research-intensive code for new software libraries, their GPU parallelization, conducting high-quality computational experiments, and even writing Q1/A* level scientific papers.

Candidate requirements:
➔ Knowledge of Machine Learning basics, Probability&Statistics
➔ Willingness to read scientific papers and documentation for ML algorithm libraries
➔ Proficiency in mathematical tools including combinatorics
➔ Python
➔ Interest in Cython/C++
➔ Strong mathematical background necessary for deriving complexity and accuracy estimates of algorithms is highly welcome

Supervisor: Andrei Lange

Internship duration: 4 months, extendable based on results

Deadline for submission: August 24, 2025

The start date of the internship: September 1, 2025

Compensation: no monetary compensation, work experience in a skilled team, real applied projects

Contact person: Andrei Lange A.Lange@skoltech.ru

Description of the internship in Russian
*
*
*
*
*
*
Your CV (pdf format)
Contact us
e-mail: admissions@skoltech.ru
phone: +7 (495) 280-1481_ext.3387

address: 30с1 Bolshoi boulevard,
Skolkovo, 121205, Russian Federation
Room E-R3-2026
We are more than happy to meet visitors Monday to Friday from 9:00 to 18:00. Please arrange a visit 48 hours in advance by contacting admissions@skoltech.ru