Self-Optimizing ML Pipeline (AutoML)
Development of a framework for automated feature selection, model validation, and cross-platform deployment via ONNX.
- Python
- LightGBM
- ONNX
- C#
- System.ML
The Challenge
Manual feature selection and testing of various models for large datasets is a time-consuming process. Often, the training environment (Python) and the production environment (C#/.NET) differ significantly, leading to deployment issues. The goal was to fully automate this process to react faster to data changes without compromising the stability of the production environment.
The Solution
Development of an "end-to-end" pipeline that decouples training and inference while seamlessly connecting them. By using ONNX (Open Neural Network Exchange), a bridge was built between the Data Science world (Python) and the Backend Engineering world (.NET).
Workflow
- Training (Python): A script analyzes incoming data, selects relevant features (Feature Engineering), and trains multiple LightGBM models with different hyperparameters.
- Validation: The system automatically selects the model with the highest predictive accuracy (e.g., based on AUC or F1-Score).
- Export: The winning model is converted into the machine-independent ONNX format.
- Inference (C#): The .NET backend dynamically loads the ONNX model and executes real-time predictions (<10ms) on production data.
Key Features
- Automated Hyperparameter Tuning: Autonomous optimization of model parameters.
- Platform Independence: Models can be executed wherever ONNX Runtime is available.
- Zero-Downtime Updates: The backend can "hot swap" new models without needing a restart.
The Result
The time from data analysis to deployment of a new model was reduced from days to hours. At the same time, prediction quality increased due to consistent validation, and the dependency on manual intervention was eliminated.