IT

Machine Learning Engineer

Stuttgart Deutschland100% RemoteFreiberuflichStart 12/2026Dauer 12 Monate80% Auslastung
Eingestellt von
Robert Bosch GmbH
Ansprechpartner
LegendsLab Team
Projekt-ID
2944765
ForschungUnit TestingContinuous IntegrationETLForecastingSteuerungMachine LearningPerformance-TuningPower BiAusbildungsaktivitätenVersionierungWorkflowsFeature-EngineeringAzure Data FactoryApache SparkKostenoptimierungGitData LakePysparkIntegrationstestsXgboostMachine Learning OperationsTerraformDatabricks

Beschreibung

We are seeking a Machine Learning Engineer, having the skills:

- Expert-level PySpark development, including Spark optimizations (broadcast joins, caching strategy, partitioning, AQE, cluster sizing)
- Production-grade Databricks experience, including job clusters, workflows, notebooks-to-repos migration, and Delta Lake optimization
- End-to-end orchestration with Azure Data Factory, including data ingestion, mapping data flows, event triggers, and REST operators
- Deep MLflow experience, including model registry, tracking, deployment flows, experiment governance
- Experience implementing and maintaining regressors (e.g., LightGBM, XGBoost, CatBoost) including hyperparameter tuning and distributed training
- Strong MLOps knowledge: CI/CD for ML (model training pipelines, evaluation, drift detection, retraining logic)
- Experience implementing robust feature pipelines (feature engineering, feature store usage, versioning)
- Ability to work with large, messy datasets, including performance tuning and incremental ETL patterns
- Experience with Databricks Repos & Git integration (branching, versioning, approvals)
- terraform knowledge
- good skills with dashboard tools like PowerBI
- good skills in working with sqoop and oozie would be a plus
- good skills in scala and spark would be a plus

Tasks:

- Design and implement scalable data pipelines in ADF and Databricks for ingestion, preprocessing, validation, and feature engineering.
- Design new regressors and improve existing LightGBM models, including feature selection, hyperparameter tuning, and model evaluation.
- Set up end-to-end CI/CD for ML:
• Model training pipeline
• Model evaluation & approval workflow
• Automated deployment
• Promotion in MLflow Model Registry
• Build and maintain monitoring dashboards (data drift, model drift, pipeline health, inference errors)
• Collaborate with Data Scientists to translate research models into production-grade code.
• Implement best practices for testing (unit tests for ETL & ML, integration tests for pipelines).
- Ensure cost optimization of Databricks clusters and data processing workloads
- Maintaining our legacy infrastructure, rerun failed pipelines, investigate issues and deploy small changes on the schedulers
- Discuss new requirements with the stakeholders and implement them to improve the forecasting
- Setup monitoring and pipelines for our machine learning model
- Optimizing the Training of Machine Learning Model
- Integrate MLflow model stages with ADF / Databricks Jobs
- Develop robust, fault-tolerant pipelines with retry logic, alerting, and monitoring

We thank all applicants for their interest; however, only those selected for the next recruiting phase will be contacted.

Bewerben