Machine Learning Engineer

Stuttgart, Deutschland100% RemoteFreiberuflichStart 12/2026Dauer 12 Monate80% Auslastung

Eingestellt von

Robert Bosch GmbH

Ansprechpartner

LegendsLab Team

Projekt-ID

2944765

ForschungUnit TestingContinuous IntegrationETLForecastingSteuerungMachine LearningPerformance-TuningPower BiAusbildungsaktivitätenVersionierungWorkflowsFeature-EngineeringAzure Data FactoryApache SparkKostenoptimierungGitData LakePysparkIntegrationstestsXgboostMachine Learning OperationsTerraformDatabricks

Beschreibung

We are seeking a Machine Learning Engineer, having the skills:

- Expert-level PySpark development, including Spark optimizations (broadcast joins, caching strategy, partitioning, AQE, cluster sizing)

- Production-grade Databricks experience, including job clusters, workflows, notebooks-to-repos migration, and Delta Lake optimization

- End-to-end orchestration with Azure Data Factory, including data ingestion, mapping data flows, event triggers, and REST operators

- Deep MLflow experience, including model registry, tracking, deployment flows, experiment governance

- Experience implementing and maintaining regressors (e.g., LightGBM, XGBoost, CatBoost) including hyperparameter tuning and distributed training

- Strong MLOps knowledge: CI/CD for ML (model training pipelines, evaluation, drift detection, retraining logic)

- Experience implementing robust feature pipelines (feature engineering, feature store usage, versioning)

- Ability to work with large, messy datasets, including performance tuning and incremental ETL patterns

- Experience with Databricks Repos & Git integration (branching, versioning, approvals)

- terraform knowledge

- good skills with dashboard tools like PowerBI

- good skills in working with sqoop and oozie would be a plus

- good skills in scala and spark would be a plus

Tasks:

- Design and implement scalable data pipelines in ADF and Databricks for ingestion, preprocessing, validation, and feature engineering.

- Design new regressors and improve existing LightGBM models, including feature selection, hyperparameter tuning, and model evaluation.

- Set up end-to-end CI/CD for ML:

• Model training pipeline

• Model evaluation & approval workflow

• Automated deployment

• Promotion in MLflow Model Registry

• Build and maintain monitoring dashboards (data drift, model drift, pipeline health, inference errors)

• Collaborate with Data Scientists to translate research models into production-grade code.

• Implement best practices for testing (unit tests for ETL & ML, integration tests for pipelines).

- Ensure cost optimization of Databricks clusters and data processing workloads

- Maintaining our legacy infrastructure, rerun failed pipelines, investigate issues and deploy small changes on the schedulers

- Discuss new requirements with the stakeholders and implement them to improve the forecasting

- Setup monitoring and pipelines for our machine learning model

- Optimizing the Training of Machine Learning Model

- Integrate MLflow model stages with ADF / Databricks Jobs

- Develop robust, fault-tolerant pipelines with retry logic, alerting, and monitoring

We thank all applicants for their interest; however, only those selected for the next recruiting phase will be contacted.

Machine Learning Engineer

Beschreibung

Bewerben