18.05.2025 aktualisiert


Machine Learning Engineer / Data Scientist / Data Engineer / Data Science Project Manager
Fürth, Deutschland
Weltweit
Skills
Python TensorFlow PyTorch Data ScienceBig Data Deep LearningCloud Computing (AWS/GCP) Machine LearningResearch & DevelopmentArtificial IntelligenceKünstliche IntelligenzData Scientistmachine learningdata scienceNLPComputer VisionLLMGPT
Highly skilled and experienced freelance machine learning engineer/consultant with a deep business understanding specialized in state of the art deep learning, machine learning and data science with a proven track record of delivering high-quality results in a fast-paced and production-ready environment.
I have worked on projects for various clients in different industries, using my expertise to help the organisation improve efficiency, reduce costs, and increase revenue through the use of data-driven solutions.
Frameworks:
I have worked on projects for various clients in different industries, using my expertise to help the organisation improve efficiency, reduce costs, and increase revenue through the use of data-driven solutions.
Frameworks:
- Keras, PyTorch, scikit-learn, TensorFlow, XGBoost
- Conda/Anaconda, Jupyter, Matplotlib, NumPy, openCV, pandas, plotly, Poetry
- MLflow, SageMaker, Vertex AI
- Anomaly Detection, Audio Analysis and Synthesis, Clickstream Analysis, Computer Vision, Content Understanding, Data Analysis, Data Mining, Data Visualisation, Deep Learning, Dynamic Pricing, Fraud Detection, Image Processing, Image Recognition/Classification, Machine Learning, Natural Language Processing (NLP), Natural Language Understanding, Product Similarities, Recommendation Systems, Speech Recognition
- Deep Neural Networks, Convolutional Neural Networks, LSTM, (Variational-)Autoencoder, Transformers
- Hyperparamer Tuning, Transfer Learning
- Model/Feature Analysis using SHAP
- Dimensionality-Reduction (PCA, t-SNE, LDA, Autoencoder, UMAP)
- Python
- C/C++, Java, MATLAB/GNU Octave, PHP
- Clean Code, PyTest, Static Code Analysis, Unittest
- Bamboo, Bitbucket, Jenkins, Git, GitHub, GitLab
- Software Development and Software Architecture
- Linux, macOS, Windows
- Apache Spark, BigQuery, Elasticsearch, Exasol, Graylog, Kibana, MS-SQL, MySQL, Oracle DB
- Amazon Web Services (AWS), EMR, SageMaker, Apache Spark
- Google Cloud Platform (GCP), BigTable, BigQuery, Vertex AI
- Hadoop, PySpark
- FFmpeg for Video Processing
- Docker
- Kubernetes
- Confluence, Jira, Miro, Slack, Teams, Trello
Sprachen
DeutschMutterspracheEnglischverhandlungssicherUngarischverhandlungssicher
Projekthistorie
As a machine learning engineer and data scientist in the search team at OTTO, my main task is to use
state of the art machine learning techniques to improve the search experience for our customers.
The Solr search engine, which processes 1.000 queries per second and supports around 20 million
product variants 24/7, is central to OTTO's e-commerce platform. All improvements are extensively
tested and validated through online experiments.
Learning to Select: Improved query precision by filtering out irrelevant results through
comprehensive data-driven solutions on clickstream data. Also identified and removed fraudulent
and bot-generated queries to improve model performance and data integrity.
Hybrid Search: Collaborated with two teams to develop a system that integrates both lexical and
semantic search approaches to provide more relevant search results.
Advanced Spell Check: Designed, implemented, validated and brought to production a leading-edge
spell checking system. This solution not only corrects customer spelling errors but also guides them
towards the most relevant products.
Query Intent Detection: I also led the development of a customer query intent detection approach
to identify non-product and navigation queries, and to recognize brand names and their context
within search queries (Named entity recognition and classification).
Toolkit: AWS, GCP, BigQuery, Clickstream Data, FastText, Huggingface Transformers, MLflow, OpenAI
API, SageMaker, AirFlow, Docker, Jenkins, Terraform, Grafana, Prometheus, Elasticsearch, Kibana,
Confluence, Jira, Miro, Agile/Scrum, FastAPI, Poetry, Python, PyTorch, GitHub, Online
Experiments/Testing, Solr, Pair Programming
state of the art machine learning techniques to improve the search experience for our customers.
The Solr search engine, which processes 1.000 queries per second and supports around 20 million
product variants 24/7, is central to OTTO's e-commerce platform. All improvements are extensively
tested and validated through online experiments.
Learning to Select: Improved query precision by filtering out irrelevant results through
comprehensive data-driven solutions on clickstream data. Also identified and removed fraudulent
and bot-generated queries to improve model performance and data integrity.
Hybrid Search: Collaborated with two teams to develop a system that integrates both lexical and
semantic search approaches to provide more relevant search results.
Advanced Spell Check: Designed, implemented, validated and brought to production a leading-edge
spell checking system. This solution not only corrects customer spelling errors but also guides them
towards the most relevant products.
Query Intent Detection: I also led the development of a customer query intent detection approach
to identify non-product and navigation queries, and to recognize brand names and their context
within search queries (Named entity recognition and classification).
Toolkit: AWS, GCP, BigQuery, Clickstream Data, FastText, Huggingface Transformers, MLflow, OpenAI
API, SageMaker, AirFlow, Docker, Jenkins, Terraform, Grafana, Prometheus, Elasticsearch, Kibana,
Confluence, Jira, Miro, Agile/Scrum, FastAPI, Poetry, Python, PyTorch, GitHub, Online
Experiments/Testing, Solr, Pair Programming
As an external consultant, I helped startups to use GPT and other large language models (LLMs). I
provided training, evaluated use cases, assessed limitations such as security, performance, accuracy
and explored options/alternatives to the OpenAI API.
Toolkit: Haystack, Hugging Face models, LangChain, Ollama, OpenAI API, Python
provided training, evaluated use cases, assessed limitations such as security, performance, accuracy
and explored options/alternatives to the OpenAI API.
Toolkit: Haystack, Hugging Face models, LangChain, Ollama, OpenAI API, Python
As a freelance consultant and expert in machine learning applications for content understanding, I
supported the RTL Data team in building the next generation multi-purpose platform "RTL+" in
cooperation with Deezer, using visual (video), audio and text data. An integral part of my role was to
manage and balance the needs and expectations of the various stakeholders involved in the project.
The primary goal of this project is to derive and provide additional metadata from the raw content
that can be used by downstream applications such as search, recommendation, and personalization.
The key challenge is to establish a clean, reliable, scalable, and production-ready state-of-the-art
solution for a large number of building blocks and to create an efficient execution pipeline on top of
it.
Video based models: Aesthetic Ranking, Dominant Color Extraction, End Credits Detection, Face Detection, Image Quality Detection, Logo Detection, Mood Detection, Object detection and Recognition, Place Prediction, Scene and Shot-Boundary Detection, Shot Type Detection by using and optimizing both pre-trained and self-trained models.
Audio based models and solutions: Speech-to-Text transcriptions using Google’s Speech-to-Text API and Whisper from Open-AI on Podcasts and other audio sources and music identification.
NLP solutions: language detection (fastText), festivity detection, kids content detection, adult content detection, topic modeling (BERTopic), keyword extraction (KeyBERT) and text summarization.
Toolkit: Argo Workflows, Confluence, Docker, Elasticsearch, FFmpeg, GitLab CI/CD, Google BigQuery, Google Cloud Platform (GCP), Google Data Studio, Grafana, Hugging Face models, Jira, Jupyter/JupyterLab, Kafka, Kibana, Kubernetes, MLflow, NumPy, pandas, Poetry, Pub/Sub, Python, PyTorch, Scrum, spaCy, SQL, Streamlit, TensorFlow, Terraform
supported the RTL Data team in building the next generation multi-purpose platform "RTL+" in
cooperation with Deezer, using visual (video), audio and text data. An integral part of my role was to
manage and balance the needs and expectations of the various stakeholders involved in the project.
The primary goal of this project is to derive and provide additional metadata from the raw content
that can be used by downstream applications such as search, recommendation, and personalization.
The key challenge is to establish a clean, reliable, scalable, and production-ready state-of-the-art
solution for a large number of building blocks and to create an efficient execution pipeline on top of
it.
Video based models: Aesthetic Ranking, Dominant Color Extraction, End Credits Detection, Face Detection, Image Quality Detection, Logo Detection, Mood Detection, Object detection and Recognition, Place Prediction, Scene and Shot-Boundary Detection, Shot Type Detection by using and optimizing both pre-trained and self-trained models.
Audio based models and solutions: Speech-to-Text transcriptions using Google’s Speech-to-Text API and Whisper from Open-AI on Podcasts and other audio sources and music identification.
NLP solutions: language detection (fastText), festivity detection, kids content detection, adult content detection, topic modeling (BERTopic), keyword extraction (KeyBERT) and text summarization.
Toolkit: Argo Workflows, Confluence, Docker, Elasticsearch, FFmpeg, GitLab CI/CD, Google BigQuery, Google Cloud Platform (GCP), Google Data Studio, Grafana, Hugging Face models, Jira, Jupyter/JupyterLab, Kafka, Kibana, Kubernetes, MLflow, NumPy, pandas, Poetry, Pub/Sub, Python, PyTorch, Scrum, spaCy, SQL, Streamlit, TensorFlow, Terraform
Zertifikate
Neural Networks for Machine Learning by University of Toronto on Coursera
Coursera Course Certificates2016
Machine Learning: Clustering & Retrieval by University of Washington on Coursera
Coursera Course Certificates2016
Machine Learning: Classification by University of Washington on Coursera
Coursera Course Certificates2016
Machine Learning With Big Data (2015) by University of California, San Diego on Coursera
Coursera Course Certificates2016
iSAQB® Certified Professional for Software Architecture
iSAQB2015
Certified Scrum-Master
Boris Gloger2008