11.09.2025 aktualisiert


Premiumkunde
100 % verfügbarData Engineer
Krefeld, Deutschland
Weltweit
B.Sc. Information SystemsSkills
JavaAPIsAgile MethodologieKünstliche IntelligenzData AnalysisMicrosoft AzureBig DataCloud ComputingInformationssystemeComputerprogrammierungDatenbankenInformation EngineeringETLData MiningDevopsBetrugspräventionGithubR (Programmiersprache)Apache HiveSystemanalysePythonMachine LearningNatural Language ProcessingNumpyScrumPower BiSoftwareentwicklungSQLDocker ContainerQliksenseGoogle CloudFeature-EngineeringData SciencePytorchDeep LearningPandasScikit-learnKubernetesQlikviewFull Stack Entwicklung
agile, Hive, API, Big Data, Programming, analyzing data, data analysis, Data Analytics, data extraction, Database Systems, deep learning, ETL process, ETL, feature engineering, fraud prevention, GitHub, Data Engineering, Information Systems, Java, machine learning, machine learning algorithms, natural language processing, Numpy, Pandas, PowerBI, Python, Python (Programming Language), Pytorch, Qlik Sense, Qlik, R, SQL, Scikit-Learn, scrum, Software Engineering, Software Development, System Analysis, AI Engineer, Azure, Langchain, OpenAI, Azure AI Studio, LLM
Sprachen
DeutschMutterspracheEnglischverhandlungssicherFranzösischgutSpanischGrundkenntnisse
Projekthistorie
• Engineered an enterprise AI chatbot system leveraging Azure OpenAI, PromptFlow, and Azure Cognitive Search to process internal documentation and ticketing data, enhancing support capabilities.
• Developed an automated Confluence & ServiceNow content scraping pipeline to extract, process, and index internal documentation, facilitating AI-powered knowledge retrieval.
• Implemented secure vector search functionality using Azure AI Search and custom embedding models, optimizing document retrieval processes while ensuring data security compliance.
• Built and optimized PromptFlow pipelines for query intent extraction and contextual response generation, establishing a modular flow architecture with real-time monitoring.
• Created a responsive and interactive chat interface with Streamlit, applying custom styling to improve user experience and support real-time interactions.
• Developed a PII detection system leveraging Azure Language Service and Azure Text Analytics to identify and mask personally identifiable information (PII) before persisting data into the database, ensuring compliance and preventing chatbot or indexer access to sensitive data.
• Architected a robust Azure AI Search integration system with configurable datasources (here a connection to MongoDB), indexes, and indexers, implementing high-watermark change detection for efficient document synchronization.
• Designed a comprehensive search index schema with optimized field mappings for title, content, labels, and metadata, enabling advanced search capabilities including semantic search and faceted navigation.
• Developed & led the end-to-end architecture design for the chatbot system and its components, ensuring integration, scalability, and maintainability, and thoroughly documented the architecture and functionality of each component for knowledge sharing and future development.
• Developed an automated Confluence & ServiceNow content scraping pipeline to extract, process, and index internal documentation, facilitating AI-powered knowledge retrieval.
• Implemented secure vector search functionality using Azure AI Search and custom embedding models, optimizing document retrieval processes while ensuring data security compliance.
• Built and optimized PromptFlow pipelines for query intent extraction and contextual response generation, establishing a modular flow architecture with real-time monitoring.
• Created a responsive and interactive chat interface with Streamlit, applying custom styling to improve user experience and support real-time interactions.
• Developed a PII detection system leveraging Azure Language Service and Azure Text Analytics to identify and mask personally identifiable information (PII) before persisting data into the database, ensuring compliance and preventing chatbot or indexer access to sensitive data.
• Architected a robust Azure AI Search integration system with configurable datasources (here a connection to MongoDB), indexes, and indexers, implementing high-watermark change detection for efficient document synchronization.
• Designed a comprehensive search index schema with optimized field mappings for title, content, labels, and metadata, enabling advanced search capabilities including semantic search and faceted navigation.
• Developed & led the end-to-end architecture design for the chatbot system and its components, ensuring integration, scalability, and maintainability, and thoroughly documented the architecture and functionality of each component for knowledge sharing and future development.
• Developed and enhanced customized Python packages for processing vehicle test data (gigabyte scale) using PySpark and Delta tables, configured to run as Kubernetes jobs.
• Enhanced a customized Python package for scheduling Kubernetes PySpark jobs in Azure Kubernetes Services (AKS).
• Resolved out-of-memory errors in PySpark Kubernetes jobs by implementing flexible resource allocation within the Kubernetes cluster.
• Improved and extended the Azure DevOps CI/CD pipeline for internal Python packages, including automated tests and security checks.
• Expanded and optimized the existing Data Warehouse architecture using Data Vault 2.0.
• Built and maintained Docker images with the Azure CI/CD pipeline in Azure Container Registry, and expanded Kubernetes Helm charts.
• Implemented granularized Trino access policies for an established reporting mart.
• Deployed, maintained, and upgraded both production and non-production environments in Azure using Bicep (IaaC).
• Developed and maintained Databricks pipelines, actively engaging customer support to address emerging issues related to this platform.
• Enhanced a customized Python package for scheduling Kubernetes PySpark jobs in Azure Kubernetes Services (AKS).
• Resolved out-of-memory errors in PySpark Kubernetes jobs by implementing flexible resource allocation within the Kubernetes cluster.
• Improved and extended the Azure DevOps CI/CD pipeline for internal Python packages, including automated tests and security checks.
• Expanded and optimized the existing Data Warehouse architecture using Data Vault 2.0.
• Built and maintained Docker images with the Azure CI/CD pipeline in Azure Container Registry, and expanded Kubernetes Helm charts.
• Implemented granularized Trino access policies for an established reporting mart.
• Deployed, maintained, and upgraded both production and non-production environments in Azure using Bicep (IaaC).
• Developed and maintained Databricks pipelines, actively engaging customer support to address emerging issues related to this platform.
• Developed an IT-Audit dashboard using Qlik Sense, established the ETL pipeline to extract necessary data from internal audit tools, and created the required Hive tables.
• Conceptualized and developed a dashboard for monitoring data quality across various processes, detecting outliers, and creating the needed ETL and datasets.
• Developed a Google Looker dashboard for visualizing logging data, extended the existing data ingestion point using FastAPI to write into Google BigQuery, and maintained its CI/CD pipeline in OpenShift.
• Actively communicated and engaged with stakeholders to meet their requirements.
• Created documentation for onboarding new team members.
• Conceptualized and developed a dashboard for monitoring data quality across various processes, detecting outliers, and creating the needed ETL and datasets.
• Developed a Google Looker dashboard for visualizing logging data, extended the existing data ingestion point using FastAPI to write into Google BigQuery, and maintained its CI/CD pipeline in OpenShift.
• Actively communicated and engaged with stakeholders to meet their requirements.
• Created documentation for onboarding new team members.