Data Engineer

Krefeld, Deutschland

Weltweit

B.Sc. Information Systems

Krefeld, Deutschland

Weltweit

B.Sc. Information Systems

Profilanlagen

Skills

JavaAPIsAgile MethodologieKünstliche IntelligenzData AnalysisMicrosoft AzureBig DataCloud ComputingInformationssystemeComputerprogrammierungDatenbankenInformation EngineeringETLData MiningDevopsBetrugspräventionGithubR (Programmiersprache)Apache HiveSystemanalysePythonMachine LearningNatural Language ProcessingNumpyScrumPower BiSoftwareentwicklungSQLDocker ContainerQliksenseGoogle CloudFeature-EngineeringData SciencePytorchDeep LearningPandasScikit-learnKubernetesQlikviewFull Stack Entwicklung

agile, Hive, API, Big Data, Programming, analyzing data, data analysis, Data Analytics, data extraction, Database Systems, deep learning, ETL process, ETL, feature engineering, fraud prevention, GitHub, Data Engineering, Information Systems, Java, machine learning, machine learning algorithms, natural language processing, Numpy, Pandas, PowerBI, Python, Python (Programming Language), Pytorch, Qlik Sense, Qlik, R, SQL, Scikit-Learn, scrum, Software Engineering, Software Development, System Analysis, AI Engineer, Azure, Langchain, OpenAI, Azure AI Studio, LLM

Website

Sprachen

DeutschMutterspracheEnglischverhandlungssicherFranzösischgutSpanischGrundkenntnisse

Projekthistorie

Data & AI Engineer

Nagarro

Sonstiges

>10.000 Mitarbeiter

• Engineered an enterprise AI chatbot system leveraging Azure OpenAI, PromptFlow, and Azure Cognitive Search to process internal documentation and ticketing data, enhancing support capabilities.
• Developed an automated Confluence & ServiceNow content scraping pipeline to extract, process, and index internal documentation, facilitating AI-powered knowledge retrieval.
• Implemented secure vector search functionality using Azure AI Search and custom embedding models, optimizing document retrieval processes while ensuring data security compliance.
• Built and optimized PromptFlow pipelines for query intent extraction and contextual response generation, establishing a modular flow architecture with real-time monitoring.
• Created a responsive and interactive chat interface with Streamlit, applying custom styling to improve user experience and support real-time interactions.
• Developed a PII detection system leveraging Azure Language Service and Azure Text Analytics to identify and mask personally identifiable information (PII) before persisting data into the database, ensuring compliance and preventing chatbot or indexer access to sensitive data.
• Architected a robust Azure AI Search integration system with configurable datasources (here a connection to MongoDB), indexes, and indexers, implementing high-watermark change detection for efficient document synchronization.
• Designed a comprehensive search index schema with optimized field mappings for title, content, labels, and metadata, enabling advanced search capabilities including semantic search and faceted navigation.
• Developed & led the end-to-end architecture design for the chatbot system and its components, ensuring integration, scalability, and maintainability, and thoroughly documented the architecture and functionality of each component for knowledge sharing and future development.

Data Engineer

Mercedes Benz Tech Innovation GmbH

Automobil und Fahrzeugbau

1000-5000 Mitarbeiter

• Developed and enhanced customized Python packages for processing vehicle test data (gigabyte scale) using PySpark and Delta tables, configured to run as Kubernetes jobs.
• Enhanced a customized Python package for scheduling Kubernetes PySpark jobs in Azure Kubernetes Services (AKS).
• Resolved out-of-memory errors in PySpark Kubernetes jobs by implementing flexible resource allocation within the Kubernetes cluster.
• Improved and extended the Azure DevOps CI/CD pipeline for internal Python packages, including automated tests and security checks.
• Expanded and optimized the existing Data Warehouse architecture using Data Vault 2.0.
• Built and maintained Docker images with the Azure CI/CD pipeline in Azure Container Registry, and expanded Kubernetes Helm charts.
• Implemented granularized Trino access policies for an established reporting mart.
• Deployed, maintained, and upgraded both production and non-production environments in Azure using Bicep (IaaC).
• Developed and maintained Databricks pipelines, actively engaging customer support to address emerging issues related to this platform.

Data Engineer

Commerzbank AG

Banken und Finanzdienstleistungen

1000-5000 Mitarbeiter

• Developed an IT-Audit dashboard using Qlik Sense, established the ETL pipeline to extract necessary data from internal audit tools, and created the required Hive tables.
• Conceptualized and developed a dashboard for monitoring data quality across various processes, detecting outliers, and creating the needed ETL and datasets.
• Developed a Google Looker dashboard for visualizing logging data, extended the existing data ingestion point using FastAPI to write into Google BigQuery, and maintained its CI/CD pipeline in OpenShift.
• Actively communicated and engaged with stakeholders to meet their requirements.
• Created documentation for onboarding new team members.

Zertifikate

Kontaktanfrage

Einloggen & anfragen.

Das Kontaktformular ist nur für eingeloggte Nutzer verfügbar.

Registrieren Anmelden