27.11.2025 aktualisiert


Premiumkunde
nicht verfügbarSenior Data Engineer
Dusseldorf, Deutschland
Dusseldorf +100 km
Diplom IngenieurSkills
JavaKünstliche IntelligenzAmazon Web ServicesMicrosoft AzureUbuntuCloud ComputingComputerprogrammierungData IntegrationETLData VaultData WarehousingIBM DB2Dialektisch-Behaviorale TherapiePythonPostgresqlSql AzureOracle FinancialsRed Hat Enterprise LinuxTensorflowSQLTalendGoogle CloudAzure Data FactoryPytorchLarge Language ModelsSnowflakePrompt EngineeringApache SparkGenerative AISpark MllibScikit-learnApache KafkaBetriebssystemeDatenmanagementDatabricks
Data Platform:
- Snowflake Data Cloud
- Databricks Data Lakehouse
- PostgreSQL
- Azure SQL Server
- DB2
- Oracle
- Azure Databricks
- Azure Data Factory
- Snowpark for Python
- Apache Spark
- Apache Kafka
- DBT
- Talend Data Management
- Informatica Data Integration
- Hugging Face
- LangChain / LangGraph
- PyTorch
- TensorFlow
- Scikit-learn
- Fine-tuning LLMs (LoRA)
- Contrastive Learning
- RAG (Retrieval-Augmented Generation)
- Prompt Engineering
- Dimension Modeling
- Data Vault Modeling
- Microsoft Azure
- Amazon Web Services
- Google Cloud
- Python
- SQL
- Java
- Scala
- Scikit-learn
- Spark MLlib
- Red Hat Linux
- Ubuntu Linux
- Windows
- macOS
Sprachen
DeutschverhandlungssicherEnglischverhandlungssicher
Projekthistorie
RESPONSIBILITIES
- Design and develop data pipeline with Talend Data Management
- Centralization and consolidation of large amounts of data from various sources
- Setup and development in the area of data warehousing
- Collection, analysis, preparation, and integration of large amounts of data from various
databases, primarily Oracle and SQL Server - Conduct a Proof-of-Concept (PoC) to design Retrieval-Augmented Generation (RAG)
agents for customer service applications, leveraging LangChain and LangGraph
frameworks - Bring capabilities to deliver fast reporting and analytics solutions
- As part of the PoC:
- Integrate Hugging Face models using transformers, datasets, and PEFT libraries
to support a modular and extensible architecture - Fine-tune the LaBSE sentence transformer model employing LoRA adapters and
contrastive learning techniques, achieving enhanced semantic relevance in
response generation
- Integrate Hugging Face models using transformers, datasets, and PEFT libraries
RESPONSIBILITIES
- Review of the existing AWS data pipeline, which consists of:
- Data Build Tools (DBT Core)
- AWS S3 Bucket
- AWS Managed Workflow for Apache Airflow (MWAA)
- AWS Lambda Function
- AWS Code Pipeline
- AWS Athena
- Implementation of data transformation using Data Build Tool (DBT core)
- Optimization and scaling of the existing AWS data pipeline
- Creation of a concept for data provisioning from the phone system and Salesforce
- Integration of the new data sources, phone system and Salesforce, into the AWS data
pipeline - pipeline orchestration via AWS Managed Workflow for Apache Airflow (MWAA)
Responsibilities:
- Design and implement a new data platform on Snowflake Data Cloud within the
Microsoft Azure Cloud platform - Integrate source data from SAP ERP/SAP BW, SQL Server, PostgreSQL, and Oracle
using Azure Databricks, Azure Data Factory, and Snowpark for Python - Develop a Data Vault 2.0 data model and implement it using Data Build Tool (DBT) on
Snowflake Data Cloud - Conduct a Proof-of-Concept (PoC) to design a Retrieval-Augmented Generation (RAG)
system leveraging Snowflake Cortex for intelligent data retrieval and summarization - Orchestrate workflows with Apache Airflow
- Build CI/CD Data pipelines on Gitlab for full automation of testing and deployment
- Provision and manage infrastructure in Azure Cloud and Snowflake Data Cloud with
Terraform - Manage metadata and data governance through the utilization of OpenMetadata
Catalog