Interim Site Reliability Engineer (m/w/d)

Düsseldorf, DeutschlandVor OrtFreiberuflichStart 12/2025Dauer 6 Monate6 Months

Eingestellt von

Michael Page

Ansprechpartner

Kai Gehrmann

Projekt-ID

2944232

APIsKünstliche IntelligenzAmazon Web ServicesArchitekturAutomatisierungMicrosoft AzureProzessoptimierungCloud ComputingCommunity ManagementContinuous DeliveryContinuous IntegrationDevopsIncident ResponseIngenieurwesenVertriebGithubITILIdentitätsmanagementIncident-ManagementNode.JsPublic-Key-InfrastrukturCloud-ServicesTransformationsmanagementAnsiblePrometheusService ManagementSoftwareentwicklungSystem DesignTopologieDaten- / DatensatzprotokollierungTestenGoogle CloudLeistungstestSystemverfügbarkeitGrafanaSpringbootKostenoptimierungInfrastructure as Code (IaC)Gitlab-CiKubernetesCoaching und MentoringPuppetTerraformDockerElk StackJenkinsGolangProgramming Languages

Beschreibung

Hello, we are currently looking for a contractor agreement (nearshore) aSite Reliability Engineer / DevOps / Operations Engineer for our client (m/f/d).

Client Details

Start Date: ASAP
Duration: 6 months
Workload: Full-time (5 days/week)
Location: Remote with quarterly onsite in Düsseldorf
Industry: Sales
Project Language: English

Hourly Salary: max 67€

Description

Project OverviewWe are seeking an experienced Site Reliability Engineer for our client. The ideal candidate has a strong foundation in software development and has transitioned into infrastructure and operations, with a passion for scaling, automation, and reliability of cloud-native systems. Furthermore previous experience as a Site Reliability Engineer is a must have.

Ideal Candidate Background

Software Engineering Foundation: Preferably the candidate started their career in software development, establishing a solid foundation in coding, system design, and software lifecycle management. This background provides a deep understanding of the development process and the importance of operational efficiency and system reliability.
Transition to Infrastructure and Operations: After gaining valuable experience in software engineering, the candidate transitioned into infrastructure and operations. This move was driven by an interest in scaling, automating, and improving the reliability of cloud-native applications and systems.

Profile

Technical Skills and Experience

Cloud-Native Applications: Proficient in deploying, managing, and scaling applications in a cloud-native environment. This includes using containerization technologies like Docker and orchestrators such as Kubernetes to manage containerized applications across various environments.
Kubernetes Experience: Extensive experience with Kubernetes, including setting up clusters, deploying applications, managing stateful and stateless workloads, implementing autoscaling, and ensuring high availability. Familiarity with Kubernetes ecosystem tools (e.g., Helm, Kustomize) and practices is essential.
Hyperscaler Expertise: Strong experience with at least one major cloud services provider, preferably AWS, but also open to experience with Azure or Google Cloud Platform. This includes managing cloud resources, implementing security best practices, and leveraging cloud-native services for operational efficiency.
Infrastructure as Code (IaC): Skilled in using IaC tools such as Terraform, Ansible, Chef, or Puppet to automate the provisioning and management of infrastructure, ensuring consistency and compliance.
Continuous Integration/Continuous Deployment (CI/CD): Experienced in setting up and managing CI/CD pipelines using tools like Jenkins, GitLab CI, or GitHub Actions to automate testing and deployment processes.
Monitoring and Logging: Proficient in implementing monitoring and logging solutions (e.g., Prometheus, Grafana, ELK stack) to ensure proactive issue identification and resolution.
Programming Languages/Frameworks: Familiarity with at least one of the following: Node.js, Golang, or Java Spring Boot, for effective automation, tooling, and incident response.

Operational Skills

On-Call Duties: Willingness to participate in an on-call rotation, defined as 18/7 and for some rare cases, 24/7, understanding the critical role of maintaining system reliability and performance.
Incident Management: Capable of quickly diagnosing and resolving issues, minimizing downtime, and learning from incidents to prevent future occurrences.
Cost Optimization: Ability to monitor, analyze, and optimize cloud resources for cost efficiency without compromising performance or security.

Soft Skills

Good English Communication Skills: Excellent verbal and written communication skills, capable of effectively collaborating with team members, stakeholders, and clients.
Teamwork and Collaboration: Ability to work well within a team, share knowledge, and contribute to a positive working environment.
Continuous Improvement: A strong desire for continuous learning and improvement, staying up-to-date with the latest technologies and best practices.
Problem-Solving: Strong analytical and problem-solving skills, with a proactive approach to identifying and addressing challenges.
High Adaptability: Exceptional adaptability is required for collaborating with multiple teams, quickly learning new technologies, and adjusting to changing project demands.

Nice to have:

Aim42 or any other architecture improvement method
Testing Automation (Integration, Unit, Functional)
FinOps
Data & AI
Hypermedia / API / REST
Team Topologies / Macro Architecture- Mentoring / Coaching
Community Management & Developer Relations
Performance Testing
Chaos Engineering
Knowledge in Public Key Infrastructure
Identity and Access Management
ISO 25010
SLIs, SLOs, Error Budgets & SLAs
Service Management (ITIL)

Job Offer

.

Interim Site Reliability Engineer (m/w/d)

Beschreibung

Bewerben