01.07.2024 aktualisiert


100 % verfügbar
BIGDATA Specialist, BIGDATA system engineer
Unterschleißheim, Deutschland
Deutschland +8
B.Sc Information Science and EngineeringSkills
LinuxMapRBig Data & HadoopApache SparkHiveGCPKafka Streams CassandraPostgres DBJupyter LabRedHat OpenShiftOpenshifthaproxypgbouncerPetroniRedisDataprocGoogle BigQuerypodmanDockerKubernetesHelm Deployments
MapR, HPE Ezmeral Data Fabric, Hadoop, GCP, Terraform, MapR, Spark, Hive, Airflow, OPENSHIFT, Spark streaming, Flume, data lineage, SAS, BIGDATA, data access, Apache Drill, Dremio, Trino datacenters, Puppet, Cassandra, Kafka, Postgres, Redis, OEM, Icinga, IT audit, fortify, system administration, cloud, OpenStack, Linux, Linux Red Hat 6 and 7, SELINUX, backup, caching, Varnish, Nginx, Grafana, Juniper, Cisco, Vlan, User Management, LDAP, Virtualization, KVM, Sdn, Windows servers, Windows Server 2003, 2008, control panel, database, DNS, Nagios, MRTG, Windows, microcomputer, Computer Science, client-server, ASP
Sprachen
DeutschGrundkenntnisseEnglischverhandlungssicher
Projekthistorie
- Design and implementation of large scale, high available, fault tolerant Apache Airflow. The implementation consists of running Airflow with different executor type both on bare metal/virtual machines and OpenShift (Kubernetes). Containers are built from source using Buildah based on ubi9-micro image. Podman and systemd script were used to run containers and services on bare metal/virtual machine nodes. The setup also covers Disaster Recovery scenarios. Helm charts were used for OpenShift deployment.
- High available, disaster recoverable and load balanced Postgres setup using PgBouncer, HAProxy, Patroni with backup and recovery via BarMan on both bare metal and Kubernetes.
- High available Redis setup using Sentinel.
- S3 compatible storage for Airflow remote logging.
- Monitoring of complete stack with Prometheus.
- Support the migration of workflows from Control-M, Automic and Tivoli WS to Airflow.
- Design, and implementation of Cassandra clusters in three different environments. The production cluster consists of multi-datacenter (primary and secondary clusters) setup to safeguard datacenter disaster scenarios. Clusters were tuned based on workload.
- Reaper was setup to perform Cassandra repair jobs.
- Prometheus was installed, dashboards were configured to show metrics collected from Cassandra nodes and application.
- Complete setup was automated by Ansible.
- § Leading cross-functional team including OPS and Dev members.
- § Part of core team for DWH (Oracle, Hadoop) migration to GCP.
- § Design and setup of GCP projects via Terraform.
- § Major upgrade and patching of MapR clusters with full automation via Ansible.
- § Provide trainings (Application tuning) for DATA organization.
- § Data restructuring to achieve better Application (Spark, Hive) performance.
- § Apache Airflow installation, configuration, update and maintenance on OPENSHIFT via
Ansible automation. - § Spark streaming application development to replace Apache Flume.
- § Installation and configuration of DATAHUB(DataCatalog) including in-housedevelopment of data lineage generator for different data sources (Oracle, Hive, SAS). § Delegate duties and tasks within the team.
Zertifikate
Google Cloud Platform Professional Data Engineer
Google Cloud2023
Google Cloud Platform Certified Associate Engineer
Google Cloud2022
Databricks Certified Data Engineer Associate
Databricks2022