Devops - Abu Dhabi, Abu Dhabi, United Arab Emirates - Full Time
Job Title: Lead DevOps Engineer (Azure, Terraform) Employment Type: Full-time Location: Abu Dhabi (UAE)
Note: This role requires flexibility to relocate to Abu Dhabi (UAE) for onsite client requirements.
About the Role:
NorthBay, a leading AWS Premier Partner, is seeking a highly skilled Lead DevOps Engineer (Azure, Terraform) to join its growing cloud and AI engineering team. This role is ideal for candidates with a strong foundation in cloud DevOps practices and a passion for implementing scalable MLOps solutions.
Key Responsibilities:
Design, implement, and manage CI/CD pipelines using tools such as Jenkins, GitHub Actions, or Azure DevOps
Develop and maintain Infrastructure-as-Code using Terraform
Manage and scale container orchestration environments using Kubernetes, including experience with larger production-grade clusters
Ensure cloud infrastructure is optimized, secure, and monitored effectively
Collaborate with data science teams to support ML model deployment and operationalization
Implement MLOps best practices, including model versioning, deployment strategies (e.g., blue-green), monitoring (data drift, concept drift), and experiment tracking (e.g., MLflow)
Build and maintain automated ML pipelines to streamline model lifecycle management
Required Skills:
8 to 12 years of experience in DevOps and/or MLOps roles
Proficient in CI/CD tools: Jenkins, GitHub Actions, Azure DevOps
Strong expertise in Terraform, including managing and scaling infrastructure across large environments
Hands-on experience with Kubernetes in larger clusters, including workload distribution, autoscaling, and cluster monitoring
Strong understanding of containerization technologies (Docker) and microservices architecture
Solid grasp of cloud networking, security best practices, and observability
Scripting proficiency in Bash and Python
Preferred Skills:
Experience with MLflow, TFX, Kubeflow, or SageMaker Pipelines
Knowledge of model performance monitoring and ML system reliability
Familiarity with AWS MLOps stack or equivalent tools on Azure/GCP