DevOps Engineer
Job Details
About the Company
With operational hubs scattered across Europe, Asia, and LATAM, and its headquarters situated in San Francisco, US, the company boasts a workforce of over 1,000 adept professionals. Spanning across more than 20 countries, ALLSTARSIT offers a diverse range of skilled employees across various verticals, including AI, cybersecurity, healthcare, fintech, telecom, media, and so on.
About the Project
Windward is the leading Maritime AI™ company, offering a powerful platform for risk management and maritime domain awareness. Our technology enables governments, financial institutions, and shipping and energy companies to make smarter, faster decisions by predicting maritime events before they happen. continue to grow, innovate, and lead in our field.
We are looking for a DevOps Engineer with a strong understanding of MLOps — or an ML Engineer with a passion for DevOps — to join our R&D team. You’ll work closely with our data science and engineering teams to support the full machine learning lifecycle, from infrastructure and automation to deployment and monitoring of production models.
Specialization
Headquarters
Years on the market
Team size and structure
Current technology stack
Required skills:
- DevOps background with experience in MLOps, or ML Engineer with strong DevOps skills
- Expertise in AWS services, especially around data and compute (S3, EKS, Lambda,SageMaker, etc.)
- Strong hands-on experience with Kubernetes in production
- Solid understanding of Terraform and infrastructure as code principles
- Proficiency in Python development for ML workflows
- Familiarity with ML lifecycle tools (MLflow, Kubeflow, etc.)
- Experience building and maintaining CI/CD pipelines
- Passion for clean code, automation, and scalable infrastructure
Nice to Have:
- Experience with observability and monitoring (Prometheus, Grafana, etc.)
- Knowledge of container security and compliance practices
- Familiarity with GPU workloads, data lakes, or distributed ML training
- Experience and understanding of GenAI technologies (Bedrock, LangGraph,LangFuse) and ability to leverage those to boost productivity
Scope of work:
- Design, build, and maintain scalable, secure, and reliable ML infrastructure on AWS
- Work with ML engineers to deploy and monitor models in production
- Automate training, testing, deployment, and monitoring workflows (CI/CD)
- Manage infrastructure as code using Terraform
- Operate and optimize Kubernetes clusters for ML workloads (e.g., GPU support,autoscaling)
- Build Python tooling for data pipelines, experimentation, and model versioning
- Monitor and optimize ML systems for performance, reliability, and cost