MLOps Engineer

AbsenceSoft • Full-time • Remote (United States) • 1w ago

We’re hiring an MLOps Engineer to lead the development of scalable, reliable, and secure infrastructure for training, deployment, and monitoring of machine learning models. In this role, you will be responsible for creating and managing end-to-end pipelines and services that support both traditional ML workflows and modern LLM/agentic systems. You’ll partner with data scientists, data engineers, and AI software engineers to streamline model lifecycle management and ensure robust, observable, and governed AI operations.

We’re looking for engineers who love to automate, simplify, and scale complex ML/AI systems—and who understand the unique operational demands of intelligent applications in regulated, high-availability environments.

Who We Are

AbsenceSoft is elevating the leave and accommodations experience and is looking to hire amazing people like you! We create user-friendly, secure, and compliant technology that empowers employers to bring humanity, certainty and efficiency to the leave and accommodations experience. Made by HR Professionals for HR Professionals, we're proud of where we've been and excited about where we're headed. We value creative, innovative people who are passionate about their work and who believe there is always a better way.

Leading With Our Core Values

Make a Difference.

We are inspired to make an impact through our hard work, talent and passion. We push ourselves each day to better serve our teams, our clients, and our community.

Team First.

We are driven by team spirit not by self-interest. We value collaboration and approach our work with humility and a desire to win together.

Own it.

If we say it, we mean it. We follow through on our commitments, step up to deliver, and grow from our successes and failures.

Everyone Matters.

No matter your background or experience, everyone's voice holds value here.

What You’ll Do

Design and maintain CI/CD pipelines for model training, validation, deployment, and monitoring.
Automate infrastructure provisioning and container orchestration using tools like Terraform, Kubernetes, and Docker.
Build reusable components for training and inference workflows, including feature stores and pipeline templates.
Implement observability solutions for models in production (e.g., drift detection, performance logging, alerting).
Collaborate with data scientists to support LLM fine-tuning, RLHF training, and experiment tracking at scale.
Enable secure and scalable serving of models and AI agents via REST/gRPC APIs or streaming interfaces.
Optimize model versioning, artifact management, and auditability to comply with regulatory requirements.
Establish testing and validation frameworks for model reproducibility and rollback procedures.
Support real-time inference pipelines for agentic workflows and vector-based RAG systems.
Champion Responsible AI principles through guardrails, explainability tooling, and compliance alignment.
Participate in a highly compliant environment while assisting to maintain company controls and security within your job role.
Other duties as assigned.

What’ll Set You Up for Success

Required Skills:

5+ years of experience in DevOps, DataOps, or MLOps roles in enterprise-grade environments.
Proficient in cloud infrastructure (AWS, Azure, GCP) and container orchestration (Kubernetes, ECS, GKE, AKS).
Strong programming and scripting skills (Python, Bash, YAML) with deep familiarity with GitOps practices.
Hands-on experience with MLOps platforms such as MLflow, Kubeflow, Metaflow, or SageMaker Pipelines.
Knowledge of CI/CD tools (e.g., GitHub Actions, Jenkins, Argo, CircleCI) for ML deployment automation.
Understanding of LLM model training pipelines and deployment architectures, including RLHF and PEFT workflows.
Familiarity with vector DB integration, RAG frameworks, and secure model serving at scale.
Solid understanding of model governance, observability, lineage, and incident response protocols.
Strong communication skills and ability to collaborate cross-functionally in fast-moving teams.

Nice To Have

MLOps Platforms: MLflow, SageMaker Pipelines, Kubeflow, Metaflow, ClearML
Infrastructure: Terraform, Docker, Kubernetes, Helm, ArgoCD, GitHub Actions
Monitoring & Logging: Prometheus, Grafana, Datadog, Sentry, EvidentlyAI
Data Tools: Airflow, dbt, Snowflake, Feature Store (Feast, Tecton)
Vector Search & RAG: FAISS, Pinecone, Weaviate, LangChain, LlamaIndex

Apply