[Remote] Site Reliability and DevOps Engineering Lead

Remote Full-time Live

Note: The job is a remote job and is open to candidates in USA. Merative is a company that provides trusted clinical decision support solutions through its Micromedex platform. They are seeking a highly skilled Platform Reliability & DevOps Engineering Lead to ensure the platform is highly available, performant, scalable, and secure, while also driving the platform reliability and DevOps strategy.

Responsibilities

Lead, mentor, and grow Platform / DevOps engineers
Build a high-performing Platform team
Drive accountability for platform reliability and delivery outcomes
Lead vendors to deliver capabilities in production
Ensure platform capabilities accelerate product delivery, remove bottlenecks
Defines and enforces platform engineering standards and DevOps practices across all teams and vendors
Lead capacity planning, performance optimization, and cost efficiency
Define operational standards, runbooks, and reliability practices
Accountable for platform reliability outcomes at enterprise/product level
Act as technical authority across platform, reliability, and delivery
Define platform strategy and roadmap
Govern delivery across internal teams and vendors
Own SLIs, SLOs, and error budgets
Lead resilience engineering, observability, and failure design
Drive proactive risk reduction and continuous improvement
Own incident management frameworks and continuous improvement
Own end-to-end pipeline architecture and release automation
Standardize, secure, and fully automate pipelines
Drive continuous integration, delivery, and validation practices
Lead Sev1 response, escalation, and recovery
Own RCA and drive systemic fixes (not point fixes)
Embed AI into monitoring, risk prediction, and CI/CD optimization
Drive automation to reduce operational toil and improve decision-making

Skills

Bachelor's degree in computer science, Engineering, or a related field
6-10 years of hands-on experience in software operations, DevOps and Site Reliability Engineering, including managing large-scale, mission-critical systems
Clear and confident communication skills with ability to lead teams and collaborate effectively across engineering, product, and architecture teams
Proven track record ensuring high availability and performance in production environments, with expertise in fault-tolerant, distributed system design
Excellent understanding of modern software delivery pipelines and DevOps practices, including CI/CD, configuration management, and version control (Git)
Exceptional problem-solving skills, with experience diagnosing complex system issues under pressure and driving them to resolution
Strong proficiency in at least one programming or scripting language (e.g., Python, Bash, or Java) for automation and tool integration
Self-driven and proactive, with a passion for automating manual processes and continuously improving systems to enhance reliability and team productivity
Proven experience releasing into and running mission-critical, high-availability SaaS platforms
Technically leading a Platform team and influence stakeholders and vendors
Stakeholder engagement across Product, Architecture, and Operations
Deep expertise in Site Reliability Engineering (SLI/SLO, error budgets, incident management)
DevOps operating models and platform engineering (engineering transformation)
CI/CD architecture and release automation
Cloud, Systems & Infrastructure (DB2, Oracle, Infinispan, OpenLiberty)
Automation-first engineering with proven usage of AI (self-healing, triage)
Java application platforms and runtimes (performance tuning, troubleshooting, production operations)
Strong experience with Cloud platforms (Azure preferred)
Distributed systems and fault-tolerant architectures
Performance Tuning and Scaling
Database optimisation (DB2, Oracle, PostgreSQL)
Multi-region / active-active environments
Monitoring, logging, tracing frameworks
Experience embedding reliability practices into the SDLC
Hands-on with DB2, Oracle, Infinispan, OpenLiberty, Azure
Infrastructure as Code (Terraform or similar)
Containerisation and orchestration (Docker/Kubernetes)

Benefits

Remote first / work from home culture
Flexible vacation to help you rest, recharge, and connect with loved ones
Paid leave benefits
Health, dental, and vision insurance
401k retirement savings plan
Infertility benefits
Tuition reimbursement, life insurance, EAP – and more!

Company Overview

Merative is an IT services company that offers products to improve decision-making and performance. It is a sub-organization of Francisco Partners. It was founded in 2022, and is headquartered in Ann Arbor, Michigan, USA, with a workforce of 1001-5000 employees. Its website is https://www.merative.com.

Apply To This Job

Apply

[Remote] Site Reliability and DevOps Engineering Lead

On the same wavelength

[Remote] Director, Growth Marketing

[Remote] Front End Developer

[Remote] Account Manager

[Remote] Business Development Manager - Boston

[Remote] Staff Site Reliability Operations Engineer

[Remote] Analyst/Senior Analyst, Business Operations & Strategy (Remote - Eastern/Central Region)

[Remote] Sr Manager, Project Management

[Remote] Associate State Product Manager

[Remote] Medicaid Acquisition Analyst

[Remote] eLearning Designer

Experienced Customer Service Representative – Remote Travel Agent Opportunity with Comprehensive Training and Certification

Supply Chain Logistics Expert job at Computer Task Group - CTG in US National

Travel Nurse RN - Progressive Care Unit - $1,836 per week in Indianapolis, IN

Senior Risk Facilitator

VP, Credit Model Development (L12)

Experienced Full Stack Software Engineer – Web & Cloud Application Development at arenaflex

Remote Data Entry Associate – High‑Precision Medical Records & Administrative Support for Skillvoraq

Product Marketing – Asset Management

Lead Product Marketing Manager- Growth Solutions

Route Specialist 5Day