Engineering Manager, AI & Data Infrastructure

Decagon

Decagon

Software Engineering, Other Engineering, Data Science

San Francisco, CA, USA · New York, NY, USA

USD 280k-430k / year + Equity

Posted on Apr 24, 2026

Location

San Francisco; New York City

Employment Type

Full time

Location Type

On-site

Department

Engineering

Compensation

  • Base Salary $280K – $430K • Offers Equity

About Decagon

Decagon is the leading conversational AI platform empowering every brand to deliver concierge customer experiences.

Our technology enables industry-defining enterprises like Avis Budget Group, Block’s Cash App and Square, Chime, Oura Health, and Hunter Douglas to deploy AI agents that power personalized, deeply satisfying interactions across voice, chat, email, SMS, and every other channel.

We’re building a future where customer experiences are being redefined from support tickets and hold music to faster resolutions, richer conversations, and deeper relationships. We’re proud to be backed by world-class investors who share that vision, including a16z, Accel, Bain Capital Ventures, Coatue, and Index Ventures, along with many others.

We’re an in-office company, driven by a shared commitment to excellence and velocity. Our values — Just Get It Done, Invent What Customers Want, Winner’s Mindset, and The Polymath Principle — shape how we work and grow as a team.

About the Team

The Infrastructure team builds and operates the foundations that power Decagon: platform, model inference, compute, data, and developer experience. We partner closely with product, research, and applied AI teams to deliver high-scale, low-latency systems with clear SLOs and great developer ergonomics.

We organize around a couple of focus areas:

  • Platform: The foundational cloud stack — networking, compute, storage, security, and infrastructure-as-code — to ensure reliability, scale, and cost efficiency. CI/CD, paved paths, and core services that make shipping fast, safe, and consistent across teams.

  • ML & Data: Streaming/batch data platforms powering analytics/BI and customer-facing telemetry, including for customer-managed and on-prem environments. Realtime databases that enable low-latency agents. GPU and model-serving platforms for LLM inference with multi-provider routing.

Our mission is to deliver magical support experiences — AI agents working alongside humans to resolve issues quickly and accurately.

About the Role

We're looking for a hands-on Engineering Manager to lead the AI & Data Infrastructure team. This is a deeply technical player/coach role that sits at the core of how Decagon's agents think, respond, and learn. You'll lead the team responsible for the data and inference systems that every agent interaction depends on — from the streaming and batch pipelines that power analytics and customer-facing telemetry, to the realtime databases that back low-latency agent behavior, to the GPU and model-serving platforms that route LLM inference across multiple providers.

You'll stay close to the code and systems — reviewing designs, participating in incident response, and contributing directly when it helps the team move faster. You'll also lead by example on AI-assisted engineering, setting the standard for how the team uses AI coding tools to ship higher-quality work more quickly.

You'll hire and develop a high-performing team while partnering closely with Research, Product Engineering, Platform, and customer-facing teams to make shipping fast and safe — across our primary cloud as well as the single-tenant and on-prem environments we operate for regulated enterprise customers. Success requires strong people leadership, crisp execution across concurrent enterprise and research commitments, and the technical depth to make sound architectural calls under real constraints.

In this role, you will

  • Build, lead, and develop a high-performing team of data and ML infrastructure engineers, including hiring, coaching, and performance management.

  • Own the technical strategy and roadmap for Decagon's AI & Data Infrastructure — streaming/batch data, realtime databases, and the GPU and model-serving stack powering LLM inference.

  • Stay hands-on: review designs and PRs with depth, lead architecture for hard problems, and contribute code when the team needs it.

  • Drive architecture for high-throughput data systems and low-latency inference, including multi-provider LLM routing and CDC pipelines at scale.

  • Set reliability, quality, and cost standards — data freshness SLOs, inference latency and availability, GPU and analytical cost discipline — and build an operating cadence that keeps the platform healthy as we scale.

  • Invest in developer and analyst experience — paved paths for producing and consuming data, and evals and observability for inference.

  • Raise the bar on AI-assisted engineering: define how your team uses AI coding tools to ship faster with higher quality, and build the workflows and guardrails that make this durable.

  • Partner with Research, Product Engineering, Platform, and customer-facing teams to deliver data and inference capabilities on aggressive timelines, including for enterprise deployments.

Your background looks something like this

  • 2+ years of engineering management experience leading high-performing data, ML, or infrastructure teams, with a strong IC background before that.

  • Deep technical depth in streaming/batch processing, analytical databases, or model-serving — you're comfortable dropping into the codebase and shipping a PR.

  • Hands-on experience operating large-scale data systems (Kafka, ClickHouse/Snowflake/BigQuery, Postgres at scale) and/or production model-serving infrastructure on GPUs.

  • Familiarity with cloud platforms (AWS, GCP, or Azure), Kubernetes, and infrastructure-as-code.

  • A track record of delivering multi-quarter data or ML infrastructure initiatives through ambiguity.

  • A strong point of view on AI-assisted engineering — you use the tools yourself and have opinions on where they work.

  • Care deeply about engineering craft, operational excellence, and cost discipline.

Even better if you have

  • Experience operating LLM inference infrastructure in production — GPU capacity planning, multi-provider routing, and inference evals.

  • Experience with realtime analytics engines (ClickHouse, Pinot, Druid) and CDC pipelines at scale.

  • Experience delivering data and ML systems into single-tenant, on-prem, or air-gapped enterprise environments.

  • Experience building internal tooling or agents that use LLMs to accelerate engineering work.

  • Background in security and compliance frameworks (SOC 2, PCI DSS, FedRAMP, or similar).

Compensation

$280,000 - $430,000 + Offers Equity

Benefits

We proudly offer the following benefits for our full-time employees:

  • Take what you need vacation policy (subject to local requirements; UK employees receive 25 days of statutory leave)

  • Medical, Dental, and Vision benefits for you and your family

  • Life Insurance and Disability Benefits

  • Retirement Plan (e.g., 401K, pension)

  • Parental Leave

  • Fertility and family building benefits through Carrot

  • Daily lunches and snacks in the office to keep you at your best

These benefits are described in more detail in Decagon’s policies, may vary by location, and can change at any time according to applicable compensation and benefits plans.

Compensation Range: $280K - $430K