Lead AI Engineer

Airkit

Airkit

Software Engineering, Data Science

Mexico City, Mexico

Posted on May 13, 2026

Description

Lead AI Engineer (Mexico City) Data Solutions Org

Hybrid

We are looking for a Lead AI Engineer to drive the development of next-generation AI and ML systems at Salesforce.

This role owns the design and evolution of intelligent decisioning systems and expands into building a broader agent flywheel (a system of self-improving feedback loops that continuously evaluate, optimize, and evolve agent performance).

This role sits on the applied side but requires strong data and systems engineering depth — you will build not just models and agents, but the data pipelines, evaluation loops, and lightweight system scaffolding that allow them to continuously improve in production.

You will build production-grade ML models, embed them into agent workflows, and define how agents learn from real-world outcomes. This is a hands-on, high-impact role focused on shipping systems that directly influence agent performance, efficiency, revenue, and customer experience.

What You’ll Do

1) Build the Agent Flywheel

  • Design and implement feedback loops that enable agents and ML models to self-improve over time

  • Develop systems for:

    • Outcome tracking (e.g., engagement, conversions, resolution quality)

    • Agent evaluation (LLM + deterministic + human-in-the-loop signals)

    • Iterative optimization (prompting, policies, model selection, fine-tuning)

  • Build pipelines that collect and structure agent traces (inputs, tool usage, intermediate steps, outputs) into high-quality training and evaluation datasets

  • Close the loop from production signals → evaluation → model/prompt improvements

2) Develop Production ML & Agent Systems

  • Build and deploy application-specific ML models (classification, ranking, forecasting, recommendation, etc.)

  • Design and implement AI agents that combine:

    • LLM reasoning

    • Tool/API usage

    • ML-based decisioning layers

  • Implement reusable agent patterns (multi-step reasoning, tool orchestration, structured outputs) within application workflows

  • Integrate ML and agent capabilities into decisioning systems that drive business outcomes

3) Data & Pipeline Engineering

  • Design and build scalable data pipelines (batch and near real-time) that power training, evaluation, and inference workflows

  • Develop pipelines that transform raw interaction data into features, labels, and evaluation datasets

  • Partner model pipelines with data pipelines to enable continuous retraining and evaluation loops

  • Ensure data quality, consistency, and availability across systems

  • Work with large-scale structured and unstructured data to support both ML and LLM systems

4) Evaluation, Experimentation & Optimization

  • Build offline and online evaluation frameworks for agent and ML model performance

  • Develop evaluation datasets, golden traces, and regression-style test sets for agent behavior

  • Design and run A/B experiments to measure impact on business outcomes

  • Define and monitor key metrics (quality, containment, revenue impact, latency, etc.)

  • Use production traces and evaluation signals to drive continuous optimization (prompting, model selection, feature improvements, fine-tuning)

5) Architecture & Applied Systems Design

  • Develop hybrid systems that blend:

    • Deterministic logic

    • Model-based scoring

    • LLM-driven generation

  • Collaborate with platform teams to leverage shared infrastructure (model serving, evaluation tooling, observability), while building application-specific layers on top

  • Design systems that scale with increasing agent complexity and data volume

6) Platform & API Development

  • Build scalable Python services and APIs powering agent workflows

  • Contribute to shared infrastructure for model serving, evaluation, and experimentation

  • Ensure reliability, observability, and performance of deployed systems

Qualifications

Core Requirements

  • 6+ years of experience in AI/ML engineering, applied data science, or closely related roles

  • Strong hands-on experience in Python for production systems

  • Proven track record building and deploying production-grade ML models

  • Strong experience with data pipeline development (ETL/ELT, batch or streaming)

  • Experience designing and building AI agents or agent-like systems

  • Strong experience with API development and backend services

  • Experience with ML lifecycle tooling (training, evaluation, deployment, monitoring)

Data & Systems Expertise

  • Experience building reliable data pipelines that support ML or AI systems in production

  • Familiarity with:

    • Data processing frameworks (e.g., Spark or equivalent)

    • Data orchestration tools (e.g., Airflow, Dagster, etc.)

    • Data warehousing solutions (e.g., Snowflake, BigQuery, etc.)

  • Understanding of data quality, lineage, and reproducibility in ML systems

Agent & LLM Experience

  • Experience building or working with LLM-powered systems (prompting, orchestration, evaluation)

  • Familiarity with agent frameworks and tool-using agents

  • Experience working with agent traces, evaluation datasets, or iterative improvement loops is strongly preferred

Modeling & Systems Thinking

  • Strong understanding of:

    • Supervised learning (classification, regression, ranking)

    • Evaluation methodologies (offline + online)

    • Experimentation (A/B testing, causal inference basics)

  • Ability to design systems that combine:

    • ML models

    • LLMs

    • Business logic

Engineering & Production Skills

  • Experience deploying models/services in production environments

  • Familiarity with:

    • Model serving architectures

    • Data pipelines

    • Monitoring and observability

  • Ability to write clean, scalable, maintainable code

Preferred Qualifications

  • Experience building model-driven agent improvement systems (e.g., scoring, gating, auto-optimization)

  • Experience with reinforcement learning, bandits, or iterative optimization systems

  • Exposure to agent evaluation tools (e.g., LangSmith, Braintrust, or similar concepts)

  • Experience with large-scale experimentation platforms

  • Familiarity with enterprise SaaS or CRM domains

What Success Looks Like

  • Agents and production-grade ML models measurably improve over time via automated feedback loops

  • Well-structured data and evaluation pipelines continuously feeding the agent flywheel

  • Clear lift in key business metrics (e.g., engagement, conversion, revenue impact)

  • Robust evaluation systems that enable rapid iteration and safe deployment