Description
Lead AI Engineer (Mexico City) Data Solutions Org
Hybrid
We are looking for a Lead AI Engineer to drive the development of next-generation AI and ML systems at Salesforce.
This role owns the design and evolution of intelligent decisioning systems and expands into building a broader agent flywheel (a system of self-improving feedback loops that continuously evaluate, optimize, and evolve agent performance).
This role sits on the applied side but requires strong data and systems engineering depth — you will build not just models and agents, but the data pipelines, evaluation loops, and lightweight system scaffolding that allow them to continuously improve in production.
You will build production-grade ML models, embed them into agent workflows, and define how agents learn from real-world outcomes. This is a hands-on, high-impact role focused on shipping systems that directly influence agent performance, efficiency, revenue, and customer experience.
What You’ll Do
1) Build the Agent Flywheel
Design and implement feedback loops that enable agents and ML models to self-improve over time
-
Develop systems for:
Outcome tracking (e.g., engagement, conversions, resolution quality)
Agent evaluation (LLM + deterministic + human-in-the-loop signals)
Iterative optimization (prompting, policies, model selection, fine-tuning)
Build pipelines that collect and structure agent traces (inputs, tool usage, intermediate steps, outputs) into high-quality training and evaluation datasets
Close the loop from production signals → evaluation → model/prompt improvements
2) Develop Production ML & Agent Systems
Build and deploy application-specific ML models (classification, ranking, forecasting, recommendation, etc.)
-
Design and implement AI agents that combine:
LLM reasoning
Tool/API usage
ML-based decisioning layers
Implement reusable agent patterns (multi-step reasoning, tool orchestration, structured outputs) within application workflows
Integrate ML and agent capabilities into decisioning systems that drive business outcomes
3) Data & Pipeline Engineering
Design and build scalable data pipelines (batch and near real-time) that power training, evaluation, and inference workflows
Develop pipelines that transform raw interaction data into features, labels, and evaluation datasets
Partner model pipelines with data pipelines to enable continuous retraining and evaluation loops
Ensure data quality, consistency, and availability across systems
Work with large-scale structured and unstructured data to support both ML and LLM systems
4) Evaluation, Experimentation & Optimization
Build offline and online evaluation frameworks for agent and ML model performance
Develop evaluation datasets, golden traces, and regression-style test sets for agent behavior
Design and run A/B experiments to measure impact on business outcomes
Define and monitor key metrics (quality, containment, revenue impact, latency, etc.)
Use production traces and evaluation signals to drive continuous optimization (prompting, model selection, feature improvements, fine-tuning)
5) Architecture & Applied Systems Design
-
Develop hybrid systems that blend:
Deterministic logic
Model-based scoring
LLM-driven generation
Collaborate with platform teams to leverage shared infrastructure (model serving, evaluation tooling, observability), while building application-specific layers on top
Design systems that scale with increasing agent complexity and data volume
6) Platform & API Development
Build scalable Python services and APIs powering agent workflows
Contribute to shared infrastructure for model serving, evaluation, and experimentation
Ensure reliability, observability, and performance of deployed systems
Qualifications
Core Requirements
6+ years of experience in AI/ML engineering, applied data science, or closely related roles
Strong hands-on experience in Python for production systems
Proven track record building and deploying production-grade ML models
Strong experience with data pipeline development (ETL/ELT, batch or streaming)
Experience designing and building AI agents or agent-like systems
Strong experience with API development and backend services
Experience with ML lifecycle tooling (training, evaluation, deployment, monitoring)
Data & Systems Expertise
Experience building reliable data pipelines that support ML or AI systems in production
-
Familiarity with:
Data processing frameworks (e.g., Spark or equivalent)
Data orchestration tools (e.g., Airflow, Dagster, etc.)
Data warehousing solutions (e.g., Snowflake, BigQuery, etc.)
Understanding of data quality, lineage, and reproducibility in ML systems
Agent & LLM Experience
Experience building or working with LLM-powered systems (prompting, orchestration, evaluation)
Familiarity with agent frameworks and tool-using agents
Experience working with agent traces, evaluation datasets, or iterative improvement loops is strongly preferred
Modeling & Systems Thinking
-
Strong understanding of:
Supervised learning (classification, regression, ranking)
Evaluation methodologies (offline + online)
Experimentation (A/B testing, causal inference basics)
-
Ability to design systems that combine:
ML models
LLMs
Business logic
Engineering & Production Skills
Experience deploying models/services in production environments
-
Familiarity with:
Model serving architectures
Data pipelines
Monitoring and observability
Ability to write clean, scalable, maintainable code
Preferred Qualifications
Experience building model-driven agent improvement systems (e.g., scoring, gating, auto-optimization)
Experience with reinforcement learning, bandits, or iterative optimization systems
Exposure to agent evaluation tools (e.g., LangSmith, Braintrust, or similar concepts)
Experience with large-scale experimentation platforms
Familiarity with enterprise SaaS or CRM domains
What Success Looks Like
Agents and production-grade ML models measurably improve over time via automated feedback loops
Well-structured data and evaluation pipelines continuously feeding the agent flywheel
Clear lift in key business metrics (e.g., engagement, conversion, revenue impact)
Robust evaluation systems that enable rapid iteration and safe deployment
