Description
Einstein products & platform democratizes AI and transforms the way our Salesforce Ohana builds trusted machine learning and AI products - in days instead of months. It augments the Salesforce Platform with the ability to easily create, deploy, and manage Generative AI and Predictive AI applications across all clouds. We achieve this vision by providing unified, configuration-driven, and fully orchestrated machine learning APIs, customer-facing declarative interfaces and various microservices for the entire machine learning lifecycle including Data, Training, Predictions/scoring, Orchestration, Model Management, Model Storage, Experimentation etc.
We are already producing over a billion predictions per day, Training 1000s of models per day along with 10s of different Large Language models, serving thousands of customers. We are enabling customers' usage of leading large language models (LLMs), both internally and externally developed, so they can leverage it in their Salesforce use cases. Along with the power of Data Cloud, this platform provides customers an unparalleled advantage for quickly integrating AI in their applications and processes.
We are looking for Engineering leaders to help us take us to the next level, and build a platform that scales to hundreds of thousands of customers, and hundreds of billions of predictions per day and works on bleeding edge technologies on model training, model inferencing and Generative AI.
The ideal candidate will be:
Technical - We don't expect you to be the most technical person on your team, but there is a pretty high minimum bar that you must pass to be useful to the team, and help influence the team to make the right technical decisions.
A Leader - You are a natural leader, who can mentor and coach engineers on the team to be able to handle bigger challenges, find fulfillment in their work, and execute on the product growth goals through collaboration to do the best work of their lives.
Experienced - We will need you to bring that experience. We want the best people who spend large portions of their time thinking about how to design large scale distributed Machine Learning services.
Responsibilities:
Working with Sagemaker, Tensorflow, Pytorch, Triton, Spark, or equivalent large-scale distributed Machine Learning technologies on a modern containerized deployment stack using Kubernetes, Spinnaker, and other technologies
Experience building Big Data services on AWS, GCP or other public cloud substrates
Eat, sleep, and breathe services. You have experience balancing live-site management, feature delivery, and retirement of technical debt
Partner with Product Managers, Architects and Data Scientists to understand customer requirements, and help translate requirements to working software
Own the technology for fully orchestrated machine learning APIs for Einstein Platform
Contribute to the long-range plan, and help drive the microservices architectures for machine learning
Designing, developing, debugging, and operating resilient distributed systems that run across thousands of compute nodes in multiple datacenters
Participate in the team’s on- call rotation to address complex problems in real-time and keep services operational and highly available
Create and enforce processes that ensure quality of work, and drive engineering excellence
Exhibit a customer-first mentality while making decisions, and be responsible and accountable for the output of the team
Partner with vendors like AWS and Data Science teams to pick best fit in terms of libraries and compute to deliver cost effective and scalable model hosting and tuning/training capabilities
Core Qualifications:
BS, MS, or PhD in computer science or a related field, or equivalent work experience
5+ years of hands-on experience with big data, machine learning, and microservices architectures
Track record of leading highly impactful projects from conception to finish
Expertise in JVM based languages (Java, Scala) and Python
Experience leading/working in teams that have built and and run machine learning services, such as for training & inferences, at scale for predictive and generative models
Experience with open source projects such as Spark, Kafka, Feast, Iceberg
Experience in building software on AWS cloud computing such as OpenSearch, DynamoDB, EMR and S3
Preferred Qualifications:
Experience working in machine learning, and technologies such as Amazon SageMaker and Google Cloud ML
Experience building or leading teams that have built and and run real-time data applications in production
For roles in San Francisco and Los Angeles: Pursuant to the San Francisco Fair Chance Ordinance and the Los Angeles Fair Chance Initiative for Hiring, Salesforce will consider for employment qualified applicants with arrest and conviction records.
