Senior Data Scientist

Cloudera

Cloudera

Data Science
Costa Rica · Remote
Posted on Mar 25, 2026

Business Area:

IT

Seniority Level:

Mid-Senior level

Job Description:

At Cloudera, we empower people to transform complex data into clear and actionable insights. With as much data under management as the hyperscalers, we're the preferred data partner for the top companies in almost every industry. Powered by the relentless innovation of the open source community, Cloudera advances digital transformation for the world’s largest enterprises.

At Cloudera, we believe that data can make what is impossible today, possible tomorrow. We empower organizations to transform complex data into clear and actionable outcomes. Join us in our mission to harness the power of data.

We are seeking a talented and curious Senior Data Scientist to join our fast-paced, data-driven organization. In this role, you will design and deliver AI-powered systems and applications that accelerate decision-making and enhance operational excellence.

You will combine strong statistical foundations, advanced programming expertise, and modern Generative AI techniques to build scalable, production-ready solutions. This is a builder-focused role. You will move beyond analysis to develop internal copilots, AI-enabled workflows, and reusable platform components that embed intelligence directly into business processes.

Our work empowers leadership and operational teams by creating measurable, AI-enabled capabilities. We seek a thoughtful and pragmatic innovator who is enthusiastic about GenAI, disciplined experimentation, and building durable internal AI infrastructure.

To succeed in this role, you will demonstrate technical depth, intellectual curiosity, and a strong builder mindset:

  • Data Science & Machine Learning Expertise: Proficiency in Python (or R) for data preparation, feature engineering, statistical modeling, and machine learning. Experience with core data science libraries (e.g., Pandas, NumPy, scikit-learn) and a solid understanding of supervised and unsupervised learning methods.

  • SQL & Data Fluency: Strong understanding of relational databases and the ability to quickly learn new schemas and data environments. Comfortable writing efficient, production-grade SQL to support modeling, experimentation, and AI-enabled applications.

  • Generative AI & LLM Engineering: Hands-on experience working with large language models (LLMs) and modern AI tooling. This includes prompt design, structured output generation, retrieval-augmented generation (RAG), evaluation strategies, and workflow automation. Ability to translate GenAI capabilities into reliable, enterprise-ready solutions that integrate with existing systems and data sources.

  • AI Application Development Experience: rapidly prototyping and iterating on internal applications, copilots, or AI-enabled workflow tools. Comfortable evolving prototypes into maintainable, production-grade solutions. Familiarity with modern development frameworks (e.g., Streamlit, Gradio, FastAPI, or similar) is beneficial.

  • Platform-Oriented Thinking: Demonstrated ability to design reusable components such as shared prompt libraries, retrieval pipelines, evaluation frameworks, and standardized integration patterns that enable scalable AI adoption.

  • Strong Mathematical and Statistical Foundation: Deep understanding of probability, statistical inference, experimentation, and quantitative reasoning to ensure model robustness and reliability.

  • Collaborative Development Experience: Experience working in collaborative environments such as Cloudera Data Science Workbench, Jupyter, Zeppelin, or similar platforms.

  • GitHub Proficiency: Experience using version control to support collaboration, code review, documentation, and long-term maintainability.

  • Exceptional Communication Skills: Ability to translate complex business challenges into technical solutions and clearly communicate findings, trade-offs, and recommendations to both technical and non-technical stakeholders.

As a Senior Data Scientist, you will:

You will apply rigorous analytical thinking and modern AI capabilities to design, build, and scale high-impact solutions.

  • Design, develop, and deploy GenAI-powered internal applications, copilots, and workflow accelerators.

  • Build reusable AI components, including retrieval pipelines, structured prompting patterns, orchestration workflows, and evaluation harnesses.

  • Develop and maintain statistical and machine learning models to support automation, optimization, forecasting, and classification use cases.

  • Design retrieval strategies that connect LLMs to trusted internal knowledge sources, ensuring grounded and reliable outputs.

  • Implement evaluation and validation frameworks to measure quality, accuracy, and consistency of AI-driven systems.

  • Partner cross-functionally to identify high-value opportunities for AI enablement across the organization.

  • Create reusable datasets, feature pipelines, and experimentation frameworks to support iterative development.

  • Document methodologies, assumptions, and implementation details to ensure transparency and reproducibility.

  • Uphold high standards for quality, reliability, and responsible AI practices.

  • Contribute to peer review processes to ensure technical rigor and maintainability.

We are excited if you have (Required Experience):

  • 5+ years of relevant experience in Data Science, Machine Learning, or AI-focused roles.

  • Demonstrated experience applying machine learning techniques in production or enterprise environments.

  • Hands-on experience building applications or workflows powered by large language models (LLMs).

  • Evidence of a builder mindset through shipped AI tools, internal platforms, or automation solutions.

  • Strong curiosity for emerging AI technologies and the ability to evaluate and adopt them responsibly.

  • Academic background in a quantitative discipline such as Statistics, Mathematics, Computer Science, Engineering, Economics, or a related field.

You may also have: (Preferred Qualifications)

  • Experience designing internal AI platforms or shared enablement frameworks.

  • Familiarity with API-driven architectures and integrating AI capabilities into enterprise systems.

  • Experience with vector databases, embedding models, or semantic retrieval systems.

  • Exposure to responsible AI practices, governance frameworks, or model lifecycle management.

This role is not eligible for immigration sponsorship.

What you can expect from us:

  • Generous PTO Policy

  • Support work life balance with Unplugged Days

  • Flexible WFH Policy

  • Mental & Physical Wellness programs

  • Phone and Internet Reimbursement program

  • Access to Continued Career Development

  • Comprehensive Benefits and Competitive Packages

  • Paid Volunteer Time

  • Employee Resource Groups

EEO/VEVRAA

#LI-MH2

#LI-REMOTE