Member of technical staff (Data)
H Company
About H: H exists to push the boundaries of superintelligence with agentic AI. By automating complex, multi-step tasks typically performed by humans, AI agents will help unlock full human potential.
H is hiring the world’s best AI talent, seeking those who are dedicated as much to building safely and responsibly as to advancing disruptive agentic capabilities. We promote a mindset of openness, learning and collaboration, where everyone has something to contribute.
Holistic, Humanist, Humble.
About the Team: The AI Data team advances the performance of Large Language Models (LLMs) and Vision-Language Models (VLMs) through cutting-edge data-centric techniques. From synthetic data generation to model distillation and AI-driven preference alignment, we develop high-quality datasets that enhance model efficiency, reasoning, and adaptability. Our work directly impacts the training and fine-tuning of frontier AI systems, ensuring they learn from richer, more diverse, and better-structured data.
Join us in shaping the future of AI through cutting-edge data optimization. We’re looking for driven individuals who thrive in fast-changing environments, adapt to new research paradigms, and eagerly take on challenges—whether deploying models, inspecting data, or pioneering new synthetic and reinforcement learning data methods.
Key Responsibilities:
-
Develop and implement cutting-edge data strategies to improve the performance, efficiency, and applicability of LLMs, VLMs and Action Models:
Generate and augment synthetic multimodal datasets, including images, text, and action trajectories, to advance model capabilities in areas like VQA, agent behaviors, and virtual navigation
Apply model distillation techniques to optimize large-scale models for edge deployment, ensuring scalability without compromising performance
Design and iterate on evaluation frameworks to target edge cases and measure model improvements across multiple domains
Lead research into aligning data with human and AI preferences, implementing feedback loops to refine agent decision-making and learning behaviors
Collaborate effectively with cross-functional teams to integrate data-driven solutions into LLM, VLM and Agent systems
Stay at the forefront of breakthroughs in AI data strategies, model distillation, and multimodal learning through active scientific exploration
Requirements:
-
Technical skills:
Strong, polyvalent programming skills in Python covering parallel computing, system design, large-scale deployments, AWS deployments and model evaluations
Experience developing and maintaining multimodal data pipelines
Experience in training and deploying LLMs, VLMs or Pytorch models
-
Research skills:
MSc or PhD in machine learning, computer vision, natural language processing, or a related field
Deep understanding of training and evaluation paradigms for multimodal models
-
Soft skills:
Strong communication skills with technical and non-technical staff
Effectiveness in fast-changing environments
-
Bonuses:
Experience with Agent-specific data pipelines and improvement techniques is a plus
-
Experience managing efficient multi-modal human annotation platforms is a plus
Location:
H's teams are distributed throughout France, the UK, and the US
This role has the potential to be fully remote or hybrid for candidates based in cities where we have an office - currently Paris and London
-
The final decision for this will lie with the hiring manager for each individual role
What We Offer:
Join the exciting journey of shaping the future of AI, and be part of the early days of one of the hottest AI startups
Collaborate with a fun, dynamic and multicultural team, working alongside world-class AI talent in a highly collaborative environment
Enjoy a competitive salary
-
Unlock opportunities for professional growth, continuous learning, and career development
If you want to change the status quo in AI, join us.