Deep Learning Engineer
NanoNets
Join Nanonets to push the boundaries of what's possible with deep learning. We're not just implementing models – we're setting new benchmarks in document AI, with our open-source models achieving nearly 1 million downloads on Hugging Face and recognition from global AI leaders.
Backed by $40M+ in total funding including our recent $29M Series B from Accel, alongside Elevation Capital and Y Combinator, we're scaling our deep learning capabilities to serve enterprise clients including Toyota, Boston Scientific, and Bill.com. You'll work on genuinely challenging problems at the intersection of computer vision, NLP, and generative AI.
Here's a quick 1-minute intro video.
About the role
The role can be summed up as building and deploying cutting edge generalised deep learning architectures that can solve complex business problems like converting unstructured data into structured format without hand-tuning features/models. You are expected to build state of the art models that are best in the world for solving these problems, continuously experimenting and incorporating new advancements in the field into these architectures.
What we’re looking for
- Strong Machine Learning concepts.
- Strong command in low-level operations involved in building architectures like Transformers, Efficientnet, ViT, Faster-rcnn, etc., and experience in implementing those in pytorch/jax/tensorflow.
- 1-3 years of experience with the latest semi-supervised, unsupervised and few shot architectures in Deep Learning methods in NLP/CV domain.
- Strong command in probability and statistics.
- Strong programming skills.
- Have previously shipped something of significance, either implemented some paper or made significant changes in an existing architecture etc.
Ideal candidate should have the following skillset:
- Python
- Tensorflow
- Experience building and deploying systems
- Experience with Theano/Torch/Caffe/Keras all useful
- Experience Data warehousing/storage/management would be a plus
- Experience writing production software would be a plus
- The ideal candidate should have developed their own DL architectures apart from using open source architectures.
- Ideal candidate would have extensive experience with computer vision applications.
Interesting Projects Other DL Engineers Have Completed
- Setting New Standards: Through our Automation Benchmark, we are defining how AI systems are measured on grounding, reliability, and performance.
- Proven Adoption: Our Nanonets-OCR-S model on Hugging Face has already ~225,000 downloads, validating its global impact and utility.
- Global Recognition: Our research and open-source contributions are recognized by leading voices in AI (example).
- Enterprise-Ready AI: Our models don’t just output predictions - they provide grounded answers with confidence scores to enable trustworthy decision-making.
- Agentic OCR Systems: Unlike traditional OCR, our models are agentic - capable of reasoning about inputs, adapting to task context, and chaining multiple steps to deliver structured, actionable data.
- VLM + LLM Innovation: From text to vision-language, we are solving alignment, hallucination reduction, and cross-modal understanding at scale - leveraging the latest techniques like RLHF, PEFT, and advanced fine-tuning to push what’s possible.
Key Responsibilities
- Understand specific customer requirements, develop and apply SOTA GenAI solutions to their workflows
- Develop and fine-tune OCR and Vision Language Models (text detection, recognition, entity extraction, layout understanding).
- Build and maintain data pipelines for documents, including cleaning, augmentation, and annotation.
- Implement and evaluate document parsing solutions for invoices, receipts, IDs, contracts, forms, etc.
- Work with LLMs/VLMs to enhance document understanding and enable intelligent reasoning over documents.
- Collaborate with senior engineers to deploy models in production with scalable APIs and workflows.
- Track and improve accuracy, robustness, and latency using proper evaluation metrics
Qualifications
Must-Have:
- 1–3 years of experience in Machine Learning / AI Engineering/ Deep Learning
- Strong programming skills in Python and familiarity with PyTorch or TensorFlow.
- Experience with data preprocessing, training, and evaluation for vision or NLP tasks.
- Experience working with LLMs or multimodal models (Hugging Face transformers, Nanonets-OCR-S, Qwen-VL, LLaMa).
- Knowledge of REST APIs, Docker, Git, Kubernetes and basic cloud deployment (AWS/GCP/Azure).
- Good understanding of ML fundamentals (supervised learning, evaluation metrics, error analysis)
- Basic understanding of agentic AI workflows (document reasoning, confidence scoring, grounding).
- Have previously shipped something of significance, either implemented some paper or made significant changes in an existing architecture etc.
- Strong problem-solving and analytical skills
