Bestkaam Logo
Zorba AI Logo

Gen AI/LLM Engineer_5+Yrs

Actively Reviewing the Applications

Zorba AI

India, Telangana, Hyderabad Full-Time On-site INR 11–25 LPA
Posted 4 hours ago Apply by June 14, 2026

Job Description

A leading consulting firm operating in the Enterprise Generative AI and Large Language Model (LLM) services sector, delivering production-grade LLM solutions, retrieval-augmented systems, and custom generative AI products for enterprise clients across domains. The team focuses on building secure, scalable, low-latency inference services and automating model lifecycle workflows for on-prem and cloud deployments.

Position: LLM Engineer — On-site (India). We are hiring an experienced LLM engineer to design, fine-tune, and deploy LLM-based solutions that power search, summarization, agents, and domain-specific assistants.

Role & Responsibilities

  • Design, fine-tune, and validate LLMs for production use-cases—instruction tuning, supervised fine-tuning, and parameter-efficient tuning (LoRA/adapters).
  • Implement retrieval-augmented generation (RAG) pipelines: embeddings, vector search, chunking, and context assembly for high-recall responses.
  • Optimize inference for latency and cost: quantization, model pruning, batching, and deployment with optimized runtimes (CUDA, Triton, bitsandbytes where applicable).
  • Build backend services and APIs to serve LLM inference and orchestration using containerized deployments (Docker/Kubernetes) and CI/CD pipelines.
  • Collaborate with product, data engineering, and ML teams to integrate LLMs into production flows, monitor model performance, and set up automated retraining/rollbacks.
  • Create reproducible training pipelines, implement evaluation suites, and produce documentation and runbooks for model governance and observability.

Skills & Qualifications Must-Have

  • 4+ years of hands-on experience working with LLMs or advanced NLP models in production contexts.
  • Proficiency in Python for ML engineering and model development.
  • Experience with PyTorch and Hugging Face Transformers for training and fine-tuning.
  • Practical experience implementing RAG and vector search using tools like FAISS or similar vector databases.
  • Familiarity with LangChain (or equivalent orchestration) and integration with LLM APIs (OpenAI, Anthropic, etc.).
  • Experience containerizing and deploying ML services using Docker; familiarity with Kubernetes is a plus.

Preferred

  • Experience with inference optimizations: quantization (bitsandbytes), Triton, or GPU-accelerated serving.
  • Exposure to distributed training frameworks (DeepSpeed) and cloud MLOps platforms (SageMaker, Azure ML, GCP AI Platform).
  • Knowledge of monitoring, logging, and model-evaluation frameworks for production LLMs (MLflow, Prometheus, Grafana).

Benefits & Culture Highlights

  • Collaborative, engineering-driven culture with strong focus on ownership and rapid iteration.
  • Opportunity to build end-to-end LLM products for enterprise clients and influence architecture decisions.
  • On-site role with hands-on access to GPU infrastructure and cross-functional product teams.

Skills: pytorch,python,docker,cuda,agentic,llm
Check Qualification

Quick Tip

Customize your resume and cover letter to highlight relevant skills for this position to increase your chances of getting hired.