Machine Learning Engineer
Actively Reviewing the ApplicationsNeoITO
Job Description
AI / ML Engineer – SLM & RAG Specialist
Location: Trivandrum(Kerala)
Company: NeoITO
Experience: 5+ Years
About the Role
NeoITO is hiring an AI / ML Engineer to build and own an AI-powered Proposal & RFP generation system designed to transform meeting notes into structured, client-ready proposals within minutes.
You will be responsible for designing and managing the core AI layer, including the inference engine, RAG pipeline, embedding models, and compliance validation system.
Y
ou will collaborate closely with backend (Node.js) and frontend (React) engineers to deliver a production-ready AI system within a defined delivery timeline.
Key Responsibilities
Model Deployment & Inference
- Deploy and manage Small Language Models (SLMs) on on-premise GPU infrastructure.
- Configure and optimize LLM inference pipelines using frameworks such as vLLM or HuggingFace Transformers.
- Implement token streaming, continuous batching, and optimized sampling strategies for reliable text generation.
- Apply quantization techniques (GPTQ/AWQ) to reduce GPU memory footprint while maintaining inference performance.
- Monitor GPU health and performance metrics including VRAM usage, latency, and throughput
Retrieval-Augmented Generation (RAG)
- Design and implement RAG pipelines to enable context-aware proposal generation.
- Build text chunking pipelines and generate embeddings using sentence-transformer models.
- Store and retrieve vector embeddings using PostgreSQL with pgvector.
- Implement semantic similarity search to retrieve relevant historical proposal data.
- Continuously evaluate and optimize retrieval quality and performance.
AI-Driven Proposal Generation
- Design structured pipelines to generate multi-section proposals including:
- Executive Summary
- Project Scope
- Technical Approach
- Implementation Timeline
- Investment Summary
- Risk Mitigation
- Create section-specific prompts and templates for high-quality generation.
- Implement real-time streaming responses to backend services.
- Support partial regeneration of sections for iterative proposal refinement.
AI Quality, Validation & Compliance
- Develop a validation engine to ensure generated content meets compliance and quality standards.
- Implement rule-based checks including:
- Client name verification
- Budget reference validation
- Section completeness
- Sensitive data detection
- Support an optional AI-based review layer for deeper quality checks.
- Deliver structured feedback and annotations for use within editing workflows.
Prompt Engineering & Model Optimization
- Design and maintain structured prompts for classification, generation, and validation tasks.
- Conduct iterative prompt optimization to improve accuracy, tone, and consistency.
- Maintain prompt versioning and regression testing frameworks.
- Evaluate output quality through structured human evaluation metrics.
Fine-Tuning & Model Improvement
- Lead fine-tuning initiatives to improve model performance over time.
- Prepare and curate training datasets from finalized proposals.
- Implement LoRA / QLoRA fine-tuning strategies for efficient model updates.
- Track experiments and model versions using tools such as MLflow.
Collaboration & Engineering Practices
- Expose AI capabilities via FastAPI services consumed by backend applications.
- Collaborate with backend teams on job orchestration, queue processing, and event streaming.
- Implement unit tests and quality checks for ML pipelines.
- Contribute to containerized deployment environments using Docker.
- Support CI/CD pipelines with automated testing and linting workflows.
Required Skills & Experience
Large Language Models & AI Systems
- Hands-on experience with LLMs or SLMs
- Experience deploying models using vLLM, HuggingFace Transformers, or similar frameworks
- Knowledge of quantization techniques and inference optimization
RAG & Vector Search
- Experience building Retrieval-Augmented Generation pipelines
- Knowledge of vector databases such as pgvector, FAISS, or similar
- Familiarity with embedding models and semantic search
Programming & Frameworks
- Strong Python development experience
- Experience with FastAPI, Pydantic, and PyTorch
- Knowledge of libraries such as sentence-transformers, LangChain, or LlamaIndex
Infrastructure & GPU Systems
- Experience working with GPU-based model deployment
- Familiarity with CUDA environments and GPU monitoring
- Experience deploying applications with Docker on Linux environments
Databases & Storage
- Experience with PostgreSQL
- Familiarity with vector extensions or vector search databases
- Knowledge of object storage solutions such as S3 or MinIO
MLOps & Model Lifecycle
- Experience with LoRA / QLoRA fine-tuning
- Familiarity with experiment tracking tools
- Knowledge of dataset preparation and model evaluation
Nice to Have
- Experience working with Meta Llama models
- Familiarity with document generation systems
- Experience with queue-based ML pipelines
- Exposure to secure enterprise environments requiring strict data governance
- Knowledge of observability tools such as Prometheus
In this role, you will:
- Deliver a fully functional AI proposal generation system running entirely on-premise
- Achieve high-quality, structured proposal outputs
- Ensure stable performance under concurrent usage
- Establish a foundation for continuous model improvement through fine-tuning
Tech Stack
Primary Language: Python
API Framework: FastAPI
LLM Inference: vLLM / Transformers
Embedding Models: Sentence Transformers
Vector Database: PostgreSQL + pgvector
GPU Infrastructure: NVIDIA GPU environments
Containerization: Docker
Monitoring: Prometheus
Testing: Pytest
Quick Tip
Customize your resume and cover letter to highlight relevant skills for this position to increase your chances of getting hired.
Related Similar Jobs
View All
Lead UI/UX Designer
Uplers
Digital Content Associate, Prime Video Sports
Prime Video & Amazon MGM Studios
Principal Engineer - Site Reliability [T500-22750]
TMUS Global Solutions
IN-Senior Associate_Oracle Fusion Technical_OC Oracle Apps_Advisory_Bhubaneswar
PwC India
Senior Product Designer
Razorpay
Share
Quick Apply
Upload your resume to apply for this position