Bestkaam Logo
NeoITO Logo

Machine Learning Engineer

Actively Reviewing the Applications

NeoITO

India, Kerala, Trivandrum Full-Time On-site
Posted 3 days ago Apply by June 8, 2026

Job Description

AI / ML Engineer – SLM & RAG Specialist


Location: Trivandrum(Kerala)

Company: NeoITO

Experience: 5+ Years


About the Role


NeoITO is hiring an AI / ML Engineer to build and own an AI-powered Proposal & RFP generation system designed to transform meeting notes into structured, client-ready proposals within minutes.


You will be responsible for designing and managing the core AI layer, including the inference engine, RAG pipeline, embedding models, and compliance validation system.

Y

ou will collaborate closely with backend (Node.js) and frontend (React) engineers to deliver a production-ready AI system within a defined delivery timeline.



Key Responsibilities

Model Deployment & Inference

  • Deploy and manage Small Language Models (SLMs) on on-premise GPU infrastructure.
  • Configure and optimize LLM inference pipelines using frameworks such as vLLM or HuggingFace Transformers.
  • Implement token streaming, continuous batching, and optimized sampling strategies for reliable text generation.
  • Apply quantization techniques (GPTQ/AWQ) to reduce GPU memory footprint while maintaining inference performance.
  • Monitor GPU health and performance metrics including VRAM usage, latency, and throughput

Retrieval-Augmented Generation (RAG)

  • Design and implement RAG pipelines to enable context-aware proposal generation.
  • Build text chunking pipelines and generate embeddings using sentence-transformer models.
  • Store and retrieve vector embeddings using PostgreSQL with pgvector.
  • Implement semantic similarity search to retrieve relevant historical proposal data.
  • Continuously evaluate and optimize retrieval quality and performance.

AI-Driven Proposal Generation

  • Design structured pipelines to generate multi-section proposals including:
  • Executive Summary
  • Project Scope
  • Technical Approach
  • Implementation Timeline
  • Investment Summary
  • Risk Mitigation
  • Create section-specific prompts and templates for high-quality generation.
  • Implement real-time streaming responses to backend services.
  • Support partial regeneration of sections for iterative proposal refinement.

AI Quality, Validation & Compliance

  • Develop a validation engine to ensure generated content meets compliance and quality standards.
  • Implement rule-based checks including:
  • Client name verification
  • Budget reference validation
  • Section completeness
  • Sensitive data detection
  • Support an optional AI-based review layer for deeper quality checks.
  • Deliver structured feedback and annotations for use within editing workflows.

Prompt Engineering & Model Optimization

  • Design and maintain structured prompts for classification, generation, and validation tasks.
  • Conduct iterative prompt optimization to improve accuracy, tone, and consistency.
  • Maintain prompt versioning and regression testing frameworks.
  • Evaluate output quality through structured human evaluation metrics.

Fine-Tuning & Model Improvement

  • Lead fine-tuning initiatives to improve model performance over time.
  • Prepare and curate training datasets from finalized proposals.
  • Implement LoRA / QLoRA fine-tuning strategies for efficient model updates.
  • Track experiments and model versions using tools such as MLflow.

Collaboration & Engineering Practices

  • Expose AI capabilities via FastAPI services consumed by backend applications.
  • Collaborate with backend teams on job orchestration, queue processing, and event streaming.
  • Implement unit tests and quality checks for ML pipelines.
  • Contribute to containerized deployment environments using Docker.
  • Support CI/CD pipelines with automated testing and linting workflows.

Required Skills & Experience


Large Language Models & AI Systems

  • Hands-on experience with LLMs or SLMs
  • Experience deploying models using vLLM, HuggingFace Transformers, or similar frameworks
  • Knowledge of quantization techniques and inference optimization

RAG & Vector Search

  • Experience building Retrieval-Augmented Generation pipelines
  • Knowledge of vector databases such as pgvector, FAISS, or similar
  • Familiarity with embedding models and semantic search

Programming & Frameworks

  • Strong Python development experience
  • Experience with FastAPI, Pydantic, and PyTorch
  • Knowledge of libraries such as sentence-transformers, LangChain, or LlamaIndex

Infrastructure & GPU Systems

  • Experience working with GPU-based model deployment
  • Familiarity with CUDA environments and GPU monitoring
  • Experience deploying applications with Docker on Linux environments

Databases & Storage

  • Experience with PostgreSQL
  • Familiarity with vector extensions or vector search databases
  • Knowledge of object storage solutions such as S3 or MinIO

MLOps & Model Lifecycle

  • Experience with LoRA / QLoRA fine-tuning
  • Familiarity with experiment tracking tools
  • Knowledge of dataset preparation and model evaluation

Nice to Have

  • Experience working with Meta Llama models
  • Familiarity with document generation systems
  • Experience with queue-based ML pipelines
  • Exposure to secure enterprise environments requiring strict data governance
  • Knowledge of observability tools such as Prometheus


In this role, you will:

  • Deliver a fully functional AI proposal generation system running entirely on-premise
  • Achieve high-quality, structured proposal outputs
  • Ensure stable performance under concurrent usage
  • Establish a foundation for continuous model improvement through fine-tuning


Tech Stack


Primary Language: Python

API Framework: FastAPI

LLM Inference: vLLM / Transformers

Embedding Models: Sentence Transformers

Vector Database: PostgreSQL + pgvector

GPU Infrastructure: NVIDIA GPU environments

Containerization: Docker

Monitoring: Prometheus

Testing: Pytest

Check Qualification

Quick Tip

Customize your resume and cover letter to highlight relevant skills for this position to increase your chances of getting hired.