AI QA Engineer

Actively Reviewing the Applications

TIGI HR

Mumbai Full-Time 4–8 years

Posted 3 days ago • Apply by June 11, 2026

Job Description

Job Summary

We are seeking an AI QA Engineer to ensure the quality, accuracy, and performance of our enterprise-grade Natural Language to SQL (NL2SQL) pipeline. You will be responsible for validating a complex, multi-stage AI architecture—including semantic routing, LLM-based disambiguation, and query generation—ensuring it securely and accurately translates user intent into valid queries within the BFSI domain.

Experience: 7+ Years

Location: Gurugram

Work Mode: Hybrid - 3 Days WFO

Employment Type: Full-Time

Key Responsibilities

LLM & Pipeline Evaluation: Design and execute automated evaluations for a 4-stage NL2SQL pipeline using LangSmith. Monitor metrics such as structural F1, execution accuracy, latency, and token cost.
Dataset Management: Create, curate, and maintain benchmark/golden datasets for continuous regression testing of LLM prompts and model outputs.
Search & Retrieval Testing: Validate precision and recall trade-offs in semantic search and schema discovery, ensuring optimal candidate selection for downstream query generation.
Failure Analysis & Debugging: Perform root cause analysis across pipeline stages (routing, disambiguation, query generation, execution), identifying issues such as schema mismatches, type/coercion errors, runtime incompatibilities, and query structure failures.
E2E & API Automation: Develop automated test scripts using Python (Pytest) for backend API testing and Playwright for the React frontend, validating end-to-end user workflows.
Observability & Debugging: Utilize Grafana and structured JSONL logs to identify pipeline bottlenecks, LLM hallucinations, or prompt degradation.
Compliance & Security: Ensure the AI pipeline meets strict BFSI data security standards, validating execution safety mechanisms (e.g., runtime capability probing, injection prevention); Ability to design validation rules and guardrails for AI pipelines to prevent invalid query generation and runtime failures.

Required Skills

AI/LLM Testing: Experience testing LLM applications, RAG (Retrieval-Augmented Generation) pipelines, or NLP models. Familiarity with AI evaluation frameworks (e.g., LangSmith, DeepEval, or similar).
Languages: Strong proficiency in Python 3.12+ (crucial for integrating with the existing AI backend and Pytest suite). Secondary experience with JavaScript/TypeScript.
Test Automation: Expertise in API testing (REST) and optional UI automation using Playwright.
Data & Search: Understanding of Vector Databases (e.g., Milvus, Pinecone) and semantic search concepts (embeddings, hybrid search).
Data & SQL Validation: Solid understanding of SQL and data validation techniques to verify correctness of complex query outputs.
Tools & Infrastructure: Git, Docker, CI/CD pipelines, and observability tools (Prometheus/Grafana).

Education

BE / BTech / MCA / BSc in Computer Science, Data Science, or a related field.

Nice to Have

Familiarity with Graph Databases (Neo4j) and LangGraph orchestration.
Experience evaluating foundational LLM models (OpenAI, Anthropic, Google).
Prior exposure to query languages like SQL or PURE or any other functional programming language.
Experience testing workflows across multiple services or pipelines, with an understanding of failure handling, retries, and system reliability concepts.
Experience in Banking, Financial Services, or Insurance domains
Understanding of data security, compliance, and enterprise database schemas