Bestkaam Logo
TIGI HR Logo

AI QA Engineer

Actively Reviewing the Applications

TIGI HR

Mumbai Full-Time 4–8 years
Posted 3 days ago Apply by June 11, 2026

Job Description

Job Summary

We are seeking an AI QA Engineer to ensure the quality, accuracy, and performance of our enterprise-grade Natural Language to SQL (NL2SQL) pipeline. You will be responsible for validating a complex, multi-stage AI architecture—including semantic routing, LLM-based disambiguation, and query generation—ensuring it securely and accurately translates user intent into valid queries within the BFSI domain.


Experience: 7+ Years


Location: Gurugram


Work Mode: Hybrid - 3 Days WFO


Employment Type: Full-Time


Key Responsibilities

  • LLM & Pipeline Evaluation: Design and execute automated evaluations for a 4-stage NL2SQL pipeline using LangSmith. Monitor metrics such as structural F1, execution accuracy, latency, and token cost.
  • Dataset Management: Create, curate, and maintain benchmark/golden datasets for continuous regression testing of LLM prompts and model outputs.
  • Search & Retrieval Testing: Validate precision and recall trade-offs in semantic search and schema discovery, ensuring optimal candidate selection for downstream query generation.
  • Failure Analysis & Debugging: Perform root cause analysis across pipeline stages (routing, disambiguation, query generation, execution), identifying issues such as schema mismatches, type/coercion errors, runtime incompatibilities, and query structure failures.
  • E2E & API Automation: Develop automated test scripts using Python (Pytest) for backend API testing and Playwright for the React frontend, validating end-to-end user workflows.
  • Observability & Debugging: Utilize Grafana and structured JSONL logs to identify pipeline bottlenecks, LLM hallucinations, or prompt degradation.
  • Compliance & Security: Ensure the AI pipeline meets strict BFSI data security standards, validating execution safety mechanisms (e.g., runtime capability probing, injection prevention); Ability to design validation rules and guardrails for AI pipelines to prevent invalid query generation and runtime failures.

Required Skills

  • AI/LLM Testing: Experience testing LLM applications, RAG (Retrieval-Augmented Generation) pipelines, or NLP models. Familiarity with AI evaluation frameworks (e.g., LangSmith, DeepEval, or similar).
  • Languages: Strong proficiency in Python 3.12+ (crucial for integrating with the existing AI backend and Pytest suite). Secondary experience with JavaScript/TypeScript.
  • Test Automation: Expertise in API testing (REST) and optional UI automation using Playwright.
  • Data & Search: Understanding of Vector Databases (e.g., Milvus, Pinecone) and semantic search concepts (embeddings, hybrid search).
  • Data & SQL Validation: Solid understanding of SQL and data validation techniques to verify correctness of complex query outputs.
  • Tools & Infrastructure: Git, Docker, CI/CD pipelines, and observability tools (Prometheus/Grafana).

Education

  • BE / BTech / MCA / BSc in Computer Science, Data Science, or a related field.

Nice to Have

  • Familiarity with Graph Databases (Neo4j) and LangGraph orchestration.
  • Experience evaluating foundational LLM models (OpenAI, Anthropic, Google).
  • Prior exposure to query languages like SQL or PURE or any other functional programming language.
  • Experience testing workflows across multiple services or pipelines, with an understanding of failure handling, retries, and system reliability concepts.
  • Experience in Banking, Financial Services, or Insurance domains
  • Understanding of data security, compliance, and enterprise database schemas

Required Skills

Check Qualification

Quick Tip

Customize your resume and cover letter to highlight relevant skills for this position to increase your chances of getting hired.