AI Data Engineer- Hyderabad
Actively Reviewing the ApplicationsHYrEzy Tech Solutions
India, Telangana, Hyderabad
Full-Time
On-site
INR 18–32 LPA
Posted 10 hours ago
•
Apply by June 7, 2026
Job Description
Role: AI Data Engineer
We are seeking a hardcore, hands-on AI Data Engineer to build the high-performance data infrastructure required to power autonomous AI agents. You won't just be moving data from A to B; you will be architecting Dynamic Context Windows, managing Real-time Semantic Indexes, and building Self-Cleaning Data Pipelines that feed our "Super Employee" agents.
Key Responsibilities
- Location: Rai Durg, Hyderabad
- Work mode- Hybrid model working (3 days work from office)
- Experience: 5 - 8 Years (Minimum 5 years- AI Data Engineer)
- Mandatory Skills: DVC (Data Version Control) and Airflow, Apache Spark, Flink, and Kafka, Advanced level Python and AI logic and Rust (or C++), Vector Database Mastery like configuration of HNSW indexes, scalar quantization, and metadata filtering strategies
- Budget: 18 - 32 LPA
- Qualification: Bachelor of Engineering - Bachelor of Technology (B.E./B.Tech.)
- Notice period: Immediate / early joiners (Max. 15-30 days)
- Interview Process: 2 - 3 Technical rounds
- We are currently prioritizing immediate / early joiners (maximum 15-30 days- notice period above 30 days will be automatically rejected.).
- All mandatory technical skills must be clearly highlighted within the project descriptions in your resume, not just listed in the Skills or Roles & Responsibilities sections .
We are seeking a hardcore, hands-on AI Data Engineer to build the high-performance data infrastructure required to power autonomous AI agents. You won't just be moving data from A to B; you will be architecting Dynamic Context Windows, managing Real-time Semantic Indexes, and building Self-Cleaning Data Pipelines that feed our "Super Employee" agents.
Key Responsibilities
- Vector & Graph ETL: Design and maintain pipelines that transform unstructured data (PDFs, emails, logs, chats) into optimized embeddings for Vector Databases (Pinecone, Weaviate, Milvus).
- Semantic Data Modeling: Engineer data structures that optimize for Retrieval-Augmented Generation (RAG), ensuring agents find the "needle in the haystack" in milliseconds.
- Knowledge Graph Construction: Build and scale Knowledge Graphs (Neo4j) to represent complex relationships in our trading and support data that standard vector search misses.
- Automated Data Labeling & Synthetic Data: Implement pipelines using LLMs to auto-label datasets or generate synthetic edge cases for agent training and evaluation.
- Stream Processing for Agents: Build real-time data "listeners" (Kafka/Flink) that feed live context to agents, allowing them to react to market or support events as they happen.
- Data Reliability & "Drift" Detection: Build monitoring for "Embedding Drift", identifying when the statistical distribution of your data changes and the agent's "knowledge" becomes stale.
- Vector Database Mastery: Expert-level configuration of HNSW indexes, scalar quantization, and metadata filtering strategies within Pinecone, Milvus, or Qdrant.
- Advanced Python & Rust: Proficiency in Python for AI logic and Rust (or C++) for high-performance data processing and custom embedding functions.
- Big Data Ecosystem: Hands-on experience with Apache Spark, Flink, and Kafka in a high-throughput environment (Trading/FinTech preferred).
- LLM Data Tooling: Deep experience with Unstructured.io, LlamaIndex, or LangChain for document parsing and chunking strategy optimization.
- MLOps & DataOps: Mastery of DVC (Data Version Control) and Airflow/Prefect for managing complex, non-linear AI data workflows.
- Embedding Models: Understanding of how to fine-tune embedding models (e.g., BGE, Cohere, or OpenAI) to better represent domain-specific (Trading) terminology.
- Chunking Strategy Architect: You don't just "split text." You implement Semantic Chunking and Parent-Child retrieval strategies to maximize LLM context relevance.
- Cold/Warm/Hot Storage Strategy: Managing cost and latency by tiering data between Vector DBs (Hot), SQL/NoSQL (Warm), and S3/Data Lakes (Cold).
- Privacy & Redaction Pipelines: Building automated PII (Personally Identifiable Information) redaction into the ingestion layer to ensure agents never "see" or "leak" sensitive user data.
- Opportunity to lead transformative initiatives, modernizing legacy systems and shaping the future of trading technology.
- Work with cutting-edge technologies in a dynamic, fast-paced environment.
- Competitive compensation, professional growth opportunities, and the chance to work with industry-leading experts.
Required Skills
Engineering
Monitoring
Python
Apache Spark
SQL
Data Modeling
Training
Rust
Spark
Kafka
MLOps
Neo4j
LangChain
Airflow
ETL
Drift
NoSQL
React
Apache
Data Structures
RAG
Metadata
Trading
Auto
Version control
DVC
Data processing
Graph
Vector Database
Modeling
Quantization
Vector
Work Mode
Big Data
Legacy systems
Semantic
Pinecone
Work from Office
Haystack
Ingestion
Distribution
Stream
PDFs
Statistical
Data labeling
Linear
Embedding
Position Overview
Synthetic
OpenAI
Synthetic Data
Privacy
Data changes
Chunking
Detection
Vector Search
Big Data Ecosystem
Retrieval-Augmented Generation
Knowledge Graph
Scalar
Labeling
Unstructured data
Knowledge graphs
Retrieval
Stream processing
LLMs
Data tooling
SQL/NoSQL
Storage
Construction
Weaviate
Personally identifiable information
Configuration
Indexes
LLM
Milvus
Flink
Augmented Generation
Retrieval-augmented
Vector Databases
Embeddings
Quick Tip
Customize your resume and cover letter to highlight relevant skills for this position to increase your chances of getting hired.
Related Similar Jobs
View All
Jack in the Box - RESTAURANT MANAGER
Giants Baseball & Softball Camps
India
Full-Time
Sales
Attention to Detail
Safety
+24
Software Consultant Development
ParentPay Group
Pune
Full-Time
Cloud management automation
Docker
PowerShell
+2
Sr Lead Front end (React, Typescript)
UST
India
Full-Time
₹3–8 LPA
JavaScript
Python
Jenkins
+8
JAVA Microservices with AWS-Senior Consultant
Deloitte
India
Full-Time
Machine Learning
Git
Python
+21
Frontend Developer
Ascendeum
India
Full-Time
Engineering
JavaScript
Angular
+4
Share
Quick Apply
Upload your resume to apply for this position