Lead Assistant Manager
Actively Reviewing the ApplicationsEXL
India, Uttar Pradesh, Noida
Full-Time
On-site
Posted 4 days ago
•
Apply by June 14, 2026
Job Description
Job Description: Data Annotator ;Job SummaryWe are seeking a detail-oriented and highly skilled Data Annotator to support the development of AI and Machine Learning (ML) models by preparing, labeling, and curating large-scale datasets. The ideal candidate will possess a strong understanding of annotation techniques, quality assurance for labeled data, and practical exposure to cloud-based tools (with a strong emphasis on AWS SageMaker Ground Truth, GCP Data Labeling, and Azure ML Data Labeling). This role is pivotal in ensuring the integrity, scalability, and accuracy of the data pipelines that power advanced AI systems.The Data Annotator will collaborate closely with Data Scientists, Machine Learning Engineers, Cloud Architects, and Product Teams to deliver high-quality labeled datasets optimized for supervised learning, natural language processing (NLP), computer vision, and speech recognition models.Key ResponsibilitiesData Annotation & LabelingPerform manual and semi-automated labeling of datasets across multiple modalities including text, audio, images, and video.Create high-quality annotations for:Text/NLP: Named Entity Recognition (NER), sentiment analysis, intent classification, part-of-speech tagging, conversation structuring, and chatbot training datasets.Computer Vision: Bounding boxes, polygons, segmentation masks, key points, object tracking in videos, and OCR annotation.Speech/Audio: Transcription, speaker diarization, phoneme tagging, emotion labeling, and acoustic event detection.Conduct multi-tier annotation validation and apply inter-annotator agreement processes to ensure labeling accuracy.AWS & Cloud-Based AnnotationLeverage AWS SageMaker Ground Truth for scalable data labeling workflows including automated data labeling with active learning.Implement quality control (QC) mechanisms in SageMaker Ground Truth such as audit labels, annotation consolidation, and annotation jobs monitoring.Integrate annotated datasets into AWS S3, ensuring optimal storage structures and lifecycle policies.Work with AWS Glue, Athena, and QuickSight for dataset validation, analysis, and reporting.Exposure to GCP Data Labeling Services and Azure ML Data Labeling tools for multi-cloud environments (good to have).Collaborate with Cloud Engineers to automate annotation workflows using Lambda functions, Step Functions, and event-driven pipelines.Data Management & Quality AssurancePerform data preprocessing: cleaning, normalization, anonymization (especially for PII data), and augmentation.Apply data quality checks to maintain dataset balance, reduce bias, and enhance representativeness.Document annotation guidelines, taxonomy structures, and ontology mapping for consistent labeling practices.Ensure compliance with security and privacy standards (GDPR, HIPAA, SOC2, ISO 27001) while working with sensitive datasets.Collaboration & Continuous ImprovementCollaborate with ML Engineers and Data Scientists to refine annotation requirements based on evolving model performance.Participate in regular feedback loops with AI developers to improve annotation accuracy and dataset utility.Contribute to the design of annotation ontologies and label taxonomies for domain-specific projects (e.g., healthcare, finance, retail, manufacturing).Stay updated on emerging annotation tools, AI-assisted labeling platforms, and best practices. ;Required Skills & CompetenciesCore SkillsProven expertise in data annotation for AI/ML applications across text, image, and speech datasets.Strong proficiency with AWS Cloud services, especially SageMaker Ground Truth, S3, and Glue.Familiarity with annotation platforms and tools (Labelbox, Supervisely, CVAT, Prodigy, Doccano).Knowledge of Python/SQL scripting for dataset preparation and automation.Basic understanding of machine learning concepts (classification, object detection, NLP pipelines).Familiarity with big data tools (Apache Spark, Databricks – nice to have).Domain KnowledgeText/NLP: Language models, chatbot training, intent recognition.Computer Vision: Object detection, OCR, autonomous systems labeling.Audio/Speech: Transcription guidelines, phoneme labeling, acoustic datasets.Understanding of industry datasets (healthcare records, retail data, insurance documents, call center logs).Cloud ExpertiseAWS (Priority): SageMaker Ground Truth, S3, Glue, Athena, QuickSight, IAM for role-based access control.GCP (Good to Have): Vertex AI, AutoML, Data Labeling.Azure (Good to Have): Azure ML Data Labeling, Azure Blob Storage, Azure Cognitive Services.QualificationsBachelor’s degree in Computer Science, Data Science, Information Technology, or related field.2–5 years of experience in data annotation, data labeling, or dataset preparation for AI/ML projects.Hands-on experience with AWS annotation workflows and multi-modal datasets.Certification in AWS Machine Learning Specialty or AWS Data Analytics Specialty (preferred).Exposure to annotation in regulated industries (healthcare, finance, retail, government projects) is a plus.Performance MetricsAnnotation Quality: Accuracy and consistency of labeled data.Efficiency: Volume of annotations completed within SLA.Cloud Integration: Seamless delivery of datasets into AWS pipelines.Error Reduction: Continuous improvement of data validation and annotation accuracy.Collaboration: Effective communication with Data Science and Cloud Engineering teams.Growth PathSenior Data Annotator / Annotation Lead → managing teams of annotators.Data Quality Analyst → leading data validation and audit processes.ML Data Engineer → transitioning into dataset pipeline development roles.AI/ML Specialist on AWS → specializing in automation and scaling of annotation pipelines ;
Required Skills
Communication
Machine Learning
Engineering
Quality Control
Quality Assurance
Reporting
Automation
Compliance
Monitoring
Cleaning
Python
Object Detection
Apache Spark
SQL
Training
AWS
Access Control
Audit
IAM
Natural Language Processing
Computer Vision
Spark
Healthcare
Speech Recognition
Sentiment Analysis
Azure
AutoML
Databricks
Chatbot
Data Analytics
Data Science
Continuous Improvement
Lambda
Apache
NLP
Cloud services
Analytics
Information Technology
Data Management
Data quality
Validation
HIPAA
Athena
Cloud Engineering
Scripting
Cloud integration
Cognitive
Blob
Data annotation
Segmentation
OCR
Natural language
Big Data
Images
Entity
Consolidation
Acoustic
Preprocessing
Data validation
SQL scripting
Transcription
Mapping
GDPR
Pipeline development
Data pipelines
Annotation
AWS Cloud
Event-driven
Normalization
Data labeling
Machine learning concepts
ISO 27001
Ontology
Cloud environments
Sagemaker
Privacy
Speech
Recognition
Vertex
Object tracking
Detection
AWS glue
AWS S3
Cognitive services
Labeling
Azure Cognitive Services
Event detection
Taxonomy
Storage
ISO
Audio
Annotations
Classification
BIg Data tools
Loops
AWS Sagemaker
Azure ML
Glue
AI/ML
Azure Blob storage
Computer Science
Basic Understanding
Named entity recognition
Blob storage
Quality Analyst
Step Functions
Vertex AI
Advanced AI
QuickSight
Active Learning
AWS Cloud Services
Quick Tip
Customize your resume and cover letter to highlight relevant skills for this position to increase your chances of getting hired.
Related Similar Jobs
View All
Human Resources Business Partner
PW (PhysicsWallah)
India
Full-Time
Analytics
Specialist, Insurance Follow Up
Revology
Leadership
Compliance
Confidentiality
+30
Service Desk Specialist
HCLTech
India
Full-Time
Communication
Troubleshooting
Issue Resolution
+12
UI Developer
Uplers
India
Full-Time
₹4–10 LPA
Communication
Team Management
JavaScript
+20
J&J Surgery Cincinnati: Data Science Co-op Fall 2026
Johnson & Johnson MedTech
India
Full-Time
Engineering
Data Analysis
Leadership
+43
Share
Quick Apply
Upload your resume to apply for this position