Full-Stack AI Developer (Multilingual Voice & Singing System | Open source | Work-for-Hire

Bengaluru, Karnataka, India

2 months ago

Applicants: 0

Salary Not Disclosed

6 days left to apply

Job Description

Job Opening ? Full-Stack AI Developer (Multilingual Voice & Singing System | Open source | Offline | Work-for-Hire Budget: USD 1,000 (fixed) Deadline: 2 weeks Ownership: 100% exclusive to Giramille We are hiring an experienced Full-Stack AI Developer to build a complete offline multilingual voice and singing system capable of cloning voices, translating, and re-singing songs and spoken lines in 33 languages or more, maintaining natural tone, emotion, pronunciation, rhythm, and melody. The final software must run entirely on a local computer, without any internet connection, including both the front-end and back-end components. All models and tools used must be MIT open source or equivalent. Project Requirements: The developer must create a local executable program (Windows or Linux) able to: 1. Clone original voices (spoken and sung). 2. Translate and re-sing songs in at least 33 languages, with natural and clean sound, not robotic. 3. Provide two English variants: British English and American English. 4. Display the full lyrics text and allow editing of any word, then regenerate that section automatically with the corrected pronunciation. 5. Export the results in .wav, .mp3, and .srt formats (for subtitles). 6. Operate completely offline, except for translation, which will use the Google Translate API (licensed and paid by us). 7. The system must be capable of training and cloning our own character voices (both spoken and sung) locally, using short audio samples, ideally under two minutes of voice data per character. Recommended Tech Stack (MIT or Equivalent License) The Contractor may develop the code entirely from scratch or adapt MIT-licensed open-source frameworks. All components must allow unrestricted commercial use, modification, and redistribution. The Clients? listed tools are suggestions only; the Contractor is fully responsible for the chosen implementation and final functionality of the system. Voice Cloning: OpenVoice (MIT) ? github.com/myshell-ai/OpenVoice RVC ? Retrieval-Based Voice Conversion ? github.com/RVC-Project/Retrieval-based-Voice-Conversion Singing Generation: DiffSinger (Microsoft Research Asia) ? github.com/Microsoft/DiffSinger CosyVoice 2 (Tencent AI) ? github.com/FunAudioLLM/CosyVoice Transcription: WhisperX (local model) ? github.com/m-bain/whisperX Translation: Google Translate API (official paid API only) ? cloud.google.com/translate Phonetic Alignment: Custom Python scripts integrated with WhisperX Interface (GUI): Flask / Electron / PyQT ? local interface, simple and intuitive Mixing and Rendering: FFmpeg and Audacity (batch mode) Front-End Delivery Process 1. The developer must first deliver a functional front-end prototype with basic buttons and layout (Upload, Generate, Export, Edit Word, etc.). 2. After approval, Giramille will provide the official design assets (backgrounds, buttons, icons, and color palette). 3. The developer will integrate those assets into the GUI for the final branded version. Payment Terms: Total payment: USD 1,000 (fixed amount), paid via PayPal. Contract Type: Work-for-Hire Payment will be made only after full delivery, with no advance payments. The system must be 100% functional and fully validated before payment is released, meaning: ? It must sing fluently in 33 languages or more. ? It must sound natural and expressive, clean audio and not robotic. ? It must respect melody, rhythm, accent, and pronunciation in each language. ? It must include both US and UK English versions. ? It must allow word-level editing and automatic regeneration. ? It must operate fully offline (except for translation via Google API). Note: Partial or non-functional deliveries will not be accepted or paid. Technical Requirements: ? Proven experience with Python, PyTorch, Docker, and CUDA GPU acceleration. ? Previous experience with AI audio models (RVC, DiffSinger, CosyVoice, WhisperX). ? Ability to integrate multiple AI models into one unified offline pipeline. ? Experience building local GUIs (Flask / Electron / PyQT). ? Deep understanding of phonetic synthesis and multilingual voice generation. ? Strong sense of usability and design for clean, intuitive interfaces. ? The developer must deliver both the front-end and back-end of the system, including all source code, libraries, and dependencies, ensuring the project remains fully open and editable for future internal development. References: For reference, the system we want to build should achieve a level of multilingual naturalness similar to Netflix?s multilingual dubbing experience, Suno AI, the Mouse Quiz multilingual singing videos available on YouTube, where songs and voices are perfectly synchronized across multiple languages, preserving tone, melody, and expression. Note: Before signing the official contract, the finalist must complete a short technical challenge to validate technical proficiency and ensure alignment with the project?s requirements. Candidates who successfully complete the challenge will then proceed to the official Work-for-Hire agreement and full project assignment. While the Clients may provide suggestions regarding tools, frameworks, platforms, or open-source codebases (MIT or equivalent) that could facilitate the development process, these references are purely advisory. The Contractor remains fully and solely responsible for selecting the most appropriate technical approach and ensuring that the final system is functional, stable, and compliant with all project requirements, whether built using suggested open-source materials or entirely original code. All risks and decisions related to implementation, compatibility, and integration are the sole responsibility of the Contractor, provided that the final product meets the technical and functional standards defined by the Clients. #AI #VoiceCloning #SingingAI #FullStackDeveloper #Python #DeepLearning #OfflineAI #DiffSinger #CosyVoice #RVC #OpenVoice #AudioAI #Localization #Multilingual #MusicTech #WorkForHire #Giramille