Sensory, Inc.
Job Description
Company: Sensory Inc. Location: Remote (US or Non-US Based) Department: Emerging Technologies / Engineering About Sensory At Sensory, we are pioneers in embedded AI, wake words, speech-to-text, and on-device biometric technologies. We build on-device solutions that make consumer products smarter, safer, and more human—without sacrificing privacy. Operating at the intersection of deep learning and edge computing, our technologies power billions of devices globally. We move at the speed of AI, and we expect our team to actively leverage the latest advancements in large language models and agentic workflows to build the future of voice. We are a relatively small, but profitable AI company with remote employees - we hire talented people wherever they want to live. The Role We are seeking a highly technical, creative, hands-on Emerging Speech Technologist to join our elite engineering team. In this role, you will be a self starter working at the bleeding edge of speech technology, optimizing and deploying advanced speech-to-text (STT) algorithms onto highly resource-constrained hardware architectures. This isn't a traditional software engineering role. We expect you to natively integrate Agentic AI into your daily development cycle. You will leverage tools like Claude Code, Cursor, and AWS Bedrock to accelerate your coding, and harness LLMs to pioneer new methods for synthetic data generation and automated data validation. Furthermore, you will lead data-driven research and benchmarking to explicitly validate novel approaches and optimize our technology for real-world deployment on devices strictly constrained by RAM, processing power, and battery limitations. If you are passionate about pushing the boundaries of embedded voice AI and building low-latency systems that improve upon the industry's best, we want you on our team. Key Responsibilities Edge-Optimized Speech AI: Architect, develop, and deploy hands-on speech technology algorithms (focusing on STT) tailored specifically for small compute platforms, including DSPs and NPUs. Model Experimentation & Deployment: Evaluate, compress, and deploy variations of new speech models using frameworks like NVIDIA NeMo, ONNX, and TensorFlow Lite/LiteRT. Research & Real-World Optimization: Execute robust data-driven research and benchmarking to validate new models, ensuring maximum performance of voice technologies specifically tailored for environments with extremely limited RAM, processors, and power. Agentic Development & SOTA Innovation: Actively utilize AI-assisted coding and agentic workflows (Claude Code, Cursor, AWS Bedrock) to accelerate software engineering, optimize low-level code, and streamline unit testing and debugging. Leverage agentic AI to research and develop emerging technologies straight from state-of-the-art (SOTA), discover novel training methods, test and validate performance, address architectural weaknesses, and uncover new techniques for optimizing code across various environments. Synthetic Data Pipelines: Leverage Large Language Models (LLMs) to generate high-quality synthetic text and speech data. Establish rigorous, automated methodologies for data checking, curation, and validation. Cross-Platform Engineering: Write highly efficient, production-ready code in Python, C/C++, and Kotlin to bridge the gap between acoustic research and embedded hardware deployment. Innovation and Creative Thinking: Stay ahead of the curve on emerging voice AI trends (monitoring advancements from industry peers) to rapidly prototype new approaches for Sensory’s technology stack. Critical Knowledge & Qualifications Education: A Masters or PhD in Machine Learning, Computer Science or Software Engineering (or a closely related technical field). Candidates with BS degrees from top schools will be considered. Experience: 2 to 10 years of post-graduate industry experience, proving a successful track record of bridging advanced academic research with production-grade engineering. Small Platform Engineering: Deep, proven experience in software engineering for resource-constrained environments (DSPs, NPUs, edge devices). Experience with development boards is extremely important. Speech Technology: Extensive hands-on knowledge of STT algorithms, acoustic modeling, and deploying AI models via NeMo, ONNX, and TensorFlow Lite. AI-Native Workflow: Demonstrated experience and high comfort level using Agentic AI tools (e.g., Claude Code, Cursor, Bedrock) t