Gen AI Engineer

AMIRA LEARNING INC

7d•Remote

About The Position

Amira Learning accelerates literacy outcomes by delivering the latest reading and neuroscience with AI. As the leader in third-generation edtech, Amira listens to students read out loud, assesses mastery, helps teachers supplement instruction and delivers 1:1 tutoring. Validated by independent university and SEA efficacy research, Amira is the only AI literacy platform proven to achieve gains surpassing 1:1 human tutoring, consistently delivering effect sizes over 0.4. Rooted in over thirty years of research, Amira is the first, foremost, and only proven Intelligent Assistant for teachers and AI Reading Tutor for students. The platform serves as a school district’s Intelligent Growth Engine, driving instructional coherence by unifying assessment, instruction, and tutoring around the chosen curriculum. Unlike any other edtech tool, Amira continuously identifies each student’s skill gaps and collaborates with teachers to build lesson plans aligned with district curricula, pulling directly from the district’s high-quality instructional materials. Teachers can finally differentiate instruction with evidence and ease, and students get the 1:1 practice they specifically need, whether they are excelling or working below grade level. Trusted by more than 2,000 districts and working in partnership with twelve state education agencies, Amira is helping 3.5 million students worldwide become motivated and masterful readers. Job Summary: A self-motivated A-player who is results-oriented, operates at a fast pace, and takes pride in delivering high-quality, trustworthy AI systems; an experienced generative AI practitioner with deep expertise in LLM-based system design, including prompt engineering, RAG architectures, fine-tuning, and evaluation; highly focused on accuracy and reliability, with a clear understanding that hallucinations and misinformation erode user trust and equipped with concrete strategies to prevent them; proficient in building and maintaining end-to-end production ML pipelines, from data preparation through deployment and monitoring; and a strong collaborator who works effectively across engineering, product, and customer-facing teams in a fully remote environment.

Requirements

2+ years of hands-on experience building and deploying LLM-based systems in production
Deep familiarity with RAG architectures: embedding models, vector databases, retrieval strategies, and response grounding
Demonstrated experience with evaluation and benchmarking of LLM outputs — including hallucination mitigation, confidence filtering, output validation, and fallback strategies
Practical experience with prompt engineering, prompt chaining, and/or agent orchestration frameworks (LangChain, LlamaIndex, or similar)
Proficiency in Python and experience working with LLM APIs (open-source, Anthropic, OpenAI, etc.)
Experience building and maintaining ML or data pipelines in AWS or similar cloud infrastructure (Lambda, S3, RDS, etc.)
Degree in computer science or a related technical field, or equivalent practical experience

Nice To Haves

Experience fine-tuning foundation models or running RLHF / preference-based feedback loops for domain-specific improvement
Experience in education SaaS or with education-sector customers (districts, schools, state agencies)
Familiarity with Salesforce or similar CRM platforms and their API/data ecosystems
Experience with evaluation tooling, custom eval harnesses, or LLM-as-judge approaches
Background working with conversational AI, chatbots, or customer-facing generative AI features
Proven ability to operate in a fast-paced, goal-oriented startup environment and manage multiple concurrent workstreams

Responsibilities

Design, build, and continuously improve LLM-powered systems across Amira's product and operations — from internal tools to customer-facing features
Own RAG pipelines end-to-end: document ingestion, chunking strategy, embedding selection, retrieval tuning, and response synthesis
Develop and enforce guardrails, grounding strategies, and confidence thresholds to mitigate hallucination and ensure output reliability
Architect prompt chains and agent workflows that are robust, maintainable, and cost-effective at scale
Design and operate evaluation frameworks to measure system accuracy, helpfulness, hallucination rate, and task completion across generative AI features
Fine-tune and adapt foundation models for domain-specific tasks, including data curation, training pipeline setup, and performance benchmarking
Implement automated and human-in-the-loop review processes to catch and correct problematic outputs
Monitor production traffic, identify failure modes, and iterate rapidly on retrieval, prompting, and generation strategies
Integrate LLM-powered features with internal systems and third-party platforms (e.g., Salesforce, CRM tools) via APIs, connectors, and data sync workflows
Contribute to shared ML infrastructure and tooling used across Amira's AI systems
Help explore and implement solutions that make generative AI economically viable within the budget constraints typical of public schools and education SaaS
Partner with learning design, content, product, and customer success teams to ensure AI systems are grounded in accurate, up-to-date domain knowledge
Translate business needs into well-scoped generative AI solutions and communicate tradeoffs clearly to non-technical stakeholders