Ai QA Engineer II/III

IntelePeer•Dania Beach, FL

About The Position

IntelePeer delivers rapidly deployable communications solutions for an always connected world. Our Conversational AI Platform instantly improves your customers’ communications experience. IntelePeer provides industry leading time-to-value with Agentic AI solutions that work seamlessly with your infrastructure. Our no-code templates, low-code, co-creation, and developer API options provide you with simple, easy-to-use tools that can be utilized by anyone. Job Summary: The AI QA Engineer II/III leads the end-to-end quality validation of AI models and conversational agents to ensure they deliver accurate, safe, and contextually relevant user experiences. This role analyzes conversational requirements to design specialized LLM prompt testing, adversarial edge cases, and dialogue flow evaluations that safeguard against hallucinations and bias. By integrating real-world conversation analytics with functional and regression testing, the engineer monitors model performance and intent recognition to ensure reliability. Through meticulous documentation of testing objectives and results, the role ensures that every conversational interaction meets established quality standards and user expectations before deployment.

Requirements

2-4 years of experience in QA, LLM prompt testing, and conversational AI for level II and 4-7 years’ experience for level III
Bachelor’s degree in computer science, Data Science, Linguistics, Cognitive Science, HCI or related AI– adjacent field.
Solid understanding of LLM behavior (hallucination patterns, determinism, prompt sensitivity)
Progressive experience testing API’s, conversational UX and/or machine learning
Strong analytical skills for pattern recognition in model outputs
Clear, concise communications skills
The ability to work in a fast-paced environment and be adaptable to change.
Strong initiative, self-motivated, proactive, and resourceful.
Team player who is willing to go above and beyond to help others
Sedentary work lifting no more than 10 pounds.
Occasional lifting, carrying, and standing.
Frequent hand/eye coordination to operate office equipment.
Vision sufficient to read computer screens, reports, and related department documents.
Dexterity to operate computer keyboards and other related office equipment.
Endurance sufficient to sit and work at a computer for extended periods of time.
Frequent speech communication and hearing.

Nice To Haves

Coursework or projects involving NLP, LLM prompt engineering, model evaluation or automated testing frameworks.

Responsibilities

Build prompt‑based test cases to evaluate LLM outputs for correctness, stability, safety, and non‑functional targets (latency, determinism, cost).
Execute scripted model tests to validate agent behaviors across intents, flows, and edge‑cases.
Maintain voice‑to‑voice and API regression tests to detect model drift or unintended degradation.
Use Smart Analytics/Analysis Agents to review real user interactions and identify issues (misclassification, routing errors, hallucinations).
Summarize patterns and provide example‑driven insights to internal teams.
Apply HITL policies for low‑confidence model predictions, unknown intents, or out‑of‑scope cases.
Document CX and accuracy impact for model improvement cycles.
Publish evaluation results in TestRail with structured evidence.
Use the nuanced pass taxonomy (not binary pass/fail) to communicate model readiness.

Benefits

Unlimited Vacation for exempt employees
Paid Holidays
Competitive medical, dental & vision insurance for employees and their dependents
401K Retirement Plan
Stock Options
Company-paid life insurance
Health & Flexible Savings Accounts
Cell phone, gym, and internet reimbursement
Paid Parental Leave
Tuition Reimbursement
Employee Assistance Program (EAP)
Free snacks (Denver, and or Fort Lauderdale)
Fun events (virtual and in-person)