About The Position

We are sharing a specialised part-time consulting opportunity for Bengali-English bilingual professionals experienced in AI safety evaluation, red team testing, adversarial review, vulnerability classification, and structured feedback on sensitive text-based AI outputs. This role supports current and upcoming remote consulting opportunities focused on AI safety evaluation, bilingual red team testing, conversational model assessment, misuse-risk review, vulnerability annotation, and high-quality project execution. Selected professionals will test AI systems using structured adversarial scenarios, identify safety weaknesses, classify risks, and produce clear English-language evaluation artifacts across English and Bengali contexts.

Requirements

  • Native-level fluency in both English and Bengali.
  • Prior experience in AI red teaming, adversarial testing, cybersecurity, trust and safety, socio-technical risk review, or conversational AI evaluation.
  • Ability to think adversarially while staying structured, careful, and methodical.
  • Experience using frameworks, benchmarks, or rubrics rather than unstructured testing alone.
  • Strong written communication skills and ability to explain safety findings clearly.
  • Comfort reviewing text-based content involving sensitive topics under clear guidelines.
  • Adaptability across project types, safety categories, and evaluation workflows.

Nice To Haves

  • Experience with adversarial ML concepts, jailbreak datasets, prompt injection, RLHF/DPO attack patterns, or model behavior testing.
  • Cybersecurity experience such as penetration testing, exploit analysis, reverse engineering, or security assessment.
  • Socio-technical risk experience involving harassment, misinformation, abuse analysis, bias testing, or conversational AI safety.
  • Creative probing background, including psychology, acting, writing, role-play design, or unconventional adversarial thinking.
  • Experience producing reproducible reports, labeled datasets, structured risk notes, or benchmark-style evaluation artifacts.

Responsibilities

  • Bilingual AI Safety & Red Team Testing: Review English and Bengali AI outputs for safety, reliability, bias, misinformation, and harmful-behavior risks.
  • Stress-test conversational AI models and agents using structured adversarial scenarios.
  • Evaluate model behavior across multi-turn conversations, sensitive topics, and edge-case prompts.
  • Identify vulnerabilities that require stronger safety controls, clearer refusals, or improved response quality.
  • Vulnerability Classification & Risk Review: Annotate failures, classify vulnerabilities, and flag recurring safety patterns.
  • Apply taxonomies, benchmarks, and project-specific playbooks to keep testing consistent.
  • Assess misuse cases, bias exploitation, prompt-injection scenarios, and socio-technical risk patterns at a high level.
  • Generate high-quality human evaluation data through careful review and structured judgment.
  • Reproducible Documentation & Evaluation Artifacts: Produce clear reports, datasets, test cases, and written summaries that support model improvement.
  • Document findings reproducibly so results can be reviewed, compared, and acted upon.
  • Explain risks clearly for both technical and non-technical audiences.
  • Maintain accuracy, consistency, and strong attention to detail across submitted evaluations.

Benefits

  • Competitive hourly compensation
  • Flexible scheduling
  • Remote structure
  • Weekly payments
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service