Decision Intelligence Engineer - Next Best Action

Humana
β€’$129,300 - $177,800β€’Remote

About The Position

Humana is seeking a skilled Decision Intelligence Engineer to design, train, and improve the reinforcement learning policy at the core of their Next Best Action platform. This role is hands-on and research-oriented, involving the design and evaluation of decision-making algorithms, and the instrumentation of training pipelines. The engineer will collaborate with data and platform engineers to ensure the system operates correctly within clinical eligibility rules and program-specific objectives.

Requirements

  • 8+ years of software engineering or quantitative research experience building and operating large-scale production systems, with emphasis on data-intensive platforms, recommendation systems, optimization engines, or simulation frameworks serving millions of users.
  • 3+ years of hands-on experience implementing reinforcement learning, operations research methods, or simulation-driven decision systems in production.
  • Relevant backgrounds include policy gradient and value-based RL (PPO, A3C, DQN, CQL), stochastic dynamic programming, discrete-event simulation, or large-scale combinatorial or constrained optimization.
  • Deep familiarity with Markov Decision Processes, Bellman-equation-based value estimation, reward or objective shaping, exploration-exploitation tradeoffs, and constraint formulation in real-world decision systems.
  • Demonstrated ability to diagnose failure modes in learned or optimized policies: instability, poor credit assignment across long horizons, and distributional shift across large populations.
  • Proficiency in Python 3.x.
  • Experience with PyTorch or TensorFlow for policy network or learned model implementation.
  • Experience with Ray RLlib or equivalent distributed computation frameworks for large-scale training or optimization.
  • Experience with Databricks, PySpark, and Delta Lake for large-scale ML or data pipelines processing tens of millions of records.
  • Experience with MLflow for experiment tracking, model registry, and artifact management.
  • Experience with shipping systems that operate reliably under production load, not just research or prototype work.

Nice To Haves

  • Experience with multi-agent RL frameworks (PettingZoo or equivalent) or multi-agent simulation and coordination methods.
  • Familiarity with operations research methods applicable to constrained sequential decisioning: linear programming, mixed-integer programming, Lagrangian relaxation, or constraint programming.
  • Experience operating decision or optimization systems in regulated domains (healthcare, finance, or insurance) where member safety, auditability, and explainability are requirements.
  • Experience building simulation environments using Gymnasium, SimPy, AnyLogic, or equivalent frameworks for policy evaluation and backtesting.
  • Familiarity with event-driven feedback loops and how disposition signals feed retraining or re-optimization pipelines.
  • OpenTelemetry instrumentation experience for ML or optimization pipeline observability.

Responsibilities

  • Design, implement, and evaluate algorithms suited to long-horizon, sparse-reward sequential decision-making in healthcare, including reinforcement learning methods (PPO, A3C, DQN, CQL, Decision Transformer), dynamic programming, and constrained optimization.
  • Frame member decisioning problems as Markov Decision Processes (MDPs) or Partially Observable MDPs, defining state representations, action spaces, transition dynamics, and reward structures.
  • Apply Bellman-equation-based value estimation, reward shaping, and constraint formulations to encode clinical eligibility, suppression rules, and program-specific objectives.
  • Manage exploration-exploitation tradeoffs appropriate for a production healthcare environment.
  • Model member journey dynamics using tools from stochastic processes, simulation, or probabilistic graphical models.
  • Build simulation and backtesting environments (discrete-event simulation, Monte Carlo methods) to evaluate policy or decision quality before production promotion.
  • Diagnose and remediate failure modes specific to learned or optimized policies, such as policy collapse, credit assignment errors, distributional shift, and constraint violations.
  • Define performance threshold criteria and automated evaluation gates within the nightly Databricks training workflow.
  • Instrument training and optimization runs with MLflow tracking.
  • Own the nightly Databricks training workflow, including feature engineering, distributed training (Ray RLlib), and batch scoring.
  • Collaborate with the Data Engineering team to ensure training inputs, reward signals, and feature pipelines are reproducible and auditable.
  • Write production-quality PySpark feature engineering jobs and maintain data lineage through Databricks Unity Catalog.
  • Manage model artifacts, versioning, and lifecycle in the MLflow Model Registry, ensuring rollback capability.
  • Apply multi-agent decision-making concepts (MARL) where member household or population-level coordination is required.
  • Implement constraint handling to enforce hard business rules (member caps, cooldown periods, clinical eligibility) within the optimization objective.
  • Collaborate with rules engine stakeholders to ensure eligibility guards and policy priorities are correctly aligned.
  • Partner with decision engine and rules engine teams to integrate model outputs cleanly with the real-time decisioning hot path.
  • Collaborate with platform architects to define feedback loop contracts for disposition outcomes.
  • Document model behavior, known limitations, and failure modes for clinical and compliance stakeholders.
  • Use AI-assisted engineering tools for scaffolding, testing, and documentation, ensuring core model logic remains human-authored and peer-reviewed.

Benefits

  • Medical benefits
  • Dental benefits
  • Vision benefits
  • 401(k) retirement savings plan
  • Time off (including paid time off, company and personal holidays, volunteer time off, paid parental and caregiver leave)
  • Short-term disability
  • Long-term disability
  • Life insurance
  • Bonus incentive plan
Β© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service