Software Development Engineer – Agentic AI

Trellix•Frisco, TX

23h

About The Position

Join our innovative team at Trellix, where you'll lead the design and development of a cutting-edge generative AI platform powering advanced AI capabilities across the entire Trellix security portfolio. This isn't prototype work. You'll be building and operating production agentic systems deployed in federal environments, enabling autonomous SOC workflows, multi-agent orchestration, and seamless AI integration across our security products. We're looking for a highly skilled Software Development Engineer with a passion for building robust, scalable, and secure AI solutions that operate at real-world scale.

Requirements

5+ years of professional experience in Python application software development, with demonstrated experience building and operating production AI or platform systems.
Linux proficiency is a must
Excellent development and debugging skills in Python
Strong grasp of data structures and design patterns
Proficiency with REST and async Web APIs
CI/CD pipeline experience (GitHub Actions or equivalent)
Strong written communication for design docs and async collaboration
Ability to operate with autonomy in a fast-moving, ambiguous environment
Hands-on experience with Large Language Models (LLMs) in production
Langchain experience required, including building and operating stateful multi-agent workflows
Experience with prompt orchestration and chain composition
Familiarity with Agentic AI concepts and patterns (ReACT, chain-of-thought, tool use, Deep Agents)
Experience deploying and operating vLLM for self-hosted inference
Familiarity with MCP (Model Context Protocol) for agentic tool integration
FastAPI (preferred)
Node.js / TypeScript for tooling and API integration layers (preferred)
Postgres (preferred)
Knowledge Graphs, including NebulaGraph or equivalent (preferred)
Vector Databases, including Qdrant or equivalent (preferred)
Embedding pipeline experience including chunking strategies and retrieval tuning (preferred)
Gunicorn or Uvicorn
OpenTelemetry (OTEL) instrumentation
Redis (preferred)
Langfuse or LangSmith for agent observability (preferred)
Kubernetes (preferred)
AWS: RDS, EKS, Elasticache, Bedrock (preferred)
Shell scripting (preferred)

Nice To Haves

Working knowledge of threat detection, EDR telemetry, SOC workflows, or SIEM platforms strongly preferred
Understanding of Security Incident and Event Management (SIEM) and Incident Response a plus

Responsibilities

Lead the design and development of our generative AI platform, driving core functionality, agentic workflows, and platform-level features from concept through production.
Take full ownership of features and functions, from initial design and development through rigorous testing, automation, and ongoing operational health.
Build, iterate on, and harden multi-agent pipelines, including tool use, inter-agent coordination, and autonomous decision workflows for security operations.
Ensure solutions are delivered on time, within budget, and to the highest quality standards, meeting project goals and customer commitments.
Proactively implement best practices to ensure applications are highly resilient, secure, and performant, with particular attention to the sensitivity of security operations data.
Design and implement instrumentation using OpenTelemetry, contribute to operational dashboards, and surface platform health and usage insights to engineering leadership and stakeholders.
Analyze feature requirements and produce detailed design documentation, architectural decision records, and async-friendly technical specs.
Own production issues end-to-end, including triage, root cause analysis, post-mortems, and SLA commitments, for a platform operating in high-stakes environments.