Software Engineer, Artificial Intelligence/LLM (Multiple Seniority Levels)

Beacon AI•San Carlos, CA

1d•Hybrid

About The Position

Beacon AI is building an AI platform to enhance aviation safety, efficiency, and capability. Backed by investors and with multiple Department of Defense contracts and airline partnerships, the company operates with a focus on small, agile teams that own their work and innovate rapidly. The role involves shipping LLM-powered product features end-to-end, including designing retrieval and tool-calling flows, developing services, implementing evaluations and guardrails, and monitoring production performance. Collaboration with ML/infra and product teams is key, with an emphasis on reliability in a safety-critical domain. The company is hiring across multiple seniority levels, with senior engineers owning features and staff engineers owning systems and technical direction.

Requirements

Shipped LLM apps: You’ve put LLM features in front of users and improved them with data.
Strong builder: Comfortable writing production code, tests, and docs. You keep things simple and observable.
RAG and tools depth: You understand embeddings, chunking, vector search tradeoffs, and function calling.
Quality mindset: You design evals, define success metrics, and iterate based on evidence.
Cost and latency aware: You track p95, hit SLAs, and reduce cost without hurting quality.
Clear communicator: You explain tradeoffs and align partners across product, infra, and security.
Must be a U.S. Person (U.S. citizen, Green Card holder, lawful permanent resident, or individual granted asylum or refugee status).
All work must be performed in the United States.

Nice To Haves

Experience with Bedrock, OpenSearch Serverless, pgvector, Pinecone, or Weaviate.
Prompt versioning, guardrails, and provider routing in production.
Multimodal work with time series or video.
Familiarity with GPU inference, Triton, or TensorRT-LLM.
Aviation or other safety-critical domain exposure.
DevOps basics for CI/CD, IaC, and secure secrets handling.

Responsibilities

Build user-facing LLM features
Design and implement retrieval-augmented generation and tool-calling flows using frameworks like LangChain or equivalent primitives.
Deliver robust JSON and schema-bound outputs with validation, retries, and fallbacks.
Add function calling to integrate with internal tools, search, routing, and data services.
Own the service layer by shipping APIs and workers in Python or TypeScript with clear contracts, streaming, and backoff.
Add caching, request shaping, prompt templates, and context packing to control latency and cost.
Integrate with AWS Bedrock, OpenAI, Anthropic, or self-hosted endpoints as needed.
Collaborate with infrastructure teammates to develop chunking, embeddings, and indexing capabilities for documents, time series, and multimedia.
Choose and tune vector backends such as OpenSearch, pgvector, or Pinecone.
Keep knowledge bases fresh with data syncs from S3, Aurora, DynamoDB, and external sources.
Create offline evals and golden sets for prompts, retrievers, and tools.
Stand up online metrics for task success, hallucination rate, retrieval precision/recall, p95 latency, and cost per request.
Run A/B tests and prompt/version rollouts with guardrails and canaries.
Implement content and policy checks, PII detection and redaction, access controls, and auditing.
Design human-in-the-loop paths for sensitive actions.
Handle aviation data with care and follow internal security standards.
Add tracing, logs, and dashboards for model calls, token usage, errors, and saturation.
Debug tricky failures across retrieval, prompts, tools, and providers.