TalTeam-posted about 2 months ago
Full-time • Mid Level
Onsite • San Jose, CA
101-250 employees

What you'll do (Responsibilities) Own the technical roadmap for Verilog/RTL‐focused LLM capabilities—from model selection and adaptation to evaluation, deployment, and continuous improvement. Lead a hands‐on team of applied scientists/engineers: set direction, unblock technically, review designs/code, and raise the bar on experimentation velocity and reliability. Fine‐tune and customize models using state‐of‐the‐art techniques (LoRA/QLoRA, PEFT, instruction tuning, preference optimization/RLAIF) with robust HDL-specific evals: Compile‐/lint‐/simulate‐based pass rates, pass@k for code generation, constrained decoding to enforce syntax, and "does‐it‐synthesize” checks. Design privacy‐first ML pipelines on AWS: Training/customization and hosting using Amazon Bedrock (including Anthropic models) where appropriate; SageMaker (or EKS + KServe/Triton/DJL) for bespoke training needs. Artifacts in S3 with KMS CMKs; isolated VPC subnets & PrivateLink (including Bedrock VPC endpoints), IAM least‐privilege, CloudTrail auditing, and Secrets Manager for credentials. Enforce encryption in transit/at rest, data minimization, no public egress for customer/RTL corpora. Stand up dependable model serving: Bedrock model invocation where it fits, and/or low‐latency self-hosted inference (vLLM/TensorRT‐LLM), autoscaling, and canary/blue-green rollouts. Build an evaluation culture: automatic regression suites that run HDL compilers/simulators, measure behavioral fidelity, and detect hallucinations/constraint violations; model cards and experiment tracking (MLflow/Weights & Biases). Partner deeply with hardware design, CAD/EDA, Security, and Legal to source/prepare datasets (anonymization, redaction, licensing), define acceptance gates, and meet compliance requirements. Drive productization: integrate LLMs with internal developer tools (IDEs/plug‐ins, code review bots, CI), retrieval (RAG) over internal HDL repos/specs, and safe tool‐use/function-calling. Mentor & uplevel: coach ICs on LLM best practices, reproducible training, critical paper reading, and building secure‐by‐default systems.

  • Own the technical roadmap for Verilog/RTL‐focused LLM capabilities—from model selection and adaptation to evaluation, deployment, and continuous improvement.
  • Lead a hands‐on team of applied scientists/engineers: set direction, unblock technically, review designs/code, and raise the bar on experimentation velocity and reliability.
  • Fine‐tune and customize models using state‐of‐the‐art techniques (LoRA/QLoRA, PEFT, instruction tuning, preference optimization/RLAIF) with robust HDL-specific evals: Compile‐/lint‐/simulate‐based pass rates, pass@k for code generation, constrained decoding to enforce syntax, and "does‐it‐synthesize” checks.
  • Design privacy‐first ML pipelines on AWS: Training/customization and hosting using Amazon Bedrock (including Anthropic models) where appropriate; SageMaker (or EKS + KServe/Triton/DJL) for bespoke training needs. Artifacts in S3 with KMS CMKs; isolated VPC subnets & PrivateLink (including Bedrock VPC endpoints), IAM least‐privilege, CloudTrail auditing, and Secrets Manager for credentials. Enforce encryption in transit/at rest, data minimization, no public egress for customer/RTL corpora.
  • Stand up dependable model serving: Bedrock model invocation where it fits, and/or low‐latency self-hosted inference (vLLM/TensorRT‐LLM), autoscaling, and canary/blue-green rollouts.
  • Build an evaluation culture: automatic regression suites that run HDL compilers/simulators, measure behavioral fidelity, and detect hallucinations/constraint violations; model cards and experiment tracking (MLflow/Weights & Biases).
  • Partner deeply with hardware design, CAD/EDA, Security, and Legal to source/prepare datasets (anonymization, redaction, licensing), define acceptance gates, and meet compliance requirements.
  • Drive productization: integrate LLMs with internal developer tools (IDEs/plug‐ins, code review bots, CI), retrieval (RAG) over internal HDL repos/specs, and safe tool‐use/function-calling.
  • Mentor & uplevel: coach ICs on LLM best practices, reproducible training, critical paper reading, and building secure‐by‐default systems.
  • 10+ years total engineering experience with 5+ years in ML/AI or large‐scale distributed systems; 3+ years working directly with transformers/LLMs.
  • Proven track record shipping LLM‐powered features in production and leading ambiguous, cross‐functional initiatives at Staff level.
  • Deep hands‐on skill with PyTorch, Hugging Face Transformers/PEFT/TRL, distributed training (DeepSpeed/FSDP), quantization‐aware fine‐tuning (LoRA/QLoRA), and constrained/grammar‐guided decoding.
  • AWS expertise to design and defend secure enterprise deployments, including: Amazon Bedrock (model selection, Anthropic model usage, model customization, Guardrails, Knowledge Bases, Bedrock runtime APIs, VPC endpoints) SageMaker (Training, Inference, Pipelines), S3, EC2/EKS/ECR, VPC/Subnets/Security Groups, IAM, KMS, PrivateLink, CloudWatch/CloudTrail, Step Functions, Batch, Secrets Manager.
  • Strong software engineering fundamentals: testing, CI/CD, observability, performance tuning; Python a must (bonus for Go/Java/C++).
  • Demonstrated ability to set technical vision and influence across teams; excellent written and verbal communication for execs and engineers.
  • Familiarity with Verilog/SystemVerilog/RTL workflows: lint, synthesis, timing closure, simulation, formal, test benches, and EDA tools (Synopsys/Cadence/Mentor).
  • Experience integrating static analysis/AST‐aware tokenization for code models or grammar-constrained decoding.
  • RAG at scale over code/specs (vector stores, chunking strategies), tool‐use/function-calling for code transformation.
  • Inference optimization: TensorRT‐LLM, KV‐cache optimization, speculative decoding; throughput/latency trade‐offs at batch and token levels.
  • Model governance/safety in the enterprise: model cards, red‐teaming, secure eval data handling; exposure to SOC2/ISO 27001/NIST frameworks.
  • Data anonymization, DLP scanning, and code de‐identification to protect IP.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service