Senior Cloud Infrastructure Engineer

Soft Tech ConsultingRockville, MD
Remote

About The Position

Soft Tech Consulting, Inc. is seeking a Senior Cloud Infrastructure Engineer to join their program, delivering innovative cloud solutions for a government client. The program is responsible for designing, implementing, and managing enterprise infrastructure for hosting applications and solutions, including both on-prem and cloud. In this role, the Senior Cloud Infrastructure Engineer will be a valuable member of a DevOps team, designing and implementing cloud infrastructure and automation. This position leverages deep technical knowledge to support a custom ChatGPT model for the intramural community to address scientific research challenges. The ability to solve problems and a collaborative approach will be instrumental in guiding the team towards scalable and efficient solutions that meet evolving needs. The company invites individuals passionate about crafting transformative solutions and thriving in a fast-paced, collaborative environment to join their team.

Requirements

  • US CITIZEN; GREEN CARD HOLDER
  • MUST BE ABLE TO OBTAIN PUBLIC TRUST
  • BS/BA (or equivalent)
  • Minimum of 10 years related experienced
  • Excellent written and communication skills
  • Strong troubleshooting skills
  • Minimum of 10 years’ experience as a cloud engineer with cloud and enterprise infrastructure technologies in a medium to large enterprise
  • Hands-on experience in system administration, automation frameworks, patch management, monitoring, certificate management and data protection and backup approaches.
  • Hands-on Azure experience to include: Implementing and supporting Azure AI Foundry, serverless compute, vector databases/search platforms, and knowledge integration architectures (RAG patterns preferred).
  • Hands-on Azure experience to include: Implementing and supporting RBAC, identity management, and conditional access policies via Azure AD / Entra ID
  • Hands-on Azure experience to include: Monitoring and performance tuning using Azure Monitor, Log Analytics, and Alerts
  • Hands-on Azure experience to include: Deploying, managing, and troubleshooting network components such as Load Balancers, Virtual Networks (VNets), API gateways, Firewalls, and Route Tables.
  • Experience designing and building CI/CD pipelines
  • Infrastructure-as-code experience to include experience with ARM templates and/or Terraform, writing custom modules from scratch, and helping guide and contribute to a large and growing codebase

Nice To Haves

  • Experience managing and integrating AI models and tools such as ChatGPT, Gemini, and DALL-E.
  • Experience working in a life-sciences oriented environment
  • Writing code (Powershell, Python, Ruby, etc.) from scratch to solve problems
  • Experience using Git to manage shared software configuration code bases

Responsibilities

  • Work closely with all relevant stakeholders to design, secure, and implement Azure cloud infrastructure solutions for multi-modal LLM and RAG-based architectures, including Azure AI Foundry, vector indexing, knowledge source integration, and distributed cloud AI services.
  • Support evolving features such as agentic capabilities, MCP integrations, and enterprise knowledge connectivity.
  • Use automation and codify best practices to enhance scalability, resiliency, and operational efficiency across cloud infrastructure.
  • Focus on streamlining system administration, reducing manual effort, and improving system reliability through scripting, configuration management, and modern ops methodologies.
  • Use second-order thinking to identify the short, medium, and long term consequences of any architecture and decisions to identify risks, understand impact, meet requirements, and continue to drive customer value.
  • Use knowledge and experience with supporting application development lifecycles to recommend and engineer CI/CD pipelines to deploy code, conduct security scans and perform application health checks.
  • Monitor system performance, reliability, and cost optimization, including logging, telemetry, incident response, and cloud resource governance.
  • Participate in audits by providing artifacts to address NIST 800-53 rev5 controls and support the requirement to maintain an Authority to Operate (ATO).

Benefits

  • Medical
  • Dental
  • Vision
  • 401K
  • Short Term Disability
  • Long Term Disability
  • Life Insurance
  • PTO
  • Paid Holidays
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service