Senior DevOps Engineer

Radwell InternationalWillingboro Township, NJ
$145,000 - $175,000

About The Position

We’re building the next generation of platform engineering and AIOps capabilities to power Radwell’s digital ecosystem. As a Senior DevOps Engineer, you’ll be a hands-on technical leader who lays down the standards for CI/CD, infrastructure-as-code, observability, and ML Ops—while partnering closely with IT Operations, Security, Data Engineering, and Product teams.

Requirements

  • Experience mentoring distributed teams and partnering across time zones.
  • Prior collaboration with Security/IT Ops on incident response, tabletop exercises, and compliance audits.
  • Proven AIOps implementations: anomaly detection, correlation/RCA, forecasting, and automated remediation—using platform features and/or bespoke ML.
  • Deep experience with AWS and/or Azure, Kubernetes (AKS/EKS) or ECS, container registries, and service meshes.
  • Expert in Terraform or CloudFormation, GitHub Actions/Azure DevOps, and environment promotion strategies.
  • Hands-on with observability stacks and OpenTelemetry (e.g., Prometheus/Grafana, ELK/Opensearch, Datadog, Splunk, Azure Monitor, CloudWatch/X-Ray).
  • Solid ML Ops toolkit familiarity (e.g., MLflow, SageMaker, Azure ML, Databricks), feature stores, model registries, and testing/rollback strategies.
  • Strong grasp of security & compliance in pipelines and infra: IAM, KMS, secrets, SAST/DAST/SCA, policy-as-code.
  • Background with event streaming (Kafka/MSK/EventBridge), API gateways, and zero trust networking (mTLS, boundary controls).
  • Familiarity with LLMOps (prompt/version control, grounding evals, token/cost telemetry) and RAG production patterns.
  • Experience hardening eCommerce and revenue-critical flows (search, pricing, invoicing) at scale.
  • Bachelor’s degree in Information Technology, Computer Science, Business, or related field preferred.
  • High school diploma or equivalent required.
  • 8+ years in DevOps/SRE/Platform Engineering, including 3+ years leading standards or mentoring engineers.

Responsibilities

  • Build and lead (as a senior IC and technical mentor) a Platform Engineering & DevOps function responsible for:
  • Standardized CI/CD pipelines (e.g., GitHub Actions, Azure DevOps) for apps, APIs, and ML workloads.
  • Infrastructure‑as‑Code for cloud resources (AWS/Azure), Kubernetes/ECS, databases, and data/AI infrastructure (repeatable, versioned, and policy‑as‑code enforced).
  • Secure, compliant, and repeatable environments for development, testing, staging, and production (secrets management, identity & access, network policies, artifact signing).
  • Design and implement an AIOps strategy that uses AI/ML to operate Radwell’s digital ecosystem:
  • Intelligent monitoring for web, ERP, CRM, AI services, and integrations.
  • Anomaly detection, proactive incident prevention, and noise‑reduced alerting.
  • Automated root‑cause analysis and self‑healing workflows for critical paths.
  • Partner with IT Operations and Security to build a unified observability stack (logs, metrics, traces, events) that feeds AIOps and SRE practices.
  • Establish ML CI/CD patterns: automated training, validation, security gates, model/package versioning, and canary/blue‑green rollouts for batch and online serving.
  • Stand up model registry, feature store, and drift/quality guardrails (data contracts, statistical monitoring, hallucination/grounding metrics as applicable).
  • Engineer reproducible pipelines for data prep, training, evaluation, deployment, and rollback; integrate with experiment tracking and cost/usage telemetry.
  • Collaborate with Data Engineering to productionize feature delivery and ensure lineage, governance, and privacy compliance are baked into pipelines.
  • Define SLIs/SLOs across critical user journeys; drive error budgets and reliability backlogs.
  • Reduce alert fatigue via correlation, deduplication, and ML‑based signal enrichment; sharpen MTTD/MTTR and change failure rate.
  • Champion FinOps (cost visibility and efficiency) across compute, storage, and data/AI workloads.
  • Document standards, publish internal runbooks/playbooks, and enable teams through training and code examples.

Benefits

  • Radwell offers a comprehensive benefits package including health, dental, and vision coverage.
  • The Company provides company sponsored short-term and long-term disability benefits, as well as $50,000 in Life insurance.
  • These benefits, along with additional voluntary benefits, are available to all regular full-time employees beginning on first day of employment.
  • All employees are automatically enrolled at 3% into the Company’s 401(k) Plan on the first of the month following 90 days of continuous employment.
  • Employees are eligible for common paid Company Holidays and 15 days of PTO annually, which begin accruing on first date of employment and may be used immediately upon joining the team.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service