Senior Infrastructure/ DevOps Engineer Fintech

Lazer

1d•Remote

About The Position

Lazer is a world-class digital product studio composed of 180+ senior engineers and designers with backgrounds from companies like Apple, Google, Coinbase, and more. With our product experience, we have designed, engineered, and grown products from $0 to $200M in revenue. Clients seek out our help because we have the talent to deeply understand their needs and provide industry, technical, or product insights that are uniquely valuable to their efforts. Our clients range from early-stage startups and venture studios to recognizable retail brands and exciting enterprises. Some of our notable clients include Google, Shopify, Coinbase, Alchemy, Hinge, OVO, Polymarket, and more. We are a remote-first organization headquartered in Toronto, Ontario, with employees worldwide. We believe in providing the best experience possible for all Lazerites by fostering a strong community through regular events, company vacations, competitive compensation, unlimited PTO, and more! Join Lazer and help us solve problems and build the next generation of products!

Requirements

Minimum of 5 years dedicated experience in DevOps, Infrastructure, or SRE roles.
Expert with Docker, Kubernetes (k8s), and Terraform/Pulumi.
Deep, proven expertise in either AWS or GCP infrastructure, with the ability to quickly grasp and transition to other cloud providers.
Strong ability to write clean, maintainable code for automation in Go, Python, or Node.js.
Demonstrable experience implementing and maintaining modern cloud security controls and meeting key compliance standards (SOC 2, PIPEDA, HIPAA, and/or GDPR).
Proven ability to quickly onboard, diagnose problems, and propose and implement solutions with minimal oversight.
Experienced in a consultant or freelancer capacity, with the ability to understand and communicate effectively with both technical and non-technical stakeholders.

Nice To Haves

Production AI/agent experience: Hands-on experience running LLM or agent systems in production, including how they fail differently from deterministic services: nondeterministic outputs that break conventional testing and alerting, runaway token and inference cost, and partial failures on multi-step chains.
AI observability and cost control: Tracing multi-step agent runs, treating token cost, latency, and output quality as first-class metrics, and keeping inference spend in check with budgets, rate limiting, and caching (Langfuse, LangSmith, Arize, or similar).
The infrastructure AI systems run on: Model gateways and provider routing with failover (LiteLLM, Bedrock, Vertex), durable execution for long-running multi-step workflows (Temporal, Step Functions, Inngest), eval and regression pipelines for prompt or model changes, and the retrieval, vector-store, and context plumbing these systems depend on (including MCP). Vector databases and GPU/TPU compute where relevant.
Domain experience in fintech or crypto/web3 environments.
Crypto/web3 infrastructure: running nodes (Ethereum, Solana, or others), indexing solutions (The Graph, custom indexers), or RPC infrastructure.
Payment processing, ledger architecture, or financial transaction systems, and meeting compliance requirements in regulated environments.
High-volume, mission-critical systems: real-time data flows, websocket feeds, payment rails, or distributed architectures handling millions of transactions.
AWS or GCP cloud certifications are a plus, not mandatory.
Advanced monitoring (Prometheus, Datadog) or logging experience.

Responsibilities

Quickly implement and adapt infrastructure using Terraform, Pulumi, or other major IaC tools.
Docker is critical. Deeply understand how to design, build, and optimize secure, multi-stage Dockerfiles.
Design, build, and manage robust CI/CD pipelines to automate testing, building, and deployment across environments.
Provision and manage foundational services. Deep expertise in one major provider is required, transferable to the other.
Expertise in at least one major container platform: EKS, GKE, ECS, Fargate, or Cloud Run. (Kubernetes is highly valued, particularly EKS or GKE.)
Know when to use load balancers, VPNs for secure connectivity, and private VPCs for isolation. Apply subnetting, routing, VPC peering, and NAT gateways to build secure systems.
S3 (AWS) or Cloud Storage (GCP).
RDS (AWS) or CloudSQL (GCP).
Deploy event-driven components using AWS Lambda, GCP Cloud Functions, or equivalents.
CDNs and message queues.
Protect PII; apply encryption, secrets management, network firewalls, and web application firewalls (AWS WAF, GCP Cloud Armor) following security best practices.
Write high-quality automation and tooling in Go, Python, Node.js, or Bash for client-specific operational challenges.
Ensure robust monitoring and high system uptime.