We are seeking a skilled and proactive Cloud Platform Engineer to design, deploy, and operate Azure-native infrastructure powering InvoiceCloud's enterprise SaaS platform. You will own AKS clusters and supporting services, implement infrastructure as code with Terraform/Ansible, and optimize Azure networking (VNets, NSGs, load balancers, VPNs) for scale and resiliency. The role integrates Cloudflare (DNS, CDN, WAF, Zero Trust) to secure and accelerate edge traffic and partners closely with application teams running Windows/IIS and modern & legacy .NET workloads. You will enhance CI/CD pipelines, establish robust monitoring/alerting with Azure Monitor and Log Analytics, and uphold governance (e.g., RBAC, policies) across environments. As part of the on-call rotation, you will lead incident response, drive root cause analysis, and implement preventive automation to improve reliability over time. Success Profile: At InvoiceCloud, success is anchored in our core competencies-Results Driven, Takes Ownership, Drives Efficiency, and Innovative-which guide how every employee delivers impact. Results Driven Deliver reliable AKS platforms and Azure networking that meet availability, performance, and security objectives; validate outcomes with actionable SLOs and dashboards. Execute changes through CI/CD with rigorous testing and phased rollouts to minimize risk while meeting delivery timelines. Lead incident response to rapid resolution, capture learnings, and implement durable fixes that measurably reduce repeat issues. Takes Ownership Own end-to-end IaC (Terraform modules, Ansible playbooks) and AKS baselines; maintain versioned patterns that teams can consume with confidence. Anticipate platform needs (capacity, security posture, cost) and drive proposals that balance reliability, performance, and spend. Partner with dev/security/ops to align on standards (RBAC, policies, WAF/Zero Trust rules) and ensure consistent, compliant environments. Drives Efficiency Standardize and templatize Terraform for reusable, secure-by-default provisioning; automate day-2 ops (scaling, patching, backups). Improve CI/CD throughput with environment promotions, automated validations, and safe rollbacks for .NET/IIS and containerized apps. Optimize observability (Azure Monitor/Log Analytics; optional Prometheus/Grafana/New Relic) to cut MTTR and reduce alert noise. Innovative Evaluate and implement Azure/AKS and Cloudflare capabilities (e.g., Zero Trust, WAF tuning, ingress add-ons) that enhance resilience and security. AI/Automation: Use GenAI assistants for runbook drafts and postmortem synthesis; apply policy-as-code and scripted remediations to auto-heal common failure modes and accelerate root cause analysis. Pilot observability and performance-testing improvements that inform capacity planning and right-sizing decisions.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Mid Level
Industry
Credit Intermediation and Related Activities
Education Level
No Education Listed
Number of Employees
251-500 employees