Senior System Engineering - Engineering Operations

AT&TPlano, TX
$160,000 - $215,800Onsite

About The Position

This role involves leading the response to production issues, ensuring minimal downtime and adherence to SLAs. The Senior System Engineer will build alerting, monitoring, and dashboards for proactive problem identification. They will use strong analytical and technical skills to diagnose and resolve complex production issues, focusing on immediate impact mitigation and automating recovery processes. The role also includes working with development teams on long-term solutions, creating and maintaining system documentation, and developing scripts and automation tools. A key aspect is identifying and ensuring non-functional requirements like reliability, performance, and scalability are met before production deployment. The engineer will monitor application performance using tools like Dynatrace and App Dynamics, identify bottlenecks, and optimize application performance. Defining SLI/SLOs and Error Budgets, and working with teams to document failure patterns and implement remediations for application resilience are also responsibilities. Capacity planning, participating in security assessments, responding to security incidents, and collaborating with Release Management on production changes are expected. The role requires supporting application releases and deployments, ensuring controlled rollouts with minimal impact. Proactive problem detection, trend analysis, and providing metrics and status reports to leadership are crucial. Strong communication skills are essential, as is knowledge transfer with Product Development teams. The position requires 24x7 on-call support for various applications, including J2EE apps, Salesforce, Salesforce Marketing Cloud, and MuleSoft, using an SRE approach. Experience with Java EE apps, ERP, CRM apps, web application architecture and development, and various observability tools is necessary. Proficiency in integration technologies, API Gateways, MuleSoft, WebLogic, Object-Oriented Programming languages (Java, J2EE, JavaScript, Spring), automation tools (Python, Shell), containerization (Docker, Kubernetes), cloud services (Azure), DevOps practices (CI/CD, Git, Jenkins), network protocols, load balancing, security principles, SQL queries, and Linux shell scripting is required.

Requirements

  • Bachelor’s degree, or foreign equivalent degree in Computer Engineering, Computer Science, or Information Technology
  • Three (3) years of experience in the job offered or three (3) years of experience in a related occupation supporting large scale applications in production with an Engineering approach (SRE) – including Java EE apps, ERP, CRM apps in an operations capacity
  • Architecting and developing web applications
  • Using Observability tools including Dynatrace, App Dynamics, Splunk, ELK, Mulesoft AnyPoint, Quantum Metric, Catchpoint to create alerts, dashboards, reports, synthetic monitoring
  • Understanding and working experience with integration technologies and API Gateways, MuleSoft, WebLogic
  • Utilizing Object Oriented Programming Languages - Java, J2EE technologies, Javascript, and frameworks (Spring)
  • Using automation tools and scripting languages (Python, Shell)
  • Utilizing containerization (Docker, Kubernetes) and cloud services (Azure)
  • Employing DevOps practices and tools (CI/CD pipelines, Git, Jenkins)
  • Applying network protocols, load balancing, and security principles
  • Utilizing database SQL queries
  • Building Linux shell scripts on demand

Responsibilities

  • Lead the response to production issues, ranging from identifying and troubleshooting problems to implementing immediate fixes.
  • Ensure minimal downtime and adherence to service level agreements (SLAs).
  • Build alerting, monitoring and dashboards that identify problems proactively.
  • Utilize strong analytical, technical and functional skills to diagnose and resolve complex issues within production environments with a focus on immediate impact mitigation, automating recovery processes and routine maintenance tasks to improve system reliability and efficiency.
  • Work with dev teams to implement long-term solutions to prevent recurrence of incidents.
  • Create and maintain documentation for system architecture, configuration, deployment procedures, and troubleshooting guides.
  • Develop and maintain scripts and automation tools to streamline operations, deployment processes, and repetitive tasks.
  • Identify non-functional requirements such as reliability, performance, scalability, application logging for observability and acceptance criteria during design and development and ensure that these are met before moving to production.
  • Monitor application performance using tools such as Dynatrace, App Dynamics and ELK.
  • Identify bottlenecks and work with dev teams to optimize the performance of applications through code improvements, configuration tuning, and resource optimization.
  • Define SLI/SLOs, Error Budgets, Automation focus.
  • Work with dev/architect/quality engineering teams to identify and document patterns of failures as lessons learnt from incidents and follow up to implement the remediations to make the application resilient.
  • Monitor system usage patterns and perform capacity planning to ensure scalability and reliability of applications and services.
  • Participate in security assessments and implement security best practices to safeguard applications and data.
  • Respond promptly to security incidents and vulnerabilities.
  • Work with Release Management related to upcoming changes to production to identify risks and mitigate them.
  • Collaborate with development teams to manage and support application releases and deployments.
  • Ensure changes are rolled out in a controlled manner with minimal impact on production services.
  • Proactive problem detection, trend and pattern analysis, assessment of impact of problems, functional analysis of problems.
  • Provide metrics and status reports and review with leadership and stakeholder communities; establish processes surrounding metrics gathering, reporting and communication.
  • Provide prompt visibility and status of escalated issues, incidents and outages to leadership, business partners and other key stakeholders.
  • Work closely with Product Development teams to ensure Knowledge Transfer related to changes to the system well in advance of change getting operationalized.
  • On-call 24x7 support for agent facing applications– Home Grown J2EE apps as well as SaaS Platform apps - Salesforce, Salesforce Marketing Cloud and MuleSoft.
  • Support large scale applications in production with an Engineering approach (SRE) – including Java EE apps, ERP, CRM apps in an operations capacity.
  • Architect and develop web applications.
  • Use observability tools including Dynatrace, App Dynamics, Splunk, ELK, MuleSoft AnyPoint, Quantum Metric, Catchpoint to create alerts, dashboards, reports, synthetic monitoring.
  • Understanding and working experience with integration technologies and API Gateways, MuleSoft, WebLogic.
  • Utilize Object Oriented Programming Languages - Java, J2EE technologies, JavaScript, and frameworks (Spring).
  • Use automation tools and scripting languages (Python, Shell).
  • Utilize containerization (Docker, Kubernetes) and cloud services (Azure).
  • Employ DevOps practices and tools (CI/CD pipelines, Git, Jenkins).
  • Apply network protocols, load balancing, and security principles.
  • Utilize database SQL queries.
  • Build Linux shell scripts on demand.

Benefits

  • Medical/Dental/Vision coverage
  • 401(k) plan
  • Tuition reimbursement program
  • Paid Time Off and Holidays (based on date of hire, at least 23 days of vacation each year and 9 company-designated holidays)
  • Paid Parental Leave
  • Paid Caregiver Leave
  • Additional sick leave beyond what state and local law require may be available but is unprotected
  • Adoption Reimbursement
  • Disability Benefits (short term and long term)
  • Life and Accidental Death Insurance
  • Supplemental benefit programs: critical illness/accident hospital indemnity/group legal
  • Employee Assistance Programs (EAP)
  • Extensive employee wellness programs
  • Employee discounts up to 50% off on eligible AT&T mobility plans and accessories, AT&T internet (and fiber where available) and AT&T phone
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service