About The Position

Description Key Responsibilities Application Database Administration (MySQL & Oracle) Monitor and optimize performance using traditional and AI-assisted tools Perform backup, recovery, and disaster recovery planning Manage security, replication, and high availability Troubleshoot and resolve database issues proactively Containerization & Kubernetes Deploy and manage containerized workloads in Kubernetes Monitor cluster health, scaling, and resource utilization Troubleshoot orchestration and networking issues Manage Helm charts and CI/CD integrations AI-Driven Operations & Automation Leverage Generative AI and machine learning tools to: Automate incident triage and root cause analysis Summarize logs and detect anomalies across systems Generate runbooks, scripts, and knowledge base articles Build and maintain AI-assisted operational workflows (e.g., chatbots for support, ticket summarization) Use AI tools for capacity planning, predictive scaling, and performance forecasting Integrate AI into monitoring platforms (AIOps) for proactive alerting and remediation Apply prompt engineering techniques to optimize AI outputs for operational use cases Evaluate and adopt AI tools responsibly with focus on security, accuracy, and governance Application Server Management (Tomcat) Install, configure, and maintain Apache Tomcat environments Deploy and support Java-based applications Tune JVM performance and optimize application throughput Monitor and troubleshoot production incidents Monitoring, Automation & Reliability Implement monitoring tools (Prometheus, Grafana, ELK) Set up intelligent alerting and anomaly detection (AI-enhanced where possible) Automate tasks using scripting and AI-generated code Participate in incident response and perform root cause analysis Infrastructure & Security Ensure system security, patching, and compliance Support cloud/on-prem environments (AWS, Azure, OCI) Maintain infrastructure documentation and runbooks (AI-assisted documentation encouraged) Required Qualifications 5+ years of experience in production support, Site Reliability Engineering, or DevOps Strong experience with: MySQL and/or Oracle Apache Tomcat Kubernetes Linux/Unix system administration expertise Scripting skills (Bash, Python, etc.) AI & Automation Skills Experience using Generative AI tools (e.g., ChatGPT, GitHub Copilot) for: Troubleshooting Script generation Documentation Understanding of AIOps concepts (AI for IT Operations) Practical AI Use Cases Log analysis and anomaly detection using AI tools AI-assisted ticket triage and prioritization Automated RCA (Root Cause Analysis) using AI insights Intelligent alert correlation and noise reduction Technical AI Skills (Preferred) Basic understanding of: Machine Learning concepts NLP (Natural Language Processing) Experience integrating AI APIs into workflows Familiarity with vector databases or embeddings (nice to have) Preferred Qualifications Cloud experience (AWS, Azure, OCI) CI/CD tools (Jenkins, GitHub Actions) Infrastructure as Code (Terraform, Ansible) Docker and microservices architecture Exposure to AI governance and responsible AI practices Key Skills Strong troubleshooting and analytical thinking Performance tuning and optimization Automation mindset with AI-first approach Communication and collaboration Ability to work in high-pressure production environments Nice to Have Certifications: Kubernetes (CKA/CKAD) Cloud certifications AI/ML certifications k Pay Transparency Laws in some locations require disclosure of compensation and/or benefits-related information. For this position, actual salaries will vary and may be above or below the range based on various factors including but not limited to location, experience, and performance. In addition to base pay, this position, based on business need, may be eligible for a bonus or incentive. In addition, Conduent provides a variety of benefits to employees including health insurance coverage, voluntary dental and vision programs, life and disability insurance, a retirement savings plan, paid holidays, and paid time off (PTO) or vacation and/or sick time. The estimated salary range for this role is $92,092 - $119,600.

Requirements

  • 5+ years of experience in production support, Site Reliability Engineering, or DevOps
  • Strong experience with: MySQL and/or Oracle
  • Apache Tomcat
  • Kubernetes
  • Linux/Unix system administration expertise
  • Scripting skills (Bash, Python, etc.)
  • Experience using Generative AI tools (e.g., ChatGPT, GitHub Copilot) for: Troubleshooting
  • Script generation
  • Documentation
  • Understanding of AIOps concepts (AI for IT Operations)
  • Log analysis and anomaly detection using AI tools
  • AI-assisted ticket triage and prioritization
  • Automated RCA (Root Cause Analysis) using AI insights
  • Intelligent alert correlation and noise reduction
  • Basic understanding of: Machine Learning concepts
  • NLP (Natural Language Processing)
  • Experience integrating AI APIs into workflows

Nice To Haves

  • Familiarity with vector databases or embeddings
  • Cloud experience (AWS, Azure, OCI)
  • CI/CD tools (Jenkins, GitHub Actions)
  • Infrastructure as Code (Terraform, Ansible)
  • Docker and microservices architecture
  • Exposure to AI governance and responsible AI practices
  • Certifications: Kubernetes (CKA/CKAD)
  • Cloud certifications
  • AI/ML certifications

Responsibilities

  • Application Database Administration (MySQL & Oracle)
  • Monitor and optimize performance using traditional and AI-assisted tools
  • Perform backup, recovery, and disaster recovery planning
  • Manage security, replication, and high availability
  • Troubleshoot and resolve database issues proactively
  • Deploy and manage containerized workloads in Kubernetes
  • Monitor cluster health, scaling, and resource utilization
  • Troubleshoot orchestration and networking issues
  • Manage Helm charts and CI/CD integrations
  • Leverage Generative AI and machine learning tools to: Automate incident triage and root cause analysis
  • Summarize logs and detect anomalies across systems
  • Generate runbooks, scripts, and knowledge base articles
  • Build and maintain AI-assisted operational workflows (e.g., chatbots for support, ticket summarization)
  • Use AI tools for capacity planning, predictive scaling, and performance forecasting
  • Integrate AI into monitoring platforms (AIOps) for proactive alerting and remediation
  • Apply prompt engineering techniques to optimize AI outputs for operational use cases
  • Evaluate and adopt AI tools responsibly with focus on security, accuracy, and governance
  • Install, configure, and maintain Apache Tomcat environments
  • Deploy and support Java-based applications
  • Tune JVM performance and optimize application throughput
  • Monitor and troubleshoot production incidents
  • Implement monitoring tools (Prometheus, Grafana, ELK)
  • Set up intelligent alerting and anomaly detection (AI-enhanced where possible)
  • Automate tasks using scripting and AI-generated code
  • Participate in incident response and perform root cause analysis
  • Ensure system security, patching, and compliance
  • Support cloud/on-prem environments (AWS, Azure, OCI)
  • Maintain infrastructure documentation and runbooks (AI-assisted documentation encouraged)

Benefits

  • health insurance coverage
  • voluntary dental and vision programs
  • life and disability insurance
  • a retirement savings plan
  • paid holidays
  • paid time off (PTO) or vacation and/or sick time

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume

What This Job Offers

Job Type

Full-time

Career Level

Mid Level

Education Level

No Education Listed

Number of Employees

5,001-10,000 employees

© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service