Platform Engineering Group Lead

Lawrence Berkeley National LaboratoryBerkeley, CA
3hHybrid

About The Position

The Platform Engineering Group Lead reports to the Systems and Software Department Head and owns the technical direction and operational excellence of the platforms that underpin ESnet’s Software Engineering and Operations functions. This role is accountable for platform architecture and lifecycle management, including standards definition, solution design, implementation, service availability, reliability engineering, and continuous improvement. The Group Lead also manages the platform engineering team responsible for operating and evolving core IT services, evaluating emerging technologies, performing product and vendor assessments, and developing roadmap plans for future platform capabilities. The role requires demonstrated experience designing, deploying, and supporting enterprise-scale platform services across computing and systems infrastructure, container orchestration, CI/CD toolchains, cloud environments, source code management, project and work management systems, collaboration and knowledge platforms, certificate and secrets management, enterprise software, and database services. Success in this position depends on advanced technical program and project management skills, including the ability to execute across multiple concurrent initiatives with shifting priorities while maintaining strong delivery discipline. The Group Lead must be able to translate complex technical tradeoffs into clear, actionable guidance for scientific and business stakeholders and Laboratory management, supported by strong written and verbal communication.

Requirements

  • Advanced degree in Computer Science, Engineering, Business, or a related discipline, or an equivalent combination of education and relevant professional experience.
  • Minimum of three (3) years of experience in an IT leadership capacity, including responsibility for planning and leading enterprise IT systems implementations.
  • Demonstrated expertise in enterprise-scale systems analysis, large systems development, and information management, with the ability to apply this knowledge to platform architecture, implementation, and sustained operations.
  • Proven ability to establish and enforce technical standards, operational practices, and lifecycle management for mission-critical computing services.
  • Competence to coordinate and lead unit activities end-to-end, including balancing competing user demands and resolving resource conflicts while meeting service objectives.
  • Ability to develop project schedules and budget estimates, and consistently deliver against schedules and operational commitments.
  • Operates with conceptual guidance and is evaluated primarily on outcomes; exercises broad latitude in planning, prioritizing, and scheduling work within managerial and policy constraints.
  • Demonstrated ability to define a clear strategic vision grounded in data-driven analysis (e.g., service health, operational risk, capacity trends, delivery metrics), and translate that vision into executable, staged implementation plans that maintain stability of existing services.
  • Experience contributing to, and potentially shaping, organization-wide hardware and software planning to support evolving enterprise requirements.
  • Ability to deploy staff, tooling, and operational processes to achieve availability, reliability, performance, and recovery targets across production, development, and QA environments.
  • Proven ability to communicate complex technical concepts, tradeoffs, and risk in clear terms to technical staff, peer leaders, non-technical stakeholders, and senior management.
  • Demonstrated skill in consensus building, team building, and negotiation across diverse stakeholder groups and customer communities.
  • Experience working with vendors and external oversight organizations (as applicable) in the acquisition, installation, integration, and validation of major platforms and applications.
  • Ability to represent the department in cross-organizational interactions, including making operational commitments and coordinating staffing and funding decisions in support of defined goals.
  • Demonstrated leadership experience supervising teams of technical professionals, including recruitment, hiring, onboarding, training, coaching, and performance management.
  • Experience leading professional, technical, production, and administrative staff to meet organizational objectives, including project management, resource allocation, and operational planning.
  • Direct experience managing budgets and resources to support both ongoing operations and planned improvement initiatives.
  • Ability to manage through subordinate leaders (e.g., supervisors, group leaders, project managers) where applicable, ensuring consistent delivery of platform services and customer support across a broad user base.
  • Capability to represent leadership continuity and serve in an acting capacity for the Department Head when required.
  • Significant experience as a hands-on technical implementer responsible for mission-critical infrastructure and services, including high availability, reliability engineering, performance, operational recovery, and support for production, development, and QA systems.
  • Demonstrated experience in two or more operational engineering disciplines such as: Performance tuning and service optimization Capacity planning and forecasting Availability monitoring, metrics, and operational reporting Configuration management and change control Process automation and operational tooling
  • Experience managing technologies relevant to modern platform engineering, including CI/CD systems, container orchestration platforms, cybersecurity technologies, and private/public cloud infrastructure.
  • Demonstrated software development experience using programming languages such as Java or Python, sufficient to guide automation, tooling, integrations, or platform-adjacent application development.
  • Ability to independently formulate, organize, document, and present proposals, including clear articulation of cost/benefit tradeoffs, implementation risk, and operational impact for multiple audiences.

Responsibilities

  • Work at a high level and provide engineering leadership for ESnet’s platform engineering capabilities across the full-service lifecycle: architecture, design, implementation, deployment, and sustained operations.
  • Own portfolio planning and execution for platform initiatives that enable Software Engineering and Operations, including requirements definition, prioritization, delivery planning, implementation oversight, and technical documentation.
  • Establishes the direction and manages the efforts of a team of technical professionals; manage operational resources and budgets to meet availability, reliability, and service objectives.
  • Under conceptual guidance, define and drive the strategic direction for ESnet’s enterprise information systems portfolio, serving as the primary technical lead for mission-critical operations and collaboration platforms (e.g., ServiceNow, Jira, Slack, Google Workspace integrations) and related enterprise services.
  • Recruit, onboard, mentor, and develop team members; establish role expectations, growth plans, and effective operating practices.
  • Partner with a diverse customer base to capture current and future requirements, and communicate technical decisions, risks, and tradeoffs clearly to both technical and non-technical stakeholders.
  • Participate as a member of the Division Leadership Team, collaborating with Department Heads, Group Leads, and CS Area and SND Division leadership to align platform strategy with organizational objectives.
  • Produce clear technical communications, including internal/external presentations and written materials on platform strategy, architecture, operations, and service support.
  • Foster an inclusive, compliant, and high-performing work environment that supports innovation, continuous improvement, and adherence to DOE and Laboratory policies.

Benefits

  • Exceptional health and retirement benefits, including pension or 401K-style plans
  • Opportunities to grow in your career - check out our Tuition Assistance Program
  • A culture where you’ll belong - we are invested in our teams!
  • In addition to accruing vacation and sick time, we also have a Winter Holiday Shutdown every year.
  • Parental bonding leave (for both mothers and fathers)
  • Pet insurance
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service