Business Value
? Deployment Process
? Secret Management Risks
? AWS Infrastructure Security & Governance
? Environment & Configuration Management
? Infrastructure Performance, Scaling & Optimization
Key Responsibilities
? Own and enforce zero-downtime deployment strategies across all production workloads.
? Design and enforce custom Helm rollback and deployment patterns with automated validation.
? Review and refactor existing Terraform modules to reduce excessive granularity and simplify dependency chains.
? Implement strict CI/CD governance in GitLab and GitHub Actions including mandatory PR approvals, automated security checks (Snyk/SonarQube), and enforced policy gates.
? Migrate secret management away to AWS Secrets Manager or similar.
? Standardize environment drift detection across pre-prod and production via Terraform Cloud and automated config audits.
? Define and maintain infrastructure runbooks, operational SOPs, and DR playbooks.
? Mentor junior engineers, define DevOps KPIs, and participate in postmortems.
Requirements
? Minimum of 3 years of experience in a DevOps, SRE, or Infrastructure Engineering role.
? Solid understanding of Terraform and experience maintaining reusable module libraries.
? Hands-on experience managing workloads on Kubernetes (preferably EKS).
? Working knowledge of CI/CD tools such as GitHub Actions and Helm.
? Familiarity with AWS Cloud services, including networking, RDS/Aurora (PostgresSQL), and container security.
? Competence in observability tooling, especially Datadog dashboards and alert configurations.
? Strong operational mindset with attention to detail in release processes and deployment integrity.
Desirable Experience
? Exposure to GitOps tooling such as ArgoCD or FluxCD.
? Experience developing or integrating Kubernetes operators.
? Familiarity with service-level indicators (SLIs), service-level objectives (SLOs), and structured alerting.
Tools and Expectations
? Terraform / HCP Terraform - Core to infrastructure provisioning. Required to build, refactor, and maintain reusable infrastructure modules across environments, enforce naming/tagging standards, and leverage state management for drift detection and rollback.
? GitHub / GitLab / GitHub Actions - Central to CI/CD workflows. Expected to enforce secure release procedures, set up integration with code quality tools, and prevent direct changes to critical branches.
? Helm - Used for Kubernetes application packaging and deployment. Must implement pre/post deployment logic, rollback plans, and chart lifecycle automation.
? EKS / Kubernetes - Platform for hosting applications. The engineer must manage node pools, service networking, security contexts, and namespace segmentation.
? AWS Services (Amazon RDS/Aurora, VPC, IAM) - Backend for infrastructure workloads. Expected to configure VPC isolation, IAM boundaries, and implement private access wherever possible connecting to PostgresSQL on RDS/Aurora
? Secrets Manager / Kubernetes Secrets / CSI Driver - Secret handling is critical. Migrate legacy init-container pattern to scoped access through Secrets Manager sync or CSI injection.
? Datadog - Observability backbone. Responsible for building actionable metrics, tracking SLOs, and managing alert noise to reduce operational fatigue.
? Cloudflare - Interface layer. Use Terraform to define DNS entries, WAF rules, and validate exposure configuration per environment.
? Snyk / SonarQube / Wiz - Code and container security enforcement. Ensure pipeline integration catches vulnerabilities and provides immediate feedback to development.
運維經理 - WPP Open China
工作地點:無錫
類型:全職
業(yè)務價值
? 部署流程
? 密鑰管理
? AWS基礎設施安全與治理
? 環(huán)境與配置管理
? 基礎設施性能、擴展與優(yōu)化
主要職責
? 負責并強制執(zhí)行所有生產環(huán)境負載的zero-downtime部署策略
? 設計并實施自定義Helm回滾和部署模式,包含自動化驗證
? 審查并重構現(xiàn)有Terraform模塊,減少過度細粒度化并簡化依賴鏈
? 在GitLab和GitHub Actions中實施嚴格的CI/CD治理,包括強制PR審批、自動化安全檢查(Snyk/SonarQube)和強制執(zhí)行策略門控
? 將密鑰管理遷移至AWS Secrets Manager或類似工具
? 通過Terraform Cloud和自動化配置審計,標準化預生產和生產環(huán)境中的漂移檢測
? 定義并維護基礎設施操作手冊、SOP和DR預案
? 指導初級工程師,定義DevOps KPI,并參與事故復盤
任職要求
? 至少3年DevOps、SRE或Infrastructure相關經驗
? 扎實的Terraform知識,并有維護可重用模塊庫的經驗
? Kubernetes(優(yōu)先EKS)工作負載管理的實際操作經驗
? 熟悉CI/CD工具,如GitHub Actions和Helm
? 了解AWS云服務,包括網(wǎng)絡、RDS/Aurora(PostgreSQL)和容器安全
? 熟練使用可觀測性工具,尤其是Datadog儀表盤和警報配置
? 具備強烈的運維意識,注重發(fā)布流程和部署完整性的細節(jié)
優(yōu)先考慮經驗
? 接觸過GitOps工具(如ArgoCD或FluxCD)
? 有開發(fā)或集成Kubernetes Operator的經驗
? 熟悉服務級別指標(SLIs)、服務級別目標(SLOs)和結構化警報
工具與期望
? Terraform/HCP Terraform:基礎設施配置核心。需構建、重構和維護跨環(huán)境可重用基礎設施模塊,強制執(zhí)行命名/標記標準,并利用狀態(tài)管理進行drift detection和回滾。
? GitHub/GitLab/GitHub Actions:CI/CD工作流核心。需確保安全的發(fā)布流程,設置與代碼質量工具的集成,并防止直接修改關鍵分支。
? Helm:用于Kubernetes應用打包和部署。需實現(xiàn)部署前后邏輯、回滾計劃和圖表生命周期自動化。
? EKS/Kubernetes:應用托管平臺。需管理節(jié)點池、服務網(wǎng)絡、安全上下文和命名空間隔離。
? AWS服務(Amazon RDS/Aurora、VPC、IAM):基礎設施負載后端。需配置VPC隔離、IAM邊界,并盡可能實現(xiàn)私有訪問,連接RDS/Aurora上的PostgreSQL。
? Secrets Manager/Kubernetes Secrets/CSI Driver:密鑰處理關鍵。需將傳統(tǒng)init-container模式遷移至通過Secrets Manager同步或CSI注入的范圍訪問。
? Datadog:監(jiān)控支柱。需構建可操作的指標、跟蹤SLOs,并管理警報噪音以減少運維疲勞。
? Cloudflare:接口層。使用Terraform定義DNS條目、WAF規(guī)則,并按環(huán)境驗證暴露配置。
? Snyk/SonarQube/Wiz:代碼和容器安全執(zhí)行。確保流水線集成能捕獲漏洞并向開發(fā)提供即時反饋。