🚀 CI/CD 自动部署模块运维手册

基于 OpenClaw + Claude Code 的端到端研发自动化系统

📋 版本:V1.0.0 📅 创建日期:2026-03-15 👥 适用对象:DevOps 工程师、系统管理员 🔒 文档类型:运维手册

📑 目录导航

1. 系统概述

99.9%
系统可用性
<10min
部署时间
200ms
API 响应 (p95)
80%+
测试覆盖率

1.1 系统组成

本 CI/CD 自动部署模块包含以下核心组件:

组件 功能 端口 说明
Jenkins Master CI/CD 编排 8080 持续集成和部署主控节点
Docker Registry 镜像仓库 5000 容器镜像存储和管理
Kubernetes Cluster 容器编排 6443 应用部署和运行平台
Prometheus 监控指标 9090 系统监控和数据采集
Grafana 可视化 3000 监控数据展示和告警
ELK Stack 日志管理 9200/5601 日志收集和分析

1.2 部署架构

Load Balancer
Jenkins Master
K8S Cluster
Application
ℹ️ 提示: 系统采用微服务架构,各组件独立部署和扩展,确保高可用性和弹性伸缩能力。

2. 环境准备

2.1 硬件要求

组件 CPU 内存 磁盘 网络
Jenkins Master 4 核+ 8GB+ 100GB SSD 1Gbps
K8S Master 4 核+ 8GB+ 100GB SSD 1Gbps
K8S Worker 8 核+ 16GB+ 200GB SSD 1Gbps

2.2 软件要求

2.3 系统初始化脚本

#!/bin/bash
# sysctl_optimization.sh

# 增加文件描述符限制
echo "fs.file-max=655360" >> /etc/sysctl.conf
echo "fs.inotify.max_user_watches=524288" >> /etc/sysctl.conf

# 网络优化
echo "net.core.somaxconn=65535" >> /etc/sysctl.conf
echo "net.ipv4.tcp_max_syn_backlog=65535" >> /etc/sysctl.conf

# 内存优化
echo "vm.max_map_count=262144" >> /etc/sysctl.conf
echo "vm.swappiness=10" >> /etc/sysctl.conf

# 应用配置
sysctl -p
⚠️ 注意: 系统参数调优需要根据实际硬件配置进行调整,建议在测试环境验证后再应用到生产环境。

3. 系统安装部署

3.1 Jenkins 安装

#!/bin/bash
# install_jenkins.sh

# 创建 Jenkins 数据目录
mkdir -p /opt/jenkins/{data,plugins,workspace}
chmod -R 777 /opt/jenkins

# 运行 Jenkins 容器
docker run -d \
  --name jenkins \
  --restart unless-stopped \
  -p 8080:8080 \
  -p 50000:50000 \
  -v /opt/jenkins/data:/var/jenkins_home \
  -v /var/run/docker.sock:/var/run/docker.sock \
  -e JAVA_OPTS="-Xmx2048m -Xms512m" \
  jenkins/jenkins:lts-jdk17

# 查看初始管理员密码
docker exec jenkins cat /var/jenkins_home/secrets/initialAdminPassword

3.2 Harbor 安装

#!/bin/bash
# install_harbor.sh

HARBOR_VERSION="v2.9.0"
wget https://github.com/goharbor/harbor/releases/download/${HARBOR_VERSION}/harbor-offline-installer-${HARBOR_VERSION}.tgz
tar xvf harbor-offline-installer-${HARBOR_VERSION}.tgz
cd harbor

# 配置文件并执行安装
./install.sh

3.3 KubeSphere 安装

#!/bin/bash
# install_kubesphere.sh

# 使用 Helm 安装
helm repo add kubesphere https://charts.kubesphere.io/main
helm repo update

helm install kubesphere kubesphere/kubesphere \
  --namespace kubesphere-system \
  --create-namespace \
  --version 3.4.0

4. Jenkins Pipeline 配置

4.1 Pipeline 基础模板

pipeline {
    agent any
    
    environment {
        REGISTRY_URL = 'harbor.yourdomain.com'
        REGISTRY_CREDENTIALS = 'harbor-credentials'
        KUBECONFIG_CREDENTIALS = 'kubeconfig'
        APP_NAME = 'myapp'
    }
    
    options {
        timestamps()
        timeout(time: 1, unit: 'HOURS')
        disableConcurrentBuilds()
    }
    
    stages {
        stage('Checkout') {
            steps {
                checkout scm
            }
        }
        
        stage('Build') {
            steps {
                sh 'npm run build'
            }
        }
        
        stage('Test') {
            steps {
                sh 'npm test'
            }
        }
        
        stage('Deploy') {
            steps {
                withKubeConfig([credentialsId: KUBECONFIG_CREDENTIALS]) {
                    sh 'kubectl apply -f k8s/'
                }
            }
        }
    }
}
✅ 最佳实践: 使用声明式 Pipeline 语法,保持配置简洁可维护;敏感信息使用 Credentials 管理。

5. Docker 容器化配置

5.1 Dockerfile 示例

# 多阶段构建优化镜像大小
FROM node:18-alpine AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci
COPY . .
RUN npm run build

FROM node:18-alpine
WORKDIR /app
COPY --from=builder /app/dist ./dist
COPY --from=builder /app/node_modules ./node_modules
USER node
EXPOSE 3000
CMD ["node", "dist/index.js"]

5.2 Docker Compose 配置

version: '3.8'
services:
  jenkins:
    image: jenkins/jenkins:lts-jdk17
    ports:
      - "8080:8080"
    volumes:
      - jenkins_data:/var/jenkins_home
  
  prometheus:
    image: prom/prometheus:v2.47.0
    ports:
      - "9090:9090"
  
  grafana:
    image: grafana/grafana:10.1.0
    ports:
      - "3000:3000"

volumes:
  jenkins_data:

6. Kubernetes/KubeSphere 部署

6.1 Deployment 配置

apiVersion: apps/v1
kind: Deployment
metadata:
  name: myapp
  namespace: production
spec:
  replicas: 3
  selector:
    matchLabels:
      app: myapp
  template:
    spec:
      containers:
      - name: myapp
        image: harbor.yourdomain.com/myapp:latest
        resources:
          requests:
            cpu: "250m"
            memory: "256Mi"
          limits:
            cpu: "1000m"
            memory: "1024Mi"

6.2 HPA 自动扩缩容

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: myapp-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: myapp
  minReplicas: 3
  maxReplicas: 20
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70

7. 监控与告警

7.1 Prometheus 配置

global:
  scrape_interval: 15s
  evaluation_interval: 15s

scrape_configs:
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']
  
  - job_name: 'kubernetes-pods'
    kubernetes_sd_configs:
      - role: pod

7.2 告警规则

groups:
  - name: application_alerts
    rules:
      - alert: HighErrorRate
        expr: sum(rate(http_requests_total{status=~"5.."}[5m])) / sum(rate(http_requests_total[5m])) > 0.05
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "High error rate detected"
📊 推荐 Dashboard:
  • Kubernetes Cluster: ID 6417
  • Jenkins Overview: ID 9964
  • Docker Container: ID 8933

8. 日常运维操作

8.1 Jenkins 运维命令

# 重启 Jenkins
docker restart jenkins

# 查看 Jenkins 日志
docker logs -f jenkins

# 备份 Jenkins 数据
tar -czf jenkins-backup.tar.gz /opt/jenkins/data

8.2 Kubernetes 运维命令

# 查看集群状态
kubectl get nodes
kubectl get pods -A

# 查看应用状态
kubectl get deployments -n production

# 查看日志
kubectl logs -f deployment/myapp -n production

# 重启 Deployment
kubectl rollout restart deployment/myapp -n production

8.3 定期维护任务

9. 故障排查指南

9.1 Pod 无法启动

# 查看 Pod 状态
kubectl describe pod <pod-name> -n <namespace>

# 查看事件
kubectl get events -n <namespace> --sort-by='.lastTimestamp'

# 查看日志
kubectl logs <pod-name> -c <container-name> --previous

9.2 Service 无法访问

# 检查 Endpoints
kubectl get endpoints <service-name>

# 测试连通性
kubectl run test --rm -it --restart=Never \
  --image=busybox \
  -- wget -O- http://<service-name>.<namespace>.svc.cluster.local:<port>
🔍 常见问题:
  • ImagePullBackOff: 检查镜像名称和凭证
  • CrashLoopBackOff: 查看容器日志定位错误
  • Pending: 检查资源配额和节点容量

10. 备份与恢复

10.1 Jenkins 备份

#!/bin/bash
BACKUP_DIR="/backup/jenkins"
DATE=$(date +%Y%m%d_%H%M%S)

tar -czf ${BACKUP_DIR}/jenkins-home-${DATE}.tar.gz \
  -C /opt/jenkins/data .

10.2 Etcd 备份

#!/bin/bash
etcdctl snapshot save backup.db \
  --endpoints=https://127.0.0.1:2379 \
  --cacert=/etc/kubernetes/pki/etcd/ca.crt \
  --cert=/etc/kubernetes/pki/etcd/server.crt \
  --key=/etc/kubernetes/pki/etcd/server.key

10.3 恢复流程

  1. 停止相关服务
  2. 恢复备份数据
  3. 修复权限
  4. 重启服务
  5. 验证功能

11. 安全加固

11.1 RBAC 配置

apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: cicd-deployer
  namespace: production
rules:
  - apiGroups: ["apps"]
    resources: ["deployments"]
    verbs: ["get", "list", "update"]

11.2 Network Policy

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-deny-ingress
spec:
  podSelector: {}
  policyTypes:
    - Ingress

11.3 镜像安全扫描

# Trivy 扫描
docker run --rm \
  -v /var/run/docker.sock:/var/run/docker.sock \
  aquasec/trivy image myapp:latest

12. 性能优化

12.1 Jenkins 优化

12.2 Kubernetes 优化

12.3 Docker 优化

📎 附录

A. 常用命令速查

# Jenkins
docker restart jenkins
docker logs -f jenkins

# Kubernetes
kubectl get pods -A
kubectl describe pod <name>
kubectl logs -f <pod>

# Docker
docker ps -a
docker images
docker system prune -a

B. 联系支持

角色 联系方式 响应时间
DevOps 团队 devops@yourdomain.com 工作时间 2 小时
紧急值班 oncall@yourdomain.com 7x24 小时 30 分钟