1. 系统概述
1.1 系统组成
本 CI/CD 自动部署模块包含以下核心组件:
| 组件 |
功能 |
端口 |
说明 |
| Jenkins Master |
CI/CD 编排 |
8080 |
持续集成和部署主控节点 |
| Docker Registry |
镜像仓库 |
5000 |
容器镜像存储和管理 |
| Kubernetes Cluster |
容器编排 |
6443 |
应用部署和运行平台 |
| Prometheus |
监控指标 |
9090 |
系统监控和数据采集 |
| Grafana |
可视化 |
3000 |
监控数据展示和告警 |
| ELK Stack |
日志管理 |
9200/5601 |
日志收集和分析 |
1.2 部署架构
Load Balancer
↓
Jenkins Master
→
K8S Cluster
→
Application
ℹ️ 提示: 系统采用微服务架构,各组件独立部署和扩展,确保高可用性和弹性伸缩能力。
2. 环境准备
2.1 硬件要求
| 组件 |
CPU |
内存 |
磁盘 |
网络 |
| Jenkins Master |
4 核+ |
8GB+ |
100GB SSD |
1Gbps |
| K8S Master |
4 核+ |
8GB+ |
100GB SSD |
1Gbps |
| K8S Worker |
8 核+ |
16GB+ |
200GB SSD |
1Gbps |
2.2 软件要求
- 操作系统: Ubuntu 22.04 LTS / CentOS 7.9+
- Docker: 24.0+
- Kubernetes: 1.28+
- Helm: 3.13+
- Java: 17 (Jenkins)
- Python: 3.10+
- Node.js: 18+
2.3 系统初始化脚本
#!/bin/bash
# sysctl_optimization.sh
# 增加文件描述符限制
echo "fs.file-max=655360" >> /etc/sysctl.conf
echo "fs.inotify.max_user_watches=524288" >> /etc/sysctl.conf
# 网络优化
echo "net.core.somaxconn=65535" >> /etc/sysctl.conf
echo "net.ipv4.tcp_max_syn_backlog=65535" >> /etc/sysctl.conf
# 内存优化
echo "vm.max_map_count=262144" >> /etc/sysctl.conf
echo "vm.swappiness=10" >> /etc/sysctl.conf
# 应用配置
sysctl -p
⚠️ 注意: 系统参数调优需要根据实际硬件配置进行调整,建议在测试环境验证后再应用到生产环境。
3. 系统安装部署
3.1 Jenkins 安装
#!/bin/bash
# install_jenkins.sh
# 创建 Jenkins 数据目录
mkdir -p /opt/jenkins/{data,plugins,workspace}
chmod -R 777 /opt/jenkins
# 运行 Jenkins 容器
docker run -d \
--name jenkins \
--restart unless-stopped \
-p 8080:8080 \
-p 50000:50000 \
-v /opt/jenkins/data:/var/jenkins_home \
-v /var/run/docker.sock:/var/run/docker.sock \
-e JAVA_OPTS="-Xmx2048m -Xms512m" \
jenkins/jenkins:lts-jdk17
# 查看初始管理员密码
docker exec jenkins cat /var/jenkins_home/secrets/initialAdminPassword
3.2 Harbor 安装
#!/bin/bash
# install_harbor.sh
HARBOR_VERSION="v2.9.0"
wget https://github.com/goharbor/harbor/releases/download/${HARBOR_VERSION}/harbor-offline-installer-${HARBOR_VERSION}.tgz
tar xvf harbor-offline-installer-${HARBOR_VERSION}.tgz
cd harbor
# 配置文件并执行安装
./install.sh
3.3 KubeSphere 安装
#!/bin/bash
# install_kubesphere.sh
# 使用 Helm 安装
helm repo add kubesphere https://charts.kubesphere.io/main
helm repo update
helm install kubesphere kubesphere/kubesphere \
--namespace kubesphere-system \
--create-namespace \
--version 3.4.0
4. Jenkins Pipeline 配置
4.1 Pipeline 基础模板
pipeline {
agent any
environment {
REGISTRY_URL = 'harbor.yourdomain.com'
REGISTRY_CREDENTIALS = 'harbor-credentials'
KUBECONFIG_CREDENTIALS = 'kubeconfig'
APP_NAME = 'myapp'
}
options {
timestamps()
timeout(time: 1, unit: 'HOURS')
disableConcurrentBuilds()
}
stages {
stage('Checkout') {
steps {
checkout scm
}
}
stage('Build') {
steps {
sh 'npm run build'
}
}
stage('Test') {
steps {
sh 'npm test'
}
}
stage('Deploy') {
steps {
withKubeConfig([credentialsId: KUBECONFIG_CREDENTIALS]) {
sh 'kubectl apply -f k8s/'
}
}
}
}
}
✅ 最佳实践: 使用声明式 Pipeline 语法,保持配置简洁可维护;敏感信息使用 Credentials 管理。
5. Docker 容器化配置
5.1 Dockerfile 示例
# 多阶段构建优化镜像大小
FROM node:18-alpine AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci
COPY . .
RUN npm run build
FROM node:18-alpine
WORKDIR /app
COPY --from=builder /app/dist ./dist
COPY --from=builder /app/node_modules ./node_modules
USER node
EXPOSE 3000
CMD ["node", "dist/index.js"]
5.2 Docker Compose 配置
version: '3.8'
services:
jenkins:
image: jenkins/jenkins:lts-jdk17
ports:
- "8080:8080"
volumes:
- jenkins_data:/var/jenkins_home
prometheus:
image: prom/prometheus:v2.47.0
ports:
- "9090:9090"
grafana:
image: grafana/grafana:10.1.0
ports:
- "3000:3000"
volumes:
jenkins_data:
6. Kubernetes/KubeSphere 部署
6.1 Deployment 配置
apiVersion: apps/v1
kind: Deployment
metadata:
name: myapp
namespace: production
spec:
replicas: 3
selector:
matchLabels:
app: myapp
template:
spec:
containers:
- name: myapp
image: harbor.yourdomain.com/myapp:latest
resources:
requests:
cpu: "250m"
memory: "256Mi"
limits:
cpu: "1000m"
memory: "1024Mi"
6.2 HPA 自动扩缩容
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: myapp-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: myapp
minReplicas: 3
maxReplicas: 20
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
7. 监控与告警
7.1 Prometheus 配置
global:
scrape_interval: 15s
evaluation_interval: 15s
scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
- job_name: 'kubernetes-pods'
kubernetes_sd_configs:
- role: pod
7.2 告警规则
groups:
- name: application_alerts
rules:
- alert: HighErrorRate
expr: sum(rate(http_requests_total{status=~"5.."}[5m])) / sum(rate(http_requests_total[5m])) > 0.05
for: 5m
labels:
severity: critical
annotations:
summary: "High error rate detected"
📊 推荐 Dashboard:
- Kubernetes Cluster: ID 6417
- Jenkins Overview: ID 9964
- Docker Container: ID 8933
8. 日常运维操作
8.1 Jenkins 运维命令
# 重启 Jenkins
docker restart jenkins
# 查看 Jenkins 日志
docker logs -f jenkins
# 备份 Jenkins 数据
tar -czf jenkins-backup.tar.gz /opt/jenkins/data
8.2 Kubernetes 运维命令
# 查看集群状态
kubectl get nodes
kubectl get pods -A
# 查看应用状态
kubectl get deployments -n production
# 查看日志
kubectl logs -f deployment/myapp -n production
# 重启 Deployment
kubectl rollout restart deployment/myapp -n production
8.3 定期维护任务
- 每日:检查系统健康状态
- 每周:清理旧构建和镜像
- 每月:安全更新和补丁
- 每季度:性能评估和优化
9. 故障排查指南
9.1 Pod 无法启动
# 查看 Pod 状态
kubectl describe pod <pod-name> -n <namespace>
# 查看事件
kubectl get events -n <namespace> --sort-by='.lastTimestamp'
# 查看日志
kubectl logs <pod-name> -c <container-name> --previous
9.2 Service 无法访问
# 检查 Endpoints
kubectl get endpoints <service-name>
# 测试连通性
kubectl run test --rm -it --restart=Never \
--image=busybox \
-- wget -O- http://<service-name>.<namespace>.svc.cluster.local:<port>
🔍 常见问题:
- ImagePullBackOff: 检查镜像名称和凭证
- CrashLoopBackOff: 查看容器日志定位错误
- Pending: 检查资源配额和节点容量
10. 备份与恢复
10.1 Jenkins 备份
#!/bin/bash
BACKUP_DIR="/backup/jenkins"
DATE=$(date +%Y%m%d_%H%M%S)
tar -czf ${BACKUP_DIR}/jenkins-home-${DATE}.tar.gz \
-C /opt/jenkins/data .
10.2 Etcd 备份
#!/bin/bash
etcdctl snapshot save backup.db \
--endpoints=https://127.0.0.1:2379 \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/etcd/server.crt \
--key=/etc/kubernetes/pki/etcd/server.key
10.3 恢复流程
- 停止相关服务
- 恢复备份数据
- 修复权限
- 重启服务
- 验证功能
11. 安全加固
11.1 RBAC 配置
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: cicd-deployer
namespace: production
rules:
- apiGroups: ["apps"]
resources: ["deployments"]
verbs: ["get", "list", "update"]
11.2 Network Policy
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: default-deny-ingress
spec:
podSelector: {}
policyTypes:
- Ingress
11.3 镜像安全扫描
# Trivy 扫描
docker run --rm \
-v /var/run/docker.sock:/var/run/docker.sock \
aquasec/trivy image myapp:latest
12. 性能优化
12.1 Jenkins 优化
- 使用持久化 Agent 减少启动时间
- 启用构建缓存加速依赖安装
- 并行执行独立任务
- 定期清理工作空间
12.2 Kubernetes 优化
- 合理设置资源请求和限制
- 使用节点亲和性优化调度
- 启用 HPA 自动扩缩容
- 使用本地持久卷提升 IO 性能
12.3 Docker 优化
- 多阶段构建减少镜像大小
- 优化 Dockerfile 层缓存
- 使用.alpine 基础镜像
- 定期清理未使用镜像
📎 附录
A. 常用命令速查
# Jenkins
docker restart jenkins
docker logs -f jenkins
# Kubernetes
kubectl get pods -A
kubectl describe pod <name>
kubectl logs -f <pod>
# Docker
docker ps -a
docker images
docker system prune -a
B. 联系支持
| 角色 |
联系方式 |
响应时间 |
| DevOps 团队 |
devops@yourdomain.com |
工作时间 2 小时 |
| 紧急值班 |
oncall@yourdomain.com |
7x24 小时 30 分钟 |