|
@@ -0,0 +1,346 @@
|
|
|
+[TOC]
|
|
|
+
|
|
|
+# 江苏电力全链路观测项目实施手册
|
|
|
+
|
|
|
+
|
|
|
+
|
|
|
+## 基础设施
|
|
|
+
|
|
|
+### 部署nginx ingress controller(如有跳过)
|
|
|
+
|
|
|
+- 使用helm chart`/deploy/ingress-nginx`安装
|
|
|
+
|
|
|
+ ```bash
|
|
|
+ helm install ingress-nginx . -n ingress-nginx
|
|
|
+ ```
|
|
|
+
|
|
|
+
|
|
|
+
|
|
|
+- 如80、443端口不可用,修改`contorller.hostPort`
|
|
|
+
|
|
|
+- 修改`controller.ingressResource.enable/default`为true
|
|
|
+
|
|
|
+- 如`node`很多,将`controller.kind`改为deployment ,并设置nodeSelector,不然ingress解析不到
|
|
|
+
|
|
|
+### 部署openebs storage provisioner(如有跳过)
|
|
|
+
|
|
|
+```bash
|
|
|
+kubectl apply -f deploy/openebs/openebs-operator.yaml
|
|
|
+```
|
|
|
+
|
|
|
+
|
|
|
+
|
|
|
+
|
|
|
+
|
|
|
+## 数据存储
|
|
|
+
|
|
|
+### 部署clickhouse
|
|
|
+
|
|
|
+- 使用`deploy/clickhouse`
|
|
|
+- 修改`global.storageClass`
|
|
|
+- 修改`shard`和`replica`
|
|
|
+- 修改`auth.username`, `auth.password`
|
|
|
+- 开启`ingress`
|
|
|
+- 修改`persistence.storangeClass: openebs-hostpath`
|
|
|
+
|
|
|
+```bash
|
|
|
+helm install clickhouse . -n obse部署nginx ingress controller(如有跳过)部署nginx ingress controller(如有跳过)部署nginx ingress controller(如有跳过)rve --create-namespace
|
|
|
+```
|
|
|
+
|
|
|
+
|
|
|
+
|
|
|
+### 部署tempo
|
|
|
+
|
|
|
+- 使用`deploy/tempo`
|
|
|
+
|
|
|
+- 开启`otel receiver`
|
|
|
+
|
|
|
+ ```yaml
|
|
|
+ otlp:
|
|
|
+ protocols:
|
|
|
+ grpc:
|
|
|
+ endpoint: "0.0.0.0:4317"
|
|
|
+ http:
|
|
|
+ endpoint: "0.0.0.0:4318"
|
|
|
+ ```
|
|
|
+
|
|
|
+- 配置`metrics_generator_processors`
|
|
|
+
|
|
|
+ ```yaml
|
|
|
+ metrics_generator_processors:
|
|
|
+ - 'service-graphs'
|
|
|
+ - 'span-metrics'
|
|
|
+ max_search_bytes_per_trace: 0
|
|
|
+ ```
|
|
|
+
|
|
|
+- 配置`tempo.metricGenerator`, 指向部署的`prometheus`
|
|
|
+
|
|
|
+ ```yaml
|
|
|
+ metricsGenerator:
|
|
|
+ enabled: true
|
|
|
+ remoteWriteUrl: "http://prometheus-server.observe.svc.cluster.local:80/api/v1/write"
|
|
|
+ ```
|
|
|
+
|
|
|
+- 配置存储方式
|
|
|
+
|
|
|
+ ```yaml
|
|
|
+ persistence:
|
|
|
+ enabled: true
|
|
|
+ storageClassName: openebs-hostpath
|
|
|
+ ```
|
|
|
+
|
|
|
+- 执行部署
|
|
|
+
|
|
|
+```
|
|
|
+helm install tempo . -n observe
|
|
|
+```
|
|
|
+
|
|
|
+### MySQL-(数字化门户)
|
|
|
+
|
|
|
+
|
|
|
+## 数据收集
|
|
|
+
|
|
|
+### prometheus
|
|
|
+
|
|
|
+- 使用`deploy/prometheus`
|
|
|
+
|
|
|
+- 修改`persistentVolume.storageClass`
|
|
|
+
|
|
|
+- 执行部署
|
|
|
+
|
|
|
+ ```yaml
|
|
|
+ helm install prometheus . -n observe
|
|
|
+ ```
|
|
|
+
|
|
|
+
|
|
|
+
|
|
|
+### opentelemetry-collector
|
|
|
+
|
|
|
+- 确定部署`mode`
|
|
|
+
|
|
|
+- 配置`exportor`
|
|
|
+
|
|
|
+ ```yaml
|
|
|
+ otlp:
|
|
|
+ endpoint: "tempo.observe.svc.cluster.local:4317"
|
|
|
+ tls:
|
|
|
+ insecure: true
|
|
|
+ prometheus:
|
|
|
+ endpoint: "0.0.0.0:8889"
|
|
|
+ #namespace: default
|
|
|
+ clickhouse:
|
|
|
+ endpoint: "tcp://clickhouse-headless.observe.svc.cluster.local:9000?dial_timeout=10s&compress=lz4"
|
|
|
+ database: otel
|
|
|
+ username: default
|
|
|
+ password: "cecf@cestong.com"
|
|
|
+ ttl_days: 10
|
|
|
+ #logs_table: otel_logs
|
|
|
+ #traces_table: otel_traces
|
|
|
+ #metrics_table: otel_metrics
|
|
|
+ timeout: 5s
|
|
|
+ retry_on_failure:
|
|
|
+ enabled: true
|
|
|
+ initial_interval: 5s
|
|
|
+ max_interval: 30s
|
|
|
+ max_elapsed_time: 300s
|
|
|
+ ```
|
|
|
+
|
|
|
+- 开启`otel receiver`
|
|
|
+
|
|
|
+ ```yaml
|
|
|
+ otlp:
|
|
|
+ protocols:
|
|
|
+ grpc:
|
|
|
+ endpoint: ${MY_POD_IP}:4317
|
|
|
+ http:
|
|
|
+ endpoint: ${MY_POD_IP}:4318
|
|
|
+ ```
|
|
|
+
|
|
|
+- 配置`pipeline`,把`otel`收到的`trace`输出至`clickhouse`和`tempo ( otel )`
|
|
|
+
|
|
|
+- 配置`pipeline`,把`otel`收到的`metrics`输出至`clickhouse`和`prometheus`
|
|
|
+
|
|
|
+- 配置`podAnnotations`,让`prometheus`自动来采集收集到的`metrics`
|
|
|
+
|
|
|
+ ```yaml
|
|
|
+ podAnnotations:
|
|
|
+ prometheus.io/scrape: "true"
|
|
|
+ prometheus.io/path: /metrics
|
|
|
+ prometheus.io/port: "8889"
|
|
|
+ ```
|
|
|
+
|
|
|
+- 执行部署
|
|
|
+
|
|
|
+ ```bash
|
|
|
+ helm install otel-collector . -n observe
|
|
|
+ ```
|
|
|
+
|
|
|
+## 数字化门户-数据清洗程序
|
|
|
+### 部署流程
|
|
|
+1. 部署mysql
|
|
|
+ - git地址:https://git.cestong.com.cn/cecf/cluster-config
|
|
|
+ 所有配置已经修改好,直接部署即可
|
|
|
+ ```bash
|
|
|
+ cd mysql
|
|
|
+ helm -n observe install mysql .
|
|
|
+ ```
|
|
|
+ **注意** 当前配置使用的是NodePort向外暴露端口,端口固定为**30306**,客户环境可能需要更换,更换位置如下
|
|
|
+ ```txt
|
|
|
+ # values.yaml文件 477行
|
|
|
+ 476 nodePorts:
|
|
|
+ 477 mysql: "30306"
|
|
|
+ ```
|
|
|
+2. 初始化sql
|
|
|
+ ```bash
|
|
|
+ # 获取mysql root密码
|
|
|
+ MYSQL_ROOT_PASSWORD=$(kubectl get secret --namespace observe mysql -o jsonpath="{.data.mysql-root-password}" | base64 -d)
|
|
|
+ # 进入到同一个命名空间的mysql客户端的pod中
|
|
|
+ kubectl run mysql-client --rm --tty -i --restart='Never' --image docker.io/bitnami/mysql:5.7.42-debian-11-r27 --namespace observe --env MYSQL_ROOT_PASSWORD=$MYSQL_ROOT_PASSWORD --command -- bash
|
|
|
+ # 登录mysql
|
|
|
+ mysql -h mysql.observe.svc.cluster.local -uroot -p"$MYSQL_ROOT_PASSWORD"
|
|
|
+ # 选择数据库
|
|
|
+ use otel;
|
|
|
+ # 执行sql语句
|
|
|
+ ```
|
|
|
+3. 部署数据清洗程序
|
|
|
+ - git地址: https://git.cestong.com.cn/cecf/datacleaner
|
|
|
+ - 镜像地址: registry.cestong.com:8150/cecf/digit_portal_handler
|
|
|
+ - 部署方式
|
|
|
+ ```bash
|
|
|
+ kubectl -n observe apply -f cronjob.yaml
|
|
|
+ ```
|
|
|
+
|
|
|
+
|
|
|
+## 数据可视化展示
|
|
|
+
|
|
|
+### grafana
|
|
|
+
|
|
|
+- 修改`ingress`,配置对应的域名
|
|
|
+
|
|
|
+- 配置`persistence.storageClassName`
|
|
|
+
|
|
|
+- 设置`adminPassword`
|
|
|
+
|
|
|
+- 执行部署
|
|
|
+
|
|
|
+ ```bash
|
|
|
+ helm install grafana . -n observe
|
|
|
+ ```
|
|
|
+
|
|
|
+- 安装`clickhouse`插件
|
|
|
+
|
|
|
+- 设置`tempo`, `prometheus`, `clickhouse`数据源
|
|
|
+
|
|
|
+- 导入`dashboard`
|
|
|
+
|
|
|
+### observe-front/ui
|
|
|
+
|
|
|
+- 更改front/ui中的域名
|
|
|
+
|
|
|
+- 编译打包
|
|
|
+
|
|
|
+- 执行部署`deploy/obui`
|
|
|
+
|
|
|
+ ```yaml
|
|
|
+ kubectl apply -f deployment-front.yaml
|
|
|
+ kubectl apply -f deployment.yaml
|
|
|
+ kubectl apply -f ingress_rewrite.yaml
|
|
|
+ kubectl apply -f ingress.yaml
|
|
|
+ kubectl apply -f svc-front.yaml
|
|
|
+ kubectl apply -f svc.yaml
|
|
|
+ ```
|
|
|
+
|
|
|
+
|
|
|
+
|
|
|
+## 测试验证
|
|
|
+
|
|
|
+### 部署opentelemetry-demo
|
|
|
+
|
|
|
+- 配置`OTEL_COLLECTOR_NAME`指向部署的`opentelemetry-collector`
|
|
|
+
|
|
|
+- 执行部署
|
|
|
+
|
|
|
+ ```bash
|
|
|
+ helm install otel-demo . -n observe
|
|
|
+ ```
|
|
|
+
|
|
|
+
|
|
|
+
|
|
|
+### 验证效果
|
|
|
+
|
|
|
+
|
|
|
+
|
|
|
+## 真实流量接入
|
|
|
+
|
|
|
+### 配置java agent
|
|
|
+
|
|
|
+- 在目标监控程序中增加Java参数
|
|
|
+
|
|
|
+ - '-javaagent:/sidecar/agent/opentelemetry-javaagent.jar'
|
|
|
+ - '-Dotel.resource.attributes=service.name=item-svc'
|
|
|
+ - '-Dotel.traces.exporter=otlp'
|
|
|
+ - '-Dotel.metrics.exporter=otlp'
|
|
|
+
|
|
|
+
|
|
|
+
|
|
|
+## 访问控制
|
|
|
+
|
|
|
+### 创建ingress
|
|
|
+
|
|
|
+- grafana 的ingress 已经创建
|
|
|
+- obui/obfront的ingress由`deploy/obui`下的配置文件生成
|
|
|
+
|
|
|
+### 配置域名
|
|
|
+
|
|
|
+- 对于`ingress-nginx`绑定的主机ip, 将域名绑定到这些ip
|
|
|
+
|
|
|
+
|
|
|
+
|
|
|
+
|
|
|
+
|
|
|
+
|
|
|
+
|
|
|
+## 资源估算
|
|
|
+
|
|
|
+| 服务 | CPU | Mem | 存储 |
|
|
|
+| -------------- | ---- | ---- | ---- |
|
|
|
+| clickhouse | 4 | 8 | 200G |
|
|
|
+| tempo | 4 | 8 | 200G |
|
|
|
+| otel-collector | 3 | 6 | 0 |
|
|
|
+| prometheus | 2 | 4 | 100G |
|
|
|
+| grafana | 2 | 4 | 30G |
|
|
|
+| obui/front | 2 | 2 | 0 |
|
|
|
+| otel-demo | 4 | 8 | 0 |
|
|
|
+| 总计 | 21核 | 40G | 530G |
|
|
|
+
|
|
|
+
|
|
|
+
|
|
|
+
|
|
|
+
|
|
|
+## 资源实际使用
|
|
|
+
|
|
|
+| 服务 | CPU | Mem | 存储 |
|
|
|
+| -------------------- | -------- | ---- | ---- |
|
|
|
+| clickhouse | 4 | 8 | 200G |
|
|
|
+| clickhouse-zookeeper | 0.25/0.5 | | |
|
|
|
+| tempo | 4 | 8 | 200G |
|
|
|
+| otel-collector | 3 | 6 | 0 |
|
|
|
+| prometheus | 2 | 4 | 100G |
|
|
|
+| grafana | 2 | 4 | 30G |
|
|
|
+| obui/front | 2 | 2 | 0 |
|
|
|
+| otel-demo | 4 | 8 | 0 |
|
|
|
+| 总计 | 21核 | 40G | 530G |
|
|
|
+
|
|
|
+
|
|
|
+
|
|
|
+租户名称: observe
|
|
|
+
|
|
|
+
|
|
|
+
|
|
|
+### 问题
|
|
|
+
|
|
|
+- [x] docker hub的地址,push
|
|
|
+
|
|
|
+- [x] ingress class和storageclass名称
|
|
|
+- [ ] 域名转发到ingress上,ingress创建不了
|
|
|
+- [ ] prometheus `mustFromJson`
|