服务容器化改造

服务容器化改造

三月 02, 2021

自从去年下半年开始做韩国项目,有半年都没得东西可写了。。。(棒子各种都要写成文档仿佛干了半年文档工程师)开年公司业务容器化改造贡献我新年第一篇笔记。
在此记录下这次新零售业务容器化改造的流程和遇到的问题


背景:这次改造是要把新零售项目组现在物理部署的业务服务改造迁移到腾讯云TKE上 附上官网 https://cloud.tencent.com/document/product/457/11741 ,线上有一套物理部署的业务在运行中,改造完测试通过后直接切换到线上,那么物理部署的服务需要再搞一套出来做测试使用

接到任务后先列下容器化改造需要做些什么工作:
[x] 腾讯云子网:retail-nat
[x] TKE集群:(统计现使用物理磁盘大小 = 日志个数 * 日志大小/G, 统计CPU/MEM)(305G 2H4G30) 根据现在线上的情况集群机器开了配置为 磁盘200G 16H32G * 3 3节点组成的集群
[x] 云数据库:mysql、redis、mongodb 创建云数据库并使用数据迁移服务进行同步数据进行测试,等测试完正式迁移时需再次同步线上数据
[x] 中间件:ckafka、rocketmq 因为中间件服务暂不容器化,ckafka是腾讯云的消息队列服务直接购买使用,rocketmq需要搭建一个做测试使用
[x] 物理部署的服务:apollo 配置系统新开机器搭建一套做测试使用
[x] 构建镜像方式:集成jenkins构建推送镜像(镜像仓库也用腾讯云的镜像仓库)
[x] 创建pod服务:新零售业务服务/27个业务服务
[x] 物理部署的服务:nginx 以及前端服务3台 (使用预发布环境的前端进行测试)

创建好以上服务的机器信息:
retail-tke-test-mysql: 2H4G 50GB 172.XX.XX.47:3306 密码 ****
retail-tke-test-redis: 8G 172.XX.XX.49:6379 密码 ****
retail-tke-test-mongodb: 2H4G 100G 172.XX.XX.37:27017 密码 ****
retail-tke-apollo-mysql: 2H4G 25G 172.XX.XX.18:3306 密码 ****

apollo-adminservice/apollo-configservice: 2H4G 20G 172.XX.XX.10/公网IP.XX.XX.XX
retail-tke-test-kafka: 入门型 40MB/s 300GB 保留7天 172.XX.XX.117:9092
retail-tke-test-rocketmq: 2H4G 172.XX.XX.28:9876 /http://公网IP.XX.XX.XX:9888

开始构建镜像:
服务的改造需要开发和运维一起完成,我和一个开发他改代码仓库配置,我做jenkins打包配置,经过2天的努力27个服务打包镜像顺利完工,其中还是遇到了几个小问题但其实都是因为平时不规范导致的小问题

前提: java项目,三个git仓库,多项目互相依赖
1.仓库缺少xml配置,个别服务被当成了独立项目处理。需要开发修改xml配置

2.打包过程中有的服务有copy包到指定目录有的包没有。需要开发修改仓库配置添加
3.缺失测试分支,为了测试容器单独拉取了一个容器测试分支。个别服务漏掉

打包除了模块名仓库不同,其余配置基本都一样,有特殊服务启动方式不一样可再单独配置修改,用其中一个说明下打包配置
jenkins构建一个maven项目

1.选项参数配置不同分支,方便切换分支打包,测试时分支和线上分支不同

每次构建都要选择复制很麻烦,所以有了下面这种

分支可选


2.打包maven项目命令 clean install

3.打包镜像shell脚本,其实就是创建2个文件,一个用于服务启动的启动脚本,一个Dockerfile启动容器

提一下基础镜像:
特殊服务需求不一样,其中一个“电子合同服务”对字体有要求,需要安装字体服务打成基础镜像
一开始使用alpine基础镜像安装字体服务,因为apk安装ttmkfdir失败,后换了centos做这个合同服务的基础镜像
安装字体参照文档 https://blog.csdn.net/wlwlwlwl015/article/details/51482065

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
#sudo docker pull XX.tencentcloudcr.com/retail/centos7-jdk8:latest

#docker images |grep centos7-jdk8

#docker run -it 5e1f5082e270 bash
bash-4.4# ls

install...

#docker cp ./simhei.ttf CONTAINER ID:/usr/share/fonts/chinese
#docker cp ./simsun.ttc CONTAINER ID:/usr/share/fonts/chines

# docker commit CONTAINER ID XX.tencentcloudcr.com/retail/centos7-jdk8:latest

# docker push XX.tencentcloudcr.com/retail/centos7-jdk8:latest

将基础镜像从公司内网的harbor仓库上传到腾讯云的镜像仓库操作:
如果从公司机房测试机器拿现成的基础镜像,实际重新打个tag上传

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
# docker login XX.tencentcloudcr.com --username 10*******9 --password XXXXXXXX
WARNING! Using --password via the CLI is insecure. Use --password-stdin.
WARNING! Your password will be stored unencrypted in /root/.docker/config.json.
Configure a credential helper to remove this warning. See
https://docs.docker.com/engine/reference/commandline/login/#credentials-store

Login Succeeded

# docker tag retail-apline-jdk-iptables:v0.0.0 XX.tencentcloudcr.com/retail/retail-apline-jdk-iptables:v0.0.0

# docker push XX.tencentcloudcr.com/retail/retail-apline-jdk-iptables:v0.0.0
The push refers to repository [XX.tencentcloudcr.com/retail/retail-apline-jdk-iptables]
f5baedc01474: Pushed
ec492e08774f: Pushed
8a25136c0f9e: Pushed
d2b392ab3f1b: Pushed
382bfee21a0d: Mounted from prod/registry-server
3aaca7e95311: Mounted from test/user-center
e8868a909fb5: Mounted from test/user-center
e874f7e70fff: Mounted from test/user-center
6f54606e6969: Mounted from test/user-center
a464c54f93a9: Mounted from test/user-center
v0.0.0: digest: sha256:d48890d9202928c2ca0cb2022068d4a0d2eaaddf02f6b628d145e5343c0c04af size: 2418

搭建rocketmq单节点时遇到的问题记录:
测试时搭建一个单节点的,从现有线上打包拷贝到了测试环境,修改配置参数启动报错,超出内存,但其实配置中都已经调小的内存参数,还是会报内存错误
解决办法如下:

加一个配置mappedFileSizeCommitLog,131989504为1/16 默认是1gb

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
# sh /usr/local/rocketmq-all-4.5.2-bin-release/bin/mqbroker -c /usr/local/rocketmq-all-4.5.2-bin-release/conf/2m-noslave/broker-a.properties -p | grep FileSize
2021-02-25 17\:16\:34 INFO main - mappedFileSizeCommitLog=131989504
2021-02-25 17\:16\:34 INFO main - mappedFileSizeConsumeQueue=6000000
2021-02-25 17\:16\:34 INFO main - mappedFileSizeConsumeQueueExt=50331648


# cat /usr/local/rocketmq-all-4.5.2-bin-release/conf/2m-noslave/broker-a.properties

brokerClusterName=rocketmq-cluster
brokerName=broker-a
brokerId=0
namesrvAddr=172.XX.XX.28:9876
defaultTopicQueueNums=6
autoCreateTopicEnable=true
autoCreateSubscriptionGroup=true
listenPort=10911
deleteWhen=04
fileReservedTime=48
storePathRootDir=/data/rocketmq/store
storePathCommitLog=/data/rocketmq/store/commitlog
storePathConsumeQueue=/data/rocketmq/store/consumequeue
storePathIndex=/data/rocketmq/store/index
storeCheckpoint=/data/rocketmq/store/checkpoint
abortFile=/data/rocketmq/store/abort
brokerRole=ASYNC_MASTER
flushDiskType=ASYNC_FLUSH
messageDelayLevel=1s 5s 10s 30s 1m 2m 3m 4m 5m 6m 7m 8m 9m 10m 20m 30m 1h 2h
maxTransferCountOnMessageInMemory=400
#发送消息的最大线程数
sendMessageThreadPoolNums=32
#large thread numbers
#发送消息是否使用可重入锁
useReentrantLockWhenPutMessage=true

waitTimeMillsInSendQueue=1000
transientStorePoolEnable=true
transientStorePoolSize=5
mappedFileSizeCommitLog=131989504

# nohup sh /usr/local/rocketmq-all-4.5.2-bin-release/bin/mqbroker -c /usr/local/rocketmq-all-4.5.2-bin-release/conf/2m-noslave/broker-a.properties > /dev/null 2>&1 &

启动web console
# nohup java -Drocketmq.config.namesrvAddr=172.XX.XX.28:9876 -jar /usr/local/rocketmq-ng/rocketmq-console-ng-1.0.0.jar > /dev/null 2>&1 &

创建pod时一开始使用模板替换服务名字和镜像启动了。。。但是后来和开发沟通发现服务端口有的需要外部通信有的需要集群通信,有的必须有固定IP,那么在创建pod时就需要设置service,service最好和其余配置一起生成

pod yaml模板

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
apiVersion: apps/v1
kind: Deployment
metadata:
annotations:
deployment.kubernetes.io/revision: "1"
labels:
k8s-app: lumi-new-retail-logs
qcloud-app: lumi-new-retail-logs
name: lumi-new-retail-logs
namespace: retail
spec:
progressDeadlineSeconds: 600
replicas: 1
revisionHistoryLimit: 10
selector:
matchLabels:
k8s-app: lumi-new-retail-logs
qcloud-app: lumi-new-retail-logs
strategy:
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
type: RollingUpdate
template:
metadata:
creationTimestamp: null
labels:
k8s-app: lumi-new-retail-logs
qcloud-app: lumi-new-retail-logs
spec:
containers:
- env:
- name: MODEL_NAME
value: lumi-new-retail-logs
- name: SW_ADDR
valueFrom:
configMapKeyRef:
key: SW_ADDR
name: basic-config
optional: false
image: XX.tencentcloudcr.com/retail/lumi_new_retail_logs:c2b6f558-202103011715
imagePullPolicy: IfNotPresent
name: lumi-new-retail-logs
resources:
limits:
cpu: "2"
memory: 4Gi
requests:
cpu: "1"
memory: 2Gi
securityContext:
privileged: false
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /opt/skywalking/agent
name: sk-agent
dnsPolicy: ClusterFirst
imagePullSecrets:
- name: retail-aqara
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
terminationGracePeriodSeconds: 30
volumes:
- hostPath:
path: /opt/skywalking/agent7
type: ""
name: sk-agent

service yaml模板

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
apiVersion: v1
kind: Service
metadata:
managedFields:
- apiVersion: v1
manager: tke-apiserver
name: lumi-new-retail-logs
namespace: retail
spec:
ports:
- name: 9440-9440-tcp
port: 9440
protocol: TCP
targetPort: 9440
selector:
k8s-app: lumi-new-retail-logs
qcloud-app: lumi-new-retail-logs
sessionAffinity: None
type: ClusterIP
status:
loadBalancer: {}

在腾讯云上创建容器pod
先创建一个业务服务的命名空间,进入工作负载中创建pod

其中需要注意的是是否有服务需要挂载,添加数据卷,我这里需要挂载skywalking的agent

数据卷创建好后记得添加挂载点,还有环境变量,就是jenkins打包中的shell脚本里的变量在pod中传参

启动参数:
启动参数种的变量可写死,也可以都改成变量,我为了创建pod时方便部分写死部分为变量

1
java ${JAVA_OPTS} -Dspring.profiles.active=prod -Dfile.encoding=UTF-8 -Djava.awt.headless=true -javaagent:/opt/skywalking/agent/skywalking-agent.jar -Dskywalking.agent.service_name=$MODEL_NAME -Dskywalking.collector.backend_service=$SW_ADDR -jar ${JARNAME}

还需要注意服务的方式方式,服务是集群内访问还是有外网访问的情况。节点机器如果有公网IP可以选择使用主机端口访问。
基于公司的业务,容器服务之间需要内部通信的选择了集群内访问(将提供一个可以被集群内其他服务或容器访问的入口,支持TCP/UDP协议,数据库类服务如Mysql可以选择集群内访问,来保证服务网络隔离性),需要有外网访问的选择了主机端口访问(提供一个主机端口映射到容器的访问方式,支持TCP&UDP, 可用于业务定制上层LB转发到Node),提供公网访问,会自动创建CLB

节点机器设置kubectl

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
[root@retail-tke-slave1 ~]# kubectl  get ns
Unable to connect to the server: dial tcp: lookup cls-nXX.ccs.tencent-cloud.com on 183.XX.XX.XX:53: no such host

[root@retail-tke-slave1 ~]# sudo sed -i '$a 172.XX.XX.128 cls-n128cem7.ccs.tencent-cloud.com' /etc/hosts

[root@retail-tke-slave1 ~]# kubectl get ns
NAME STATUS AGE
default Active 3d15h
kube-node-lease Active 3d15h
kube-public Active 3d15h
kube-system Active 3d15h
retail Active 3d15h
[root@retail-tke-slave1 ~]# kubectl get svc -n etail
No resources found in etail namespace.
[root@retail-tke-slave1 ~]# kubectl get svc -n retail
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
lumi-biz-retail-open ClusterIP 10.1.0.57 <none> 9325/TCP 23h
lumi-new-retail-employee-gateway ClusterIP 10.1.0.34 <none> 9395/TCP 22h
lumi-new-retail-gateway ClusterIP 10.1.0.96 <none> 9360/TCP 22h
lumi-retail-center NodePort 10.1.0.6 <none> 9406:30000/TCP 22h

从终端查看日志:
也可以从腾讯云TKE上查看

1
2
# kubectl  get  pods  -n retail
# kubectl logs -f lumi-new-retail-employee-gateway-7fddf6f6d9-8md6n -n retail

启动服务后排查问题:
手动替换容器中的jar包,检测排除jar包的问题

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
[root@retail-tke-slave1 ~]# # Received /Users/zhaodan/Downloads/lumi_retail_xxljob_admin-0.0.1.jar
[root@retail-tke-slave1 ~]# ls
lumi_retail_xxljob_admin-0.0.1.jar
[root@retail-tke-slave1 ~]# docker cp ./lumi_retail_xxljob_admin-0.0.1.jar d3cf17ce6770:/data/
[root@retail-tke-slave1 ~]# docker exec -it d3cf17ce6770 /bin/bash
bash-4.4# ls
apollo dev home media proc sbin tmp
bin entrypoint.sh lib mnt root srv usr
data etc lib64 opt run sys var
bash-4.4# cd data/
bash-4.4# ll
bash: ll: command not found
bash-4.4# ls -l
total 41288
-rw-r--r-- 1 root root 274 Mar 1 14:58 Dockerfile
-rw-r--r-- 1 root root 1904 Mar 1 14:58 entrypoint.sh
-rw------- 1 root root 42268422 Mar 1 16:00 lumi_retail_xxljob_admin-0.0.1.jar


[root@retail-tke-slave1 ~]# docker ps|grep xxljob
d3cf17ce6770 aqara.tencentcloudcr.com/retail/lumi_retail_xxljob_admin "sh entrypoint.sh" 24 minutes ago Up 24 minutes k8s_lumi-retail-xxljob-admin_lumi-retail-xxljob-admin-6df78757-z86sh_retail_8a492f04-1311-48de-9323-fb7d98147110_2
781df3652630 ccr.ccs.tencentyun.com/library/pause:latest "/pause" About an hour ago Up About an hour k8s_POD_lumi-retail-xxljob-admin-6df78757-z86sh_retail_8a492f04-1311-48de-9323-fb7d98147110_0
[root@retail-tke-slave1 ~]# docker tag aqara.tencentcloudcr.com/retail/lumi_retail_xxljob_admin aqara.tencentcloudcr.com/retail/lumi_retail_xxljob_admin_test

[root@retail-tke-slave1 ~]# docker commit d3cf17ce6770 aqara.tencentcloudcr.com/retail/lumi_retail_xxljob_admin:20210301
sha256:9fd32ac163fcb3c1ca09501a42022514100b434c060644877e235aa72d293951
[root@retail-tke-slave1 ~]# docker push aqara.tencentcloudcr.com/retail/lumi_retail_xxljob_admin:20210301
The push refers to repository [aqara.tencentcloudcr.com/retail/lumi_retail_xxljob_admin]

经过排查发现 xxljob启动报错原因是bug 有概率触发空指针 https://github.com/xuxueli/xxl-job/issues/1060


nginx server域配置,域名解析方式不同,nginx配置方式不同
(背景是使用现有测试环境的前端进行测试,使用nginx转发流量到腾讯云的容器pod服务中,现nginx配置中已经有一套内网的测试服务,需要再添加一个server域转发)

域名 http://operation-k8s-test.XX.cn/ 域名转发直连,nginx中的server域端口使用不同,测试时需要 http://域名:端口的方式访问
域名 https://retail-k8s-test.XX.cn 域名是指向公司(腾讯云公网nginx),nginx转发到了内部,所以服务的nginx server域中端口都配置成80的话,只会转到第一个80端口的域名上,这时需要找IT帮忙更改域名retail-k8s-test.XX.cn 转发到8081端口(不和原server域端口一样就行)

通过监测gateway服务日志发现,nginx没有配置正确之前,访问测试域名流量依旧转发到了内网的后端服务中


下次更新日志收集 filebeat + kafka + logstash + elasticsearch + 日志中心

日志收集到现有的日志系统中,只需要在三节点安装filebeat进行收集日志,其余的就是修改logstash配置。没有现成的日志系统搭建一套也不是很麻烦

https://www.elastic.co/cn/downloads/beats/filebeat
安装filebeat filebeat-7.11.1-x86_64.rpm

1
rpm -ivh filebeat-7.11.1-x86_64.rpm

查看容器日志路径 /var/lib/docker/overlay2/*/merged/

1
# ls /var/lib/docker/overlay2/*/merged/root/data/logs/lumi-retail-open*.log

修改filebeat配置文件拿一个服务举例,其余服务配置照猫画虎

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
- input_type: log
enable: true
paths:
- /var/lib/docker/overlay2/*/merged/logs/user-center*.log
encoding: utf-8
exclude_files: [".gz$"]

ignore_older: 24h
scan_frequency: 1s
max_bytes: 104857600
multiline.pattern: ^[0-9]{4}-[0-9]{2}-[0-9]{2}
multiline.negate: true
multiline.match: after

registry_file: /var/lib/filebeat/registry
fields:
log_topics: user-center
env: retail


processors:
- drop_fields:
fields: ["@timestamp", "beat.version"]


#----------------------------- Kafka output --------------------------------

output.kafka:
hosts: ["172.XX.XX.8:9092","172.XX.XX.5:9092","172.XX.XX.13:9092"]
# message topic selection + partitioning
topic: '%{[fields][log_topics]}'
partition.round_robin:
reachable_only: false
required_acks: 1
compression: gzip
max_message_bytes: 1000000


# mv filebeat.yml /etc/filebeat/filebeat.yml

# systemctl enable filebeat
# systemctl start filebeat

kafka用现有日志系统的,不演示搭建过程了,也比较简单

logstash也用现有日志系统的,修改logstash配置文件,添加容器服务的日志,用一个服务举例子
需要注意的就是topic名字和filebeat中的一致就行

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
# ls /usr/local/logstash-6.5.4/config/kafka-logstash-elaticsearch.conf


input{
kafka{
bootstrap_servers => ["172.XX.XX.8:9092,172.XX.XX.5:9092,172.XX.XX.13:9092"]
topics => ["user-center"]
session_timeout_ms => "36000"
auto_offset_reset => "latest"
consumer_threads =>6
decorate_events =>true
type => "user-center"
group_id => "logstash"
client_id => "client2"
}

}

filter {
mutate {
rename => { "type" => "log_type" }
}

json {
source => "message"
}

mutate{
remove_field => "offset"
remove_field => "kafka"
remove_field => "@version"
remove_field => "input_type"
remove_field => "fields"
remove_field => "type"
remove_field => "agent"
remove_field => "ecs"
remove_field => "beat"
}

grok {
match => {"message" => "%{TIMESTAMP_ISO8601:logtime} \[TID:\s*%{GREEDYDATA:tid}\] \[%{GREEDYDATA:thread}\] %{LOGLEVEL:loglevel} %{GREEDYDATA:msg}"}
remove_field => "message"
}
}

output {
# 处理后的日志入es
elasticsearch {
hosts => ["172.XX.XX.84:9200"]
index => "logstash-%{[log_type]}-%{+YYYY.MM.dd}"
template=>"/usr/local/logstash-6.5.4/config/es-template.json"
template_name=>"es-template.json"
template_overwrite=>true
http_compression => true
}
}

省略搭建elasticache的步骤,不难

最后一步就是创建日志索引了,公司现在使用自己开发的日志中心,其实和kibana没什么区别,和在kibana中创建日志索引是一样的


下次更新容器监控 prometheus

集群节点使用prometheus进行监控,容器pod暂时还是使用腾讯云自带的云监控进行报警(prometheus监控pod报警还需要对上报的指标进行筛除再设置rules,还在调试中)

prometheus钉钉报警使用到的组件服务:
prometheus-2.6.0.linux-amd64
prometheus-webhook-dingtalk-0.3.0.linux-amd64
alertmanager-0.15.3.linux-amd64
node_exporter-0.17.0.linux-amd64

三节点先安装node-exporter

1
2
3
4
5
6
# rz
# Received /Users/zhaodan/Downloads/腾讯云TKE迁移/node_exporter-0.17.0.linux-amd64.tar.gz
# ls
# tar -zxvf node_exporter-0.17.0.linux-amd64.tar.gz -C /usr/local/
# nohup /usr/local/node_exporter-0.17.0.linux-amd64/node_exporter &>/dev/null &
# ps -ef |grep exporter

修改prometheus.yml添加容器配置,服务状态从注册中心获取

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
- job_name: 'tke-retail'
metrics_path: '/actuator/prometheus'
eureka_sd_configs:
- server: "http://172.XX.XX.10:30000/eureka"
relabel_configs:
- target_label: __meta_eureka_app_instance_country_id
replacement: bjtxy
- target_label: __meta_eureka_app_instance_datacenterinfo_name
replacement: retail
- source_labels: [__meta_eureka_app_instance_hostname]
action: replace
#target_label: "ip"
target_label: "instance"
- source_labels: [__meta_eureka_app_name]
action: replace
#target_label: "service"
target_label: "web"
- source_labels: [__meta_eureka_app_instance_country_id]
action: replace
target_label: "region"
- source_labels: [__meta_eureka_app_instance_datacenterinfo_name]
action: replace
target_label: "department"

- job_name: 'bj-retail-tke-pod'
metrics_path: '/metrics/cadvisor'
scrape_interval: 10s
static_configs:
- targets:
- 172.XX.XX.10:10255
- 172.XX.XX.5:10255
- 172.XX.XX.6:10255
labels:
web: tke-pod
region: bjtxy
department: retail

- job_name: 'bj-retail-tke-node'
scrape_interval: 30s
scrape_timeout: 15s
metrics_path: "/metrics"
static_configs:
- targets:
- 172.XX.XX.10:9100
- 172.XX.XX.5:9100
- 172.XX.XX.6:9100
labels:
service: tke-node
region: bjtxy
department: retail
relabel_configs:
- source_labels: [__address__]
regex: '(.*)\:9100'
target_label: ‘instance'