riscv的k3s集群中由于coredns问题,pod间不能ping通

openeuler+riscv的开发板部署了k3s集群,但coredns的pod存在启动问题,不同pod间不能ping通

k3s集群中部署了2个node,master中的coredns这个pod默认状态如下:

local-path-provisioner-6d44f4f9d7-z5b9c   0/1     CrashLoopBackOff   1089 (3m37s ago)   35d   10.42.0.25   openeuler-riscv64   <none>           <none>
metrics-server-7c55d89d5d-kpj5h           0/1     CrashLoopBackOff   1077 (2m50s ago)   35d   10.42.0.22   openeuler-riscv64   <none>           <none>
helm-install-traefik-crd-hhfn4            0/1     CrashLoopBackOff   820 (119s ago)     35d   10.42.0.21   openeuler-riscv64   <none>           <none>
helm-install-traefik-r6bm8                0/1     CrashLoopBackOff   819 (109s ago)     35d   10.42.0.24   openeuler-riscv64   <none>           <none>
coredns-97b598894-7l5ff                   0/1     CrashLoopBackOff   8 (54s ago)        17m   10.42.0.27   openeuler-riscv64   <none>           <none>

kubectl describe显示如下:

[root@openeuler-riscv64 ~]# kubectl get pods -n kube-system
NAME                                      READY   STATUS             RESTARTS          AGE
coredns-97b598894-tqr5v                   0/1     CrashLoopBackOff   9 (15h ago)       15h
metrics-server-7c55d89d5d-kpj5h           0/1     Running            1074 (119s ago)   35d
local-path-provisioner-6d44f4f9d7-z5b9c   0/1     CrashLoopBackOff   1086 (32s ago)    35d
helm-install-traefik-r6bm8                0/1     CrashLoopBackOff   815 (15s ago)     35d
helm-install-traefik-crd-hhfn4            0/1     CrashLoopBackOff   816 (15s ago)     35d                                                      coredns-97b598894-tqr5v
Name:                      coredns-97b598894-tqr5vn kube-system coredns-97b598894-tqr5v
Namespace:                 kube-system
Priority:                  2000000000
Priority Class Name:       system-cluster-critical
Service Account:           coredns
Node:                      k3s-air1/172.20.10.3
Start Time:                Mon, 22 Jan 2024 17:09:59 +0800
Labels:                    k8s-app=kube-dns
                           pod-template-hash=97b598894
Annotations:               <none>
Status:                    Terminating (lasts 2m4s)
Termination Grace Period:  30s
IP:                        10.42.1.26
IPs:
  IP:           10.42.1.26
Controlled By:  ReplicaSet/coredns-97b598894
Containers:
  coredns:
    Container ID:  docker://c5386186c0177f658a96df702607bca0f795185cc7438ae29b1065dca1051cbc
    Image:         carvicsforth/coredns:1.10.1
    Image ID:      docker-pullable://carvicsforth/coredns@sha256:6cd10cf78af68af9bfebc932c22724a64d4ce0e7ff94738aef6b92df7565f4b1
    Ports:         53/UDP, 53/TCP, 9153/TCP
    Host Ports:    0/UDP, 0/TCP, 0/TCP
    Args:
      -conf
      /etc/coredns/Corefile
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Mon, 22 Jan 2024 17:32:03 +0800
      Finished:     Mon, 22 Jan 2024 17:32:08 +0800
    Ready:          False
    Restart Count:  9
    Limits:
      memory:  170Mi
    Requests:
      cpu:        100m
      memory:     70Mi
    Liveness:     http-get http://:8080/health delay=60s timeout=1s period=10s #success=1 #failure=3
    Readiness:    http-get http://:8181/ready delay=0s timeout=1s period=2s #success=1 #failure=3
    Environment:  <none>
    Mounts:
      /etc/coredns from config-volume (ro)
      /etc/coredns/custom from custom-config-volume (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-dbth2 (ro)
Conditions:
  Type               Status
  Initialized        True
  Ready              False
  ContainersReady    False
  PodScheduled       True
  DisruptionTarget   True
Volumes:
  config-volume:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      coredns
    Optional:  false
  custom-config-volume:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      coredns-custom
    Optional:  true
  kube-api-access-dbth2:
    Type:                     Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:   3607
    ConfigMapName:            kube-root-ca.crt
    ConfigMapOptional:        <nil>
    DownwardAPI:              true
QoS Class:                    Burstable
Node-Selectors:               kubernetes.io/os=linux
Tolerations:                  CriticalAddonsOnly op=Exists
                              node-role.kubernetes.io/control-plane:NoSchedule op=Exists
                              node-role.kubernetes.io/master:NoSchedule op=Exists
                              node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                              node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Topology Spread Constraints:  kubernetes.io/hostname:DoNotSchedule when max skew 1 is exceeded for selector k8s-app=kube-dns
Events:
  Type     Reason     Age                 From               Message
  ----     ------     ----                ----               -------
  Normal   Scheduled  15h                 default-scheduler  Successfully assigned kube-system/coredns-97b598894-tqr5v to k3s-air1
  Normal   Pulled     15h (x3 over 15h)   kubelet            Container image "carvicsforth/coredns:1.10.1" already present on machine
  Normal   Created    15h (x3 over 15h)   kubelet            Created container coredns
  Normal   Started    15h (x3 over 15h)   kubelet            Started container coredns
  Warning  Unhealthy  15h (x14 over 15h)  kubelet            Readiness probe failed: HTTP probe failed with statuscode: 503
  Warning  BackOff    15h (x95 over 15h)  kubelet            Back-off restarting failed container coredns in pod coredns-97b598894-tqr5v_kube-system(5f26f744-5697-47c4-a895-7a1dbff23b96)

kubectl logs如下:

[WARNING] No files matching import glob pattern: /etc/coredns/custom/*.override
[WARNING] No files matching import glob pattern: /etc/coredns/custom/*.server
[INFO] plugin/ready: Still waiting on: "kubernetes"
[INFO] plugin/kubernetes: waiting for Kubernetes API before starting server
[INFO] plugin/kubernetes: waiting for Kubernetes API before starting server
[INFO] plugin/kubernetes: waiting for Kubernetes API before starting server
[INFO] plugin/ready: Still waiting on: "kubernetes"
[INFO] plugin/kubernetes: waiting for Kubernetes API before starting server
[INFO] plugin/ready: Still waiting on: "kubernetes"
[INFO] plugin/kubernetes: waiting for Kubernetes API before starting server
[INFO] plugin/kubernetes: waiting for Kubernetes API before starting server
[INFO] plugin/kubernetes: waiting for Kubernetes API before starting server
[INFO] plugin/kubernetes: waiting for Kubernetes API before starting server
[INFO] plugin/ready: Still waiting on: "kubernetes"
[INFO] plugin/kubernetes: waiting for Kubernetes API before starting server
[WARNING] plugin/kubernetes: starting server with unsynced Kubernetes API
Listen: listen tcp :53: bind: permission denied

用ps -ef查看发现并不是root执行,然后更改coredns.yaml文件,增加如下内容:

+affinity:
+        nodeAffinity:
+          requiredDuringSchedulingIgnoredDuringExecution:
+            nodeSelectorTerms:
+              - matchExpressions:
+                  - key: node-role.kubernetes.io/master
+                    operator: Exists
 nodeSelector:
         kubernetes.io/os: linux
......
securityContext:
+          runAsUser: 0
+          runAsGroup: 0
           allowPrivilegeEscalation: false

重启corddns这个pod后状态如下:

coredns-65dc9b694c-xx4pf                  0/1     Running            0                 63m   10.42.0.28   openeuler-riscv64   <none>           <none>
helm-install-traefik-crd-hhfn4            0/1     CrashLoopBackOff   829 (3m56s ago)   35d   10.42.0.21   openeuler-riscv64   <none>           <none>
helm-install-traefik-r6bm8                0/1     CrashLoopBackOff   828 (3m24s ago)   35d   10.42.0.24   openeuler-riscv64   <none>           <none>
local-path-provisioner-6d44f4f9d7-z5b9c   0/1     CrashLoopBackOff   1102 (3m ago)     35d   10.42.0.25   openeuler-riscv64   <none>           <none>
metrics-server-7c55d89d5d-kpj5h           0/1     CrashLoopBackOff   1090 (81s ago)    35d   10.42.0.22   openeuler-riscv64   <none>           <none>

状态为running,但ready还是0/1,kubectl describe显示的event如下:

Events:
  Type     Reason     Age                     From     Message
  ----     ------     ----                    ----     -------
  Warning  Unhealthy  4m17s (x1850 over 64m)  kubelet  Readiness probe failed: HTTP probe failed with statuscode: 503

logs如下:

[INFO] plugin/ready: Still waiting on: "kubernetes"
[INFO] plugin/kubernetes: pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:231: failed to list *v1.EndpointSlice: Get "https://10.43.0.1:443/apis/discovery.k8s.io/v1/endpointslices?limit=500&resourceVersion=0": dial tcp 10.43.0.1:443: i/o timeout
[INFO] plugin/kubernetes: Trace[1972775025]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:231 (23-Jan-2024 01:36:37.721) (total time: 30001ms):
Trace[1972775025]: ---"Objects listed" error:Get "https://10.43.0.1:443/apis/discovery.k8s.io/v1/endpointslices?limit=500&resourceVersion=0": dial tcp 10.43.0.1:443: i/o timeout 30001ms (01:37:07.723)
Trace[1972775025]: [30.001897321s] [30.001897321s] END
[ERROR] plugin/kubernetes: pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:231: Failed to watch *v1.EndpointSlice: failed to list *v1.EndpointSlice: Get "https://10.43.0.1:443/apis/discovery.k8s.io/v1/endpointslices?limit=500&resourceVersion=0": dial tcp 10.43.0.1:443: i/o timeout
[INFO] plugin/ready: Still waiting on: "kubernetes"
[INFO] plugin/ready: Still waiting on: "kubernetes"
[INFO] plugin/ready: Still waiting on: "kubernetes"
[WARNING] No files matching import glob pattern: /etc/coredns/custom/*.override
[WARNING] No files matching import glob pattern: /etc/coredns/custom/*.server

pod间还是不能ping通,有小伙伴有过类似的问题吗?