web-dev-qa-db-fra.com

les pods de Coredns ont l’état CrashLoopBackOff ou Error

J'essaie de configurer le maître Kubernetes en émettant:

kubeadm init --pod-network-cidr = 192.168.0.0/16

  1. suivi de: Installation d'un add-on pour réseau de pod (Calico)
  2. suivi de: Master Isolation

problème: les variables coredns ont le statut CrashLoopBackOff ou Error:

# kubectl get pods -n kube-system
NAME                                       READY   STATUS             RESTARTS   AGE
calico-node-lflwx                          2/2     Running            0          2d
coredns-576cbf47c7-nm7gc                   0/1     CrashLoopBackOff   69         2d
coredns-576cbf47c7-nwcnx                   0/1     CrashLoopBackOff   69         2d
etcd-suey.nknwn.local                      1/1     Running            0          2d
kube-apiserver-suey.nknwn.local            1/1     Running            0          2d
kube-controller-manager-suey.nknwn.local   1/1     Running            0          2d
kube-proxy-xkgdr                           1/1     Running            0          2d
kube-scheduler-suey.nknwn.local            1/1     Running            0          2d
# 

J'ai essayé avec Dépannage de kubeadm - Kubernetes , cependant mon noeud ne tourne pas SELinux et mon Docker est à jour.

# docker --version
Docker version 18.06.1-ce, build e68fc7a
# 

kubectl 's describe:

# kubectl -n kube-system describe pod coredns-576cbf47c7-nwcnx 
Name:               coredns-576cbf47c7-nwcnx
Namespace:          kube-system
Priority:           0
PriorityClassName:  <none>
Node:               suey.nknwn.local/192.168.86.81
Start Time:         Sun, 28 Oct 2018 22:39:46 -0400
Labels:             k8s-app=kube-dns
                    pod-template-hash=576cbf47c7
Annotations:        cni.projectcalico.org/podIP: 192.168.0.30/32
Status:             Running
IP:                 192.168.0.30
Controlled By:      ReplicaSet/coredns-576cbf47c7
Containers:
  coredns:
    Container ID:  docker://ec65b8f40c38987961e9ed099dfa2e8bb35699a7f370a2cda0e0d522a0b05e79
    Image:         k8s.gcr.io/coredns:1.2.2
    Image ID:      docker-pullable://k8s.gcr.io/coredns@sha256:3e2be1cec87aca0b74b7668bbe8c02964a95a402e45ceb51b2252629d608d03a
    Ports:         53/UDP, 53/TCP, 9153/TCP
    Host Ports:    0/UDP, 0/TCP, 0/TCP
    Args:
      -conf
      /etc/coredns/Corefile
    State:          Running
      Started:      Wed, 31 Oct 2018 23:28:58 -0400
    Last State:     Terminated
      Reason:       Error
      Exit Code:    137
      Started:      Wed, 31 Oct 2018 23:21:35 -0400
      Finished:     Wed, 31 Oct 2018 23:23:54 -0400
    Ready:          True
    Restart Count:  103
    Limits:
      memory:  170Mi
    Requests:
      cpu:        100m
      memory:     70Mi
    Liveness:     http-get http://:8080/health delay=60s timeout=5s period=10s #success=1 #failure=5
    Environment:  <none>
    Mounts:
      /etc/coredns from config-volume (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from coredns-token-xvq8b (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             True 
  ContainersReady   True 
  PodScheduled      True 
Volumes:
  config-volume:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      coredns
    Optional:  false
  coredns-token-xvq8b:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  coredns-token-xvq8b
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  <none>
Tolerations:     CriticalAddonsOnly
                 node-role.kubernetes.io/master:NoSchedule
                 node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason     Age                     From                       Message
  ----     ------     ----                    ----                       -------
  Normal   Killing    54m (x10 over 4h19m)    kubelet, suey.nknwn.local  Killing container with id docker://coredns:Container failed liveness probe.. Container will be killed and recreated.
  Warning  Unhealthy  9m56s (x92 over 4h20m)  kubelet, suey.nknwn.local  Liveness probe failed: HTTP probe failed with statuscode: 503
  Warning  BackOff    5m4s (x173 over 4h10m)  kubelet, suey.nknwn.local  Back-off restarting failed container
# kubectl -n kube-system describe pod coredns-576cbf47c7-nm7gc 
Name:               coredns-576cbf47c7-nm7gc
Namespace:          kube-system
Priority:           0
PriorityClassName:  <none>
Node:               suey.nknwn.local/192.168.86.81
Start Time:         Sun, 28 Oct 2018 22:39:46 -0400
Labels:             k8s-app=kube-dns
                    pod-template-hash=576cbf47c7
Annotations:        cni.projectcalico.org/podIP: 192.168.0.31/32
Status:             Running
IP:                 192.168.0.31
Controlled By:      ReplicaSet/coredns-576cbf47c7
Containers:
  coredns:
    Container ID:  docker://0f2db8d89a4c439763e7293698d6a027a109bf556b806d232093300952a84359
    Image:         k8s.gcr.io/coredns:1.2.2
    Image ID:      docker-pullable://k8s.gcr.io/coredns@sha256:3e2be1cec87aca0b74b7668bbe8c02964a95a402e45ceb51b2252629d608d03a
    Ports:         53/UDP, 53/TCP, 9153/TCP
    Host Ports:    0/UDP, 0/TCP, 0/TCP
    Args:
      -conf
      /etc/coredns/Corefile
    State:          Running
      Started:      Wed, 31 Oct 2018 23:29:11 -0400
    Last State:     Terminated
      Reason:       Error
      Exit Code:    137
      Started:      Wed, 31 Oct 2018 23:21:58 -0400
      Finished:     Wed, 31 Oct 2018 23:24:08 -0400
    Ready:          True
    Restart Count:  102
    Limits:
      memory:  170Mi
    Requests:
      cpu:        100m
      memory:     70Mi
    Liveness:     http-get http://:8080/health delay=60s timeout=5s period=10s #success=1 #failure=5
    Environment:  <none>
    Mounts:
      /etc/coredns from config-volume (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from coredns-token-xvq8b (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             True 
  ContainersReady   True 
  PodScheduled      True 
Volumes:
  config-volume:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      coredns
    Optional:  false
  coredns-token-xvq8b:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  coredns-token-xvq8b
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  <none>
Tolerations:     CriticalAddonsOnly
                 node-role.kubernetes.io/master:NoSchedule
                 node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason     Age                     From                       Message
  ----     ------     ----                    ----                       -------
  Normal   Killing    44m (x12 over 4h18m)    kubelet, suey.nknwn.local  Killing container with id docker://coredns:Container failed liveness probe.. Container will be killed and recreated.
  Warning  BackOff    4m58s (x170 over 4h9m)  kubelet, suey.nknwn.local  Back-off restarting failed container
  Warning  Unhealthy  8s (x102 over 4h19m)    kubelet, suey.nknwn.local  Liveness probe failed: HTTP probe failed with statuscode: 503
# 

kubectl 's log:

# kubectl -n kube-system logs -f coredns-576cbf47c7-nm7gc 
E1101 03:31:58.974836       1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:348: Failed to list *v1.Service: Get https://10.96.0.1:443/api/v1/services?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
E1101 03:31:58.974836       1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:355: Failed to list *v1.Namespace: Get https://10.96.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
E1101 03:31:58.974857       1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:350: Failed to list *v1.Endpoints: Get https://10.96.0.1:443/api/v1/endpoints?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
E1101 03:32:29.975493       1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:348: Failed to list *v1.Service: Get https://10.96.0.1:443/api/v1/services?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
E1101 03:32:29.976732       1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:355: Failed to list *v1.Namespace: Get https://10.96.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
E1101 03:32:29.977788       1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:350: Failed to list *v1.Endpoints: Get https://10.96.0.1:443/api/v1/endpoints?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
E1101 03:33:00.976164       1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:348: Failed to list *v1.Service: Get https://10.96.0.1:443/api/v1/services?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
E1101 03:33:00.977415       1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:355: Failed to list *v1.Namespace: Get https://10.96.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
E1101 03:33:00.978332       1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:350: Failed to list *v1.Endpoints: Get https://10.96.0.1:443/api/v1/endpoints?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
2018/11/01 03:33:08 [INFO] SIGTERM: Shutting down servers then terminating
E1101 03:33:31.976864       1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:348: Failed to list *v1.Service: Get https://10.96.0.1:443/api/v1/services?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
E1101 03:33:31.978080       1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:355: Failed to list *v1.Namespace: Get https://10.96.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
E1101 03:33:31.979156       1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:350: Failed to list *v1.Endpoints: Get https://10.96.0.1:443/api/v1/endpoints?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
# 

# kubectl -n kube-system log -f coredns-576cbf47c7-gqdgd
.:53
2018/11/05 04:04:13 [INFO] CoreDNS-1.2.2
2018/11/05 04:04:13 [INFO] linux/AMD64, go1.11, eb51e8b
CoreDNS-1.2.2
linux/AMD64, go1.11, eb51e8b
2018/11/05 04:04:13 [INFO] plugin/reload: Running configuration MD5 = f65c4821c8a9b7b5eb30fa4fbc167769
2018/11/05 04:04:19 [FATAL] plugin/loop: Seen "HINFO IN 3597544515206064936.6415437575707023337." more than twice, loop detected
# kubectl -n kube-system log -f coredns-576cbf47c7-hhmws
.:53
2018/11/05 04:04:18 [INFO] CoreDNS-1.2.2
2018/11/05 04:04:18 [INFO] linux/AMD64, go1.11, eb51e8b
CoreDNS-1.2.2
linux/AMD64, go1.11, eb51e8b
2018/11/05 04:04:18 [INFO] plugin/reload: Running configuration MD5 = f65c4821c8a9b7b5eb30fa4fbc167769
2018/11/05 04:04:24 [FATAL] plugin/loop: Seen "HINFO IN 6900627972087569316.7905576541070882081." more than twice, loop detected
# 

describe (apiserver):

# kubectl -n kube-system describe pod kube-apiserver-suey.nknwn.local
Name:               kube-apiserver-suey.nknwn.local
Namespace:          kube-system
Priority:           2000000000
PriorityClassName:  system-cluster-critical
Node:               suey.nknwn.local/192.168.87.20
Start Time:         Fri, 02 Nov 2018 00:28:44 -0400
Labels:             component=kube-apiserver
                    tier=control-plane
Annotations:        kubernetes.io/config.hash: 2433a531afe72165364aace3b746ea4c
                    kubernetes.io/config.mirror: 2433a531afe72165364aace3b746ea4c
                    kubernetes.io/config.seen: 2018-11-02T00:28:43.795663261-04:00
                    kubernetes.io/config.source: file
                    scheduler.alpha.kubernetes.io/critical-pod: 
Status:             Running
IP:                 192.168.87.20
Containers:
  kube-apiserver:
    Container ID:  docker://659456385a1a859f078d36f4d1b91db9143d228b3bc5b3947a09460a39ce41fc
    Image:         k8s.gcr.io/kube-apiserver:v1.12.2
    Image ID:      docker-pullable://k8s.gcr.io/kube-apiserver@sha256:094929baf3a7681945d83a7654b3248e586b20506e28526121f50eb359cee44f
    Port:          <none>
    Host Port:     <none>
    Command:
      kube-apiserver
      --authorization-mode=Node,RBAC
      --advertise-address=192.168.87.20
      --allow-privileged=true
      --client-ca-file=/etc/kubernetes/pki/ca.crt
      --enable-admission-plugins=NodeRestriction
      --enable-bootstrap-token-auth=true
      --etcd-cafile=/etc/kubernetes/pki/etcd/ca.crt
      --etcd-certfile=/etc/kubernetes/pki/apiserver-etcd-client.crt
      --etcd-keyfile=/etc/kubernetes/pki/apiserver-etcd-client.key
      --etcd-servers=https://127.0.0.1:2379
      --insecure-port=0
      --kubelet-client-certificate=/etc/kubernetes/pki/apiserver-kubelet-client.crt
      --kubelet-client-key=/etc/kubernetes/pki/apiserver-kubelet-client.key
      --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
      --proxy-client-cert-file=/etc/kubernetes/pki/front-proxy-client.crt
      --proxy-client-key-file=/etc/kubernetes/pki/front-proxy-client.key
      --requestheader-allowed-names=front-proxy-client
      --requestheader-client-ca-file=/etc/kubernetes/pki/front-proxy-ca.crt
      --requestheader-extra-headers-prefix=X-Remote-Extra-
      --requestheader-group-headers=X-Remote-Group
      --requestheader-username-headers=X-Remote-User
      --secure-port=6443
      --service-account-key-file=/etc/kubernetes/pki/sa.pub
      --service-cluster-ip-range=10.96.0.0/12
      --tls-cert-file=/etc/kubernetes/pki/apiserver.crt
      --tls-private-key-file=/etc/kubernetes/pki/apiserver.key
    State:          Running
      Started:      Sun, 04 Nov 2018 22:57:27 -0500
    Last State:     Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Sun, 04 Nov 2018 20:12:06 -0500
      Finished:     Sun, 04 Nov 2018 22:55:24 -0500
    Ready:          True
    Restart Count:  2
    Requests:
      cpu:        250m
    Liveness:     http-get https://192.168.87.20:6443/healthz delay=15s timeout=15s period=10s #success=1 #failure=8
    Environment:  <none>
    Mounts:
      /etc/ca-certificates from etc-ca-certificates (ro)
      /etc/kubernetes/pki from k8s-certs (ro)
      /etc/ssl/certs from ca-certs (ro)
      /usr/local/share/ca-certificates from usr-local-share-ca-certificates (ro)
      /usr/share/ca-certificates from usr-share-ca-certificates (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             True 
  ContainersReady   True 
  PodScheduled      True 
Volumes:
  etc-ca-certificates:
    Type:          HostPath (bare Host directory volume)
    Path:          /etc/ca-certificates
    HostPathType:  DirectoryOrCreate
  k8s-certs:
    Type:          HostPath (bare Host directory volume)
    Path:          /etc/kubernetes/pki
    HostPathType:  DirectoryOrCreate
  ca-certs:
    Type:          HostPath (bare Host directory volume)
    Path:          /etc/ssl/certs
    HostPathType:  DirectoryOrCreate
  usr-share-ca-certificates:
    Type:          HostPath (bare Host directory volume)
    Path:          /usr/share/ca-certificates
    HostPathType:  DirectoryOrCreate
  usr-local-share-ca-certificates:
    Type:          HostPath (bare Host directory volume)
    Path:          /usr/local/share/ca-certificates
    HostPathType:  DirectoryOrCreate
QoS Class:         Burstable
Node-Selectors:    <none>
Tolerations:       :NoExecute
Events:            <none>
# 

syslog (hôte):

4 novembre 22:59:36 suey kubelet [1234]: E1104 22: 59: 36.139538 1234 pod_workers.go: 186] Erreur lors de la synchronisation du pod d8146b7e-de57-11e8-a1e2-ec8eb57434c8 ("coredns-576cbf47c7-hhmws_kube-system (d8146b7e-de-11-118-a1e2-ec8eb57434c8)"), sautant: échec de "StartContainer" pour "coredns" avec CrashLoopBackOff: "Retarder le redémarrage de 40 secondes container = coredns Pod = coredns-576cbf47c7-hhmws_kube-system (d8146b7e-de57-11e8-a1e2-ec8eb57434c8)"

S'il vous plaît donnez votre avis.

4
alexus

Cette erreur

[FATAL] plugin/loop: Seen "HINFO IN 6900627972087569316.7905576541070882081." more than twice, loop detected

est provoqué lorsque CoreDNS détecte une boucle dans la configuration de résolution et qu'il s'agit du comportement souhaité. Vous frappez ce problème:

https://github.com/kubernetes/kubeadm/issues/1162

https://github.com/coredns/coredns/issues/2087

Solution Hacky: Désactiver la détection de boucle CoreDNS

Editez le configmap CoreDNS:

kubectl -n kube-system edit configmap coredns

Supprimez ou commentez la ligne avec loop, enregistrez et quittez.

Ensuite, supprimez les pods CoreDNS pour en créer de nouveaux avec une nouvelle configuration:

kubectl -n kube-system delete pod -l k8s-app=kube-dns

Tout devrait bien se passer après.

Solution recommandée: supprimez la boucle dans la configuration DNS

Tout d’abord, vérifiez si vous utilisez systemd-resolved. Si vous utilisez Ubuntu 18.04, c'est probablement le cas.

systemctl list-unit-files | grep enabled | grep systemd-resolved

Si tel est le cas, vérifiez le fichier resolv.conf que votre cluster utilise comme référence:

ps auxww | grep kubelet

Vous pourriez voir une ligne comme:

/usr/bin/kubelet ... --resolv-conf=/run/systemd/resolve/resolv.conf

La partie importante est --resolv-conf - nous déterminons si systemd resolv.conf est utilisé ou non. 

S'il s'agit du resolv.conf de systemd, procédez comme suit:

Vérifiez le contenu de /run/systemd/resolve/resolv.conf pour voir s’il existe un enregistrement du type:

nameserver 127.0.0.1

S'il existe 127.0.0.1, c'est celui qui provoque la boucle.

Pour vous en débarrasser, vous ne devez pas éditer ce fichier, mais consulter d’autres emplacements pour le générer correctement.

Vérifiez tous les fichiers sous /etc/systemd/network et si vous trouvez un enregistrement comme

DNS=127.0.0.1

supprimer cet enregistrement. Vérifiez également /etc/systemd/resolved.conf et faites la même chose si nécessaire. Assurez-vous d’avoir au moins un ou deux serveurs DNS configurés, tels que 

DNS=1.1.1.1 1.0.0.1

Après avoir fait tout cela, redémarrez les services systemd pour que vos modifications prennent effet: Systemctl redémarrez systemd-networkd avec la résolution systemd

Après cela, vérifiez que DNS=127.0.0.1 ne figure plus dans le fichier resolv.conf:

cat /run/systemd/resolve/resolv.conf

Enfin, déclencher la recréation des pods DNS

kubectl -n kube-system delete pod -l k8s-app=kube-dns

Résumé: La solution consiste à supprimer de la configuration DNS de l'hôte une boucle de recherche DNS. Les étapes varient entre différents gestionnaires/implémentations de resolv.conf.

8
Utku Özdemir

Voici quelques exemples de hackery Shell qui automatise Utku 's answer :

# remove loop from DNS config files
Sudo find /etc/systemd/network /etc/systemd/resolved.conf -type f \
    -exec sed -i '/^DNS=127.0.0.1/d' {} +

# if necessary, configure some DNS servers (use cloudfare public)
if ! grep '^DNS=.*' /etc/systemd/resolved.conf; then
    Sudo sed -i '$aDNS=1.1.1.1 1.0.0.1' /etc/systemd/resolved.conf
fi

# restart systemd services
Sudo systemctl restart systemd-networkd systemd-resolved

# force (re-) creation of the dns pods
kubectl -n kube-system delete pod -l k8s-app=kube-dns
0
rubicks