一旦你的应用程序运行起来了,你将不可避免地需要对它进行调试。 之前我们介绍过如何使用 kubectl get pod 来检索有关您的 pod 的简单状态信息。但还有很多方法可以获得有关应用程序的更多信息。
使用 kubectl describe pod 来获取有关 pod 的详细信息
在这个例子中,我们将使用 Deployment 来创建两个 pod,与前面的示例类似。
nginx-dep.yaml |
---|
apiVersion: apps/v1 kind: Deployment metadata: name: nginx-deployment spec: selector: matchLabels: app: nginx replicas: 2 template: metadata: labels: app: nginx spec: containers: - name: nginx image: nginx resources: limits: memory: "128Mi" cpu: "500m" ports: - containerPort: 80 |
使用如下命令来创建 deployment:
$ kubectl create -f https://k8s.io/docs/tasks/debug-application-cluster/nginx-dep.yaml deployment "nginx-deployment" created
$ kubectl get pods NAME READY STATUS RESTARTS AGE nginx-deployment-1006230814-6winp 1/1 Running 0 11s nginx-deployment-1006230814-fmgu3 1/1 Running 0 11s
我们可以使用 kubectl describe pod 获取每个 pod 的更多信息。例如:
$ kubectl describe pod nginx-deployment-1006230814-6winp Name: nginx-deployment-1006230814-6winp Namespace: default Node: kubernetes-node-wul5/10.240.0.9 Start Time: Thu, 24 Mar 2016 01:39:49 +0000 Labels: app=nginx,pod-template-hash=1006230814 Annotations: kubernetes.io/created-by={"kind":"SerializedReference","apiVersion":"v1","reference":{"kind" :"ReplicaSet","namespace":"default","name":"nginx-deployment-1956810328","uid":"14e607e7-8ba1-11e7-b5cb-fa16" ... Status: Running IP: 10.244.0.6 Controllers: ReplicaSet/nginx-deployment-1006230814 Containers: nginx: Container ID: docker://90315cc9f513c724e9957a4788d3e625a078de84750f244a40f97ae355eb1149 Image: nginx Image ID: docker://6f62f48c4e55d700cf3eb1b5e33fa051802986b77b874cc351cce539e5163707 Port: 80/TCP QoS Tier: cpu: Guaranteed memory: Guaranteed Limits: cpu: 500m memory: 128Mi Requests: memory: 128Mi cpu: 500m State: Running Started: Thu, 24 Mar 2016 01:39:51 +0000 Ready: True Restart Count: 0 Environment: <none> Mounts: /var/run/secrets/kubernetes.io/serviceaccount from default-token-5kdvl (ro) Conditions: Type Status Initialized True Ready True PodScheduled True Volumes: default-token-4bcbi: Type: Secret (a volume populated by a Secret) SecretName: default-token-4bcbi Optional: false QoS Class: Guaranteed Node-Selectors: <none> Tolerations: <none> Events: FirstSeen LastSeen Count From SubobjectPath Type Reason Message --------- -------- ----- ---- ------------- -------- ------ ------- 54s 54s 1 {default-scheduler } Normal Scheduled Successfully assigned nginx-deployment-1006230814-6winp to kubernetes-node-wul5 54s 54s 1 {kubelet kubernetes-node-wul5} spec.containers{nginx} Normal Pulling pulling image "nginx" 53s 53s 1 {kubelet kubernetes-node-wul5} spec.containers{nginx} Normal Pulled Successfully pulled image "nginx" 53s 53s 1 {kubelet kubernetes-node-wul5} spec.containers{nginx} Normal Created Created container with docker id 90315cc9f513 53s 53s 1 {kubelet kubernetes-node-wul5} spec.containers{nginx} Normal Started Started container with docker id 90315cc9f513
在这里您可以看到有关容器和 Pod 的配置信息(标签,资源需求等),以及有关容器和 Pod 的状态信息(状态,准备情况,重新启动次数,事件等)。
容器状态是 Waiting,Running 或 Terminated 之一。根据状态,可以获得更多信息 – 在这里您可以看到,对于处于运行状态的容器,系统会告诉您何时启动的容器。
Ready 告诉您容器是否通过了最后一次准备就绪探测。(在这种情况下,容器没有配置就绪探针;如果未配置准备就绪探针,则假定容器已准备就绪。)
重启数量会告诉您容器重新启动的次数; 此信息可用于检测重启策略为 ‘always’ 的容器的循环崩溃。
目前,与 Pod 相关的唯一条件是二进制 Ready 状态,这表明该 Pod 可以处理请求,并且应该添加到所有匹配服务的负载均衡池中。
最后,您会看到与您的 Pod 有关的最近事件日志。系统压缩多个相同的事件,只显示第一次和最后一次出现的时间以及出现的次数。”From” 表示记录事件的组件,”SubobjectPath” 告诉您哪个对象(例如容器内的容器)被引用,”Reason” 和 “Message” 告诉您发生了什么。
示例:调试 Pending 状态的 Pod
通过事件排查的一种常见情况是创建了不适合任何节点的 Pod。例如,Pod 可能会请求比任何节点上的空闲资源更多的资源,或者可能会指定一个不匹配任何节点的标签选择器。 假设我们在上面的 Deployment 例子中创建 5 个 replicas(而不是 2 个),并请求 600 millicores 而不是 500 millicores,集群拥有 4 个节点,每个(虚拟)机器有 1 个 CPU。 在这种情况下,其中一个 Pod 将无法调度。(请注意,由于在每个节点上运行了集群附加 pod,例如 fluentd 和 skydns 等,如果我们请求 1000 millicores,则没有任何一个 pod 可以成功调度。)
$ kubectl get pods NAME READY STATUS RESTARTS AGE nginx-deployment-1006230814-6winp 1/1 Running 0 7m nginx-deployment-1006230814-fmgu3 1/1 Running 0 7m nginx-deployment-1370807587-6ekbw 1/1 Running 0 1m nginx-deployment-1370807587-fg172 0/1 Pending 0 1m nginx-deployment-1370807587-fz9sd 0/1 Pending 0 1m
要找出 nginx-deployment-1370807587-fz9sd pod 未运行的原因,我们可以在待处理的 Pod 上使用 kubectl describe pod 并查看其事件:
$ kubectl describe pod nginx-deployment-1370807587-fz9sd Name: nginx-deployment-1370807587-fz9sd Namespace: default Node: / Labels: app=nginx,pod-template-hash=1370807587 Status: Pending IP: Controllers: ReplicaSet/nginx-deployment-1370807587 Containers: nginx: Image: nginx Port: 80/TCP QoS Tier: memory: Guaranteed cpu: Guaranteed Limits: cpu: 1 memory: 128Mi Requests: cpu: 1 memory: 128Mi Environment Variables: Volumes: default-token-4bcbi: Type: Secret (a volume populated by a Secret) SecretName: default-token-4bcbi Events: FirstSeen LastSeen Count From SubobjectPath Type Reason Message --------- -------- ----- ---- ------------- -------- ------ ------- 1m 48s 7 {default-scheduler } Warning FailedScheduling pod (nginx-deployment-1370807587-fz9sd) failed to fit in any node fit failure on node (kubernetes-node-6ta5): Node didn't have enough resource: CPU, requested: 1000, used: 1420, capacity: 2000 fit failure on node (kubernetes-node-wul5): Node didn't have enough resource: CPU, requested: 1000, used: 1100, capacity: 2000
在这里,您可以看到 scheduler 生成的事件,表明由于 FailedScheduling(可能还有其他原因),Pod 无法调度。该消息告诉我们没有任何节点能够满足 Pod 的需求。
要解决这种情况,可以使用 kubectl scale 来更新您的部署以指定 4 个或更少的 replicas。(或者您可以让一个Pod 保持 pending,这是无害的。)
在 etcd 中存储了类似于 kubectl describe pod 结尾处看到的事件,并提供有关集群中正在发生的事情的高级信息。您可以使用如下命令列出所有事件:
kubectl get events
但是您需要记住事件是具有命名空间的。这意味着如果您对某些命名空间对象的事件感兴趣(例如,命名空间 my-namespace 中的 Pod 发生了什么),则需要明确地为命令提供一个命名空间:
kubectl get events --namespace=my-namespace
要查看来自所有命名空间的事件,可以使用 --all-namespaces 参数。
除 kubectl describe pod 之外,另一种获得关于 pod 额外信息的方法(超出了 kubectl get pod 提供的内容)是将 -o yaml 输出格式标志传递给 kubectl get pod。 这会给你 YAML 格式的信息,甚至比 kubectl describe pod 更多的信息 – 基本上是系统拥有的 Pod 的所有信息。 在这里,您将看到类似注解(这是没有标签限制的键值元数据,给 Kubernetes 系统组件内部使用)、重新启动策略、端口和卷。
$ kubectl get pod nginx-deployment-1006230814-6winp -o yaml apiVersion: v1 kind: Pod metadata: annotations: kubernetes.io/created-by: | {"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"ReplicaSet","namespace":"default","name":"nginx-deployment-1006230814","uid":"4c84c175-f161-11e5-9a78-42010af00005","apiVersion":"extensions","resourceVersion":"133434"}} creationTimestamp: 2016-03-24T01:39:50Z generateName: nginx-deployment-1006230814- labels: app: nginx pod-template-hash: "1006230814" name: nginx-deployment-1006230814-6winp namespace: default resourceVersion: "133447" selfLink: /api/v1/namespaces/default/pods/nginx-deployment-1006230814-6winp uid: 4c879808-f161-11e5-9a78-42010af00005 spec: containers: - image: nginx imagePullPolicy: Always name: nginx ports: - containerPort: 80 protocol: TCP resources: limits: cpu: 500m memory: 128Mi requests: cpu: 500m memory: 128Mi terminationMessagePath: /dev/termination-log volumeMounts: - mountPath: /var/run/secrets/kubernetes.io/serviceaccount name: default-token-4bcbi readOnly: true dnsPolicy: ClusterFirst nodeName: kubernetes-node-wul5 restartPolicy: Always securityContext: {} serviceAccount: default serviceAccountName: default terminationGracePeriodSeconds: 30 volumes: - name: default-token-4bcbi secret: secretName: default-token-4bcbi status: conditions: - lastProbeTime: null lastTransitionTime: 2016-03-24T01:39:51Z status: "True" type: Ready containerStatuses: - containerID: docker://90315cc9f513c724e9957a4788d3e625a078de84750f244a40f97ae355eb1149 image: nginx imageID: docker://6f62f48c4e55d700cf3eb1b5e33fa051802986b77b874cc351cce539e5163707 lastState: {} name: nginx ready: true restartCount: 0 state: running: startedAt: 2016-03-24T01:39:51Z hostIP: 10.240.0.9 phase: Running podIP: 10.244.0.6 startTime: 2016-03-24T01:39:49Z
示例:调试一个关闭(或者无法到达)的节点
有时,在调试时,查看节点的状态可能很有用 – 例如,您已经注意到节点上运行的 Pod 的奇怪行为,或想查明 Pod 不调度到节点上的原因。与 Pod 一样,可以使用 kubectl describe node 和 kubectl get node -o yaml 来检索有关节点的详细信息。例如,如果某个节点关闭(从网络断开连接,或 kubelet 死亡并不会重新启动等),您将看到以下内容。 注意显示节点为 NotReady 的事件,并且还注意到 Pod 不再运行(它们在 NotReady 状态五分钟后被驱逐)。
$ kubectl get nodes NAME STATUS AGE VERSION kubernetes-node-861h NotReady 1h v1.6.0+fff5156 kubernetes-node-bols Ready 1h v1.6.0+fff5156 kubernetes-node-st6x Ready 1h v1.6.0+fff5156 kubernetes-node-unaj Ready 1h v1.6.0+fff5156 $ kubectl describe node kubernetes-node-861h Name: kubernetes-node-861h Role Labels: beta.kubernetes.io/arch=amd64 beta.kubernetes.io/os=linux kubernetes.io/hostname=kubernetes-node-861h Annotations: node.alpha.kubernetes.io/ttl=0 volumes.kubernetes.io/controller-managed-attach-detach=true Taints: <none> CreationTimestamp: Mon, 04 Sep 2017 17:13:23 +0800 Phase: Conditions: Type Status LastHeartbeatTime LastTransitionTime Reason Message ---- ------ ----------------- ------------------ ------ ------- OutOfDisk Unknown Fri, 08 Sep 2017 16:04:28 +0800 Fri, 08 Sep 2017 16:20:58 +0800 NodeStatusUnknown Kubelet stopped posting node status. MemoryPressure Unknown Fri, 08 Sep 2017 16:04:28 +0800 Fri, 08 Sep 2017 16:20:58 +0800 NodeStatusUnknown Kubelet stopped posting node status. DiskPressure Unknown Fri, 08 Sep 2017 16:04:28 +0800 Fri, 08 Sep 2017 16:20:58 +0800 NodeStatusUnknown Kubelet stopped posting node status. Ready Unknown Fri, 08 Sep 2017 16:04:28 +0800 Fri, 08 Sep 2017 16:20:58 +0800 NodeStatusUnknown Kubelet stopped posting node status. Addresses: 10.240.115.55,104.197.0.26 Capacity: cpu: 2 hugePages: 0 memory: 4046788Ki pods: 110 Allocatable: cpu: 1500m hugePages: 0 memory: 1479263Ki pods: 110 System Info: Machine ID: 8e025a21a4254e11b028584d9d8b12c4 System UUID: 349075D1-D169-4F25-9F2A-E886850C47E3 Boot ID: 5cd18b37-c5bd-4658-94e0-e436d3f110e0 Kernel Version: 4.4.0-31-generic OS Image: Debian GNU/Linux 8 (jessie) Operating System: linux Architecture: amd64 Container Runtime Version: docker://1.12.5 Kubelet Version: v1.6.9+a3d1dfa6f4335 Kube-Proxy Version: v1.6.9+a3d1dfa6f4335 ExternalID: 15233045891481496305 Non-terminated Pods: (9 in total) Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits --------- ---- ------------ ---------- --------------- ------------- ...... Allocated resources: (Total limits may be over 100 percent, i.e., overcommitted.) CPU Requests CPU Limits Memory Requests Memory Limits ------------ ---------- --------------- ------------- 900m (60%) 2200m (146%) 1009286400 (66%) 5681286400 (375%) Events: <none> $ kubectl get node kubernetes-node-861h -o yaml apiVersion: v1 kind: Node metadata: creationTimestamp: 2015-07-10T21:32:29Z labels: kubernetes.io/hostname: kubernetes-node-861h name: kubernetes-node-861h resourceVersion: "757" selfLink: /api/v1/nodes/kubernetes-node-861h uid: 2a69374e-274b-11e5-a234-42010af0d969 spec: externalID: "15233045891481496305" podCIDR: 10.244.0.0/24 providerID: gce://striped-torus-760/us-central1-b/kubernetes-node-861h status: addresses: - address: 10.240.115.55 type: InternalIP - address: 104.197.0.26 type: ExternalIP capacity: cpu: "1" memory: 3800808Ki pods: "100" conditions: - lastHeartbeatTime: 2015-07-10T21:34:32Z lastTransitionTime: 2015-07-10T21:35:15Z reason: Kubelet stopped posting node status. status: Unknown type: Ready nodeInfo: bootID: 4e316776-b40d-4f78-a4ea-ab0d73390897 containerRuntimeVersion: docker://Unknown kernelVersion: 3.16.0-0.bpo.4-amd64 kubeProxyVersion: v0.21.1-185-gffc5a86098dc01 kubeletVersion: v0.21.1-185-gffc5a86098dc01 machineID: "" osImage: Debian GNU/Linux 7 (wheezy) systemUUID: ABE5F6B4-D44B-108B-C46A-24CCE16C8B6E
译者:tianshapjq / 原文链接