Kubernetes v1.10安装

#DevOps #Kubernetes

DevOps

最近打算把OneNote、有道上的一些笔记，整理下发布到博客上来。首先从搞过一段时间的DevOps工具开始。作为目前最火的项目之一，本人安排了详细的学习计划，但是平时事情太多，计划只能暂缓。

以下是在CentOs 7中通过kubeadm搭建Kubernetes v1.10的过程。前期曾通过yum install kubernetes在centos 7上直接部署了v1.5，启动都正常，但是在分发应用时卡壳了，查阅资料才知道是DNS问题，1.5版本好像是SkyDNS，部署有点麻烦，可查的资料也不多，后转向了V1.10版。

安装前的准备

环境说明：三台CentOs 7的AP，IP分别是10.0.0.8，10.0.0.9，10.0.0.10

1.配置好各节点hosts文件

1
2
3

10.0.0.8  k8smaster
10.0.0.9  k8snode1
10.0.0.10  k8snode2

2.关闭各节点系统防火墙

1 2	systemctl stop firewalld systemctl disable firewalld

3.关闭各节点SElinux

1
2
3

vim /etc/selinux/config

SELINUX=disabled

4.关闭各节点swap

1	swapoff -a

创建/etc/sysctl.d/k8s.conf文件

cat << EOF > /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
vm.swappiness=0
EOF

sysctl -p /etc/sysctl.d/k8s.conf

如果遇到报错：

1 2	sysctl: cannot stat /proc/sys/net/bridge/bridge-nf-call-ip6tables: No such file or directory sysctl: cannot stat /proc/sys/net/bridge/bridge-nf-call-iptables: No such file or directory

需执行modprobe br_netfilter同时加入rc.local自启动中：

1 2	modprobe br_netfilter echo "modprobe br_netfilter" >> /etc/rc.local

安装kubeadm

首先配置各节点阿里K8S YUM源

cat <<EOF > /etc/yum.repos.d/kubernetes.repo
 
[kubernetes]
name=Kubernetes
baseurl=https://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64
enabled=1
gpgcheck=0
 
EOF


yum -y install epel-release
 
yum clean all
 
yum makecache

2.在各节点安装kubeadm和相关工具包

1	yum -y install docker kubelet kubeadm kubectl kubernetes-cni

3.启动Docker与kubelet服务

1
2
3

systemctl enable docker && systemctl start docker
 
systemctl enable kubelet && systemctl start kubelet

查看docker要与kubelet.service的cgroup要一致：

1 2	docker info \| grep -i cgroup cat /etc/systemd/system/kubelet.service.d/10-kubeadm.conf

其中docker.service在以下路径：

1	/usr/lib/systemd/system/docker.service

改为以下方式--exec-opt native.cgroupdriver=cgroupfs
执行：

1	sed -i "s/cgroup-driver=systemd/cgroup-driver=cgroupfs/g" /etc/systemd/system/kubelet.service.d/10-kubeadm.conf

1	systemctl daemon-reload

4.下载K8S相关镜像（Master节点操作）
因为无法直接访问gcr.io下载镜像，所以需要配置一个国内的容器镜像加速器

配置一个阿里云的加速器：

在页面中找到并点击镜像加速按钮，即可看到属于自己的专属加速链接，选择Centos版本后即可看到配置方法。

提示：在阿里云上使用 Docker 并配置阿里云镜像加速器，可能会遇到 daemon.json 导致 docker daemon 无法启动的问题，可以通过以下方法解决。

vim /etc/sysconfig/docker 
 
然后 
 
OPTIONS='--selinux-enabled --log-driver=journald --registry-mirror=http://xxxx.mirror.aliyuncs.com' 
registry-mirror 输入你的镜像地址 
 
最后 service docker restart 重启 daemon 
 
然后 ps aux | grep docker 然后你就会发现带有镜像的启动参数了。

5.下载K8S相关镜像

OK，解决完加速器的问题之后，开始下载k8s相关镜像，下载后将镜像名改为k8s.gcr.io/开头的名字，以便kubeadm识别使用。

#!/bin/bash
images=(kube-proxy-amd64:v1.10.0 kube-scheduler-amd64:v1.10.0 kube-controller-manager-amd64:v1.10.0 kube-apiserver-amd64:v1.10.0
etcd-amd64:3.1.12 pause-amd64:3.1 kubernetes-dashboard-amd64:v1.8.3 k8s-dns-sidecar-amd64:1.14.8 k8s-dns-kube-dns-amd64:1.14.8
k8s-dns-dnsmasq-nanny-amd64:1.14.8)
for imageName in ${images[@]} ; do
  docker pull keveon/$imageName
  docker tag keveon/$imageName k8s.gcr.io/$imageName
  docker rmi keveon/$imageName
done

上面的shell脚本主要做了3件事，下载各种需要用到的容器镜像、重新打标记为符合k8s命令规范的版本名称、清除旧的容器镜像。

提示：镜像版本一定要和kubeadm安装的版本一致，否则会出现time out问题。

6.初始化安装K8S Master

执行上述shell脚本，等待下载完成后，执行kubeadm init

1	kubeadm init --kubernetes-version=v1.10.0 --pod-network-cidr=10.244.0.0/16

注意记住指令：

1	kubeadm join 10.0.0.8:6443 --token a6vlug.shwfx89vqrofvro7 --discovery-token-ca-cert-hash sha256:f7b5dc65173098b1ae74b06fe5062124528ff873c53ed38761601598ddbf1a58

7.配置kubectl认证信息（Master节点操作）

# 对于非root用户
mkdir -p $HOME/.kube
 
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
 
sudo chown $(id -u):$(id -g) $HOME/.kube/config
 
# 对于root用户
export KUBECONFIG=/etc/kubernetes/admin.conf
 
也可以直接放到~/.bash_profile
 
echo "export KUBECONFIG=/etc/kubernetes/admin.conf" >> ~/.bash_profile

8.安装flannel网络（Master节点操作）

1	mkdir -p /etc/cni/net.d/

cat <<EOF> /etc/cni/net.d/10-flannel.conf
{
“name”: “cbr0”,
“type”: “flannel”,
“delegate”: {
“isDefaultGateway”: true
}
}
 
EOF

mkdir /usr/share/oci-umount/oci-umount.d -p

mkdir /run/flannel/

cat <<EOF> /run/flannel/subnet.env
FLANNEL_NETWORK=10.244.0.0/16
FLANNEL_SUBNET=10.244.1.0/24
FLANNEL_MTU=1450
FLANNEL_IPMASQ=true
 
EOF

1	kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/v0.9.1/Documentation/kube-flannel.yml

9.验证K8S Master是否搭建成功（Master节点操作）

1 2	# 查看pods状态 kubectl get pods --all-namespaces

注意要等到所有的pod都是running状态再加入其他节点，否则kube-dns、kubernetes-dashboard等可能会被分配到node节点上导致kube-dns、kubernetes-dashboard等安装失败

10.让node1、node2加入集群

[root@k8snode2 ~]# kubeadm join 10.0.0.8:6443 --token a6vlug.shwfx89vqrofvro7 --discovery-token-ca-cert-hash sha256:f7b5dc65173098b1ae74b06fe5062124528ff873c53ed38761601598ddbf1a58
[preflight] Running pre-flight checks.
	[WARNING FileExisting-crictl]: crictl not found in system path
Suggestion: go get github.com/kubernetes-incubator/cri-tools/cmd/crictl
[preflight] Starting the kubelet service
[discovery] Trying to connect to API Server "10.0.0.8:6443"
[discovery] Created cluster-info discovery client, requesting info from "https://10.0.0.8:6443"
[discovery] Requesting info from "https://10.0.0.8:6443" again to validate TLS against the pinned public key
[discovery] Cluster info signature and contents are valid and TLS certificate validates against pinned roots, will use API Server "10.0.0.8:6443"
[discovery] Successfully established connection with API Server "10.0.0.8:6443"

This node has joined the cluster:
* Certificate signing request was sent to master and a response
  was received.
* The Kubelet was informed of the new secure connection details.

Run 'kubectl get nodes' on the master to see this node join the cluster.

注意此段执行命令第6步中记录下来的

默认情况下，Master节点不参与工作负载，但如果希望安装出一个All-In-One的k8s环境，则可以执行以下命令，让Master节点也成为一个Node节点：

1	kubectl taint nodes --all node-role.kubernetes.io/master-

安装插件

部署dashboard（注意，dashboard需要部署在master节点上，否则会报错）

在k8s中 dashboard可以有两种访问方式：kubeconfig（HTTPS）和token（http）本篇先来介绍下Token方式的访问。

Token访问是无登录密码的，简单方便
1.下载官方的dashboard YAML文件或者改版的YAML（无坑版）

# 官网版
https://raw.githubusercontent.com/kubernetes/dashboard/master/src/deploy/recommended/kubernetes-dashboard.yaml

# 修改版(包括heapster插件YAML和RBAC YAML)
https://github.com/gh-Devin/kubernetes-dashboard

其中heapster.yaml的img改为registry.cn-hangzhou.aliyuncs.com/google_containers/heapster:v1.5.2(通过阿里云搜索到)

2.创建pod

[root@k8smaster kubernetes-dashboard]# ls
heapster-rbac.yaml  heapster.yaml  kubernetes-dashboard-admin.rbac.yaml  kubernetes-dashboard.yaml
[root@k8smaster kubernetes-dashboard]# kubectl  -n kube-system create -f .
clusterrolebinding.rbac.authorization.k8s.io "heapster" created
serviceaccount "heapster" created
deployment.extensions "heapster" created
service "heapster" created
serviceaccount "kubernetes-dashboard-admin" created
clusterrolebinding.rbac.authorization.k8s.io "kubernetes-dashboard-admin" created
secret "kubernetes-dashboard-certs" created
serviceaccount "kubernetes-dashboard" created
role.rbac.authorization.k8s.io "kubernetes-dashboard-minimal" created
rolebinding.rbac.authorization.k8s.io "kubernetes-dashboard-minimal" created
deployment.apps "kubernetes-dashboard" created
service "kubernetes-dashboard-external" created

3.查看插件的状态

1	kubectl get svc,pod --all-namespaces

4.遇到问题查看日志

kubectl describe pod heapster-6595c54cb9-chmfd --namespace=kube-system 


kubectl logs pod/heapster-6595c54cb9-chmfd -n kube-system

5.安装出错，删除该pod后重新部署

1	kubectl -n kube-system delete -f .

问题及解决

1、报错：

1	Error from server: error dialing backend: dial tcp 192.168.0.107:10250: getsockopt: no route to host

需要配置各节点上的防火墙。其中8472是flannel使用，9898和6443是minio访问master使用，centos必须配置，否则iptables -L -vn|more会看到INPUT的reject-with icmp-host-prohibited计数一直在增加。 10250是kubectl exec使用的，不加会报“Error from server: error dialing backend: dial tcp 192.168.128.164:10250: getsockopt: no route to host”。

iptables -I INPUT -p tcp -m tcp --dport 8472 -j ACCEPT
iptables -I INPUT -p tcp -m tcp --dport 6443 -j ACCEPT
iptables -I INPUT -p tcp -m tcp --dport 9898 -j ACCEPT
iptables -I INPUT -p tcp -m tcp --dport 10250 -j ACCEPT

2、报错：

[controlplane] Wrote Static Pod manifest for component kube-apiserver to "/etc/kubernetes/manifests/kube-apiserver.yaml"
[controlplane] Wrote Static Pod manifest for component kube-controller-manager to "/etc/kubernetes/manifests/kube-controller-manager.yaml"
[controlplane] Wrote Static Pod manifest for component kube-scheduler to "/etc/kubernetes/manifests/kube-scheduler.yaml"
[etcd] Wrote Static Pod manifest for a local etcd instance to "/etc/kubernetes/manifests/etcd.yaml"
[init] Waiting for the kubelet to boot up the control plane as Static Pods from directory "/etc/kubernetes/manifests".
[init] This might take a minute or longer if the control plane images have to be pulled.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz' failed with error: Get http://localhost:10255/healthz: dial tcp [::1]:10255: getsockopt: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz' failed with error: Get http://localhost:10255/healthz: dial tcp [::1]:10255: getsockopt: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz' failed with error: Get http://localhost:10255/healthz: dial tcp [::1]:10255: getsockopt: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz/syncloop' failed with error: Get http://localhost:10255/healthz/syncloop: dial tcp [::1]:10255: getsockopt: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz/syncloop' failed with error: Get http://localhost:10255/healthz/syncloop: dial tcp [::1]:10255: getsockopt: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz/syncloop' failed with error: Get http://localhost:10255/healthz/syncloop: dial tcp [::1]:10255: getsockopt: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz' failed with error: Get http://localhost:10255/healthz: dial tcp [::1]:10255: getsockopt: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz/syncloop' failed with error: Get http://localhost:10255/healthz/syncloop: dial tcp [::1]:10255: getsockopt: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz' failed with error: Get http://localhost:10255/healthz: dial tcp [::1]:10255: getsockopt: connection refused.

Unfortunately, an error has occurred:
	timed out waiting for the condition

This error is likely caused by:
	- The kubelet is not running
	- The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)
	- Either there is no internet connection, or imagePullPolicy is set to "Never",
	  so the kubelet cannot pull or find the following control plane images:
		- k8s.gcr.io/kube-apiserver-amd64:v1.10.0
		- k8s.gcr.io/kube-controller-manager-amd64:v1.10.0
		- k8s.gcr.io/kube-scheduler-amd64:v1.10.0
		- k8s.gcr.io/etcd-amd64:3.1.12 (only if no external etcd endpoints are configured)

If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:
	- 'systemctl status kubelet'
	- 'journalctl -xeu kubelet'

docker.service 和10-kubeadm.conf中的cgroup不一致导致的

3、安装dashboard报错:

[root@k8smaster kubernetes-dashboard]# kubectl get svc,pod --all-namespaces | grep dashboard

kube-system   service/kubernetes-dashboard-external   NodePort    10.106.103.199   <none>        9090:30090/TCP   52s
kube-system   pod/kubernetes-dashboard-5cc6564db9-tp5ws   0/1       CrashLoopBackOff   2          52s

查看日志：

[root@k8smaster kubernetes-dashboard]# kubectl logs pod/kubernetes-dashboard-5cc6564db9-tp5ws -n kube-system 
2018/04/30 18:23:18 Using in-cluster config to connect to apiserver
2018/04/30 18:23:18 Using service account token for csrf signing
2018/04/30 18:23:18 No request provided. Skipping authorization
2018/04/30 18:23:18 Starting overwatch
2018/04/30 18:23:19 Error while initializing connection to Kubernetes apiserver. This most likely means that the cluster is misconfigured (e.g., it has invalid apiserver certificates or service accounts configuration) or the --apiserver-host param points to a server that does not exist. Reason: Get https://10.96.0.1:443/version: dial tcp 10.96.0.1:443: getsockopt: no route to host
Refer to our FAQ and wiki pages for more information: https://github.com/kubernetes/dashboard/wiki/FAQ

或者

1	kubectl describe pod kubernetes-dashboard-5cc6564db9-bjmgr --namespace=kube-system

删除该pod：

1	kubectl delete pod kubernetes-dashboard-5cc6564db9-tp5ws -n kube-system

安装过程中遇到问题，可以使用一下命令清理环境，重新安装

1	kubeadm reset

4、安装完成后pod ip无法访问

极有可能是防火墙问题，需要对防火墙进行下清理

解决方法：

1.先把服务停掉

1	[root@k8smaster ~]# systemctl stop docker kubelet

2.查看当前规则，然后清空

[root@k8smaster ~]#  iptables -L -n

[root@k8smaster ~]#  iptables -F &&  iptables -X &&  iptables -F -t nat &&  iptables -X -t nat

[root@k8smaster ~]#  iptables -L -n

Chain INPUT (policy ACCEPT)
target     prot opt source               destination         

Chain FORWARD (policy ACCEPT)
target     prot opt source               destination         

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination      

[root@hadoop38 ~]#

3.重启iptables 和开启服务

1
2
3

[root@k8smaster ~]#  systemctl restart iptables

[root@k8smaster ~]#  systemctl start docker kubelet

4.再次查看防火墙策略和清空掉

1
2
3

[root@k8smaster ~]#  iptables -L -n

[root@k8smaster ~]#  iptables -F &&  iptables -X &&  iptables -F -t nat &&  iptables -X -t nat

5.等待一会最终查看防火墙策略

[root@k8smaster ~]#  iptables -L -n

Chain INPUT (policy ACCEPT)
target     prot opt source               destination         

KUBE-FIREWALL  all  --  0.0.0.0/0            0.0.0.0/0           

Chain FORWARD (policy ACCEPT)
target     prot opt source               destination         

Chain OUTPUT (policy ACCEPT)

target     prot opt source               destination         

KUBE-FIREWALL  all  --  0.0.0.0/0            0.0.0.0/0    

KUBE-SERVICES  all  --  0.0.0.0/0            0.0.0.0/0            /* kubernetes service portals */


Chain KUBE-FIREWALL (2 references)

target     prot opt source               destination         

DROP       all  --  0.0.0.0/0            0.0.0.0/0            /* kubernetes firewall for dropping marked packets */ mark match 0x8000/0x8000


Chain KUBE-SERVICES (1 references)
target     prot opt source               destination         

[root@k8smaster ~]#          destination

6.验证
部署nginx：

1	kubectl run nginx --image=nginx --replicas=2 --port=80

暴露端口：

1	kubectl expose deployment/nginx --type="NodePort" --port 80

查看应用状态：

[root@k8smaster ~]# kubectl describe svc
Name:              kubernetes
Namespace:         default
Labels:            component=apiserver
                   provider=kubernetes
Annotations:       <none>
Selector:          <none>
Type:              ClusterIP
IP:                10.96.0.1
Port:              https  443/TCP
TargetPort:        6443/TCP
Endpoints:         10.0.0.8:6443
Session Affinity:  ClientIP
Events:            <none>

Name:                     nginx
Namespace:                default
Labels:                   run=nginx
Annotations:              <none>
Selector:                 run=nginx
Type:                     NodePort
IP:                       10.98.84.61
Port:                     <unset>  80/TCP
TargetPort:               80/TCP
NodePort:                 <unset>  30019/TCP
Endpoints:                10.244.1.60:80,10.244.1.61:80
Session Affinity:         None
External Traffic Policy:  Cluster
Events:                   <none>

[root@k8smaster ~]# ping 10.244.1.60
PING 10.244.1.60 (10.244.1.60) 56(84) bytes of data.
64 bytes from 10.244.1.60: icmp_seq=1 ttl=63 time=0.941 ms
64 bytes from 10.244.1.60: icmp_seq=2 ttl=63 time=1.01 ms
^C
--- 10.244.1.60 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1003ms
rtt min/avg/max/mdev = 0.941/0.979/1.017/0.038 ms
[root@k8smaster ~]# ping 10.244.1.61
PING 10.244.1.61 (10.244.1.61) 56(84) bytes of data.
64 bytes from 10.244.1.61: icmp_seq=1 ttl=63 time=0.781 ms
64 bytes from 10.244.1.61: icmp_seq=2 ttl=63 time=0.574 ms
^C
--- 10.244.1.61 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1000ms
rtt min/avg/max/mdev = 0.574/0.677/0.781/0.106 ms
[root@k8smaster ~]# ping 10.244.1.61
PING 10.244.1.61 (10.244.1.61) 56(84) bytes of data.
64 bytes from 10.244.1.61: icmp_seq=1 ttl=63 time=0.630 ms
64 bytes from 10.244.1.61: icmp_seq=2 ttl=63 time=0.662 ms
^C

5、报错：

1	Failed to get system container stats ``for "/system.slice/

解决：

在/etc/systemd/system/kubelet.service中加入：

1	ExecStart=/usr/bin/kubelet --runtime-cgroups=/systemd/system.slice --kubelet-cgroups=/systemd/system.slice

参考：

k8s官方文档

kubernetes-dashboard Github地址

51CTO 使用kubeadm安装Kubernetes v1.10以及常见问题解答

51CTO Kubernetes1.10中部署dashboard以及常见问题解析