Kubernetes v1.10安装

2018-07-11
DevOps

最近打算把OneNote、有道上的一些笔记,整理下发布到博客上来。首先从搞过一段时间的DevOps工具开始。作为目前最火的项目之一,本人安排了详细的学习计划,但是平时事情太多,计划只能暂缓。

以下是在CentOs 7中通过kubeadm搭建Kubernetes v1.10的过程。前期曾通过yum install kubernetes在centos 7上直接部署了v1.5,启动都正常,但是在分发应用时卡壳了,查阅资料才知道是DNS问题,1.5版本好像是SkyDNS,部署有点麻烦,可查的资料也不多,后转向了V1.10版。

安装前的准备

环境说明:三台CentOs 7的AP,IP分别是10.0.0.8,10.0.0.9,10.0.0.10

1.配置好各节点hosts文件

1
2
3
10.0.0.8  k8smaster
10.0.0.9 k8snode1
10.0.0.10 k8snode2

2.关闭各节点系统防火墙

1
2
systemctl stop firewalld
systemctl disable firewalld

3.关闭各节点SElinux

1
2
3
vim /etc/selinux/config

SELINUX=disabled

4.关闭各节点swap

1
swapoff -a

  1. 创建/etc/sysctl.d/k8s.conf文件
    1
    2
    3
    4
    5
    6
    7
    cat << EOF > /etc/sysctl.d/k8s.conf
    net.bridge.bridge-nf-call-ip6tables = 1
    net.bridge.bridge-nf-call-iptables = 1
    vm.swappiness=0
    EOF

    sysctl -p /etc/sysctl.d/k8s.conf

如果遇到报错:

1
2
sysctl: cannot stat /proc/sys/net/bridge/bridge-nf-call-ip6tables: No such file or directory
sysctl: cannot stat /proc/sys/net/bridge/bridge-nf-call-iptables: No such file or directory

需执行modprobe br_netfilter同时加入rc.local自启动中:

1
2
modprobe br_netfilter
echo "modprobe br_netfilter" >> /etc/rc.local

安装kubeadm

  1. 首先配置各节点阿里K8S YUM源
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    cat <<EOF > /etc/yum.repos.d/kubernetes.repo

    [kubernetes]
    name=Kubernetes
    baseurl=https://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64
    enabled=1
    gpgcheck=0

    EOF


    yum -y install epel-release

    yum clean all

    yum makecache

2.在各节点安装kubeadm和相关工具包

1
yum -y install docker kubelet kubeadm kubectl kubernetes-cni

3.启动Docker与kubelet服务

1
2
3
systemctl enable docker && systemctl start docker

systemctl enable kubelet && systemctl start kubelet

查看docker要与kubelet.service的cgroup要一致:

1
2
docker info | grep -i cgroup
cat /etc/systemd/system/kubelet.service.d/10-kubeadm.conf

其中docker.service在以下路径:

1
/usr/lib/systemd/system/docker.service

改为以下方式--exec-opt native.cgroupdriver=cgroupfs
执行:

1
sed -i "s/cgroup-driver=systemd/cgroup-driver=cgroupfs/g" /etc/systemd/system/kubelet.service.d/10-kubeadm.conf

1
systemctl daemon-reload

4.下载K8S相关镜像(Master节点操作)
因为无法直接访问gcr.io下载镜像,所以需要配置一个国内的容器镜像加速器

配置一个阿里云的加速器:

登录 https://cr.console.aliyun.com/

在页面中找到并点击镜像加速按钮,即可看到属于自己的专属加速链接,选择Centos版本后即可看到配置方法。

提示:在阿里云上使用 Docker 并配置阿里云镜像加速器,可能会遇到 daemon.json 导致 docker daemon 无法启动的问题,可以通过以下方法解决。

1
2
3
4
5
6
7
8
9
10
vim /etc/sysconfig/docker 

然后

OPTIONS='--selinux-enabled --log-driver=journald --registry-mirror=http://xxxx.mirror.aliyuncs.com'
registry-mirror 输入你的镜像地址

最后 service docker restart 重启 daemon

然后 ps aux | grep docker 然后你就会发现带有镜像的启动参数了。

5.下载K8S相关镜像

OK,解决完加速器的问题之后,开始下载k8s相关镜像,下载后将镜像名改为k8s.gcr.io/开头的名字,以便kubeadm识别使用。

1
2
3
4
5
6
7
8
9
#!/bin/bash
images=(kube-proxy-amd64:v1.10.0 kube-scheduler-amd64:v1.10.0 kube-controller-manager-amd64:v1.10.0 kube-apiserver-amd64:v1.10.0
etcd-amd64:3.1.12 pause-amd64:3.1 kubernetes-dashboard-amd64:v1.8.3 k8s-dns-sidecar-amd64:1.14.8 k8s-dns-kube-dns-amd64:1.14.8
k8s-dns-dnsmasq-nanny-amd64:1.14.8)
for imageName in ${images[@]} ; do
docker pull keveon/$imageName
docker tag keveon/$imageName k8s.gcr.io/$imageName
docker rmi keveon/$imageName
done

上面的shell脚本主要做了3件事,下载各种需要用到的容器镜像、重新打标记为符合k8s命令规范的版本名称、清除旧的容器镜像。

提示:镜像版本一定要和kubeadm安装的版本一致,否则会出现time out问题。

6.初始化安装K8S Master

执行上述shell脚本,等待下载完成后,执行kubeadm init

1
kubeadm init --kubernetes-version=v1.10.0 --pod-network-cidr=10.244.0.0/16

注意记住指令:

1
kubeadm join 10.0.0.8:6443 --token a6vlug.shwfx89vqrofvro7 --discovery-token-ca-cert-hash sha256:f7b5dc65173098b1ae74b06fe5062124528ff873c53ed38761601598ddbf1a58

7.配置kubectl认证信息(Master节点操作)

1
2
3
4
5
6
7
8
9
10
11
12
13
# 对于非root用户
mkdir -p $HOME/.kube

sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config

sudo chown $(id -u):$(id -g) $HOME/.kube/config

# 对于root用户
export KUBECONFIG=/etc/kubernetes/admin.conf

也可以直接放到~/.bash_profile

echo "export KUBECONFIG=/etc/kubernetes/admin.conf" >> ~/.bash_profile

8.安装flannel网络(Master节点操作)

1
mkdir -p /etc/cni/net.d/

1
2
3
4
5
6
7
8
9
10
cat <<EOF> /etc/cni/net.d/10-flannel.conf
{
“name”: “cbr0”,
“type”: “flannel”,
“delegate”: {
“isDefaultGateway”: true
}
}

EOF

mkdir /usr/share/oci-umount/oci-umount.d -p

mkdir /run/flannel/

1
2
3
4
5
6
7
cat <<EOF> /run/flannel/subnet.env
FLANNEL_NETWORK=10.244.0.0/16
FLANNEL_SUBNET=10.244.1.0/24
FLANNEL_MTU=1450
FLANNEL_IPMASQ=true

EOF
1
kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/v0.9.1/Documentation/kube-flannel.yml

9.验证K8S Master是否搭建成功(Master节点操作)

1
2
# 查看pods状态
kubectl get pods --all-namespaces

注意要等到所有的pod都是running状态再加入其他节点,否则kube-dnskubernetes-dashboard等可能会被分配到node节点上导致kube-dnskubernetes-dashboard等安装失败

10.让node1、node2加入集群

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
[root@k8snode2 ~]# kubeadm join 10.0.0.8:6443 --token a6vlug.shwfx89vqrofvro7 --discovery-token-ca-cert-hash sha256:f7b5dc65173098b1ae74b06fe5062124528ff873c53ed38761601598ddbf1a58
[preflight] Running pre-flight checks.
[WARNING FileExisting-crictl]: crictl not found in system path
Suggestion: go get github.com/kubernetes-incubator/cri-tools/cmd/crictl
[preflight] Starting the kubelet service
[discovery] Trying to connect to API Server "10.0.0.8:6443"
[discovery] Created cluster-info discovery client, requesting info from "https://10.0.0.8:6443"
[discovery] Requesting info from "https://10.0.0.8:6443" again to validate TLS against the pinned public key
[discovery] Cluster info signature and contents are valid and TLS certificate validates against pinned roots, will use API Server "10.0.0.8:6443"
[discovery] Successfully established connection with API Server "10.0.0.8:6443"

This node has joined the cluster:
* Certificate signing request was sent to master and a response
was received.
* The Kubelet was informed of the new secure connection details.

Run 'kubectl get nodes' on the master to see this node join the cluster.

注意此段执行命令第6步中记录下来的

默认情况下,Master节点不参与工作负载,但如果希望安装出一个All-In-One的k8s环境,则可以执行以下命令,让Master节点也成为一个Node节点:

1
kubectl taint nodes --all node-role.kubernetes.io/master-

安装插件

部署dashboard(注意,dashboard需要部署在master节点上,否则会报错)

在k8s中 dashboard可以有两种访问方式:kubeconfig(HTTPS)和token(http)本篇先来介绍下Token方式的访问。

Token访问是无登录密码的,简单方便
1.下载官方的dashboard YAML文件或者改版的YAML(无坑版)

1
2
3
4
5
# 官网版
https://raw.githubusercontent.com/kubernetes/dashboard/master/src/deploy/recommended/kubernetes-dashboard.yaml

# 修改版(包括heapster插件YAML和RBAC YAML)
https://github.com/gh-Devin/kubernetes-dashboard

其中heapster.yaml的img改为registry.cn-hangzhou.aliyuncs.com/google_containers/heapster:v1.5.2(通过阿里云搜索到)

2.创建pod

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
[root@k8smaster kubernetes-dashboard]# ls
heapster-rbac.yaml heapster.yaml kubernetes-dashboard-admin.rbac.yaml kubernetes-dashboard.yaml
[root@k8smaster kubernetes-dashboard]# kubectl -n kube-system create -f .
clusterrolebinding.rbac.authorization.k8s.io "heapster" created
serviceaccount "heapster" created
deployment.extensions "heapster" created
service "heapster" created
serviceaccount "kubernetes-dashboard-admin" created
clusterrolebinding.rbac.authorization.k8s.io "kubernetes-dashboard-admin" created
secret "kubernetes-dashboard-certs" created
serviceaccount "kubernetes-dashboard" created
role.rbac.authorization.k8s.io "kubernetes-dashboard-minimal" created
rolebinding.rbac.authorization.k8s.io "kubernetes-dashboard-minimal" created
deployment.apps "kubernetes-dashboard" created
service "kubernetes-dashboard-external" created

3.查看插件的状态

1
kubectl get svc,pod --all-namespaces

4.遇到问题查看日志

1
2
3
4
kubectl describe pod heapster-6595c54cb9-chmfd --namespace=kube-system 


kubectl logs pod/heapster-6595c54cb9-chmfd -n kube-system

5.安装出错,删除该pod后重新部署

1
kubectl  -n kube-system delete -f .

问题及解决

1、 报错:

1
Error from server: error dialing backend: dial tcp 192.168.0.107:10250: getsockopt: no route to host

需要配置各节点上的防火墙。其中8472是flannel使用,9898和6443是minio访问master使用,centos必须配置,否则iptables -L -vn|more会看到INPUT的reject-with icmp-host-prohibited计数一直在增加。 10250是kubectl exec使用的,不加会报“Error from server: error dialing backend: dial tcp 192.168.128.164:10250: getsockopt: no route to host”。

1
2
3
4
iptables -I INPUT -p tcp -m tcp --dport 8472 -j ACCEPT
iptables -I INPUT -p tcp -m tcp --dport 6443 -j ACCEPT
iptables -I INPUT -p tcp -m tcp --dport 9898 -j ACCEPT
iptables -I INPUT -p tcp -m tcp --dport 10250 -j ACCEPT

2、 报错:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
[controlplane] Wrote Static Pod manifest for component kube-apiserver to "/etc/kubernetes/manifests/kube-apiserver.yaml"
[controlplane] Wrote Static Pod manifest for component kube-controller-manager to "/etc/kubernetes/manifests/kube-controller-manager.yaml"
[controlplane] Wrote Static Pod manifest for component kube-scheduler to "/etc/kubernetes/manifests/kube-scheduler.yaml"
[etcd] Wrote Static Pod manifest for a local etcd instance to "/etc/kubernetes/manifests/etcd.yaml"
[init] Waiting for the kubelet to boot up the control plane as Static Pods from directory "/etc/kubernetes/manifests".
[init] This might take a minute or longer if the control plane images have to be pulled.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz' failed with error: Get http://localhost:10255/healthz: dial tcp [::1]:10255: getsockopt: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz' failed with error: Get http://localhost:10255/healthz: dial tcp [::1]:10255: getsockopt: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz' failed with error: Get http://localhost:10255/healthz: dial tcp [::1]:10255: getsockopt: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz/syncloop' failed with error: Get http://localhost:10255/healthz/syncloop: dial tcp [::1]:10255: getsockopt: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz/syncloop' failed with error: Get http://localhost:10255/healthz/syncloop: dial tcp [::1]:10255: getsockopt: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz/syncloop' failed with error: Get http://localhost:10255/healthz/syncloop: dial tcp [::1]:10255: getsockopt: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz' failed with error: Get http://localhost:10255/healthz: dial tcp [::1]:10255: getsockopt: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz/syncloop' failed with error: Get http://localhost:10255/healthz/syncloop: dial tcp [::1]:10255: getsockopt: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz' failed with error: Get http://localhost:10255/healthz: dial tcp [::1]:10255: getsockopt: connection refused.

Unfortunately, an error has occurred:
timed out waiting for the condition

This error is likely caused by:
- The kubelet is not running
- The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)
- Either there is no internet connection, or imagePullPolicy is set to "Never",
so the kubelet cannot pull or find the following control plane images:
- k8s.gcr.io/kube-apiserver-amd64:v1.10.0
- k8s.gcr.io/kube-controller-manager-amd64:v1.10.0
- k8s.gcr.io/kube-scheduler-amd64:v1.10.0
- k8s.gcr.io/etcd-amd64:3.1.12 (only if no external etcd endpoints are configured)

If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:
- 'systemctl status kubelet'
- 'journalctl -xeu kubelet'

docker.service 和10-kubeadm.conf中的cgroup不一致导致的

3、 安装dashboard报错:

1
2
3
4
[root@k8smaster kubernetes-dashboard]# kubectl get svc,pod --all-namespaces | grep dashboard

kube-system service/kubernetes-dashboard-external NodePort 10.106.103.199 <none> 9090:30090/TCP 52s
kube-system pod/kubernetes-dashboard-5cc6564db9-tp5ws 0/1 CrashLoopBackOff 2 52s

查看日志:

1
2
3
4
5
6
7
[root@k8smaster kubernetes-dashboard]# kubectl logs pod/kubernetes-dashboard-5cc6564db9-tp5ws -n kube-system 
2018/04/30 18:23:18 Using in-cluster config to connect to apiserver
2018/04/30 18:23:18 Using service account token for csrf signing
2018/04/30 18:23:18 No request provided. Skipping authorization
2018/04/30 18:23:18 Starting overwatch
2018/04/30 18:23:19 Error while initializing connection to Kubernetes apiserver. This most likely means that the cluster is misconfigured (e.g., it has invalid apiserver certificates or service accounts configuration) or the --apiserver-host param points to a server that does not exist. Reason: Get https://10.96.0.1:443/version: dial tcp 10.96.0.1:443: getsockopt: no route to host
Refer to our FAQ and wiki pages for more information: https://github.com/kubernetes/dashboard/wiki/FAQ

或者

1
kubectl describe pod kubernetes-dashboard-5cc6564db9-bjmgr --namespace=kube-system

删除该pod:

1
kubectl delete pod kubernetes-dashboard-5cc6564db9-tp5ws -n kube-system

安装过程中遇到问题,可以使用一下命令清理环境,重新安装

1
kubeadm reset

4、 安装完成后pod ip无法访问

极有可能是防火墙问题,需要对防火墙进行下清理

解决方法:

1.先把服务停掉

1
[root@k8smaster ~]#  systemctl stop docker kubelet

2.查看当前规则,然后清空

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
[root@k8smaster ~]#  iptables -L -n

[root@k8smaster ~]# iptables -F && iptables -X && iptables -F -t nat && iptables -X -t nat

[root@k8smaster ~]# iptables -L -n

Chain INPUT (policy ACCEPT)
target prot opt source destination

Chain FORWARD (policy ACCEPT)
target prot opt source destination

Chain OUTPUT (policy ACCEPT)
target prot opt source destination

[root@hadoop38 ~]#

3.重启iptables 和 开启服务

1
2
3
[root@k8smaster ~]#  systemctl restart iptables

[root@k8smaster ~]# systemctl start docker kubelet

4.再次查看防火墙策略 和清空掉

1
2
3
[root@k8smaster ~]#  iptables -L -n

[root@k8smaster ~]# iptables -F && iptables -X && iptables -F -t nat && iptables -X -t nat

5.等待一会 最终查看防火墙策略

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
[root@k8smaster ~]#  iptables -L -n

Chain INPUT (policy ACCEPT)
target prot opt source destination

KUBE-FIREWALL all -- 0.0.0.0/0 0.0.0.0/0

Chain FORWARD (policy ACCEPT)
target prot opt source destination

Chain OUTPUT (policy ACCEPT)

target prot opt source destination

KUBE-FIREWALL all -- 0.0.0.0/0 0.0.0.0/0

KUBE-SERVICES all -- 0.0.0.0/0 0.0.0.0/0 /* kubernetes service portals */


Chain KUBE-FIREWALL (2 references)

target prot opt source destination

DROP all -- 0.0.0.0/0 0.0.0.0/0 /* kubernetes firewall for dropping marked packets */ mark match 0x8000/0x8000


Chain KUBE-SERVICES (1 references)
target prot opt source destination

[root@k8smaster ~]# destination

6.验证
部署nginx:

1
kubectl run nginx --image=nginx --replicas=2 --port=80

暴露端口:

1
kubectl expose deployment/nginx --type="NodePort" --port 80

查看应用状态:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
[root@k8smaster ~]# kubectl describe svc
Name: kubernetes
Namespace: default
Labels: component=apiserver
provider=kubernetes
Annotations: <none>
Selector: <none>
Type: ClusterIP
IP: 10.96.0.1
Port: https 443/TCP
TargetPort: 6443/TCP
Endpoints: 10.0.0.8:6443
Session Affinity: ClientIP
Events: <none>

Name: nginx
Namespace: default
Labels: run=nginx
Annotations: <none>
Selector: run=nginx
Type: NodePort
IP: 10.98.84.61
Port: <unset> 80/TCP
TargetPort: 80/TCP
NodePort: <unset> 30019/TCP
Endpoints: 10.244.1.60:80,10.244.1.61:80
Session Affinity: None
External Traffic Policy: Cluster
Events: <none>

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
[root@k8smaster ~]# ping 10.244.1.60
PING 10.244.1.60 (10.244.1.60) 56(84) bytes of data.
64 bytes from 10.244.1.60: icmp_seq=1 ttl=63 time=0.941 ms
64 bytes from 10.244.1.60: icmp_seq=2 ttl=63 time=1.01 ms
^C
--- 10.244.1.60 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1003ms
rtt min/avg/max/mdev = 0.941/0.979/1.017/0.038 ms
[root@k8smaster ~]# ping 10.244.1.61
PING 10.244.1.61 (10.244.1.61) 56(84) bytes of data.
64 bytes from 10.244.1.61: icmp_seq=1 ttl=63 time=0.781 ms
64 bytes from 10.244.1.61: icmp_seq=2 ttl=63 time=0.574 ms
^C
--- 10.244.1.61 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1000ms
rtt min/avg/max/mdev = 0.574/0.677/0.781/0.106 ms
[root@k8smaster ~]# ping 10.244.1.61
PING 10.244.1.61 (10.244.1.61) 56(84) bytes of data.
64 bytes from 10.244.1.61: icmp_seq=1 ttl=63 time=0.630 ms
64 bytes from 10.244.1.61: icmp_seq=2 ttl=63 time=0.662 ms
^C

5、 报错:

1
Failed to get system container stats ``for "/system.slice/

解决:

在/etc/systemd/system/kubelet.service中加入:

1
ExecStart=/usr/bin/kubelet --runtime-cgroups=/systemd/system.slice --kubelet-cgroups=/systemd/system.slice

参考:

k8s官方文档

kubernetes-dashboard Github地址

51CTO 使用kubeadm安装Kubernetes v1.10以及常见问题解答

51CTO Kubernetes1.10中部署dashboard以及常见问题解析