最近打算把OneNote、有道上的一些笔记,整理下发布到博客上来。首先从搞过一段时间的DevOps工具开始。作为目前最火的项目之一,本人安排了详细的学习计划,但是平时事情太多,计划只能暂缓。
以下是在CentOs 7中通过kubeadm搭建Kubernetes v1.10的过程。前期曾通过yum install kubernetes
在centos 7上直接部署了v1.5,启动都正常,但是在分发应用时卡壳了,查阅资料才知道是DNS问题,1.5版本好像是SkyDNS,部署有点麻烦,可查的资料也不多,后转向了V1.10版。
安装前的准备
环境说明:三台CentOs 7的AP,IP分别是10.0.0.8,10.0.0.9,10.0.0.10
1.配置好各节点hosts文件1
2
310.0.0.8 k8smaster
10.0.0.9 k8snode1
10.0.0.10 k8snode2
2.关闭各节点系统防火墙1
2systemctl stop firewalld
systemctl disable firewalld
3.关闭各节点SElinux1
2
3vim /etc/selinux/config
SELINUX=disabled
4.关闭各节点swap1
swapoff -a
- 创建/etc/sysctl.d/k8s.conf文件
1
2
3
4
5
6
7cat << EOF > /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
vm.swappiness=0
EOF
sysctl -p /etc/sysctl.d/k8s.conf
如果遇到报错:1
2sysctl: cannot stat /proc/sys/net/bridge/bridge-nf-call-ip6tables: No such file or directory
sysctl: cannot stat /proc/sys/net/bridge/bridge-nf-call-iptables: No such file or directory
需执行modprobe br_netfilter同时加入rc.local自启动中:1
2modprobe br_netfilter
echo "modprobe br_netfilter" >> /etc/rc.local
安装kubeadm
- 首先配置各节点阿里K8S YUM源
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16cat <<EOF > /etc/yum.repos.d/kubernetes.repo
[kubernetes]
name=Kubernetes
baseurl=https://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64
enabled=1
gpgcheck=0
EOF
yum -y install epel-release
yum clean all
yum makecache
2.在各节点安装kubeadm和相关工具包1
yum -y install docker kubelet kubeadm kubectl kubernetes-cni
3.启动Docker与kubelet服务1
2
3systemctl enable docker && systemctl start docker
systemctl enable kubelet && systemctl start kubelet
查看docker要与kubelet.service的cgroup要一致:1
2docker info | grep -i cgroup
cat /etc/systemd/system/kubelet.service.d/10-kubeadm.conf
其中docker.service在以下路径:1
/usr/lib/systemd/system/docker.service
改为以下方式--exec-opt native.cgroupdriver=cgroupfs
执行:1
sed -i "s/cgroup-driver=systemd/cgroup-driver=cgroupfs/g" /etc/systemd/system/kubelet.service.d/10-kubeadm.conf
1 | systemctl daemon-reload |
4.下载K8S相关镜像(Master节点操作)
因为无法直接访问gcr.io下载镜像,所以需要配置一个国内的容器镜像加速器
配置一个阿里云的加速器:
登录 https://cr.console.aliyun.com/
在页面中找到并点击镜像加速按钮,即可看到属于自己的专属加速链接,选择Centos版本后即可看到配置方法。
提示:在阿里云上使用 Docker 并配置阿里云镜像加速器,可能会遇到 daemon.json 导致 docker daemon 无法启动的问题,可以通过以下方法解决。
1 | vim /etc/sysconfig/docker |
5.下载K8S相关镜像
OK,解决完加速器的问题之后,开始下载k8s相关镜像,下载后将镜像名改为k8s.gcr.io/开头的名字,以便kubeadm识别使用。1
2
3
4
5
6
7
8
9#!/bin/bash
images=(kube-proxy-amd64:v1.10.0 kube-scheduler-amd64:v1.10.0 kube-controller-manager-amd64:v1.10.0 kube-apiserver-amd64:v1.10.0
etcd-amd64:3.1.12 pause-amd64:3.1 kubernetes-dashboard-amd64:v1.8.3 k8s-dns-sidecar-amd64:1.14.8 k8s-dns-kube-dns-amd64:1.14.8
k8s-dns-dnsmasq-nanny-amd64:1.14.8)
for imageName in ${images[@]} ; do
docker pull keveon/$imageName
docker tag keveon/$imageName k8s.gcr.io/$imageName
docker rmi keveon/$imageName
done
上面的shell脚本主要做了3件事,下载各种需要用到的容器镜像、重新打标记为符合k8s命令规范的版本名称、清除旧的容器镜像。
提示:镜像版本一定要和kubeadm安装的版本一致,否则会出现time out问题。
6.初始化安装K8S Master
执行上述shell脚本,等待下载完成后,执行kubeadm init
1 | kubeadm init --kubernetes-version=v1.10.0 --pod-network-cidr=10.244.0.0/16 |
注意记住指令:1
kubeadm join 10.0.0.8:6443 --token a6vlug.shwfx89vqrofvro7 --discovery-token-ca-cert-hash sha256:f7b5dc65173098b1ae74b06fe5062124528ff873c53ed38761601598ddbf1a58
7.配置kubectl认证信息(Master节点操作)1
2
3
4
5
6
7
8
9
10
11
12
13# 对于非root用户
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
# 对于root用户
export KUBECONFIG=/etc/kubernetes/admin.conf
也可以直接放到~/.bash_profile
echo "export KUBECONFIG=/etc/kubernetes/admin.conf" >> ~/.bash_profile
8.安装flannel网络(Master节点操作)1
mkdir -p /etc/cni/net.d/
1 | cat <<EOF> /etc/cni/net.d/10-flannel.conf |
mkdir /usr/share/oci-umount/oci-umount.d -p
mkdir /run/flannel/
1 | cat <<EOF> /run/flannel/subnet.env |
1 | kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/v0.9.1/Documentation/kube-flannel.yml |
9.验证K8S Master是否搭建成功(Master节点操作)1
2# 查看pods状态
kubectl get pods --all-namespaces
注意要等到所有的pod都是running状态再加入其他节点,否则kube-dns
、kubernetes-dashboard
等可能会被分配到node节点上导致kube-dns
、kubernetes-dashboard
等安装失败
10.让node1、node2加入集群1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17[root@k8snode2 ~]# kubeadm join 10.0.0.8:6443 --token a6vlug.shwfx89vqrofvro7 --discovery-token-ca-cert-hash sha256:f7b5dc65173098b1ae74b06fe5062124528ff873c53ed38761601598ddbf1a58
[preflight] Running pre-flight checks.
[WARNING FileExisting-crictl]: crictl not found in system path
Suggestion: go get github.com/kubernetes-incubator/cri-tools/cmd/crictl
[preflight] Starting the kubelet service
[discovery] Trying to connect to API Server "10.0.0.8:6443"
[discovery] Created cluster-info discovery client, requesting info from "https://10.0.0.8:6443"
[discovery] Requesting info from "https://10.0.0.8:6443" again to validate TLS against the pinned public key
[discovery] Cluster info signature and contents are valid and TLS certificate validates against pinned roots, will use API Server "10.0.0.8:6443"
[discovery] Successfully established connection with API Server "10.0.0.8:6443"
This node has joined the cluster:
* Certificate signing request was sent to master and a response
was received.
* The Kubelet was informed of the new secure connection details.
Run 'kubectl get nodes' on the master to see this node join the cluster.
注意此段执行命令第6步中记录下来的
默认情况下,Master节点不参与工作负载,但如果希望安装出一个All-In-One的k8s环境,则可以执行以下命令,让Master节点也成为一个Node节点:1
kubectl taint nodes --all node-role.kubernetes.io/master-
安装插件
部署dashboard(注意,dashboard需要部署在master节点上,否则会报错)
在k8s中 dashboard可以有两种访问方式:kubeconfig(HTTPS)和token(http)本篇先来介绍下Token方式的访问。
Token访问是无登录密码的,简单方便
1.下载官方的dashboard YAML文件或者改版的YAML(无坑版)1
2
3
4
5# 官网版
https://raw.githubusercontent.com/kubernetes/dashboard/master/src/deploy/recommended/kubernetes-dashboard.yaml
# 修改版(包括heapster插件YAML和RBAC YAML)
https://github.com/gh-Devin/kubernetes-dashboard
其中heapster.yaml的img改为registry.cn-hangzhou.aliyuncs.com/google_containers/heapster:v1.5.2
(通过阿里云搜索到)
2.创建pod1
2
3
4
5
6
7
8
9
10
11
12
13
14
15[root@k8smaster kubernetes-dashboard]# ls
heapster-rbac.yaml heapster.yaml kubernetes-dashboard-admin.rbac.yaml kubernetes-dashboard.yaml
[root@k8smaster kubernetes-dashboard]# kubectl -n kube-system create -f .
clusterrolebinding.rbac.authorization.k8s.io "heapster" created
serviceaccount "heapster" created
deployment.extensions "heapster" created
service "heapster" created
serviceaccount "kubernetes-dashboard-admin" created
clusterrolebinding.rbac.authorization.k8s.io "kubernetes-dashboard-admin" created
secret "kubernetes-dashboard-certs" created
serviceaccount "kubernetes-dashboard" created
role.rbac.authorization.k8s.io "kubernetes-dashboard-minimal" created
rolebinding.rbac.authorization.k8s.io "kubernetes-dashboard-minimal" created
deployment.apps "kubernetes-dashboard" created
service "kubernetes-dashboard-external" created
3.查看插件的状态1
kubectl get svc,pod --all-namespaces
4.遇到问题查看日志1
2
3
4kubectl describe pod heapster-6595c54cb9-chmfd --namespace=kube-system
kubectl logs pod/heapster-6595c54cb9-chmfd -n kube-system
5.安装出错,删除该pod后重新部署1
kubectl -n kube-system delete -f .
问题及解决
1、 报错:1
Error from server: error dialing backend: dial tcp 192.168.0.107:10250: getsockopt: no route to host
需要配置各节点上的防火墙。其中8472是flannel使用,9898和6443是minio访问master使用,centos必须配置,否则iptables -L -vn|more会看到INPUT的reject-with icmp-host-prohibited计数一直在增加。 10250是kubectl exec使用的,不加会报“Error from server: error dialing backend: dial tcp 192.168.128.164:10250: getsockopt: no route to host”。
1 | iptables -I INPUT -p tcp -m tcp --dport 8472 -j ACCEPT |
2、 报错:1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41[controlplane] Wrote Static Pod manifest for component kube-apiserver to "/etc/kubernetes/manifests/kube-apiserver.yaml"
[controlplane] Wrote Static Pod manifest for component kube-controller-manager to "/etc/kubernetes/manifests/kube-controller-manager.yaml"
[controlplane] Wrote Static Pod manifest for component kube-scheduler to "/etc/kubernetes/manifests/kube-scheduler.yaml"
[etcd] Wrote Static Pod manifest for a local etcd instance to "/etc/kubernetes/manifests/etcd.yaml"
[init] Waiting for the kubelet to boot up the control plane as Static Pods from directory "/etc/kubernetes/manifests".
[init] This might take a minute or longer if the control plane images have to be pulled.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz' failed with error: Get http://localhost:10255/healthz: dial tcp [::1]:10255: getsockopt: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz' failed with error: Get http://localhost:10255/healthz: dial tcp [::1]:10255: getsockopt: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz' failed with error: Get http://localhost:10255/healthz: dial tcp [::1]:10255: getsockopt: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz/syncloop' failed with error: Get http://localhost:10255/healthz/syncloop: dial tcp [::1]:10255: getsockopt: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz/syncloop' failed with error: Get http://localhost:10255/healthz/syncloop: dial tcp [::1]:10255: getsockopt: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz/syncloop' failed with error: Get http://localhost:10255/healthz/syncloop: dial tcp [::1]:10255: getsockopt: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz' failed with error: Get http://localhost:10255/healthz: dial tcp [::1]:10255: getsockopt: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz/syncloop' failed with error: Get http://localhost:10255/healthz/syncloop: dial tcp [::1]:10255: getsockopt: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10255/healthz' failed with error: Get http://localhost:10255/healthz: dial tcp [::1]:10255: getsockopt: connection refused.
Unfortunately, an error has occurred:
timed out waiting for the condition
This error is likely caused by:
- The kubelet is not running
- The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)
- Either there is no internet connection, or imagePullPolicy is set to "Never",
so the kubelet cannot pull or find the following control plane images:
- k8s.gcr.io/kube-apiserver-amd64:v1.10.0
- k8s.gcr.io/kube-controller-manager-amd64:v1.10.0
- k8s.gcr.io/kube-scheduler-amd64:v1.10.0
- k8s.gcr.io/etcd-amd64:3.1.12 (only if no external etcd endpoints are configured)
If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:
- 'systemctl status kubelet'
- 'journalctl -xeu kubelet'
docker.service 和10-kubeadm.conf中的cgroup不一致导致的
3、 安装dashboard报错:1
2
3
4[root@k8smaster kubernetes-dashboard]# kubectl get svc,pod --all-namespaces | grep dashboard
kube-system service/kubernetes-dashboard-external NodePort 10.106.103.199 <none> 9090:30090/TCP 52s
kube-system pod/kubernetes-dashboard-5cc6564db9-tp5ws 0/1 CrashLoopBackOff 2 52s
查看日志:1
2
3
4
5
6
7[root@k8smaster kubernetes-dashboard]# kubectl logs pod/kubernetes-dashboard-5cc6564db9-tp5ws -n kube-system
2018/04/30 18:23:18 Using in-cluster config to connect to apiserver
2018/04/30 18:23:18 Using service account token for csrf signing
2018/04/30 18:23:18 No request provided. Skipping authorization
2018/04/30 18:23:18 Starting overwatch
2018/04/30 18:23:19 Error while initializing connection to Kubernetes apiserver. This most likely means that the cluster is misconfigured (e.g., it has invalid apiserver certificates or service accounts configuration) or the --apiserver-host param points to a server that does not exist. Reason: Get https://10.96.0.1:443/version: dial tcp 10.96.0.1:443: getsockopt: no route to host
Refer to our FAQ and wiki pages for more information: https://github.com/kubernetes/dashboard/wiki/FAQ
或者1
kubectl describe pod kubernetes-dashboard-5cc6564db9-bjmgr --namespace=kube-system
删除该pod:1
kubectl delete pod kubernetes-dashboard-5cc6564db9-tp5ws -n kube-system
安装过程中遇到问题,可以使用一下命令清理环境,重新安装1
kubeadm reset
4、 安装完成后pod ip无法访问
极有可能是防火墙问题,需要对防火墙进行下清理
解决方法:
1.先把服务停掉1
[root@k8smaster ~]# systemctl stop docker kubelet
2.查看当前规则,然后清空1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16[root@k8smaster ~]# iptables -L -n
[root@k8smaster ~]# iptables -F && iptables -X && iptables -F -t nat && iptables -X -t nat
[root@k8smaster ~]# iptables -L -n
Chain INPUT (policy ACCEPT)
target prot opt source destination
Chain FORWARD (policy ACCEPT)
target prot opt source destination
Chain OUTPUT (policy ACCEPT)
target prot opt source destination
[root@hadoop38 ~]#
3.重启iptables 和 开启服务1
2
3[root@k8smaster ~]# systemctl restart iptables
[root@k8smaster ~]# systemctl start docker kubelet
4.再次查看防火墙策略 和清空掉1
2
3[root@k8smaster ~]# iptables -L -n
[root@k8smaster ~]# iptables -F && iptables -X && iptables -F -t nat && iptables -X -t nat
5.等待一会 最终查看防火墙策略1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30[root@k8smaster ~]# iptables -L -n
Chain INPUT (policy ACCEPT)
target prot opt source destination
KUBE-FIREWALL all -- 0.0.0.0/0 0.0.0.0/0
Chain FORWARD (policy ACCEPT)
target prot opt source destination
Chain OUTPUT (policy ACCEPT)
target prot opt source destination
KUBE-FIREWALL all -- 0.0.0.0/0 0.0.0.0/0
KUBE-SERVICES all -- 0.0.0.0/0 0.0.0.0/0 /* kubernetes service portals */
Chain KUBE-FIREWALL (2 references)
target prot opt source destination
DROP all -- 0.0.0.0/0 0.0.0.0/0 /* kubernetes firewall for dropping marked packets */ mark match 0x8000/0x8000
Chain KUBE-SERVICES (1 references)
target prot opt source destination
[root@k8smaster ~]# destination
6.验证
部署nginx:1
kubectl run nginx --image=nginx --replicas=2 --port=80
暴露端口:1
kubectl expose deployment/nginx --type="NodePort" --port 80
查看应用状态:1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29[root@k8smaster ~]# kubectl describe svc
Name: kubernetes
Namespace: default
Labels: component=apiserver
provider=kubernetes
Annotations: <none>
Selector: <none>
Type: ClusterIP
IP: 10.96.0.1
Port: https 443/TCP
TargetPort: 6443/TCP
Endpoints: 10.0.0.8:6443
Session Affinity: ClientIP
Events: <none>
Name: nginx
Namespace: default
Labels: run=nginx
Annotations: <none>
Selector: run=nginx
Type: NodePort
IP: 10.98.84.61
Port: <unset> 80/TCP
TargetPort: 80/TCP
NodePort: <unset> 30019/TCP
Endpoints: 10.244.1.60:80,10.244.1.61:80
Session Affinity: None
External Traffic Policy: Cluster
Events: <none>
1 | [root@k8smaster ~]# ping 10.244.1.60 |
5、 报错:1
Failed to get system container stats ``for "/system.slice/
解决:
在/etc/systemd/system/kubelet.service中加入:1
ExecStart=/usr/bin/kubelet --runtime-cgroups=/systemd/system.slice --kubelet-cgroups=/systemd/system.slice
参考: