K8s Master节点高可用集群搭建

    技术2024-08-22  67

    Kubernetes 高可用集群

    高可用集群概念高可用集群搭建一、节点准备环境二、上传所需要的 master节点组件 镜像文件三、修改 HAproxy,keepalived文件四、初始化集群主从master五、安装 flannel 插件六、重新加载 HAproxy flannel 报错解决方法如下:报错信息:Error registering network: failed to acquire lease: node "k8s-master-1" pod cidr not assigned报错如下图:Pod 日志如下图:报错原因/解决报错方法:1.安装Kubeadm Init的时候,没有增加 `--pod-network-cidr 10.244.0.0/16`参数2.kube-controller-manager 没有给新加入的节点分配IP段. Kubernetes主要由以下几个核心组件组成:

    etcd保存了整个集群的状态;apiserver提供了资源操作的唯一入口,并提供认证、授权、访问控制、API注册和发现等机制;controller manager负责维护集群的状态,比如故障检测、自动扩展、滚动更新等;scheduler负责资源的调度,按照预定的调度策略将Pod调度到相应的机器上;kubelet负责维护容器的生命周期,同时也负责Volume(CSI)和网络(CNI)的管理;Container runtime负责镜像管理以及Pod和容器的真正运行(CRI);kube-proxy负责为Service提供cluster内部的服务发现和负载均衡;

    除了核心组件,还有一些推荐的插件,其中有的已经成为CNCF中的托管项目:

    CoreDNS负责为整个集群提供DNS服务Ingress Controller为服务提供外网入口Prometheus提供资源监控Dashboard提供GUI

    高可用集群概念

    随着项目需求的不断增大,访问量不断增多,单主模式的集群方案在很多公司已经没有生存空间,高可用已经成为常态。k8s也不例外,今天说说k8s集群部署。k8s集群部署非常简单。因为k8s已经帮我们做了很多事情,我们需要做的只是实现他的apiserver高可用。

    etcd 存储 ,一般是会放在集群内部托管,多主集群状态下会实现集群化扩展,自动满足高可用。

    controller manager 与 scheduler 对于k8s集群来说只会运行一个,其他都是挂起状态

    kubelet 与 proxy 都是集群内部应用 只在当前节点工作,所以不需要高可用。

    k8s已经将集群这一部分的高可用功能帮我们解决掉了,如上图所示,我们需要做的就是为入口apiserver设置高可用,简单来说就是加个 HAproxy 把这三个节点添加到后端真实服务器。

    我们在API server 之前加一个负载调度器,或者nginx ,或者haporxy。需要注意的是调度器直接访问的是节点,而非api server

    国内有个apiserver 叫睿云 ,对于睿云来说有个叫breeze的k8s部署工具。他是基于kubeadm来实现的,并且他通过keepalived和HA-poryx来实现高可用,并且他用镜像将2个应用封装了起来。使用起来非常方便(部署结构如上图所示)

    高可用集群搭建

    主机节点备注192.168.168.11/24k8s-master-0centos 7-7 kernel:5.7.7192.168.168.12/24k8s-master-1centos 7-7 kernel:5.7.7192.168.168.13/24k8s-master-2centos 7-7 kernel:5.7.7

    一、节点准备环境

    1.修改主机名

    2.配置 /etc/hosts文件

    3.安装docker 并配置加速器

    4.配置kubernetes阿里源,安装组件,升级内核版本

    5.关闭防火墙,内核参数优化,关闭selinux,安装k8s TAB

    systemctl start kubelet ; systemctl enable kubelet

    systemctl start docker ; systemctl enable docker

    二、上传所需要的 master节点组件 镜像文件

    所有节点都要上传,安装(kubeadm-basic.images这个包是自己做的,网上是找不到的,很简单,根据当前kubeadm 版本下载镜像,然后save -o 打成tar包就可以了) 导入镜像

    [root@k8s-master-1 cluster]# docker load -i haproxy.tar [root@k8s-master-1 cluster]# docker load -i keepalived.tar [root@k8s-master-1 cluster]# tar -zxvf kubeadm-basic.images.tar.gz kubeadm-basic.images/ kubeadm-basic.images/coredns.tar kubeadm-basic.images/etcd.tar kubeadm-basic.images/pause.tar kubeadm-basic.images/apiserver.tar kubeadm-basic.images/proxy.tar kubeadm-basic.images/kubec-con-man.tar kubeadm-basic.images/scheduler.tar

    进入到 解压之后的目录

    [root@k8s-master-1 cluster]# cd kubeadm-basic.images [root@k8s-master-1 kubeadm-basic.images]# ls apiserver.tar coredns.tar etcd.tar kubec-con-man.tar pause.tar proxy.tar scheduler.tar [root@k8s-master-1 kubeadm-basic.images]# for i in `ls ./`;do docker load --input $i ;done

    docker images

    三、修改 HAproxy,keepalived文件

    修改HAproxy 文件

    [root@k8s-master-1 cluster]# tar -zxvf start.keep.tar.gz [root@k8s-master-1 cluster]# mv data/ /data [root@k8s-master-1 cluster]# cd /data/lb/etc

    vim haproxy.cfg

    backend be_k8s_6443 mode tcp timeout queue 1h timeout server 1h timeout connect 1h log global balance roundrobin server rancher01 192.168.168.11:6443 #暂时只设置一个真实节点

    vim start-haproxy.sh

    #!/bin/bash MasterIP1=192.168.168.11 MasterIP2=192.168.168.12 MasterIP3=192.168.168.13 MasterPort=6443 docker run -d --restart=always --name HAProxy-K8S -p 6444:6444 \ -e MasterIP1=$MasterIP1 \ -e MasterIP2=$MasterIP2 \ -e MasterIP3=$MasterIP3 \ -e MasterPort=$MasterPort \ -v /data/lb/etc/haproxy.cfg:/usr/local/etc/haproxy/haproxy.cfg \ wise2c/haproxy-k8s ~

    ./start-haproxy.sh

    vim start-keepalived.sh

    #!/bin/bash VIRTUAL_IP=192.168.168.100 INTERFACE=ens33 NETMASK_BIT=24 CHECK_PORT=6444 RID=10 VRID=160 MCAST_GROUP=224.0.0.18 docker run -itd --restart=always --name=Keepalived-K8S \ --net=host --cap-add=NET_ADMIN \ -e VIRTUAL_IP=$VIRTUAL_IP \ -e INTERFACE=$INTERFACE \ -e CHECK_PORT=$CHECK_PORT \ -e RID=$RID \ -e VRID=$VRID \ -e NETMASK_BIT=$NETMASK_BIT \ -e MCAST_GROUP=$MCAST_GROUP \ wise2c/keepalived-k8s

    ./start-keepalived.sh

    四、初始化集群主从master

    kubeadm config print init-defaults >> kubeadm-config.yaml 修改 kubeadm-config.yaml 文件

    apiVersion: kubeadm.k8s.io/v1beta2 bootstrapTokens: - groups: - system:bootstrappers:kubeadm:default-node-token token: abcdef.0123456789abcdef ttl: 24h0m0s usages: - signing - authentication kind: InitConfiguration localAPIEndpoint: advertiseAddress: 192.168.168.12 #当前节点IP地址 bindPort: 6443 nodeRegistration: criSocket: /var/run/dockershim.sock name: k8s-master-1 #当前节点主机名 taints: - effect: NoSchedule key: node-role.kubernetes.io/master --- apiServer: timeoutForControlPlane: 4m0s apiVersion: kubeadm.k8s.io/v1beta2 certificatesDir: /etc/kubernetes/pki clusterName: kubernetes controlPlaneEndpoint: "192.168.168.100:6444" #默认空字符串,高可用集群需要添加高可用VIP地址 controllerManager: {} dns: type: CoreDNS etcd: local: dataDir: /var/lib/etcd imageRepository: k8s.gcr.io kind: ClusterConfiguration kubernetesVersion: v1.18.5 #修改集群版本号 networking: dnsDomain: cluster.local serviceSubnet: 10.144.0.0/16 #修改网段 scheduler: {} --- apiVersion: kubeproxy.config.k8s.io/v1alpha1 kind: KubeProxyConfiguration featureGates: SupportIPVSProxyMode: true mode: ipvs

    初始化集群:

    192.168.168.11 k8s-master-0

    kubeadm init --config=kubeadm-config.yaml \ --upload-certs \ --pod-network-cidr 10.244.0.0/1 | tee kubeadm-init.log

    注意,安装Flannel时,kubectl create -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml如果yml中的"Network": "10.244.0.0/16"和--pod-network-cidr不一样,就修改成一样的。不然可能会使得Node间Cluster IP不通。

    阻塞日志:

    Your Kubernetes control-plane has initialized successfully! To start using your cluster, you need to run the following as a regular user: mkdir -p $HOME/.kube sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config sudo chown $(id -u):$(id -g) $HOME/.kube/config You should now deploy a pod network to the cluster. Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at: https://kubernetes.io/docs/concepts/cluster-administration/addons/ You can now join any number of the control-plane node running the following command on each as root: kubeadm join 192.168.168.100:6444 --token abcdef.0123456789abcdef \ --discovery-token-ca-cert-hash sha256:a5ef4aa34ae4e9c6cb37826a7c588155a31f5a67e4b554626b2553cb23be1541 \ --control-plane --certificate-key 7fe6e2e69a306b782cf1cb3d5b1a317b23ebd13c15009faf5deb5ccd2aea77a3 Please note that the certificate-key gives access to cluster sensitive data, keep it secret! As a safeguard, uploaded-certs will be deleted in two hours; If necessary, you can use "kubeadm init phase upload-certs --upload-certs" to reload certs afterward. Then you can join any number of worker nodes by running the following on each as root: kubeadm join 192.168.168.100:6444 --token abcdef.0123456789abcdef \ --discovery-token-ca-cert-hash sha256:a5ef4aa34ae4e9c6cb37826a7c588155a31f5a67e4b554626b2553cb23be1541

    创建 config 文件

    mkdir -p $HOME/.kube sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config sudo chown $(id -u):$(id -g) $HOME/.kube/config

    其他主节点加入集群:

    kubeadm join 192.168.168.100:6444 --token abcdef.0123456789abcdef –discovery-token-ca-cert-hash sha256:a5ef4aa34ae4e9c6cb37826a7c588155a31f5a67e4b554626b2553cb23be1541 –control-plane --certificate-key 7fe6e2e69a306b782cf1cb3d5b1a317b23ebd13c15009faf5deb5ccd2aea77a3

    This node has joined the cluster and a new control plane instance was created: * Certificate signing request was sent to apiserver and approval was received. * The Kubelet was informed of the new secure connection details. * Control plane (master) label and taint were applied to the new node. * The Kubernetes control plane instances scaled up. * A new etcd member was added to the local/stacked etcd cluster. To start administering your cluster from this node, you need to run the following as a regular user: mkdir -p $HOME/.kube sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config sudo chown $(id -u):$(id -g) $HOME/.kube/config Run 'kubectl get nodes' to see this node join the cluster.

    创建 config 文件

    mkdir -p $HOME/.kube sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config sudo chown $(id -u):$(id -g) $HOME/.kube/config

    五、安装 flannel 插件

    安装镜像:docker pull quay.io/coreos/flannel:v0.12.0-amd64

    wget https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml

    执行 flannel :kubectl apply -f kube-flannel.yml

    查看pod状态,全部running即为成功

    kubectl get pod -n kube-system

    六、重新加载 HAproxy

    之前的 HAproxy.cfg 文件中只指定了一个后端主机,将其他 Master 节点也添加上

    vim /data/lb/etc/haproxy.cfg

    backend be_k8s_6443 mode tcp timeout queue 1h timeout server 1h timeout connect 1h log global balance roundrobin server rancher01 192.168.168.11:6443 server rancher02 192.168.168.12:6443 server rancher03 192.168.168.13:6443

    docker rm -f HAProxy-K8S

    sh /data/lb/start-haproxy.sh

    flannel 报错解决方法如下:

    报错信息:Error registering network: failed to acquire lease: node “k8s-master-1” pod cidr not assigned

    报错如下图:

    Pod 日志如下图:

    报错原因/解决报错方法:

    1.安装Kubeadm Init的时候,没有增加 --pod-network-cidr 10.244.0.0/16参数

    注意,安装Flannel时,kubectl create -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml如果yml中的"Network": "10.244.0.0/16"和--pod-network-cidr不一样,就修改成一样的。不然可能会使得Node间Cluster IP不通。

    2.kube-controller-manager 没有给新加入的节点分配IP段.

    编辑 master 机器上的 /etc/kubernetes/manifests/kube-controller-manager.yaml 启动文件加上下面两句话,那篇文档没有加,所以报错;下面这个cluster-cidr要和kube-flannel.yml里面的地址一致,要和kube-proxy.config.yaml里面的clusterCIDR一致

    –allocate-node-cidrs=true –cluster-cidr=10.244.0.0/16

    - command: - kube-controller-manager - --authentication-kubeconfig=/etc/kubernetes/controller-manager.conf - --authorization-kubeconfig=/etc/kubernetes/controller-manager.conf - --bind-address=127.0.0.1 - --client-ca-file=/etc/kubernetes/pki/ca.crt - --cluster-signing-cert-file=/etc/kubernetes/pki/ca.crt - --cluster-signing-key-file=/etc/kubernetes/pki/ca.key - --controllers=*,bootstrapsigner,tokencleaner - --kubeconfig=/etc/kubernetes/controller-manager.conf - --leader-elect=true - --requestheader-client-ca-file=/etc/kubernetes/pki/front-proxy-ca.crt - --root-ca-file=/etc/kubernetes/pki/ca.crt - --service-account-private-key-file=/etc/kubernetes/pki/sa.key - --use-service-account-credentials=true - --allocate-node-cidrs=true - --cluster-cidr=10.244.0.0/16

    如果你还没有还原快照,采用方法二,然后 kubectl delete pod -n kube-system kube-flannel-*,将三个错误的 flannel-pod删除,即可自动重新创建新的 flannel-pod。

    如果你恢复快照了,那么在 kubeadm init 时加上 --pod-network-cidr=10.244.0.0/16 参数即可。

    Processed: 0.021, SQL: 9