Regardless of whether an instance of IRIS is in the cloud or not, high availability and disaster recovery are always important considerations. While IKO already allows for the use of NodeSelectors to enforce the scheduling of IRISCluster nodes across multiple zones, multi-region k8s clusters are generally not recommended or even supported in the major CSP's managed Kubernetes solutions. However, when discussing HA and DR for IRIS, we may want to have an async member in a completely separate region, or even in a different cloud provider altogether. With additional options added to IKO in the development of version 3.8 (specifically 3.8.4+), it's now possible to create this kind of multi-region HA architecture.
In this post we will:
- Create two different Kubernetes clusters in entirely separate Regions.
- Configure both Kubernetes clusters to allow for bi-directional communication
- Configure internal Loadbalancers to connect our IRISCluster nodes
- Create and mirror pair and DR Async IKO IRISCluster, one in each Kubernetes cluster
- Verify the Mirror Status of each node and prove that a simple global is accessible on the DR Async node
Previous Work
It should be noted that we have previously experimented with several tools for creating remote DR Async members through IKO and with K8s.
1) We've attempted cross--cluster communication schemes using Calico/Tigera and Istio. In both cases the solutions were overly complicated and did not end up meeting our success criteria
2) We've used external LoadBalancer to expose mirror members to each other over the internet. While effective, the solution we will cover here allows for much greater security, and doesn't expose our IRIS instances to the internet at large in any capacity.
Prerequisites
In order to replicate the following work, you will need the following tools
Creating and Installing Dependencies on EKS Kubernetes Clusters
It's fairly straightforward to create EKS Clusters through the EKSCTL tool.
- First create two files, cluster1.yaml and cluster2.yaml
cluster1.yaml
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
metadata:
name: sferguso-cross1
region: us-east-1
version: "1.27"
kubernetesNetworkConfig:
ipFamily: "IPv4"
serviceIPv4CIDR: "10.100.0.0/16"
vpc:
cidr: "192.168.0.0/16"
nodeGroups:
- name: ng-1
instanceType: m5.large
desiredCapacity: 5
volumeSize: 80
cluster2.yaml
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
metadata:
name: sferguso-cross2
region: us-west-2
version: "1.27"
kubernetesNetworkConfig:
ipFamily: "IPv4"
serviceIPv4CIDR: "10.200.0.0/16"
vpc:
cidr: "172.16.0.0/16"
nodeGroups:
- name: ng-1
instanceType: m5.large
desiredCapacity: 5
volumeSize: 80
- Secondly, use EKSCTL to create the clusters. (This will take some time)
eksctl create cluster -f cluster1.yaml
eksctl create cluster -f cluster2.yaml
Note in the above configurations, that the regions are us-east-1 and us-west-2. This will result in each Kubernetes cluster spanning 2 Availability Zones by default (which generally are their own independent data centers), but in entirely separate regions for the greatest possible fault tolerance.
Note as well that the CIDRS both for the kubernetesNetworkConfigs and the VPCs are all unique. This allows us to be certain exactly where traffic is originating from. At the very least, the VPC CIDRs must be different to ensure no overlap for later steps.
- Next we need to install our Storage CSI Driver.
If your AWS permissions are wide enough, its possible to simply run the following commands to install EBS onto the two k8s clusters
eksctl utils associate-iam-oidc-provider --region=us-east-1 --cluster=sferguso-cross1 --approve
eksctl create iamserviceaccount --name ebs-csi-controller-sa --namespace kube-system --cluster sferguso-cross1 --role-name AmazonEKS_EBS_CSI_DriverRole_sferguso-cross1 --role-only --attach-policy-arn arn:aws:iam::aws:policy/service-role/AmazonEBSCSIDriverPolicy --approve
eksctl create addon --name aws-ebs-csi-driver --cluster sferguso-cross1 --service-account-role-arn arn:aws:iam::<Your_Account_ID>:role/AmazonEKS_EBS_CSI_DriverRole_sferguso-cross1 --force
eksctl utils associate-iam-oidc-provider --region=us-west-2 --cluster=sferguso-cross2 --approve
eksctl create iamserviceaccount --name ebs-csi-controller-sa --namespace kube-system --cluster sferguso-cross2 --region us-west-2 --role-name AmazonEKS_EBS_CSI_DriverRole_sferguso-cross2 --role-only --attach-policy-arn arn:aws:iam::aws:policy/service-role/AmazonEBSCSIDriverPolicy --approve
eksctl create addon --name aws-ebs-csi-driver --cluster sferguso-cross2 --region=us-west-2 --service-account-role-arn arn:aws:iam::<Your_Account_ID>:role/AmazonEKS_EBS_CSI_DriverRole_sferguso-cross2 --force
If you do not have IAM permissions for these commands (true for all ISC AWS accounts), you can create the IAM Roles by hand as well. This requires you to make two new Roles (AmazonEKS_EBS_CSI_DriverRole_sferguso-cross1 and AmazonEKS_EBS_CSI_DriverRole_sferguso-cross2) in the IAM Console from the AWS Managed AmazonEBSCSIDriverPolicy and a policy that looks like the following:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Federated": "arn:aws:iam::<Your_Account_ID>:oidc-provider/oidc.eks.<region>.amazonaws.com/id/<EKS_OIDC>"
},
"Action": "sts:AssumeRoleWithWebIdentity",
"Condition": {
"StringEquals": {
"oidc.eks.<region>.amazonaws.com/id/<EKS_OIDC>:sub": "system:serviceaccount:kube-system:ebs-csi-controller-sa",
"oidc.eks.<region>.amazonaws.com/id/<EKS_OIDC>:aud": "sts.amazonaws.com"
}
}
}
]
}
Note that you can find the <EKS_OIDC> in the AWS Console in the Overview tab of your EKS Cluster, and <Your_Account_ID> in the top right dropdown of the Console.
Follow the instructions at https://docs.intersystems.com/components/csp/docbook/DocBook.UI.Page.cls?KEY=AIKO36#AIKO36_install
Important: Before running helm install, navigate to the IKO chart values.yaml and make sure to set FQDN to false. The rest of the exercise will not work without this setting
useFQDN: false
Create a Peering Connection between the two EKS created VPCs
Follow instructions at https://docs.aws.amazon.com/vpc/latest/peering/create-vpc-peering-connection.html#same-account-different-region
Additionally follow the instructions at https://docs.aws.amazon.com/vpc/latest/peering/vpc-peering-routing.html to create routes for traffic to move through the peering connection
This should result in a ingress route for every RouteTable like the following:
${cidr_block_othercluster} |
${vpc_peering_id} |
Create Internal LoadBalancers
Now that the two EKS Clusters are connected, we need to expose K8s services that will associate themselves with our IKO defined IRISClusters. These internal load balancers have DNS names that only resolve within either of the VPCs created by EKS.
In the Cluster you will use to create an IKO mirror pair apply the following:
mirror-svcs.yaml
apiVersion: v1
kind: Service
metadata:
name: primary-svc
annotations:
service.beta.kubernetes.io/aws-load-balancer-cross-zone-load-balancing-enabled: "true"
service.beta.kubernetes.io/aws-load-balancer-type: "nlb"
service.beta.kubernetes.io/aws-load-balancer-internal: "true"
spec:
type: LoadBalancer
ports:
- port: 1972
protocol: TCP
name: superserver
- port: 2188
protocol: TCP
name: iscagent
selector:
statefulset.kubernetes.io/pod-name: crossover-data-0-0
---
apiVersion: v1
kind: Service
metadata:
name: backup-svc
annotations:
service.beta.kubernetes.io/aws-load-balancer-cross-zone-load-balancing-enabled: "true"
service.beta.kubernetes.io/aws-load-balancer-type: "nlb"
service.beta.kubernetes.io/aws-load-balancer-internal: "true"
spec:
type: LoadBalancer
ports:
- port: 1972
protocol: TCP
name: superserver
- port: 2188
protocol: TCP
name: iscagent
selector:
statefulset.kubernetes.io/pod-name: crossover-data-0-1
In the Cluster you will use to create an IKO async member apply the following:
async-svc.yaml
apiVersion: v1
kind: Service
metadata:
name: async-svc
annotations:
service.beta.kubernetes.io/aws-load-balancer-cross-zone-load-balancing-enabled: "true"
service.beta.kubernetes.io/aws-load-balancer-type: "nlb"
service.beta.kubernetes.io/aws-load-balancer-internal: "true"
spec:
type: LoadBalancer
ports:
- port: 1972
protocol: TCP
name: superserver
- port: 2188
protocol: TCP
name: iscagent
selector:
statefulset.kubernetes.io/pod-name: crossover-data-0-100
You can apply these services with the following command (note that I've chosen my us-east-1 cluster to contain my mirror pair, and us-west-2 to contain my async dr member)
kubectl --context <east> apply -f mirror-svcs.yaml
kubectl --context <west> apply -f async-svc.yaml
- Once These Internal LoadBalancer services come up, create ExternalName services that map cross-cluster.
In the Cluster you will use to create an IKO mirror pair apply the following:
mirror-cross-svc.yaml
apiVersion: v1
kind: Service
metadata:
name: crossover-data-0-100
spec:
type: ExternalName
externalName: "a74fcb8ce93d94781ae4e0ea9d4d1c41-a8ec6876e5f7d747.elb.us-west-2.amazonaws.com"
ports:
- port: 1972
name: "superserver"
- port: 2188
name: "iscagent"
In the Cluster you will use to create an IKO async member apply the following:
async-cross-svcs.yaml
apiVersion: v1
kind: Service
metadata:
name: crossover-data-0-1
spec:
type: ExternalName
externalName: "a59d4f5a0c102479a9d48854da1f0d87-b27eeccb1044a9da.elb.us-east-1.amazonaws.com"
ports:
- port: 1972
name: "superserver"
- port: 2188
name: "iscagent"
---
apiVersion: v1
kind: Service
metadata:
name: crossover-data-0-0
spec:
type: ExternalName
externalName: "a59d4f5a0c102479a9d48854da1f0d87-b27eeccb1044a9da.elb.us-east-1.amazonaws.com"
ports:
- port: 1972
name: "superserver"
- port: 2188
name: "iscagent"
Again, you can apply these with the following style command:
kubectl --context <east> apply -f mirror-cross-svc.yaml
kubectl --context <west> apply -f async-cross-svcs.yaml
Once all these service are running, we're ready to create the matching IKO IRISClusters. Note that the ExternalName services use the expected naming scheme of the IRISClusters we'll make in a second. This is why disabling FQDN is so important when we initially installed IKO, as without that setting off, IKO will not create a routable mirroring configuration for the cluster using these services. In this way, the Primary and Failover member will route to the Async member over the ExternalName service (which references the associated Internal Loadbalancer in the other VPC) and vice-versa.
Create Mirrored IRISCluster and Async IRISCluster
For the following I created an AWS ECR (sferguso-cross) to push my IRIS images too. EKS automatically has pull access to any ECR in the same account. You can either do the same, or use an IKO ImagePullSecret to source IRIS images from containers.intersystems.com
In the Cluster for the IKO mirror pair apply the following:
iris-mirror.yaml
apiVersion: intersystems.com/v1alpha1
kind: IrisCluster
metadata:
name: crossover
spec:
licenseKeySecret:
name: iris-key-secret
configSource:
name: my-iriscluster-config
tls:
common:
secret:
secretName: common-certs
topology:
data:
image: <Your_Account_ID>.dkr.ecr.us-east-1.amazonaws.com/sferguso-cross:iris-2024.1.0L.217.0
compatibilityVersion: "2024.1.0"
mirrored: true
In the Cluster for the Async Member apply the following:
iris-async.yaml
apiVersion: intersystems.com/v1alpha1
kind: IrisCluster
metadata:
name: crossover
spec:
licenseKeySecret:
name: iris-key-secret
configSource:
name: my-iriscluster-config
tls:
common:
secret:
secretName: common-certs
topology:
data:
image: <Your_Account_ID>.dkr.ecr.us-east-1.amazonaws.com/sferguso-cross:iris-2024.1.0L.217.0
compatibilityVersion: "2024.1.0"
mirrored: true
mirrorMap: drasync
start: 100
And as usual, apply these to the clusters through kubectl
kubectl --context <east> apply -f iris-mirror.yaml
kubectl --context <west> apply -f iris-async.yaml
Verification
In the Cluster for the IKO mirror pair
$ kubectl exec crossover-data-0-0 -- iris qlist
IRIS^/usr/irissys^2024.1.0L.208.0^running, since Thu Jan 18 16:43:48 2024^iris.cpf^1972^0^0^ok^IRIS^Failover^Primary^/irissys/data/IRIS/^1972
$ kubectl exec crossover-data-0-1 -- iris qlist
IRIS^/usr/irissys^2024.1.0L.208.0^running, since Thu Jan 18 16:44:27 2024^iris.cpf^1972^0^0^ok^IRIS^Failover^Backup^/irissys/data/IRIS/^1972
$ kubectl exec -it crossover-data-0-0 -- iris session IRIS -UIRISCLUSTER
IRISCLUSTER>s ^result="Success!"
$ kubectl exec -it crossover-data-0-1 -- iris session IRIS -UIRISCLUSTER
IRISCLUSTER>w ^result
Success!
In the Cluster for the Async Member
$ kubectl exec crossover-data-0-100 -- iris qlist
IRIS^/usr/irissys^2024.1.0L.208.0^running, since Thu Jan 18 17:07:23 2024^iris.cpf^1972^0^0^ok^IRIS^Disaster Recovery^Connected^/irissys/data/IRIS/^1972
$ kubectl exec -it crossover-data-0-100 -- iris session IRIS -UIRISCLUSTER
IRISCLUSTER>w ^result
Success!
Huzzah! We officially have an Async DR node that is running on the opposite coast from its other mirror members.