icl

Deploying ICL cluster to AWS

Prerequisites

Deploy ICL cluster

./scripts/deploy/aws.sh

Delete ICL cluster

./scripts/deploy/aws.sh --delete

Advanced scenarios

Control node console

The following command starts an ephemeral control node in a Docker container and starts a new Bash session:

./scripts/deploy/aws.sh --console

The Kubernetes context is configured in that control node, so you can use kubectl, helm and so on in that Bash session.

HTTP and HTTPS proxies

The script uses the following environment variables if they are set:

Note that the proxy is only used by the script itself, it is not used in the cluster.

To run console with transparent proxy in a sidecar container:

./scripts/deploy/aws.sh --start-proxy
./scripts/deploy/aws.sh --console
./scripts/deploy/aws.sh --stop-proxy

Cluster authentication

When you create an Amazon EKS cluster, the IAM principal that creates the cluster is automatically granted system:masters permissions. To grant additional IAM principals the ability to interact with your cluster, edit the kube-syste/aws-auth ConfigMap.

Example:

apiVersion: v1
kind: ConfigMap
metadata:
  name: aws-auth
  namespace: kube-system
data:
  mapRoles: |
    - rolearn: arn:aws:iam::980842202052:role/main-eks-node-group-20230119212622560000000001
      groups:
        - system:bootstrappers
        - system:nodes
      username: system:node:

    - rolearn: arn:aws:iam::980842202052:role/AWSReservedSSO_AWSAdministratorAccess_bf7da1573ba8f7c9
      username: system:node:
      groups:
        - system:masters

See also:

DNS

To access the cluster endpoints you need to configure an external DNS and set up the following records:

{ingess_domain}. 300 IN CNAME {ingress_nginx_elb}.
*.{ingess_domain}. 300 IN CNAME {ingess_domain}.

# Optional, Ray client endpoint uses a dedicated AWS ELB.
ray-api.{ingess_domain}. 300 IN CNAME {ray_elb}.
 
# Optional, ClearML requires its own subdomain.
*.clearml.{ingess_domain}. 300 IN CNAME {ingess_domain}.

Where

Custom instance type

You may override default instance type (e.g. to enable use of GPU):

export ICL_AWS_INSTANCE_TYPE="g4dn.xlarge"

GPU software and drivers

To install GPU driver and Kubernetes plugin, specify GPU type:

export GPU_TYPE="nvidia"

For AWS, currently only “nvidia” and “” (empty value) are supported. The latter means no GPU.

Tests

Replace {ingress_domain} with the cluster ingress domain.

# Optional, use only when ICL endpoints are accessible via HTTP proxy
./scripts/deploy/aws.sh --start-proxy

./scripts/deploy/aws.sh --console

# On control node execute
export ICL_INGRESS_DOMAIN={ingress_domain}
export ICL_RAY_ENDPOINT=ray-api.{ingress_domain}:80
./scripts/ccn/test.sh

# Optional, use only when ICL endpoints are accessible via HTTP proxy
./scripts/deploy/aws.sh --stop-proxy