Quickstart - SkyShift with Ray#
You’re on your way to simplifying and enhancing your job management and scheduling on Ray clusters. Let’s dive into how you can add, manage clusters, and submit jobs efficiently.
In this guide, we’ll cover the following topics:
Adding and removing Ray Clusters from SkyShift.
Creating and running SkyShift jobs on Ray.
Prerequisites#
- Open Ports:
- The following ports need to be open on the target cluster for communication:
RAY_CLIENT_PORT: 10001
RAY_JOBS_PORT: 8265
RAY_DASHBOARD_PORT: 8265
RAY_NODES_PORT: 6379
If Ray needs to be installed, port 22 must also be open for SSH access.
- Connection details for the Ray cluster:
- If Ray needs to be installed:
ssh_key_path: Path to the SSH private key.
username: Username for the SSH connection.
host: Hostname or IP address of the target cluster.
Automatic Ray Installation#
If Ray is not installed on the target cluster, SkyShift can automatically install it using the ray_install.sh script. The script installs Miniconda, creates a new Conda environment, and installs Ray. This ensures that the necessary components are in place for job submission and management without requiring manual intervention from the user.
Adding and Removing Clusters#
Attaching a Remote Ray Cluster#
With SkyShift, integrating a remote Ray cluster into your workflow is straightforward. Start by ensuring the name of the cluster in the rayconf.yaml configuration file matches the name you intend to use. Here’s how you can attach it to SkyShift:
skyctl create cluster <rayclustername> --manager ray
Now you’re ready to deploy jobs to your Ray cluster through SkyShift!
Checking Cluster Status#
To view the status of your configured clusters, simply run:
> skyctl get clusters
You’ll see an output similar to the following, providing a snapshot of your clusters’ resources and their status:
NAME MANAGER RESOURCES STATUS
raycluster1 ray cpus: 520.0/600.0 READY
memory: 1235171.0/3868184.0 MiB
P100: 8.0/8.0
raycluster2 ray cpus: 420.0/600.0 READY
memory: 1268431.0/2664184.0 MiB
cluster3 k8s cpus: 1.83/2.0 READY
memory: 6035.6/7954.6 MiB
Detaching a Cluster#
If you need to remove a cluster from SkyShift, the process is just as simple:
skyctl delete cluster <rayclustername>
After detaching, you can verify the status of the remaining clusters with skyctl get clusters to see the updated list.
Submitting Jobs#
Submitting jobs through SkyShift allows you to leverage the powerful scheduling and management capabilities of Ray, SLURM or Kubernetes, with the added benefits of SkyShift’s scheduling capabilities.
Creating a SkyShift Job#
Here’s an example SkyShift job definition:
kind: Job
metadata:
name: example-job
labels:
app: nginx
spec:
replicas: 2
image: nginx:1.14.2
resources:
cpus: 0.5
memory: 128
ports:
- 80
restartPolicy: Always
To deploy this job, use the skyctl apply command with the job definition file:
skyctl apply -f <path-to-your-job-file>.yaml
Alternative Job Creation Methods#
SkyShift also supports job creation via our Python API and the SkyShift job CLI, offering you flexibility in how you manage your deployments. For instance, to create a job using the CLI:
skyctl create job example-job --image nginx:1.14.2 --replicas 2 --cpus 0.5 --memory 128 --port 80 --labels app=nginx
Because this job requests 0.5 CPUs and 128 MiB of memory, it will be scheduled on a Ray cluster as the Kubernetes cluster has 0.17 CPUs available.
Monitoring Your Job#
To check the status of your jobs and ensure they’re running as expected:
> skyctl get jobs
NAME CLUSTER REPLICAS RESOURCES NAMESPACE STATUS
example-job raycluster1 2/2 cpus: 0.5 default RUNNING
memory: 128.0 MiB
You’ll see details about each job, including the cluster it’s running on, resources allocated, and its current status.
Now that you’re equipped with the basics of managing clusters and jobs in SkyShift using Ray, you can start harnessing the full potential of your Ray clusters. SkyShift is designed to make your computational tasks easier, more efficient, and scalable. Happy computing!