serverless_gpu package
Submodules
serverless_gpu.compute module
GPU compute type definitions and utilities.
This module defines the available GPU types for serverless compute and provides utilities for working with GPU configurations and resource allocation.
- class serverless_gpu.compute.DisplayGpuType(value)[source]
Bases:
EnumGPU type to Display mapping
- a10 = 'A10'
- h100_80gb = '8xH100'
- class serverless_gpu.compute.GPUType(value)[source]
Bases:
EnumEnumeration of available GPU types for serverless compute.
This enum defines the GPU types that can be used for distributed computing on the serverless GPU platform.
- H100
NVIDIA H100 80GB GPU instances.
- A10
NVIDIA A10 GPU instances.
- A10 = 'a10'
- H100 = 'h100_80gb'
serverless_gpu.launcher module
Main launcher module for distributed serverless GPU compute.
This module provides the core functionality for launching and managing distributed functions on serverless GPU infrastructure. It includes:
The distributed decorator for executing functions on remote GPU resources
Workload submission and monitoring capabilities
Integration with Databricks jobs API and MLflow for tracking
Environment synchronization and dependency management
Support for multi-GPU and multi-node distributed workloads
The main entry point is the distributed function which can be used as a decorator or called directly to execute functions on serverless GPU compute resources.
- serverless_gpu.launcher.distributed(gpus, gpu_type=None, remote=False, run_async=False)[source]
Decorator to launch a function on remote GPUs or local GPUs.
remote GPUs: gpus that are not attached to your notebook but you have access to local GPUs: the gpus that are attached to the notebook
- Parameters:
gpus (int, optional) – Number of GPUs to use. Must be 1, 2, 4, 8 or a multiple of 8 for remote GPUs.
gpu_type (Optional[Union[GPUType, str]], optional) – The GPU type to use. Defaults to None. Required if remote is True.
remote (bool) – Whether to run the function on remote GPUs. Defaults to False.
run_async (bool) – Whether to run the function asynchronously. Defaults to False.
serverless_gpu.runtime module
Runtime utilities for distributed serverless GPU compute.
This module provides runtime utilities for managing distributed environments in serverless GPU compute. It includes functions for:
Getting distributed configuration parameters (rank, world size, etc.)
Environment variable management for distributed setup
Integration with PyTorch distributed backend
Node and process rank management
The functions in this module are adapted from MosaicML’s Composer library to work with the serverless GPU compute environment.
Note: This code is derived from https://github.com/mosaicml/composer.git@dc13fb0
- exception serverless_gpu.runtime.MissingEnvironmentError[source]
Bases:
ExceptionRaised when a required environment variable is missing.
- serverless_gpu.runtime.get_global_rank(group=None)[source]
Returns the global rank of the current process in the input PG, which is on
[0; group.WORLD_SIZE - 1].- Parameters:
group (ProcessGroup, optional) – The process group. If
None, it will return env_varRANK- Returns:
The global rank in input process group.
- Return type:
int
- serverless_gpu.runtime.get_local_rank()[source]
Returns the local rank for the current process, which is on
[0; LOCAL_WORLD_SIZE - 1].- Returns:
The local rank.
- Return type:
int
- serverless_gpu.runtime.get_local_world_size()[source]
Returns the local world size, which is the number of processes for the current node.
- Returns:
The local world size.
- Return type:
int
serverless_gpu.ray module
Ray integration for distributed serverless GPU compute.
This module provides integration with Ray for distributed computing on serverless GPU infrastructure. It includes:
Ray cluster setup and management on distributed GPU nodes
Integration with serverless GPU launcher for Ray workloads
Utilities for Ray head node detection and connection management
Support for Ray distributed training and inference patterns
The module enables users to run Ray-based distributed workloads on serverless GPU compute resources seamlessly.
- serverless_gpu.ray.ray_launch(gpus, gpu_type=None, remote=False, run_async=False)[source]
Experimental decorator to launch function with a ray cluster.
- Parameters:
gpus (int) – Number of gpus to launch ray on.
gpu_type (Optional[Union[GPUType, str]]) – The GPU type to use. Defaults to None. Required if remote is True.
remote (bool) – Use remote gpus.
run_async (bool) – Whether to run the function asynchronously. Defaults to False.