serverless_gpu.runtime
Runtime utilities for distributed serverless GPU compute.
This module provides runtime utilities for managing distributed environments in serverless GPU compute. It includes functions for:
Getting distributed configuration parameters (rank, world size, etc.)
Environment variable management for distributed setup
Integration with PyTorch distributed backend
Node and process rank management
The functions in this module are adapted from MosaicML’s Composer library to work with the serverless GPU compute environment.
Note: This code is derived from https://github.com/mosaicml/composer.git@dc13fb0
Functions
|
Returns the global rank of the current process in the input PG, which is on |
Returns the local rank for the current process, which is on |
|
Returns the local world size, which is the number of processes for the current node. |
|
Returns the node rank. |
|
|
|
Returns the world size, which is the number of processes participating in this training run. |
- serverless_gpu.runtime.get_global_rank(group=None)[source]
Returns the global rank of the current process in the input PG, which is on
[0; group.WORLD_SIZE - 1]
.- Parameters
group (ProcessGroup, optional) – The process group. If
None
, it will return env_varRANK
- Returns
The global rank in input process group.
- Return type
int
- serverless_gpu.runtime.get_local_rank()[source]
Returns the local rank for the current process, which is on
[0; LOCAL_WORLD_SIZE - 1]
.- Returns
The local rank.
- Return type
int
- serverless_gpu.runtime.get_local_world_size()[source]
Returns the local world size, which is the number of processes for the current node.
- Returns
The local world size.
- Return type
int