Overview
What is Serverless GPU API?
Serverless GPU API is a light-weight, intuitive library for launching multi-GPU workloads from Databricks notebooks onto managed Serverless GPU compute. It’s designed to make distributed computing on Databricks simple and accessible.
Key Features
Easy Integration: Works seamlessly with Databricks notebooks
Multi-GPU Support: Efficiently utilize multiple GPUs for your workloads
Flexible Configuration: Customizable compute resources and runtime settings
Comprehensive Logging: Built-in logging and monitoring capabilities
Architecture
Distributed execution (@distributed)
The
@distributeddecorator captures GPU count, type, and a reference to your callable. Those values are baked into aDistributedFunctionwrapper.When you call
my_fn.distributed(...), the serverless GPU API serializes the wrapped function and its validated arguments withcloudpickleinto a per-run directory that also holds an auto-generated_air.pyentrypoint.The local notebook environment (site-packages and user site-packages) is snapshotted and staged to DBFS via the env manager API. This snapshot is re-hydrated on the workers at launch time so your pip environment is available remotely.
_execute_localspawns multiple processes withtorchrun-style arguments and then aggregates outputs withcollect_outputs_or_raise.Set the following inside the decorator: GPU count, GPU type.
Set the following outside the decorator (before calling
.distributed): any%pipinstalls that affect the captured environment and input arguments passed when invoking.distributed.
Use Cases
Serverless GPU API is ideal for:
Machine learning model training at scale
Distributed data processing
GPU-accelerated computations
Research and experimentation workflows
Distributed Execution Details
When running in distributed mode:
The function is serialized and distributed across the specified number of GPUs
Each GPU runs a copy of the function with the same parameters
The environment is synchronized across all nodes
Results are collected and returned from all GPUs
Limitations
Pip environment size is limited to 15GB.
We do not support Ray Serve APIs.
Troubleshooting
See the Databricks Serverless GPU troubleshooting guide for fixes to the most common errors.