Overview

What is Serverless GPU API?

Serverless GPU API is a light-weight, intuitive library for launching multi-GPU workloads from Databricks notebooks onto managed Serverless GPU compute. It’s designed to make distributed computing on Databricks simple and accessible.

Key Features

  • Easy Integration: Works seamlessly with Databricks notebooks

  • Multi-GPU Support: Efficiently utilize multiple GPUs for your workloads

  • Flexible Configuration: Customizable compute resources and runtime settings

  • Comprehensive Logging: Built-in logging and monitoring capabilities

Architecture

Distributed execution (@distributed)

  • The @distributed decorator captures GPU count, type, and a reference to your callable. Those values are baked into a DistributedFunction wrapper.

  • When you call my_fn.distributed(...), the serverless GPU API serializes the wrapped function and its validated arguments with cloudpickle into a per-run directory that also holds an auto-generated _air.py entrypoint.

  • The local notebook environment (site-packages and user site-packages) is snapshotted and staged to DBFS via the env manager API. This snapshot is re-hydrated on the workers at launch time so your pip environment is available remotely.

  • _execute_local spawns multiple processes with torchrun-style arguments and then aggregates outputs with collect_outputs_or_raise.

  • Set the following inside the decorator: GPU count, GPU type.

  • Set the following outside the decorator (before calling .distributed): any %pip installs that affect the captured environment and input arguments passed when invoking .distributed.

Use Cases

Serverless GPU API is ideal for:

  • Machine learning model training at scale

  • Distributed data processing

  • GPU-accelerated computations

  • Research and experimentation workflows

Distributed Execution Details

When running in distributed mode:

  • The function is serialized and distributed across the specified number of GPUs

  • Each GPU runs a copy of the function with the same parameters

  • The environment is synchronized across all nodes

  • Results are collected and returned from all GPUs

Limitations

  • Pip environment size is limited to 15GB.

  • We do not support Ray Serve APIs.

Troubleshooting

See the Databricks Serverless GPU troubleshooting guide for fixes to the most common errors.