Azure Databricks · Schema

Azure Databricks Cluster

Schema representing an Azure Databricks cluster configuration and state. A cluster is a set of computation resources and configurations on which you run notebooks, jobs, and libraries. Clusters consist of a driver node and worker nodes running Apache Spark.

AnalyticsApache SparkBig DataData EngineeringMachine Learning

Properties

Name Type Description
cluster_id string Canonical identifier for the cluster, assigned by Databricks upon creation.
cluster_name string Cluster name requested by the user. This name does not have to be unique. If not specified at creation, the cluster name defaults to an empty string.
spark_version string The Databricks Runtime version for the cluster. Determines the version of Apache Spark and other preinstalled libraries. Use the spark-versions API endpoint to retrieve available versions.
node_type_id string The Azure VM node type for worker nodes. Determines the amount of memory, CPU cores, and local storage available to each worker. Use the list-node-types API to retrieve available types.
driver_node_type_id string The Azure VM node type for the Spark driver node. If not specified, the driver node type defaults to the same value as node_type_id.
num_workers integer Number of worker nodes in the cluster. For a fixed-size cluster, set this to the desired number of workers. When autoscale is specified, this field is not used.
autoscale object Parameters for autoscaling the cluster. When specified, the cluster dynamically scales between the minimum and maximum number of workers based on workload.
spark_conf object An object containing a set of optional, user-specified Spark configuration key-value pairs. These are passed directly to the Spark driver and executors via the --conf flag.
azure_attributes object Attributes specific to Azure Databricks clusters, controlling Azure-specific behavior such as spot instance configuration.
ssh_public_keys array SSH public key contents that are added to each node in the cluster. You can specify up to 10 keys.
custom_tags object Custom tags to apply to cluster resources. These tags are propagated to Azure resources created for the cluster. Databricks adds several default tags in addition to any custom tags specified.
cluster_log_conf object Configuration for delivering Spark logs to a long-term storage destination. Only one destination type can be specified.
init_scripts array Cluster-scoped init scripts to run when the cluster starts. Init scripts run before the Spark driver or workers start. A maximum of 10 init scripts can be specified.
spark_env_vars object Environment variables to set on the Spark driver and worker processes. Key-value pairs are set as environment variables before the process starts.
enable_elastic_disk boolean When enabled, the cluster will autoscale local storage. The disk space used by the cluster auto-adjusts based on the amount of data shuffled. Recommended for workloads with varying storage needs.
instance_pool_id string The optional ID of the instance pool to use for cluster nodes. When specified, the cluster uses idle instances from the pool to reduce startup time. Both driver and worker nodes use the same pool unle
driver_instance_pool_id string The optional ID of the instance pool to use for the driver node. If specified, the driver uses this pool while workers use instance_pool_id.
policy_id string Identifier of the cluster policy used to create the cluster. Cluster policies enforce configuration constraints and provide defaults.
enable_local_disk_encryption boolean When enabled, locally attached disks on cluster nodes are encrypted. This includes shuffle data, spilled data, and local caches.
runtime_engine string The runtime engine to use on the cluster. PHOTON provides a native vectorized query engine that accelerates SQL and DataFrame workloads.
data_security_mode string The data security mode for the cluster. Controls how data access is governed. USER_ISOLATION provides per-user isolation with Unity Catalog. SINGLE_USER restricts the cluster to a single user.
single_user_name string The optional user name of the user assigned to the cluster when data_security_mode is SINGLE_USER. This user is the only one who can execute commands on the cluster.
state string Current state of the cluster. PENDING indicates the cluster is being created; RUNNING means it is ready for use; TERMINATED means it has been stopped.
state_message string A human-readable message providing additional details about the current state of the cluster.
creator_user_name string The username of the user who created the cluster.
start_time integer Time (in epoch milliseconds) when the cluster was created or last started.
terminated_time integer Time (in epoch milliseconds) when the cluster was terminated.
last_state_loss_time integer Time (in epoch milliseconds) when the cluster driver last lost its state. This occurs when the driver node is lost.
last_activity_time integer Time (in epoch milliseconds) when the cluster last had activity. Inactivity duration is measured from this time for autotermination.
autotermination_minutes integer Automatically terminates the cluster after it has been inactive for this time in minutes. If set to 0, the cluster is not auto-terminated. Default is 120 minutes.
cluster_source string Indicates the source that created the cluster, such as UI, API, or JOB.
default_tags object Tags that are automatically applied by Databricks regardless of custom_tags settings. Includes Vendor, Creator, ClusterId, and ClusterName.
termination_reason object Information about why the cluster was terminated, available when the cluster is in TERMINATED state.
driver object Information about the Spark driver node.
executors array Information about the Spark executor (worker) nodes.
jdbc_port integer Port on which the JDBC/ODBC server is listening for connections. Available only when the cluster is running.
cluster_memory_mb integer Total amount of memory (in megabytes) available across all nodes in the cluster.
cluster_cores number Total number of CPU cores available across all nodes in the cluster.
disk_spec object Disk specification for the cluster nodes.
cluster_log_status object Status of log delivery for the cluster.
View JSON Schema on GitHub