Databricks · Schema
Databricks Cluster
Schema representing a Databricks cluster, which is a managed cloud resource for running data engineering and data science workloads on Apache Spark. Clusters can be configured with fixed or autoscaling worker counts, various node types, and cloud-provider-specific attributes.
AIAnalyticsApache SparkBig DataClean RoomsCloud ComputingDataData AnalyticsData EngineeringData GovernanceDelta LakeDelta SharingETLIdentity ManagementLakehouseMachine LearningMLflowModel ServingSecuritySQLUnity CatalogVector SearchVisualize
Properties
| Name | Type | Description |
|---|---|---|
| cluster_id | string | The unique identifier assigned to the cluster by Databricks. This ID is generated during cluster creation and is used to reference the cluster in all subsequent API calls. |
| cluster_name | string | A human-readable name for the cluster. This does not need to be unique within the workspace. |
| spark_version | string | The Databricks Runtime version of the cluster, which determines the versions of Apache Spark, Scala, Java, Python, R, and installed libraries. Use the Runtime Versions API to retrieve available versio |
| node_type_id | string | The cloud provider instance type for worker nodes. Determines the compute and memory resources available to each worker. |
| driver_node_type_id | stringnull | The cloud provider instance type for the Spark driver node. If not specified, defaults to the same value as node_type_id. |
| num_workers | integer | The number of worker nodes in a fixed-size cluster. A cluster has one Spark driver and num_workers executors. Set to 0 for a single-node cluster where the driver acts as both driver and worker. |
| autoscale | objectnull | Autoscaling configuration. When set, num_workers is ignored and the cluster dynamically scales between min_workers and max_workers based on workload. |
| state | string | The current state of the cluster in its lifecycle. |
| state_message | string | A human-readable message providing additional information about the current cluster state, such as the reason for termination. |
| start_time | integer | The time when the cluster was started, represented as epoch milliseconds (Unix timestamp in milliseconds). |
| terminated_time | integer | The time when the cluster was terminated, represented as epoch milliseconds. |
| last_state_loss_time | integer | The time when the cluster driver last lost its state, represented as epoch milliseconds. This occurs when the driver node is lost or restarted. |
| last_activity_time | integer | The time of the last user activity on the cluster, used for auto-termination calculations. Represented as epoch milliseconds. |
| last_restarted_time | integer | The time when the cluster was last restarted, represented as epoch milliseconds. |
| creator_user_name | string | The email address of the user who created the cluster. |
| cluster_source | string | The source that initiated the creation of this cluster. |
| spark_conf | object | A map of Spark configuration key-value pairs that override the default Spark configuration values for this cluster. |
| custom_tags | object | Additional tags applied to the cluster resources. Tags are propagated to the underlying cloud provider instances for cost tracking and resource management. |
| spark_env_vars | object | Environment variables set for all Spark processes running on this cluster. Use the syntax {{secrets/scope/key}} to reference Databricks secrets. |
| autotermination_minutes | integer | The number of minutes of inactivity after which the cluster is automatically terminated. A value of 0 disables auto-termination. Default is 120 minutes. |
| enable_elastic_disk | boolean | Whether autoscaling local storage is enabled. When enabled, Databricks monitors disk usage on Spark workers and automatically attaches additional disks when needed. |
| instance_pool_id | stringnull | The ID of the instance pool to use for cluster nodes. Instance pools reduce cluster start time by maintaining idle, ready-to-use instances. |
| policy_id | stringnull | The ID of the cluster policy applied to this cluster. Cluster policies constrain configuration settings and enforce organizational governance. |
| enable_local_disk_encryption | boolean | Whether data stored on local disks is encrypted. |
| data_security_mode | string | The data security mode of the cluster, which determines how data access is controlled. |
| single_user_name | stringnull | The user name (email) of the single user when data_security_mode is SINGLE_USER. |
| runtime_engine | string | The runtime engine to use. PHOTON enables the Photon vectorized query engine for significantly faster performance on SQL and DataFrame workloads. |
| aws_attributes | objectnull | AWS-specific attributes for clusters running on Amazon Web Services. |
| azure_attributes | objectnull | Azure-specific attributes for clusters running on Microsoft Azure. |
| gcp_attributes | objectnull | GCP-specific attributes for clusters running on Google Cloud Platform. |
| init_scripts | array | Initialization scripts that run on each node when the cluster starts. Scripts can be stored in workspace files, Unity Catalog volumes, or DBFS. |
| default_tags | object | Default tags automatically applied by Databricks, including Vendor, Creator, ClusterName, and ClusterId. |
| termination_reason | objectnull | The reason the cluster was terminated, including error codes and parameters. |
| driver | objectnull | Information about the Spark driver node. |
| executors | array | Information about the Spark executor (worker) nodes. |
| jdbc_port | integer | The port number on the driver node that serves JDBC/ODBC connections. |
| spark_context_id | integer | The canonical Spark context identifier for this cluster. |
| ssh_public_keys | array | SSH public keys added to each Spark node in this cluster for SSH access. |
| disk_spec | objectnull | Disk specifications for the cluster nodes. |
| cluster_log_status | objectnull | Status of cluster log delivery. |