Databricks · Schema

Databricks Cluster

Schema representing a Databricks cluster, which is a managed cloud resource for running data engineering and data science workloads on Apache Spark. Clusters can be configured with fixed or autoscaling worker counts, various node types, and cloud-provider-specific attributes.

AIAnalyticsApache SparkBig DataClean RoomsCloud ComputingDataData AnalyticsData EngineeringData GovernanceDelta LakeDelta SharingETLIdentity ManagementLakehouseMachine LearningMLflowModel ServingSecuritySQLUnity CatalogVector SearchVisualize

Properties

Name Type Description
cluster_id string The unique identifier assigned to the cluster by Databricks. This ID is generated during cluster creation and is used to reference the cluster in all subsequent API calls.
cluster_name string A human-readable name for the cluster. This does not need to be unique within the workspace.
spark_version string The Databricks Runtime version of the cluster, which determines the versions of Apache Spark, Scala, Java, Python, R, and installed libraries. Use the Runtime Versions API to retrieve available versio
node_type_id string The cloud provider instance type for worker nodes. Determines the compute and memory resources available to each worker.
driver_node_type_id stringnull The cloud provider instance type for the Spark driver node. If not specified, defaults to the same value as node_type_id.
num_workers integer The number of worker nodes in a fixed-size cluster. A cluster has one Spark driver and num_workers executors. Set to 0 for a single-node cluster where the driver acts as both driver and worker.
autoscale objectnull Autoscaling configuration. When set, num_workers is ignored and the cluster dynamically scales between min_workers and max_workers based on workload.
state string The current state of the cluster in its lifecycle.
state_message string A human-readable message providing additional information about the current cluster state, such as the reason for termination.
start_time integer The time when the cluster was started, represented as epoch milliseconds (Unix timestamp in milliseconds).
terminated_time integer The time when the cluster was terminated, represented as epoch milliseconds.
last_state_loss_time integer The time when the cluster driver last lost its state, represented as epoch milliseconds. This occurs when the driver node is lost or restarted.
last_activity_time integer The time of the last user activity on the cluster, used for auto-termination calculations. Represented as epoch milliseconds.
last_restarted_time integer The time when the cluster was last restarted, represented as epoch milliseconds.
creator_user_name string The email address of the user who created the cluster.
cluster_source string The source that initiated the creation of this cluster.
spark_conf object A map of Spark configuration key-value pairs that override the default Spark configuration values for this cluster.
custom_tags object Additional tags applied to the cluster resources. Tags are propagated to the underlying cloud provider instances for cost tracking and resource management.
spark_env_vars object Environment variables set for all Spark processes running on this cluster. Use the syntax {{secrets/scope/key}} to reference Databricks secrets.
autotermination_minutes integer The number of minutes of inactivity after which the cluster is automatically terminated. A value of 0 disables auto-termination. Default is 120 minutes.
enable_elastic_disk boolean Whether autoscaling local storage is enabled. When enabled, Databricks monitors disk usage on Spark workers and automatically attaches additional disks when needed.
instance_pool_id stringnull The ID of the instance pool to use for cluster nodes. Instance pools reduce cluster start time by maintaining idle, ready-to-use instances.
policy_id stringnull The ID of the cluster policy applied to this cluster. Cluster policies constrain configuration settings and enforce organizational governance.
enable_local_disk_encryption boolean Whether data stored on local disks is encrypted.
data_security_mode string The data security mode of the cluster, which determines how data access is controlled.
single_user_name stringnull The user name (email) of the single user when data_security_mode is SINGLE_USER.
runtime_engine string The runtime engine to use. PHOTON enables the Photon vectorized query engine for significantly faster performance on SQL and DataFrame workloads.
aws_attributes objectnull AWS-specific attributes for clusters running on Amazon Web Services.
azure_attributes objectnull Azure-specific attributes for clusters running on Microsoft Azure.
gcp_attributes objectnull GCP-specific attributes for clusters running on Google Cloud Platform.
init_scripts array Initialization scripts that run on each node when the cluster starts. Scripts can be stored in workspace files, Unity Catalog volumes, or DBFS.
default_tags object Default tags automatically applied by Databricks, including Vendor, Creator, ClusterName, and ClusterId.
termination_reason objectnull The reason the cluster was terminated, including error codes and parameters.
driver objectnull Information about the Spark driver node.
executors array Information about the Spark executor (worker) nodes.
jdbc_port integer The port number on the driver node that serves JDBC/ODBC connections.
spark_context_id integer The canonical Spark context identifier for this cluster.
ssh_public_keys array SSH public keys added to each Spark node in this cluster for SSH access.
disk_spec objectnull Disk specifications for the cluster nodes.
cluster_log_status objectnull Status of cluster log delivery.
View JSON Schema on GitHub