Amazon Glue logo

Amazon Glue

Amazon Glue is a serverless data integration service that makes it simple to discover, prepare, move, and integrate data from multiple sources for analytics, machine learning, and application development. It provides both visual and code-based interfaces for ETL operations and includes a Data Catalog for unified metadata management.

1 APIs 1 Capabilities 8 Features
AnalyticsAWSData CatalogData IntegrationData PipelineETLServerless

APIs

Amazon Glue API

The Amazon Glue API enables programmatic access to create and manage ETL jobs, crawlers, data catalogs, connections, and development endpoints. You can discover data sources, tr...

Capabilities

Amazon Glue Data Integration

Workflow capability for data engineers building ETL pipelines with Amazon Glue. Covers job management, crawler configuration, data catalog operations, workflow orchestration, an...

Run with Naftiko

Features

Serverless ETL

Run ETL jobs without managing infrastructure with automatic scaling and pay-per-use pricing.

Visual ETL Editor

Build ETL pipelines visually using a drag-and-drop interface without writing code.

Data Catalog

Unified metadata repository for all data assets across S3, databases, and data warehouses.

Automated Schema Discovery

Crawlers automatically discover data schemas and populate the Data Catalog.

Workflow Orchestration

Orchestrate multi-job ETL pipelines with triggers, conditional flows, and scheduling.

ML Transforms

Use machine learning to automate complex data transformation tasks like entity deduplication.

Schema Registry

Centrally manage and enforce data schema evolution with versioning and compatibility checks.

Data Quality

Define and evaluate data quality rules to validate data during ETL processing.

Use Cases

Data Lake ETL

Build ETL pipelines to ingest, transform, and load data into Amazon S3 data lakes.

Data Warehouse Loading

Extract and transform data from multiple sources and load into Amazon Redshift.

Data Catalog Management

Maintain a unified data catalog for data discovery across all data assets.

Real-Time Streaming ETL

Process streaming data from Kinesis and Kafka with Glue Streaming jobs.

Machine Learning Data Prep

Prepare and transform training datasets for machine learning using Glue Studio.

Integrations

Amazon S3

Primary data lake storage for Glue ETL input and output.

Amazon Redshift

Load transformed data into Redshift data warehouse.

Amazon Athena

Query Data Catalog tables directly with Athena serverless SQL.

Amazon Kinesis

Process streaming data from Kinesis Data Streams with Glue streaming.

Apache Kafka

Ingest and process Kafka streaming data in Glue jobs.

AWS Lake Formation

Fine-grained access control to Glue Data Catalog resources.

Amazon RDS

Connect to relational databases as ETL data sources.

Semantic Vocabularies

Amazon Glue Context

418 classes · 330 properties

JSON-LD

API Governance Rules

Amazon Glue API Rules

8 rules · 5 errors 2 warnings 1 info

SPECTRAL

Resources

🌐
Portal
Portal
🔗
Documentation
Documentation
📜
TermsOfService
TermsOfService
📜
PrivacyPolicy
PrivacyPolicy
💬
Support
Support
📰
Blog
Blog
👥
GitHubOrganization
GitHubOrganization
🌐
Console
Console
📝
SignUp
SignUp
🟢
StatusPage
StatusPage
🔗
Contact
Contact
🔗
SpectralRules
SpectralRules
🔗
Vocabulary
Vocabulary
🔗
NaftikoCapability
NaftikoCapability