Amazon Glue
Amazon Glue is a serverless data integration service that makes it simple to discover, prepare, move, and integrate data from multiple sources for analytics, machine learning, and application development. It provides both visual and code-based interfaces for ETL operations and includes a Data Catalog for unified metadata management.
APIs
Amazon Glue API
The Amazon Glue API enables programmatic access to create and manage ETL jobs, crawlers, data catalogs, connections, and development endpoints. You can discover data sources, tr...
Capabilities
Amazon Glue Data Integration
Workflow capability for data engineers building ETL pipelines with Amazon Glue. Covers job management, crawler configuration, data catalog operations, workflow orchestration, an...
Run with NaftikoFeatures
Run ETL jobs without managing infrastructure with automatic scaling and pay-per-use pricing.
Build ETL pipelines visually using a drag-and-drop interface without writing code.
Unified metadata repository for all data assets across S3, databases, and data warehouses.
Crawlers automatically discover data schemas and populate the Data Catalog.
Orchestrate multi-job ETL pipelines with triggers, conditional flows, and scheduling.
Use machine learning to automate complex data transformation tasks like entity deduplication.
Centrally manage and enforce data schema evolution with versioning and compatibility checks.
Define and evaluate data quality rules to validate data during ETL processing.
Use Cases
Build ETL pipelines to ingest, transform, and load data into Amazon S3 data lakes.
Extract and transform data from multiple sources and load into Amazon Redshift.
Maintain a unified data catalog for data discovery across all data assets.
Process streaming data from Kinesis and Kafka with Glue Streaming jobs.
Prepare and transform training datasets for machine learning using Glue Studio.
Integrations
Primary data lake storage for Glue ETL input and output.
Load transformed data into Redshift data warehouse.
Query Data Catalog tables directly with Athena serverless SQL.
Process streaming data from Kinesis Data Streams with Glue streaming.
Ingest and process Kafka streaming data in Glue jobs.
Fine-grained access control to Glue Data Catalog resources.
Connect to relational databases as ETL data sources.