Amazon DataSync
AWS DataSync is an online data transfer service that simplifies, automates, and accelerates moving data between on-premises storage systems, AWS storage services, and other cloud storage. DataSync can transfer data at speeds up to 10 times faster than open-source tools by using purpose-built network protocol and parallel multi-threaded architecture. It supports NFS, SMB, HDFS, S3, EFS, FSx, and more as transfer endpoints.
APIs
Amazon DataSync REST API
RESTful API for AWS DataSync enabling management of data transfer tasks, locations, agents, and task executions for automated data movement between on-premises storage systems a...
Capabilities
Features
Transfer data at speeds up to 10 times faster than open-source tools using purpose-built multi-threaded network protocol over TLS.
Connect to NFS, SMB, HDFS, Amazon S3, Amazon EFS, FSx for Windows, FSx for Lustre, and FSx for NetApp ONTAP as transfer endpoints.
Automatically verify data integrity using checksums at both source and destination to ensure byte-for-byte data consistency after transfer.
Configure recurring scheduled transfers on hourly, daily, or weekly cadences for ongoing data synchronization between systems.
Deploy the DataSync agent VM on-premises to connect local NFS and SMB storage to AWS without opening inbound firewall ports.
Control the network bandwidth consumed by DataSync transfers to minimize impact on production workloads during business hours.
Monitor transfer metrics, task execution history, and set up alarms for failed transfers using Amazon CloudWatch.
Use Cases
Migrate petabytes of data from on-premises NAS and SAN systems to Amazon S3 or EFS during cloud adoption and data center exit projects.
Keep on-premises and cloud storage in sync on a scheduled basis for hybrid cloud architectures and distributed workloads.
Transfer on-premises file data to Amazon S3 Glacier for cost-effective long-term archival and backup storage.
Transfer datasets between AWS Regions or across AWS accounts for data sharing, disaster recovery, or multi-region analytics.
Stage large datasets from S3 or on-premises storage to FSx for Lustre for high-performance computing workloads on AWS.
Integrations
Primary cloud storage destination supporting all S3 storage classes including Glacier for cost-effective data archival.
Transfer data to and from Amazon Elastic File System for shared file storage accessible from multiple EC2 instances.
Integrate with FSx for Windows, FSx for Lustre, and FSx for NetApp ONTAP as high-performance managed file system destinations.
Receive DataSync task execution metrics, transfer rates, and error alerts in CloudWatch for monitoring and incident response.
Use Snowball for initial bulk data transfer followed by DataSync for ongoing incremental synchronization after migration.
Combine Storage Gateway for cache-based hybrid access with DataSync for bulk data movement between on-premises and cloud.