AWS DataSync¶
What Is This Service?¶
AWS DataSync is a fully managed online data transfer and synchronization service for moving data securely between:
- On-premises ↔ AWS
- AWS ↔ AWS
- Region ↔ Region
- Account ↔ Account
Supported storage systems:
- Amazon S3
- Amazon EFS
- Amazon FSx
- NFS
- SMB
- Object Storage
- Hadoop (HDFS)
Mental model:
DataSync = managed high-speed migration + synchronization + integrity verification.
Unlike generic copy tools:
DataSync
=
Transfer
+
Optimization
+
Verification
+
Automation
Why It Matters for Security¶
Data migration creates risk:
- Data leakage
- Transfer corruption
- Network exposure
- Credential sprawl
- Migration downtime
Security goals:
- Secure large transfers
- Verify integrity
- Eliminate manual tooling
- Reduce public exposure
- Preserve permissions
Security outcomes:
- Secure migrations
- Controlled synchronization
- Reduced operational risk
- Stronger compliance posture
Typical use cases:
- Data center migration
- Hybrid storage
- Cross-region movement
- DR synchronization
- Data lake ingestion
- HPC migration
Architecture Example¶
flowchart LR
OnPrem[On-Prem Storage]
Agent[DataSync Agent]
VPCE[Interface Endpoint<br/>PrivateLink]
DX[Direct Connect]
DataSync[AWS DataSync]
S3[Amazon S3]
EFS[Amazon EFS]
FSX[Amazon FSx]
KMS[KMS]
OnPrem --> Agent
Agent --> DX
DX --> VPCE
VPCE --> DataSync
DataSync --> S3
DataSync --> EFS
DataSync --> FSX
KMS --> S3
Core architecture:
Source
↓
Task
↓
Transfer
↓
Verification
↓
Destination
Core Concepts¶
DataSync Task (MOST TESTED)¶
Task defines:
What?
Where?
How?
When?
Contains:
- Source
- Destination
- Schedule
- Verification
- Filters
- Transfer options
Execution:
- Manual
- Scheduled
Locations¶
Endpoints participating in transfers.
Supported:
Source¶
- NFS
- SMB
- HDFS
- S3
- EFS
- FSx
- Object Storage
Destination¶
- S3
- EFS
- FSx
- NFS
- SMB
Agent¶
Virtual transfer appliance.
Required for:
On-premises transfers
Deployment:
- VMware
- Hyper-V
- EC2
- KVM
Exam nuance:
AWS↔AWS transfers often require no agent.
Execution¶
Single task run.
Tracks:
- Progress
- Metrics
- Verification
Scheduling¶
Supports:
- Recurring sync
- Incremental movement
Include / Exclude Filters (HIGH VALUE)¶
Allows selective transfer.
Examples:
Exclude:
*.tmp
/secrets/*
Include:
/project-alpha/*
Benefits:
- Reduced transfer size
- Improved security
- Lower costs
Exam scenario:
Transfer only production data.
Answer:
Filter rules
Important Integrations¶
Amazon S3 (VERY HIGH VALUE)¶
Most common destination.
Supports:
- Migration
- Data ingestion
- Synchronization
Amazon EFS¶
Supports:
- Shared filesystem migration
Amazon FSx¶
Supports:
- Windows
- Lustre
- ONTAP
- OpenZFS
Hadoop Distributed File System (HDFS)¶
Supports:
HDFS → AWS migration
Common big data scenario.
AWS Snow Family¶
Exam distinction:
Snow:
Offline movement
DataSync:
Online transfer
AWS Snowcone (SPECIAL EXCEPTION)¶
Snowcone includes:
Embedded DataSync Agent
Pattern:
Edge Collection
↓
Transport Device
↓
Reconnect
↓
DataSync
↓
S3
Exam scenario:
Disconnected environments.
AWS Direct Connect¶
Typical enterprise architecture.
AWS PrivateLink (VERY HIGH VALUE)¶
Supports:
Private DataSync Agent Communication
Pattern:
Agent
↓
PrivateLink
↓
DataSync
Agent must be:
Activated through endpoint
Critical trap:
Creating VPCE later does not automatically privatize traffic.
CloudWatch¶
Provides:
- Metrics
- Monitoring
AWS IAM¶
Controls:
- Tasks
- Execution
- Permissions
AWS KMS¶
Destination encryption.
Security Features¶
Encryption In Transit¶
Uses:
TLS
Protects:
- Transfer sessions
Encryption At Rest¶
Destination service controls encryption.
Examples:
- S3 KMS
- EFS encryption
- FSx encryption
Integrity Verification (VERY HIGH VALUE)¶
Verifies:
- Metadata
- Checksums
- Completeness
Purpose:
Prevent silent corruption.
Major differentiator.
Bandwidth Throttling¶
Controls:
- Transfer speed
Prevents:
- Network exhaustion
Network Isolation¶
Supports:
- VPN
- Direct Connect
- PrivateLink
Access Controls¶
Uses:
- IAM
- Storage permissions
Advanced Security and Operational Concepts¶
DataSync vs Storage Gateway (MOST TESTED)¶
DataSync:
Move data
Storage Gateway:
Extend storage
Shortcut:
Migration →
DataSync
Hybrid storage →
Storage Gateway
Incremental Transfer¶
Transfers:
Changed objects only
Improves:
- Speed
- Cost
Verification Modes¶
Options:
Entire transfer
or
Sample verification
Tradeoff:
Assurance vs speed.
Private Transfer Architecture (HIGH VALUE)¶
Default:
Agent
↓
Public Endpoint
↓
DataSync
Private design:
Agent
↓
DX / VPN
↓
PrivateLink
↓
DataSync
Most secure architecture.
Agent Requirement Trap¶
Need agent:
On-prem ↔ AWS
No agent:
AWS ↔ AWS
Metadata Preservation¶
Preserves:
- Ownership
- Timestamps
- Permissions
Important for migration.
DataSync Is Not Replication¶
Not intended for:
- Continuous replication
- Live synchronization
Use:
- S3 Replication
- Database replication
Task Parallelization¶
DataSync automatically parallelizes transfers.
Improves throughput.
WAN Optimization¶
Optimizes:
- Network efficiency
- Compression
- Transfer scheduling
Workflow(s)¶
Secure Private Migration¶
sequenceDiagram
participant Storage
participant Agent
participant VPCE
participant DataSync
participant S3
Storage->>Agent: Read
Agent->>VPCE: Private transfer
VPCE->>DataSync: Secure connection
DataSync->>S3: Write
DataSync->>DataSync: Verify
DataSync Task with Filters¶
sequenceDiagram
participant Source
participant Task
participant DataSync
participant Destination
Task->>DataSync: Include/Exclude rules
DataSync->>Source: Select files
DataSync->>Destination: Transfer
DataSync-->>Task: Verification
Snowcone Transfer Workflow¶
sequenceDiagram
participant Edge
participant Snowcone
participant DataSync
participant S3
Edge->>Snowcone: Store data
Snowcone->>DataSync: Upload
DataSync->>S3: Transfer
Comparisons¶
| Service | Purpose | Online | Verification | Agent |
|---|---|---|---|---|
| DataSync | Migration | Yes | Yes | Sometimes |
| Storage Gateway | Hybrid Storage | Yes | No | Yes |
| Snowball | Offline Transfer | No | Yes | No |
| S3 Replication | Object Replication | Yes | No | No |
| Transfer Family | Protocol Access | Yes | No | No |
Common Exam Traps¶
-
DataSync is migration, not storage.
-
Storage Gateway ≠ DataSync.
-
Integrity verification is built in.
-
Agent required for on-prem.
-
AWS↔AWS may not require agent.
-
Snowcone includes DataSync.
-
HDFS supported.
-
PrivateLink supports private transfers.
-
Agent activation must use VPCE.
-
Filter rules optimize transfers.
-
Metadata preserved.
-
DataSync is not continuous replication.
5-Second Recall¶
- DataSync = managed transfer
- Tasks define movement
- Agent for on-prem
- PrivateLink secures transfer
- Verification is core
- Snowcone includes agent
- HDFS supported
Quick Revision Notes¶
- Managed migration service
- Hybrid and cloud transfers
- Data integrity verification
- Incremental sync supported
- PrivateLink available
- Datasets can be filtered
- HDFS supported
- Snowcone integrates DataSync
- Works with S3/EFS/FSx
- Excellent for secure migrations