Skip to content

AWS DataSync

What Is This Service?

AWS DataSync is a fully managed online data transfer and synchronization service for moving data securely between:

  • On-premises ↔ AWS
  • AWS ↔ AWS
  • Region ↔ Region
  • Account ↔ Account

Supported storage systems:

  • Amazon S3
  • Amazon EFS
  • Amazon FSx
  • NFS
  • SMB
  • Object Storage
  • Hadoop (HDFS)

Mental model:
DataSync = managed high-speed migration + synchronization + integrity verification.

Unlike generic copy tools:

DataSync
=
Transfer
+
Optimization
+
Verification
+
Automation

Why It Matters for Security

Data migration creates risk:

  • Data leakage
  • Transfer corruption
  • Network exposure
  • Credential sprawl
  • Migration downtime

Security goals:

  • Secure large transfers
  • Verify integrity
  • Eliminate manual tooling
  • Reduce public exposure
  • Preserve permissions

Security outcomes:

  • Secure migrations
  • Controlled synchronization
  • Reduced operational risk
  • Stronger compliance posture

Typical use cases:

  • Data center migration
  • Hybrid storage
  • Cross-region movement
  • DR synchronization
  • Data lake ingestion
  • HPC migration

Architecture Example

flowchart LR

OnPrem[On-Prem Storage]

Agent[DataSync Agent]

VPCE[Interface Endpoint<br/>PrivateLink]

DX[Direct Connect]

DataSync[AWS DataSync]

S3[Amazon S3]

EFS[Amazon EFS]

FSX[Amazon FSx]

KMS[KMS]

OnPrem --> Agent

Agent --> DX

DX --> VPCE

VPCE --> DataSync

DataSync --> S3

DataSync --> EFS

DataSync --> FSX

KMS --> S3

Core architecture:

Source
 ↓
Task
 ↓
Transfer
 ↓
Verification
 ↓
Destination

Core Concepts

DataSync Task (MOST TESTED)

Task defines:

What?
Where?
How?
When?

Contains:

  • Source
  • Destination
  • Schedule
  • Verification
  • Filters
  • Transfer options

Execution:

  • Manual
  • Scheduled

Locations

Endpoints participating in transfers.

Supported:

Source

  • NFS
  • SMB
  • HDFS
  • S3
  • EFS
  • FSx
  • Object Storage

Destination

  • S3
  • EFS
  • FSx
  • NFS
  • SMB

Agent

Virtual transfer appliance.

Required for:

On-premises transfers

Deployment:

  • VMware
  • Hyper-V
  • EC2
  • KVM

Exam nuance:

AWS↔AWS transfers often require no agent.


Execution

Single task run.

Tracks:

  • Progress
  • Metrics
  • Verification

Scheduling

Supports:

  • Recurring sync
  • Incremental movement

Include / Exclude Filters (HIGH VALUE)

Allows selective transfer.

Examples:

Exclude:

*.tmp
/secrets/*

Include:

/project-alpha/*

Benefits:

  • Reduced transfer size
  • Improved security
  • Lower costs

Exam scenario:

Transfer only production data.

Answer:

Filter rules

Important Integrations

Amazon S3 (VERY HIGH VALUE)

Most common destination.

Supports:

  • Migration
  • Data ingestion
  • Synchronization

Amazon EFS

Supports:

  • Shared filesystem migration

Amazon FSx

Supports:

  • Windows
  • Lustre
  • ONTAP
  • OpenZFS

Hadoop Distributed File System (HDFS)

Supports:

HDFS → AWS migration

Common big data scenario.


AWS Snow Family

Exam distinction:

Snow:

Offline movement

DataSync:

Online transfer

AWS Snowcone (SPECIAL EXCEPTION)

Snowcone includes:

Embedded DataSync Agent

Pattern:

Edge Collection
 ↓
Transport Device
 ↓
Reconnect
 ↓
DataSync
 ↓
S3

Exam scenario:

Disconnected environments.


AWS Direct Connect

Typical enterprise architecture.


Supports:

Private DataSync Agent Communication

Pattern:

Agent
 ↓
PrivateLink
 ↓
DataSync

Agent must be:

Activated through endpoint

Critical trap:

Creating VPCE later does not automatically privatize traffic.


CloudWatch

Provides:

  • Metrics
  • Monitoring

AWS IAM

Controls:

  • Tasks
  • Execution
  • Permissions

AWS KMS

Destination encryption.


Security Features

Encryption In Transit

Uses:

TLS

Protects:

  • Transfer sessions

Encryption At Rest

Destination service controls encryption.

Examples:

  • S3 KMS
  • EFS encryption
  • FSx encryption

Integrity Verification (VERY HIGH VALUE)

Verifies:

  • Metadata
  • Checksums
  • Completeness

Purpose:

Prevent silent corruption.

Major differentiator.


Bandwidth Throttling

Controls:

  • Transfer speed

Prevents:

  • Network exhaustion

Network Isolation

Supports:

  • VPN
  • Direct Connect
  • PrivateLink

Access Controls

Uses:

  • IAM
  • Storage permissions

Advanced Security and Operational Concepts

DataSync vs Storage Gateway (MOST TESTED)

DataSync:

Move data

Storage Gateway:

Extend storage

Shortcut:

Migration →

DataSync

Hybrid storage →

Storage Gateway


Incremental Transfer

Transfers:

Changed objects only

Improves:

  • Speed
  • Cost

Verification Modes

Options:

Entire transfer

or

Sample verification

Tradeoff:

Assurance vs speed.


Private Transfer Architecture (HIGH VALUE)

Default:

Agent
 ↓
Public Endpoint
 ↓
DataSync

Private design:

Agent
 ↓
DX / VPN
 ↓
PrivateLink
 ↓
DataSync

Most secure architecture.


Agent Requirement Trap

Need agent:

On-prem ↔ AWS

No agent:

AWS ↔ AWS

Metadata Preservation

Preserves:

  • Ownership
  • Timestamps
  • Permissions

Important for migration.


DataSync Is Not Replication

Not intended for:

  • Continuous replication
  • Live synchronization

Use:

  • S3 Replication
  • Database replication

Task Parallelization

DataSync automatically parallelizes transfers.

Improves throughput.


WAN Optimization

Optimizes:

  • Network efficiency
  • Compression
  • Transfer scheduling

Workflow(s)

Secure Private Migration

sequenceDiagram

participant Storage
participant Agent
participant VPCE
participant DataSync
participant S3

Storage->>Agent: Read

Agent->>VPCE: Private transfer

VPCE->>DataSync: Secure connection

DataSync->>S3: Write

DataSync->>DataSync: Verify

DataSync Task with Filters

sequenceDiagram

participant Source
participant Task
participant DataSync
participant Destination

Task->>DataSync: Include/Exclude rules

DataSync->>Source: Select files

DataSync->>Destination: Transfer

DataSync-->>Task: Verification

Snowcone Transfer Workflow

sequenceDiagram

participant Edge
participant Snowcone
participant DataSync
participant S3

Edge->>Snowcone: Store data

Snowcone->>DataSync: Upload

DataSync->>S3: Transfer

Comparisons

Service Purpose Online Verification Agent
DataSync Migration Yes Yes Sometimes
Storage Gateway Hybrid Storage Yes No Yes
Snowball Offline Transfer No Yes No
S3 Replication Object Replication Yes No No
Transfer Family Protocol Access Yes No No

Common Exam Traps

  1. DataSync is migration, not storage.

  2. Storage Gateway ≠ DataSync.

  3. Integrity verification is built in.

  4. Agent required for on-prem.

  5. AWS↔AWS may not require agent.

  6. Snowcone includes DataSync.

  7. HDFS supported.

  8. PrivateLink supports private transfers.

  9. Agent activation must use VPCE.

  10. Filter rules optimize transfers.

  11. Metadata preserved.

  12. DataSync is not continuous replication.


5-Second Recall

  • DataSync = managed transfer
  • Tasks define movement
  • Agent for on-prem
  • PrivateLink secures transfer
  • Verification is core
  • Snowcone includes agent
  • HDFS supported

Quick Revision Notes

  • Managed migration service
  • Hybrid and cloud transfers
  • Data integrity verification
  • Incremental sync supported
  • PrivateLink available
  • Datasets can be filtered
  • HDFS supported
  • Snowcone integrates DataSync
  • Works with S3/EFS/FSx
  • Excellent for secure migrations