Amazon FSx for Lustre¶
What Is This Service?¶
Amazon FSx for Lustre is a fully managed high-performance parallel distributed filesystem optimized for workloads requiring extreme throughput, low latency, and shared POSIX access.
Designed for:
- High Performance Computing (HPC)
- Machine Learning (ML)
- Analytics
- Media rendering
- Scientific computing
- Financial simulations
Mental model:
FSx for Lustre = managed ultra-fast compute filesystem backed by S3.
Unlike object storage:
S3 → Objects
FSx for Lustre → Files
Why It Matters for Security¶
Compute-heavy environments often process:
- Training datasets
- Scientific data
- Financial models
- Intermediate outputs
- Sensitive workloads
Security goals:
- Centralize file access
- Avoid uncontrolled copies
- Encrypt high-throughput storage
- Isolate compute environments
- Control filesystem access
Security outcomes:
- Secure shared compute
- Controlled data movement
- Encrypted processing pipelines
- Centralized governance
Architecture Example¶
flowchart LR
S3[Amazon S3 Data Lake]
FSX[Amazon FSx for Lustre]
EC2[EC2 HPC Cluster]
EKS[Amazon EKS]
KMS[KMS]
SG[Security Groups]
DRT[Data Repository Task]
S3 --> DRT
DRT --> FSX
KMS --> FSX
SG --> FSX
EC2 --> FSX
EKS --> FSX
FSX --> S3
Core architecture:
S3 = Source of Truth
FSx = Compute Acceleration Layer
Core Concepts¶
Lustre File System¶
Lustre characteristics:
- Distributed
- Parallel
- Shared
- POSIX-compliant
Built for:
Thousands of clients
↓
Single namespace
Deployment Types¶
Scratch¶
Temporary.
Characteristics:
- Lowest cost
- Ephemeral
- Not replicated
Best for:
- Training
- Batch analytics
- Temporary processing
Exam shortcut:
Maximum speed
Minimum durability
Persistent¶
Long-lived deployment.
Characteristics:
- Better durability
- Metadata protection
- Production workloads
Persistent_1¶
Older persistent generation.
Use:
- Existing workloads
Persistent_2 (HIGH VALUE)¶
Newer generation.
Benefits:
- Higher throughput density
- Improved performance
Exam shortcut:
Highest performance
+
Long-running workloads
If asked:
Maximum throughput with durable storage
Look for:
Persistent_2
Data Repository Association (DRA)¶
Links:
S3
↔
FSx
Capabilities:
- Import
- Export
- Synchronization
Most tested feature.
Data Repository Tasks (DRT) (HIGH VALUE)¶
Bulk synchronization operations.
Purpose:
Avoid waiting for lazy loading.
Operations:
Import
Export
Metadata Sync
Pattern:
S3
↓
DRT
↓
FSx
↓
Compute
Exam scenario:
Pre-load data before expensive training.
Answer:
Data Repository Task
Storage Types¶
SSD¶
Optimized for:
- Low latency
- Random IO
- ML training
HDD¶
Optimized for:
- Sequential throughput
- Large datasets
Typical workloads:
- Genomics
- Simulations
- Analytics
SSD Read Cache (HDD)¶
Optional cache layer.
Purpose:
Capacity
+
Hot Data Acceleration
Exam trap:
FSx for Lustre is NOT SSD-only.
Throughput Capacity¶
Controls:
- Aggregate performance
- Parallel access speed
Storage and throughput are linked.
Important Integrations¶
Amazon S3 (VERY HIGH VALUE)¶
Primary integration.
Supports:
- Lazy loading
- Export
- DRA
- DRT
Pattern:
S3
↓
FSx
↓
Compute
↓
S3
Amazon EC2¶
Primary compute layer.
Common use:
- HPC clusters
- ML training
Amazon EKS¶
Supports:
- High-performance containers
AWS Batch¶
Typical:
Batch
↓
FSx
Amazon SageMaker¶
Pattern:
Training
↓
FSx
↓
S3
AWS IAM¶
Controls:
Filesystem lifecycle
Not:
File access
Exam trap.
AWS KMS¶
Supports:
- Encryption at rest
VPC¶
Provides:
- Network isolation
- Security Groups
Security Features¶
Encryption at Rest¶
Supports:
AWS KMS
Protects:
- Files
- Metadata
Encryption in Transit¶
Supports TLS.
Protects:
- Client communication
POSIX Authorization (MOST TESTED)¶
Controls:
UID
GID
RWX
Not IAM.
Exam shortcut:
IAM:
Can create filesystem
POSIX:
Can access file
Security Groups¶
Control:
- Mount access
- Network exposure
VPC Isolation¶
Access controlled through:
- Routing
- Subnets
- SGs
S3 Security Preservation¶
Bucket controls remain active.
FSx does not bypass:
- IAM
- Bucket policies
Advanced Security and Operational Concepts¶
FSx for Lustre Is Not General NAS¶
Wrong use:
Home directories
Correct use:
Parallel compute
Scratch vs Persistent (HIGH VALUE)¶
Scratch:
Speed
Persistent:
Durability
Exam shortcut:
Training →
Scratch
Production →
Persistent_2
S3 Remains Source of Truth¶
Recommended:
S3
↓
FSx
↓
Compute
↓
Export
↓
S3
Do not archive in FSx.
IAM vs POSIX (VERY HIGH VALUE)¶
IAM:
Control Plane
POSIX:
Data Plane
Classic exam distinction.
Compression with LZ4 (HIGH VALUE)¶
Supports:
LZ4 Compression
Benefits:
- Reduced storage usage
- Improved effective throughput
- Fewer disk writes
Exam scenario:
Improve throughput without provisioning more storage.
Answer:
Enable LZ4
Ephemeral ML Pattern¶
Pattern:
S3 Dataset
↓
Scratch FSx
↓
Training
↓
Export
↓
Delete
Scaling Model¶
Scale via:
- Capacity
- Throughput
Not independently.
Availability Nuance¶
Persistent deployments provide:
- Better resilience
FSx is not multi-region storage.
Workflow(s)¶
Lazy Load Workflow¶
sequenceDiagram
participant Compute
participant FSx
participant S3
Compute->>FSx: Open file
FSx->>S3: Load requested blocks
S3-->>FSx: Data
FSx-->>Compute: POSIX access
Data Repository Task¶
sequenceDiagram
participant Admin
participant S3
participant FSx
Admin->>FSx: Start DRT
FSx->>S3: Import objects
S3-->>FSx: Data loaded
FSx-->>Admin: Ready
Secure Compute Workflow¶
sequenceDiagram
participant EC2
participant SG
participant FSx
participant KMS
EC2->>SG: Network authorization
SG->>FSx: Mount
FSx->>KMS: Decrypt
FSx-->>EC2: Access granted
Comparisons¶
| Service | Model | Performance | POSIX | Primary Use |
|---|---|---|---|---|
| FSx for Lustre | Distributed Files | Extreme | Yes | HPC / ML |
| EFS | Shared Files | Medium | Yes | Applications |
| S3 | Objects | Massive Scale | No | Data Lake |
| FSx ONTAP | Enterprise NAS | High | Yes | Enterprise |
| EBS | Block | Very High | No | Single Host |
Common Exam Traps¶
-
FSx for Lustre ≠ object storage.
-
S3 remains source of truth.
-
IAM does not authorize file access.
-
POSIX governs file permissions.
-
Scratch is temporary.
-
Persistent_2 delivers highest throughput.
-
Use DRT to preload datasets.
-
HDD supports SSD cache.
-
LZ4 improves effective throughput.
-
FSx runs inside VPC.
-
Encryption uses KMS.
-
Designed for parallel compute.
5-Second Recall¶
- Lustre = parallel filesystem
- S3 ↔ FSx is core architecture
- DRT preloads data
- POSIX ≠ IAM
- Scratch = temporary
- Persistent_2 = fastest durable
- LZ4 boosts throughput
Quick Revision Notes¶
- High-performance shared filesystem
- S3 integration is core
- DRA + DRT are key concepts
- SSD or HDD available
- HDD supports SSD cache
- KMS encryption supported
- POSIX authorization model
- Persistent_2 preferred for production
- LZ4 improves effective performance
- Ideal for HPC and ML