Amazon Security Lake¶
What Is This Service?¶
Centralized AWS security data lake built on Open Cybersecurity Schema Framework (OCSF).
Mental model:
Security Lake = Security logs → Normalize → Store → Query → Investigate
Primary purpose:
Aggregate security and operational telemetry across AWS, multi-account environments, and third-party tools into a unified security analytics platform.
Why It Matters for Security¶
Modern security failures are rarely caused by lack of logs.
They happen because logs are:
- fragmented
- inconsistent
- account-isolated
- difficult to correlate
- expensive to operationalize
Security Lake exists to:
- centralize security telemetry
- normalize data into OCSF
- accelerate investigations
- support SIEM platforms
- simplify compliance
- reduce security data engineering
Security outcomes:
- centralized detection
- faster incident response
- unified investigations
- long-term evidence retention
MOST TESTED:
Security Lake is not a SIEM.
It is the security data foundation layer.
Architecture Example¶
flowchart LR
subgraph Source Accounts
CT[CloudTrail]
VPC[VPC Flow Logs]
Route53[Resolver Logs]
GuardDuty[GuardDuty]
WAF[WAF Logs]
SecurityHub[Security Hub]
Inspector[Inspector]
end
subgraph Security Lake
Collector[Ingestion]
OCSF[OCSF Normalization]
S3[S3 Data Lake]
Subscriber[Subscribers]
end
subgraph Analytics
Athena[Athena]
OpenSearch[OpenSearch]
SIEM[Third Party SIEM]
QuickSight[QuickSight]
end
CT --> Collector
VPC --> Collector
Route53 --> Collector
GuardDuty --> Collector
WAF --> Collector
SecurityHub --> Collector
Inspector --> Collector
Collector --> OCSF
OCSF --> S3
S3 --> Athena
S3 --> OpenSearch
S3 --> SIEM
S3 --> QuickSight
Subscriber --> S3
Architecture goals:
- centralize
- normalize
- decouple producers and consumers
Workflow(s)¶
Security Data Ingestion¶
sequenceDiagram
participant Source
participant SecurityLake
participant OCSF
participant S3
participant Consumer
Source->>SecurityLake: Emit security data
SecurityLake->>OCSF: Normalize records
OCSF->>S3: Store parquet objects
Consumer->>S3: Query security data
Incident Investigation¶
sequenceDiagram
participant Analyst
participant Athena
participant SecurityLake
participant SecurityHub
Analyst->>Athena: Query indicators
Athena->>SecurityLake: Retrieve records
SecurityLake->>Athena: Return normalized logs
Athena->>Analyst: Correlated findings
Analyst->>SecurityHub: Validate findings
Core Concepts¶
Security Data Lake¶
Security Lake creates:
- centralized storage
- normalized telemetry
- analytics-ready datasets
Storage engine:
- Amazon S3
Storage format:
- Apache Parquet
Benefits:
- compression
- efficient analytics
- lower cost
Open Cybersecurity Schema Framework (OCSF)¶
MOST TESTED
OCSF standardizes security event formats.
Problem solved:
Every security tool produces different schemas.
Without OCSF:
Tool A → source_ip
Tool B → src_ip
Tool C → client_address
With OCSF:
src_endpoint.ip
Benefits:
- unified analytics
- easier SIEM integration
- lower ETL complexity
Exam trap:
Security Lake normalizes.
It does not create detections.
Native AWS Sources¶
Supported examples:
- CloudTrail
- VPC Flow Logs
- Route 53 Resolver Logs
- AWS WAF
- GuardDuty
- Security Hub
- Inspector
Security Lake continuously collects.
No manual exports required.
Custom Sources¶
Supports:
- custom applications
- third-party security tools
- partner integrations
Requirements:
- OCSF-compatible ingestion
Examples:
- Splunk
- CrowdStrike
- Palo Alto
- SIEM platforms
Subscribers¶
Consumers of Security Lake data.
Examples:
- Athena
- OpenSearch
- Security analytics
- custom pipelines
Subscribers do not own ingestion.
Security Lake remains source of truth.
Important Integrations¶
| Service | Purpose |
|---|---|
| S3 | Data storage |
| Athena | Investigation |
| Security Hub | Findings correlation |
| GuardDuty | Threat findings |
| CloudTrail | API telemetry |
| VPC Flow Logs | Network visibility |
| Route 53 Resolver Logs | DNS analysis |
| AWS WAF | Application telemetry |
| OpenSearch | Search |
| EventBridge | Automation |
| Organizations | Central governance |
| IAM | Access control |
| KMS | Encryption |
Security Features¶
Centralized Log Collection¶
Organization-wide collection.
Benefits:
- fewer blind spots
- simpler investigations
Encryption¶
Security Lake encrypts data.
Supported:
- AWS owned keys
- AWS KMS
MOST TESTED:
Customer-managed KMS keys provide:
- auditability
- key lifecycle control
- cross-account governance
Fine-Grained Access¶
Access controlled through:
- IAM
- S3 permissions
- Lake permissions
Supports:
- least privilege
- account separation
Cross-Account Aggregation¶
Organization model:
flowchart TB
Org[Organizations]
AccountA --> SecurityLake
AccountB --> SecurityLake
AccountC --> SecurityLake
SecurityLake --> SecurityAccount
Central security account receives telemetry.
Advanced Security and Operational Concepts¶
Control Plane vs Data Plane¶
Control Plane:
- configure lake
- onboard sources
- manage subscribers
Data Plane:
- ingest logs
- normalize
- store records
Exam trap:
Query operations occur outside Security Lake.
Security Lake manages storage.
Regional Architecture¶
HIGH VALUE
Security Lake is regional.
Implications:
- data residency matters
- collection configured per Region
- retention managed regionally
Cross-region collection supported through architecture.
Security Lake itself is not globally replicated.
Multi-Account Governance¶
MOST TESTED
flowchart LR
Organizations --> DelegatedAdmin
DelegatedAdmin --> SecurityLake
SecurityLake --> Member1
SecurityLake --> Member2
SecurityLake --> Member3
Benefits:
- centralized operations
- delegated administration
- lower operational overhead
Security Lake vs SIEM¶
MASSIVE EXAM TRAP
Security Lake:
- collects
- normalizes
- stores
SIEM:
- correlates
- alerts
- detects
Security Lake complements SIEM.
Not replacement.
Security Lake vs CloudTrail Lake¶
HIGH VALUE
| Capability | Security Lake | CloudTrail Lake |
|---|---|---|
| Purpose | Security analytics | API activity |
| Sources | Multiple | CloudTrail |
| Schema | OCSF | CloudTrail |
| Storage | S3 | Managed |
| Query | Consumers | Built-in |
Rule:
Need broad security telemetry → Security Lake
Need API audit analysis → CloudTrail Lake
Security Lake vs OpenSearch¶
| Capability | Security Lake | OpenSearch |
|---|---|---|
| Storage | Long-term | Search index |
| Query | External | Native |
| Analytics | Consumer-driven | Built-in |
| Detection | No | Possible |
Security Lake often feeds OpenSearch.
Storage and Cost Model¶
Costs driven by:
- ingestion
- processing
- retention
- S3 storage
- analytics
Optimization:
- Parquet
- lifecycle policies
- archive tiers
Exam trap:
Security Lake does not eliminate S3 charges.
Retention Strategy¶
Security telemetry frequently follows:
Hot → S3 Standard
Warm → Intelligent Tiering
Cold → Glacier
Archive → Deep Archive
Compliance often determines retention.
Detection Pipeline Pattern¶
flowchart LR
Sources --> SecurityLake
SecurityLake --> Athena
Athena --> SecurityHub
SecurityHub --> EventBridge
EventBridge --> Lambda
Flow:
Collect → Normalize → Investigate → Respond
OCSF vs ASFF¶
HIGH VALUE
| Standard | Purpose |
|---|---|
| OCSF | Security event schema |
| ASFF | Security findings |
Security Lake → OCSF
Security Hub → ASFF
MASSIVE EXAM TRAP:
Do not confuse them.
Comparisons¶
| Service | Primary Role |
|---|---|
| Security Lake | Security telemetry lake |
| Security Hub | Findings aggregation |
| GuardDuty | Threat detection |
| CloudTrail Lake | API analytics |
| Athena | Query engine |
| OpenSearch | Search and analytics |
| SIEM | Detection platform |
Common Exam Traps¶
-
Security Lake is not a SIEM.
-
Security Lake stores—not detects.
-
OCSF ≠ ASFF.
-
Security Lake is regional.
-
Data resides in S3.
-
Athena commonly queries Security Lake.
-
Security Hub findings are not raw logs.
-
Security Lake supports third-party ingestion.
-
CloudTrail Lake is API-specific.
-
Security Lake normalizes schema automatically.
-
Security Lake does not replace GuardDuty.
-
Security Lake ingestion and storage both affect cost.
-
Security Lake improves investigation—not prevention.
-
Multi-account collection is common architecture.
-
Subscriber access should follow least privilege.
5-Second Recall¶
- Security Lake = centralized security data lake
- OCSF = normalized schema
- S3 = storage
- Athena = investigation
- Security Hub = findings
- GuardDuty = detection
- Not a SIEM
- Regional service
Quick Revision Notes¶
- Collect security telemetry centrally
- Normalize using OCSF
- Store in S3 using Parquet
- Query with Athena/OpenSearch
- Multi-account through Organizations
- KMS for encryption governance
- Subscribers consume data
- Security Hub correlates findings
- CloudTrail Lake ≠ Security Lake
- Security Lake enables investigations, not detections