These questions cover data ingestion and pipeline orchestration — the core of DEA-C01 and where the exam tests your ability to match AWS services to specific data engineering scenarios, not just describe what each service does.
Question 1
A data team needs to ingest data from 50 different REST APIs daily. Each API has a different response schema and unpredictable call timing. Which architecture handles this most effectively?
- A) Amazon Kinesis Data Streams for real-time ingestion from all 50 APIs
- B) AWS Lambda with Amazon EventBridge Scheduler for per-API invocations
- C) AWS Glue crawlers to discover and ingest from each API automatically
- D) Amazon S3 with a batch script that calls all APIs in sequence
Answer: B — Lambda with EventBridge Scheduler
Lambda functions are ideal for API integration: each function handles one API's authentication, pagination, and schema transformation. EventBridge Scheduler triggers each function on a per-API schedule (which can differ between APIs). Results land in S3 or a database for downstream processing.
Why the alternatives don't fit:
- Kinesis Data Streams: Designed for high-throughput event streams, not scheduled REST API polling. You'd still need Lambda to call the APIs and push to Kinesis — adding unnecessary complexity.
- Glue Crawlers: Discover schema from storage (S3, databases) — they don't call REST APIs
- Sequential batch script: A single point of failure; one slow API blocks all subsequent ones, and there's no retry or observability
EventBridge Scheduler supports both rate expressions (rate(1 day)) and cron expressions, and each schedule can pass different parameters to the Lambda function, handling per-API configuration cleanly.
Question 2
A data pipeline has five sequential jobs where each job's output is the next job's input. If any job fails, the pipeline must retry that specific job up to 3 times before alerting the team. Which service is the right orchestrator?
- A) AWS Glue Workflows with trigger-based job dependencies
- B) AWS Step Functions with
Retryconfiguration on each state - C) Amazon EventBridge with chained rules triggering each job
- D) AWS Lambda with recursive invocations for each stage
Answer: B — AWS Step Functions with Retry configuration
Step Functions provides native support for exactly this pattern:
"ProcessStage1": {
"Type": "Task",
"Resource": "arn:aws:lambda:...",
"Retry": [{
"ErrorEquals": ["States.ALL"],
"IntervalSeconds": 60,
"MaxAttempts": 3,
"BackoffRate": 2.0
}],
"Catch": [{
"ErrorEquals": ["States.ALL"],
"Next": "AlertTeam"
}],
"Next": "ProcessStage2"
}
Each state retries independently — a failure in Stage 3 retries Stage 3 without re-running Stages 1 and 2. After exhausting retries, the Catch block routes to an alert state.
Glue Workflows manage job dependencies but have limited built-in retry and error routing compared to Step Functions. EventBridge chaining creates coupling between rules and doesn't provide per-step retry state. Recursive Lambda invocations hit concurrency limits and lose state on failure.
</details>Question 3
An application produces 50,000 events per second that must be processed in real time, with ordering guaranteed within each sensor device, and the ability to replay the last 7 days of data. Which service is the right choice?
- A) Amazon SQS Standard queue
- B) Amazon SQS FIFO queue
- C) Amazon Kinesis Data Streams
- D) Amazon SNS with SQS fan-out
Answer: C — Amazon Kinesis Data Streams
Kinesis is designed for high-throughput ordered streaming with replay. Key capabilities that match the requirements:
- Ordering: Events with the same partition key (e.g.,
deviceId) go to the same shard in order - Replay: Data is retained for 1–365 days (default 24 hours, extendable); consumers can reprocess from any point
- Throughput: Scales by adding shards (1 MB/s ingest, 2 MB/s read per shard)
Kinesis vs SQS for this scenario:
| Capability | Kinesis | SQS Standard | SQS FIFO |
|---|---|---|---|
| Throughput | Very high | Very high | 300 msg/s (3,000 with batching) |
| Ordering | Per partition key | No guarantee | FIFO per message group |
| Replay | Yes (by moving cursor) | No (messages deleted after read) | No |
| Consumers | Multiple independent | Multiple competing | Multiple competing |
SQS FIFO is limited to 3,000 messages/second with batching — far below 50,000 events/second. Standard SQS has no ordering guarantee. Neither supports replay.
</details>Key Takeaways
- Lambda + EventBridge Scheduler = per-API scheduling with independent retry and observability; Kinesis is for streams, not REST polling
- Step Functions Retry + Catch = native per-step retry with error routing; no custom orchestration code needed
- Kinesis = high-throughput ordered streaming with replay; SQS = reliable queuing without ordering guarantees or replay