Overview
The Google Professional Cloud Architect certification validates your ability to design, develop, and manage robust, secure, and scalable solutions on Google Cloud Platform. It is one of the most respected cloud architecture certifications and is widely recognised in enterprise engineering organisations.
The exam has 50–60 questions including multiple choice, multiple select, and case study scenarios, a 2-hour time limit, and no publicly stated passing score (Google uses a scaled passing standard).
This is a design exam. Questions present a business or technical scenario and ask which architecture, service, or approach best satisfies the stated requirements. Knowing which GCP service to use and why is more important than knowing how to configure it.
Exam Domains
| Domain | Weight |
|---|---|
| Designing Cloud Solution Architectures | 28% |
| Managing and Provisioning Solution Infrastructure | 18% |
| Ensuring Solution and Operations Reliability | 16% |
| Designing for Security and Compliance | 15% |
| Analyzing and Optimizing Technical and Business Processes | 13% |
| Managing Implementation | 10% |
Domain 1: Designing Cloud Solution Architectures (28%)
Compute Selection
The exam frequently tests which compute option is most appropriate for a given scenario. The key decision factors are: state requirements, operational overhead tolerance, traffic pattern, and team capability.
| Scenario | Service |
|---|---|
| Containerised microservices, need pod-level control | GKE (Standard mode) |
| Containerised workloads, minimal cluster management | GKE Autopilot |
| Stateless HTTP/event-driven, pay-per-request | Cloud Run |
| Event-driven, single-function, lightweight | Cloud Functions |
| Lift-and-shift VMs, full OS control | Compute Engine |
| Managed batch or stream processing | Dataflow |
Storage and Database Selection
Relational (structured, ACID)
├── Cloud SQL → managed MySQL / PostgreSQL / SQL Server, regional
├── Cloud Spanner → globally distributed, strongly consistent, horizontally scalable
└── AlloyDB → PostgreSQL-compatible, high performance OLTP
Non-relational
├── Firestore → document database, real-time sync, mobile/web apps
├── Bigtable → wide-column, high-throughput, IoT/time-series/analytics
└── Memorystore → managed Redis or Memcached (caching layer)
Analytics
├── BigQuery → serverless data warehouse, SQL analytics at scale
└── Dataflow → batch and stream processing (Apache Beam)
Object / File Storage
└── Cloud Storage → blobs, backups, static assets, data lake
Exam tip: If the scenario mentions "globally distributed with strong consistency," the answer is Cloud Spanner. If it mentions "analytical queries over petabytes," the answer is BigQuery. If it mentions "time-series" or "IoT telemetry at very high throughput," the answer is Bigtable.
Networking Design
- VPC architecture: A single VPC with multiple subnets across regions; Shared VPC for multi-project organisations
- Private access: Private Service Connect for accessing managed services without public IPs; VPC Service Controls for API-level perimeter
- Connectivity: Cloud Interconnect (dedicated or partner) for private on-premises connectivity; Cloud VPN for encrypted over-internet
- Load balancing: Global external (HTTP/S, SSL Proxy, TCP Proxy) vs regional external vs internal; Cloud CDN for caching at edge
- DNS: Cloud DNS for managed zones; Cloud DNS private zones for internal name resolution
Migration Patterns
- VM migration: Migrate to Virtual Machines (formerly Velostrata) for large-scale VM migration
- Database migration: Database Migration Service for homogeneous and heterogeneous migrations
- Strategy: Rehost → Replatform → Refactor (each step increases cloud-nativeness and operational benefit but also effort)
Domain 2: Managing and Provisioning Solution Infrastructure (18%)
Resource Organisation
Organisation
└── Folders (business units / environments)
└── Projects (billing boundary, API enablement)
└── Resources (VMs, buckets, databases)
IAM policies are inherited down the hierarchy. Apply policies at the highest level that makes sense to avoid per-resource sprawl.
IAM Design
- Principle of least privilege: Grant the minimum permissions needed for the specific task
- Service accounts: Identity for workloads; avoid using default service accounts; use workload identity federation to eliminate key files
- Predefined vs custom roles: Use predefined roles first; custom roles when predefined roles are too broad
- Avoid primitive roles: Owner/Editor/Viewer grant very broad permissions; prefer predefined roles
Infrastructure as Code
- Terraform: Industry-standard IaC; the exam may reference it for repeatable infrastructure deployments
- Cloud Deployment Manager: Google-native IaC (less common in new designs)
- Config Connector: Manage GCP resources via Kubernetes manifests (useful for GKE-centric environments)
Domain 3: Ensuring Solution and Operations Reliability (16%)
SLO/SLI/SLA Framework
- SLI (Service Level Indicator): What you measure — request success rate, latency, throughput
- SLO (Service Level Objective): Your target — "99.9% of requests complete in under 200ms over 30 days"
- Error budget: 100% - SLO = allowable unreliability; when the budget is spent, slow feature releases and invest in reliability
- SLA (Service Level Agreement): Commercial commitment with financial penalties; always set below your SLO
Monitoring Design
- Cloud Monitoring: Metrics, dashboards, uptime checks, alerting policies
- Cloud Logging: Centralised log ingestion; log-based metrics; export to BigQuery for long-term analysis
- Cloud Trace: Distributed tracing for latency analysis across microservices
- Cloud Profiler: Continuous CPU and memory profiling for production workloads
- Error Reporting: Aggregates and surfaces application errors in real time
Disaster Recovery Patterns
| RTO / RPO target | Pattern |
|---|---|
| Hours / hours | Backup and restore (cold standby) |
| Minutes / minutes | Warm standby (reduced capacity replica ready to scale) |
| Seconds / near-zero | Active-passive with automated failover |
| Near-zero / near-zero | Active-active multi-region |
Match the pattern to the business requirement. Active-active multi-region is the most expensive and complex; don't recommend it unless the scenario specifically demands near-zero RTO and RPO.
Domain 4: Designing for Security and Compliance (15%)
Data Protection
- Cloud KMS: Customer-managed encryption keys; key rotation; audit key usage in Cloud Logging
- Cloud HSM: Hardware-backed key storage for compliance requirements
- CMEK vs CSEK: Customer-managed encryption keys (CMEK) vs customer-supplied encryption keys (CSEK); CMEK is the standard choice, CSEK gives you the key material
- DLP API: Discover, classify, and redact sensitive data (PII, payment data)
Network Security
- VPC Service Controls: Create a perimeter around GCP APIs to prevent data exfiltration; restrict which projects can call which APIs
- Cloud Armor: WAF and DDoS protection for global load balancers; managed rule sets for OWASP Top 10
- Binary Authorization: Policy-based control requiring only trusted container images to be deployed to GKE
- Shielded VMs: Secure boot, vTPM, and integrity monitoring to protect VM instances
Identity and Access
- Workforce Identity Federation: Federate on-premises AD or third-party IdP with Google Cloud; eliminate separate Google accounts
- Workload Identity Federation: Allow workloads outside GCP (GitHub Actions, AWS, on-prem) to authenticate without service account keys
- Access Transparency: Logs of Google personnel access to your data
Domain 5: Analyzing and Optimizing (13%)
Cost Optimisation
- Committed Use Discounts (CUDs): 1 or 3-year commitments for predictable workloads; up to 57% discount vs on-demand
- Preemptible/Spot VMs: Up to 91% discount; suitable for fault-tolerant batch workloads
- Rightsizing recommendations: Cloud Monitoring provides automated recommendations for over-provisioned VMs
- Storage lifecycle management: Transition objects to Nearline/Coldline/Archive as access frequency decreases
- BigQuery slots vs on-demand: Slots for predictable, high-volume analytics; on-demand for variable workloads
Performance Optimisation
- Caching: Memorystore for application caching; Cloud CDN for static assets; BigQuery BI Engine for dashboard acceleration
- Async processing: Pub/Sub + Dataflow or Cloud Functions for decoupling producers from consumers
- Read replicas and connection pooling: Cloud SQL read replicas for read-heavy workloads; PgBouncer for connection pooling with Cloud SQL
Case Study Preparation
The Google Professional Cloud Architect exam includes case studies (Mountkirk Games, Dress4Win, TerramEarth, EHR Healthcare are the published examples). You read a multi-page description of the company's existing environment, technical requirements, business requirements, and executive goals, then answer 4–6 related questions.
Case study strategy:
- Read the requirements sections carefully before reading the questions
- Identify the highest-priority constraint (often regulatory, SLA, or cost)
- That constraint usually eliminates 2–3 answer options immediately
- Look for keywords: "existing licences," "minimal refactoring," "zero trust," "compliance," "least privilege," "real-time"
The case studies are published by Google and are available in the official exam guide. Review them as part of your preparation.
Common Exam Traps
- Cloud SQL vs Cloud Spanner: Cloud SQL is regional; Spanner is globally distributed. If the scenario mentions "global" and "strongly consistent," the answer is Spanner
- Cloud Run vs Cloud Functions: Cloud Run runs containers; Cloud Functions runs individual function code. Both are serverless, but Cloud Run is more flexible
- Pub/Sub vs Cloud Tasks: Pub/Sub is fan-out messaging (one message, many consumers); Cloud Tasks is a managed queue for targeted task execution
- VPC Service Controls vs Private Service Connect: VPC Service Controls creates an API perimeter; PSC provides private connectivity to managed services
- Preemptible vs Spot VMs: Same product, different terms (Spot is the newer name); both can be reclaimed by Google with 30-second notice
Study Plan (6 Weeks)
| Week | Focus |
|---|---|
| 1 | Compute and storage service selection; networking fundamentals |
| 2 | IAM, resource hierarchy, security controls |
| 3 | Reliability: SLO/SLI/error budgets, monitoring, disaster recovery |
| 4 | Data analytics: BigQuery, Dataflow, Pub/Sub, data storage selection |
| 5 | Case study analysis; optimisation and IaC |
| 6 | Practice exams, case study practice, review weak domains |
Use the Professional Cloud Architect practice exams throughout your preparation. The exam rarely has an obviously wrong answer — practice helps you build the pattern recognition needed to choose between two plausible options.