How to Run an AWS Well-Architected Review for Client Accounts (Without Losing a Weekend)

The Cloud Lighthouse · 2026-04-16 · 11 min read

How to run an AWS Well-Architected Review for client accounts (without losing a weekend)


The standard approach to an AWS Well-Architected Review takes two to four days. A senior engineer reads through documentation, opens the AWS Well-Architected Tool, manually answers 57 questions across six pillars, writes up findings, builds a remediation roadmap, and prepares client-ready slides. By the end, you've burned a week of billable capacity on work you could have done in an afternoon.


AWS Well-Architected Reviews surface an average of 8+ High Risk Issues per client account — and one analysis of 22 reviews by Versent found customers achieve up to 40% reductions in their AWS bills after acting on the findings (AWS APN Blog, 2021).


That's real output. The question is whether the delivery process has to be that slow. This covers what a WAR actually evaluates, what you find consistently across accounts, and how IaC-driven scanning compresses the engagement without cutting corners on what comes out the other end.


TL;DR: A WAR covers six pillars across 57 questions and surfaces an average of 8+ High Risk Issues per account. Manual reviews run 2–4 days of senior engineering time. IaC-scanning tools pre-populate up to 60% of answers automatically, generating PDF reports and fix code in hours — same-day delivery, same findings, better margins.




What is an AWS Well-Architected Review — and why clients need one


A Well-Architected Review is a structured evaluation of an AWS workload against Amazon's six-pillar framework: Operational Excellence, Security, Reliability, Performance Efficiency, Cost Optimization, and Sustainability. AWS publishes the framework publicly. Running an official review through the AWS Well-Architected Tool requires an AWS Partner Network listing and produces a formal report with High Risk Issues (HRIs) and Medium Risk Issues (MRIs) as outputs.


$44.5 billion in cloud infrastructure waste is projected for 2025, and 84% of organizations say managing cloud spend is their top cloud challenge (Flexera 2025 State of the Cloud Report, n=759 executives; Harness FinOps in Focus 2025, n=700 engineering leaders). A WAR doesn't just surface cost waste. It surfaces security misconfigurations, reliability gaps, and operational blind spots that clients genuinely can't see from inside their own accounts.


For consultants and MSPs, it's also a repeatable service line with built-in funding. APN Partners who run WARs can apply for up to $5,000 in AWS credits per workload reviewed — credits that cover remediation work and make the engagement close to self-financing.


Why do so few MSPs run WARs regularly? Time. At agency rates, a two-to-four day engagement for a single client account doesn't work economically for smaller workloads. That math changes when automation handles the objective questions and leaves the judgment calls to people.




The six pillars: what you're actually evaluating


Knowing the pillar structure matters because different pillars surface different categories of client risk. Prioritizing findings for a client presentation requires knowing which pillar generated them — and what's at stake if the finding sits unaddressed.


Operational Excellence is about how teams actually run the workload day-to-day: runbook quality, change management, deployment practices, observability. Findings here tend to be process gaps — no structured incident response, CloudTrail disabled in some regions, production changes happening outside any change tracking system.


Security is where the most critical findings land, almost every time. 60% of AWS IAM users have access keys older than one year, and 18% of EC2 instances are overprivileged, according to Datadog's State of Cloud Security 2024 analysis. IBM's Cost of a Data Breach Report 2024 pegged the average breach cost at a record $4.88 million — up 10% year-over-year — with cloud misconfiguration contributing to 15% of incidents studied. The Security pillar checks IAM permissions, encryption, network exposure, and detection coverage.


Reliability is often where the technically heaviest findings surface: no multi-region failover plan, undefined RTO and RPO, Lambda functions with no dead-letter queues, single points of failure that look fine in the console until something breaks. Most clients have never formally defined what "acceptable downtime" means for their workload. The WAR makes that conversation happen.


Performance Efficiency asks whether resources match the actual workload — instance selection, caching, storage configuration. It overlaps heavily with cost findings. An oversized instance is simultaneously a performance misconfiguration and a waste pattern, so the same finding shows up in two pillars.


Cost Optimization is the pillar clients immediately understand. Right-sizing, savings plan coverage, idle resource detection, storage class selection. 27% of cloud spend is wasted (Flexera 2025 State of the Cloud Report). This pillar is where you trace that number to specific resources in a specific account.


Sustainability, added in 2021, evaluates resource efficiency through an environmental lens. It's often the fastest pillar to complete, but it catches the same oversizing patterns the Cost pillar flags from a financial angle — same finding, different framing.




What consultants find most often


A few finding categories show up in nearly every account. Knowing what to expect before running the review helps you ask the right questions and cuts time in the questionnaire sessions.


Security dominates. IAM users with long-lived access keys, overprivileged EC2 roles, S3 buckets with permissive policies, no CloudTrail log validation, MFA missing on the root account — these aren't sophisticated misconfigurations. They're defaults nobody went back to change. The Datadog finding that 60% of AWS accounts have access keys older than one year is consistent with what partner reviews report in practice.


Cost follows a predictable pattern: On-Demand pricing on workloads that have run continuously for over a year, dev environments with Multi-AZ RDS enabled, idle EC2 instances left running after a project finished, io1 storage volumes nobody migrated to gp3. Organizations take an average of 31 days to identify and eliminate cloud waste after it starts, according to the Harness FinOps in Focus 2025 report. A WAR compresses that discovery cycle down to an afternoon.


<!-- [PERSONAL EXPERIENCE] -->

Reliability gaps cluster in serverless workloads consistently: missing retry logic, no DLQs on async Lambda invocations, production databases in a single AZ, no documented load testing history. Teams that built fast and moved on rarely go back to formalize failure handling.


AWS partner Versent found an average of 8+ High Risk Issues per customer across 22 Well-Architected Reviews, with one client achieving a 40% AWS bill reduction and another saving $1 million annually after acting on the recommendations (AWS APN Blog, 2021). Those are post-remediation numbers, not projections.


For more on the cost patterns that surface consistently in WAR reviews, see where AWS costs actually hide across services (opens in new tab).




Why a traditional WAR costs you two days — and what that does to your margins


A manual WAR runs like this: discovery call, account access setup, documentation review, questionnaire session with the client's engineering team, findings writeup, remediation prioritization, final presentation. Add scheduling back-and-forth and prep time, and you're looking at twelve to twenty hours of senior engineer time.


At $200–$300/hour agency rates, that's $2,400 to $6,000 in labor for a single workload. For a client with three workloads, multiply it. For an MSP managing twenty accounts, offering regular WARs at that cost structure isn't viable. It becomes a one-time onboarding deliverable — maybe annual if someone asks — not a steady-state part of the service.


The questionnaire is a big part of the problem. The Well-Architected Tool has 57 questions. A lot of them can be answered by scanning the account's infrastructure. They're asking things like "do you use AWS Config?", "are your S3 buckets encrypted?", "do you have CloudTrail enabled?" These aren't judgment calls requiring a consultant's expertise. They're facts about the account that a script retrieves faster than a human can navigate the console.


Judgment is required for something else entirely: remediation prioritization, risk context for the client's business, sequencing the roadmap. That's where senior time belongs — not on checkbox questions the Terraform state file already answers.




How automation changes the math


The time shift comes from parsing infrastructure-as-code instead of clicking through a questionnaire. When a client shares Terraform or CloudFormation, or grants a read-only cross-account role, a scanning tool evaluates most of the objective pillar questions automatically.


AWS research found that GenAI-assisted reviews can pre-populate approximately 60% of Well-Architected questionnaire answers automatically, reducing review time from hours to minutes for the questionnaire phase (AWS Community, 2024). Tools that parse IaC directly go further: evaluating specific resource configurations against best-practice rules without touching the questionnaire at all.


<!-- [UNIQUE INSIGHT] -->

The engagement model shifts in a way that actually improves the output. Instead of spending two days gathering answers, a consultant spends two hours reviewing generated findings, adding client context, and shaping the remediation roadmap. The questionnaire fills itself from what the infrastructure actually says. The human work focuses on interpretation — "what does this mean for this client's risk posture, their team, their current quarter?" — which is the part worth paying for.


Rego Consulting documented a 90% reduction in time and effort for WAR completion after switching to automated tooling — from several days with multiple team members to a few hours with two engineers (Rego Consulting case study). The output quality doesn't drop. The effort just goes toward analysis instead of information gathering.


For MSPs running dozens of accounts, this is the difference between a WAR being a quarterly touchpoint versus an annual burden. Four hours instead of four days means you can offer it regularly without blowing your service margins. You can include it in a standard managed service tier. You can run it proactively when a client makes a major infrastructure change, not only when someone formally asks for a review.




What a same-day WAR deliverable actually looks like


A complete WAR deliverable has findings, a prioritized roadmap, and fix code. Miss any of the three and the engagement doesn't land well.


Findings go by pillar, with each High Risk Issue including a plain-language description of the problem, why it matters, and the specific resource or configuration responsible. The AWS Well-Architected Tool exports this in structured format; IaC-scanning tools generate it directly from Terraform and CloudFormation analysis, with finding metadata included.


The remediation roadmap is where consultants earn their rate. Clients don't need an unranked list of 30 findings. They need to know which five to fix this sprint, which ten to schedule this quarter, and which they can accept as documented risks given their business constraints. That sorting requires understanding workload criticality, team bandwidth, and what the client actually cares about. No tool produces it correctly without a human making calls.


Fix code is what keeps the engagement from sitting in a shared drive. About 80% of clients proceed with recommended remediations after a WAR (nClouds, AWS APN Blog, 2019) — but execution stalls when the fix is described in prose and the client's engineers have to figure out the implementation from scratch. Generating the Terraform or CloudFormation patch alongside each finding removes that stall. The client's team gets a PR, not a recommendation to interpret.


Put the PDF summary, structured SARIF findings, and remediation IaC together and you have a deliverable that feels like a real product, not a consulting document. It also makes follow-through trackable: at the next review, you run the same scan and see exactly which findings closed.




Running WARs at MSP scale


At one client, automation is convenient. At twenty, it's what makes the service viable at all.


The cloud managed services market is projected to grow from $134.44 billion in 2024 to $305.16 billion by 2030, a 14.7% CAGR (Grand View Research, February 2025). MSPs that can deliver structured, repeatable health reviews with documented remediation will stand out as that market expands. WARs are one of the few AWS engagement types where the output is standardized enough to scale without customizing the workflow per client.


You only need one setup: a standard read-only cross-account IAM role deployed to each client account at onboarding, centralized findings storage, and a reporting layer that tracks HRI trends across the portfolio. One workflow for all accounts, not a bespoke process rebuilt each time.


In practice it looks like this: deploy the role during onboarding, then run the scan when a review is due — quarterly, annually, or triggered by a major deployment. Review generated findings, apply your prioritization layer, deliver the report. The client gets professional output. You spent a morning, not a week.


For the RDS-specific findings that surface in nearly every Cost Optimization pillar, see the RDS cost optimization guide (opens in new tab).




Where to start


If you haven't run a WAR on a client account recently, start with the accounts where you already have cost monitoring. Cost Optimization findings are immediately legible against real billing data — makes the first client conversation concrete rather than abstract.


For accounts where cost scanning is already running, Security and Reliability are the natural next step. They cover risk categories that cost tools skip — IAM misconfigurations, encryption gaps, recovery posture — and they're where the highest-stakes findings consistently land.


Running WARs manually across a portfolio doesn't scale. ArchReview automates the pillar analysis by scanning Terraform and CloudFormation directly — 36 rules across all six pillars — and generates the PDF report and remediation IaC in a single pass. One cross-account role. Same-day delivery. Fix code included. Connect an account and see what it finds. (opens in new tab)

Ready to cut your AWS costs?

The Cloud Lighthouse scans your AWS accounts daily and shows you exactly where the money goes — with the IaC diffs to fix it.

No credit card required