Most migration overruns aren't caused by anything exotic. They're caused by questions nobody asked in week one, that surface as blockers in week nine. Below is a compressed version of the checklist we actually run during a paid assessment — organized into the four areas that cause the most damage when skipped.
Data profile (get the real numbers, not the assumed ones)
- Actual data volume per schema, not the number from a three-year-old architecture diagram.
- Row count and growth rate on your ten largest tables — migrations get scoped on averages and blown up by outliers.
- NUPI vs. non-unique PI distribution, since skew here directly predicts BigQuery clustering and partitioning strategy.
- Column-level charset usage — Latin1 vs. Unicode, and where BYTES-declared columns are silently holding text.
- Which tables are genuinely active vs. dormant. We've found live migrations budgeting for tables nobody had queried in four years.
Logic & dependencies (the part estimates usually miss)
- Full inventory of BTEQ scripts, with a rough complexity pass on each — simple SELECT/INSERT vs. multi-statement transactions.
- Every VOLATILE TABLE and its lifecycle — these have no direct BigQuery equivalent and each one is a design decision, not a find-and-replace.
- Macros that contain business logic (masking, derivations, conditional routing) rather than pure data access.
- Cross-schema dependencies and job scheduling chains — what breaks if Table A moves before Table B.
- Undocumented downstream consumers: BI tools, scheduled exports, and other systems reading from these tables that nobody remembers configuring.
Access & governance (the part that isn't in the DDL)
- Every role and profile currently granted, mapped to the person or team that actually needs it today — not who needed it in 2019.
- Row-level and column-level masking currently enforced by views or macros, and how that becomes BigQuery IAM and security policies.
- Data classification status — do you know which tables hold PII, financial data, or regulated categories right now, without opening each one manually.
- Retention requirements per table, especially anything under regulatory or contractual hold.
- Who signs off on the access control matrix before cutover, and whether that person is actually available during your planned timeline.
Cutover & validation (the part that determines if go-live sticks)
- A defined validation method — row counts are a start, not a finish. Aggregate checksums and sampled column-level comparisons catch what row counts miss.
- A rollback plan that's actually been discussed, not assumed to be "just switch back."
- Realistic parallel-run window — enough time for both systems to be compared under real production load, not a token 48 hours.
- BigQuery cost guardrails configured before go-live, not after the first surprising bill.
- A named owner for the first 30 days post-cutover, when most of the "we forgot about this" issues actually surface.
None of these are hard individually. What makes them expensive is finding them mid-migration instead of in week one — at that point, every one of them is a change order instead of a line item.
Get the full checklist as a PDF
Same 20 points, formatted to print and hand to your team — no email required.
Download the Checklist