← All posts
Technical

BTEQ to BigQuery: The Conversion Patterns That Actually Break

Automated BTEQ-to-BigQuery converters handle the obvious 80% fine — SELECT statements, basic joins, standard data types. The failures cluster around a small set of patterns that show up in almost every legacy Teradata environment. Here's what to check for before you trust a converted script.

Sentinel dates

Teradata environments frequently use sentinel values — 9999-12-31 for "still active," 1900-01-01 for "unknown" — baked into date logic across years of BTEQ scripts. BigQuery's date functions don't inherently understand these as special cases; they're just dates. If your conversion doesn't explicitly preserve and handle sentinel logic in the converted SQL (or in a pre-processing step), you'll get technically-correct queries that silently return wrong business results — rows that should be "currently active" dropping out of date-range filters, or vice versa.

Implicit casting, especially BYTES to numeric

Teradata is more permissive about implicit type conversion than BigQuery. The one that causes the most damage: BYTES fields that get implicitly cast where BigQuery requires an explicit cast to BIGNUMERIC. Left unhandled, this doesn't always throw an error — sometimes it truncates precision silently, which is far worse than a failed query, because it passes validation that only checks for successful execution rather than correct values.

Multi-statement transaction blocks

BTEQ supports transaction patterns — multiple DML statements wrapped in explicit BT/ET blocks with specific rollback behavior — that don't map one-to-one onto BigQuery's transaction model. Scripts that lean on Teradata's transaction semantics for correctness (rather than just performance) need to be redesigned, not translated line-by-line.

The 4,000-partition DML limit

BigQuery enforces a limit on the number of partitions a single DML statement can affect. Large historical backfills that would run as one script in Teradata will hit this ceiling in BigQuery and need to be chunked — by date range or batch — with checkpoint and resume logic so a failure partway through doesn't force a full restart. This is a scripting and orchestration problem, not just a query problem, and it's easy to miss until you're migrating a table with years of history.

Character encoding in shell-wrapped scripts

BTEQ jobs are often triggered from shell scripts that have quietly accumulated encoding quirks over years — an em dash or curly quote pasted into a comment or literal years ago, sitting fine in the old environment's encoding, breaking the moment it runs through a UTF-8-strict pipeline. These failures are frustrating specifically because they're invisible in the source and only surface at runtime, usually with an unhelpful error.

Macros calling macros

Nested BTEQ macros with parameters passed down two or three levels are common in older environments and rarely documented anywhere except the code itself. Automated converters handle single-level macros reasonably well; nested macro chains usually need a human to trace the actual logic before conversion, not just a syntax translation.

The pattern underneath all of these

Every one of these failure modes shares the same root cause: BTEQ and BigQuery SQL look similar enough that a syntax-level converter produces something that runs. Whether it runs correctly is a separate question — and the gap between those two is exactly where migration budgets and timelines go wrong. Automated conversion is a legitimate first pass. It is not a substitute for someone who has personally chased down a sentinel-date bug in production and knows to look for the next one.

Inheriting a BTEQ-heavy environment?

A Migration Assessment flags exactly which of these patterns exist in your scripts before conversion starts — not after something breaks in production.

See the Migration Assessment