Business Intelligence Analysts juggle a diverse range of datasets. Some reside at the granular unit level, while others are pre-aggregated at the item or order level. Aggregating data upwards (unit -> item -> order) is a breeze. But when it comes to disaggregating data downwards (order -> item -> unit), things can get tricky.
This article dives into BigQuery's powerful tool for tackling this challenge: Recursive Common Table Expressions (CTEs). We'll explore how to leverage them to disaggregate data downwards, specifically from the item level to the unit level.
Imagine you have an order_items
dataset containing details like order_id
, sku
, items_ordered
, and line_total
. Here's a sample of how this data might look for a single order:
order_id | sku | items_ordered | line_total |
C852F724 | 12345-white-xs | 1 | 14.5 |
C852F724 | 45678-black-l | 3 | 48.0 |
C852F724 | 90123-yellow-m | 1 | 22.5 |
C852F724 | 13764-blue-s | 4 | 44.0 |
As you can see, items_ordered
represents the total quantity of each item ordered (aggregated data), and line_total
reflects the sum of the entire order line(also aggregated). Our goal is to disaggregate this data and reach the level of individual units ordered (assuming each item represents a single unit).
Disaggregation Approach:
The disaggregation process involves two key techniques:
Recursive CTE: This CTE iterates until a counter (
id
) reachesitems_ordered
value, effectively creating one row for each unit within an order item.Window Function: We use the
SUM(1) OVER (PARTITION BY order_id, sku)
window function to calculate the unit price by dividing theline_total
by the total number ofitems_ordered
for a specificsku
within the order.
WITH
RECURSIVE
loops AS (
SELECT order_id
,sku
,items_ordered
,line_total
,1 AS id
FROM sample
UNION ALL
SELECT order_id
,sku
,items_ordered
,line_total
,id + 1
FROM loops
WHERE id < items_ordered),
sample AS (
SELECT 'C852F724' AS order_id, '12345-white-xs' AS sku, 1 AS items_ordered, 14.5 AS line_total
UNION ALL SELECT 'C852F724', '45678-black-l', 3, 48
UNION ALL SELECT 'C852F724', '90123-yellow-m', 1, 22.5
UNION ALL SELECT 'C852F724', '13764-blue-s', 4, 44),
final AS (
SELECT * EXCEPT(id, items_ordered, line_total)
,line_total / SUM(1) OVER (PARTITION BY order_id, sku) AS unit_total
FROM loops
ORDER BY 1,2)
SELECT *
FROM final;
The code demonstrates the disaggregation process through three CTEs:
sample
CTE: This CTE defines the sample order items data.loops
CTE (Recursive): This CTE iterates, creating a row for each unit within an order item. It stops iterating whenid
reachesitems_ordered
.final
CTE: This CTE selects desired columns fromloops
and calculates theunit_total
using the window function. It then sorts the data byorder_id
andsku
.
Output and Conclusion:
The final output shows the disaggregated data at the unit level. This allows for a more granular analysis of individual items within an order.
order_id | sku | unit_total |
C852F724 | 12345-white-xs | 14.5 |
C852F724 | 13764-blue-s | 11.0 |
C852F724 | 13764-blue-s | 11.0 |
C852F724 | 13764-blue-s | 11.0 |
C852F724 | 13764-blue-s | 11.0 |
C852F724 | 45678-black-l | 16.0 |
C852F724 | 45678-black-l | 16.0 |
C852F724 | 45678-black-l | 16.0 |
C852F724 | 90123-yellow-m | 22.5 |
Subscribe to our newsletter
Read articles from Decoding Data for Business directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Article Series
BigQuery
BigQuery Recursive CTEs for Data Transformation
Business Intelligence Analysts juggle a diverse range of datasets. Some reside at the granular unit …
Unveiling Volatility with Bollinger Bands
Ever felt like your sales data is a wild sea monster – all over the place and difficult to grasp? Fe…