Business Intelligence Analysts juggle a diverse range of datasets. Some reside at the granular unit level, while others are pre-aggregated at the item or order level. Aggregating data upwards (unit -> item -> order) is a breeze. But when it comes to disaggregating data downwards (order -> item -> unit), things can get tricky.
This article dives into BigQuery's powerful tool for tackling this challenge: Recursive Common Table Expressions (CTEs). We'll explore how to leverage them to disaggregate data downwards, specifically from the item level to the unit level.
Imagine you have an order_items
dataset containing details like order_id
, sku
, items_ordered
, and line_total
. Here's a sample of how this data might look for a single order:
order_id | sku | items_ordered | line_total |
C852F724 | 12345-white-xs | 1 | 14.5 |
C852F724 | 45678-black-l | 3 | 48.0 |
C852F724 | 90123-yellow-m | 1 | 22.5 |
C852F724 | 13764-blue-s | 4 | 44.0 |
As you can see, items_ordered
represents the total quantity of each item ordered (aggregated data), and line_total
reflects the sum of the entire order line(also aggregated). Our goal is to disaggregate this data and reach the level of individual units ordered (assuming each item represents a single unit).
Disaggregation Approach:
The disaggregation process involves two key techniques:
Recursive CTE: This CTE iterates until a counter (
id
) reachesitems_ordered
value, effectively creating one row for each unit within an order item.Window Function: We use the
SUM(1) OVER (PARTITION BY order_id, sku)
window function to calculate the unit price by dividing theline_total
by the total number ofitems_ordered
for a specificsku
within the order.
WITH
RECURSIVE
loops AS (
SELECT order_id
,sku
,items_ordered
,line_total
,1 AS id
FROM sample
UNION ALL
SELECT order_id
,sku
,items_ordered
,line_total
,id + 1
FROM loops
WHERE id < items_ordered),
sample AS (
SELECT 'C852F724' AS order_id, '12345-white-xs' AS sku, 1 AS items_ordered, 14.5 AS line_total
UNION ALL SELECT 'C852F724', '45678-black-l', 3, 48
UNION ALL SELECT 'C852F724', '90123-yellow-m', 1, 22.5
UNION ALL SELECT 'C852F724', '13764-blue-s', 4, 44),
final AS (
SELECT * EXCEPT(id, items_ordered, line_total)
,line_total / SUM(1) OVER (PARTITION BY order_id, sku) AS unit_total
FROM loops
ORDER BY 1,2)
SELECT *
FROM final;
The code demonstrates the disaggregation process through three CTEs:
sample
CTE: This CTE defines the sample order items data.loops
CTE (Recursive): This CTE iterates, creating a row for each unit within an order item. It stops iterating whenid
reachesitems_ordered
.final
CTE: This CTE selects desired columns fromloops
and calculates theunit_total
using the window function. It then sorts the data byorder_id
andsku
.
Output and Conclusion:
The final output shows the disaggregated data at the unit level. This allows for a more granular analysis of individual items within an order.
order_id | sku | unit_total |
C852F724 | 12345-white-xs | 14.5 |
C852F724 | 13764-blue-s | 11.0 |
C852F724 | 13764-blue-s | 11.0 |
C852F724 | 13764-blue-s | 11.0 |
C852F724 | 13764-blue-s | 11.0 |
C852F724 | 45678-black-l | 16.0 |
C852F724 | 45678-black-l | 16.0 |
C852F724 | 45678-black-l | 16.0 |
C852F724 | 90123-yellow-m | 22.5 |
Subscribe to our newsletter
Read articles from Decoding Data for Business directly inside your inbox. Subscribe to the newsletter, and don't miss out.