Insert everything at once: A short story about CTE
Common Table Expressions is a powerful feature that came with SQL:1999. It allows users to specify auxiliary statements to be used in a larger query. I'll share a few thoughts about its data-modifying version since it allows you to do many things within a single query.
ANNOUNCE: I build a service for API fuzzing. Sign up to check your API now!
A typical use case is INSERT into multiple tables with or without interdependencies. For example, you want to store parsed User Agent in a normalized way:
CREATE TABLE browser (
browser_id SERIAL PRIMARY KEY,
family TEXT NOT NULL,
version TEXT NOT NULL,
UNIQUE (family, version)
);
CREATE TABLE operating_system (
os_id SERIAL PRIMARY KEY,
family TEXT NOT NULL,
version TEXT NOT NULL,
UNIQUE (family, version)
);
CREATE TABLE device (
device_id SERIAL PRIMARY KEY,
family TEXT NOT NULL,
brand TEXT NOT NULL,
model TEXT NOT NULL,
UNIQUE (family, brand, model)
);
CREATE TABLE user_agent (
id SERIAL PRIMARY KEY,
value TEXT NOT NULL UNIQUE,
browser_id BIGINT REFERENCES browser (browser_id) DEFERRABLE INITIALLY DEFERRED,
os_id BIGINT REFERENCES operating_system (os_id) DEFERRABLE INITIALLY DEFERRED,
device_id BIGINT REFERENCES device (device_id) DEFERRABLE INITIALLY DEFERRED,
);
The intuitive approach would be to insert data into browser
, operating_system
, and device
tables and then insert new ids into the user_agent
table. It could be done on the application level, or with some DB-level procedure.
With CTE approach you can just combine multiple INSERT
’s:
WITH data(
browser_family,
browser_version,
os_family,
os_version,
device_family,
device_brand,
device_model,
ua_string
) AS (
VALUES
($1, $2, $3, $4, $5, $6, $7, $8, $9)
)
, browser_insert AS (
INSERT INTO browser (family, version)
SELECT
browser_family,
browser_version
FROM data
ON CONFLICT DO NOTHING
RETURNING browser_id
), os_insert AS (
INSERT INTO operating_system (family, version)
SELECT
os_family,
os_version
FROM data
ON CONFLICT DO NOTHING
RETURNING os_id
), device_insert AS (
INSERT INTO device (family, brand, model)
SELECT
device_family,
device_brand,
device_model
FROM data
ON CONFLICT DO NOTHING
RETURNING device_id
)
INSERT INTO user_agent (value, browser_id, os_id, device_id)
SELECT
data.ua_string,
B.browser_id,
O.os_id,
D.device_id
FROM data,
(
SELECT browser_id FROM browser_insert
UNION ALL
SELECT B.browser_id FROM data
JOIN browser B ON (
data.browser_family = B.family AND
data.browser_version = B.version
)
LIMIT 1
) B,
(
SELECT os_id FROM os_insert
UNION ALL
SELECT O.os_id FROM data
JOIN operating_system O ON (
data.os_family = O.family AND
data.os_version = O.version
)
LIMIT 1
) O,
(
SELECT device_id FROM device_insert
UNION ALL
SELECT D.device_id FROM data
JOIN device D ON (
data.device_family = D.family AND
data.device_brand = D.brand AND
data.device_model = D.model
)
LIMIT 1
) D
ON CONFLICT ("value") DO NOTHING
RETURNING id;
The query breakdown:
- “data” CTE, which contains all input values in a single place;
- 3 CTEs, that insert data into corresponding tables. Every CTE has
ON CONFLICT DO NOTHING
to handleUNIQUE
constraint violations. Also, there is RETURNING part, that allows queries results to be used further; - 3 actual usage of data-modifying CTEs. Each one has
UNION ALL
part, that acts as a fallback for cases when there was a conflict insideINSERT
(which is handled withDO NOTHING
part) - it looks for id in a corresponding table.
The query above is logically very similar to alternative implementation (procedure / application-level), but it has the following advantages:
- It is faster than alternatives;
- It is a single statement, there is no need for transaction;
- You can extend “data” CTE with as many rows as you want and it still will be a single SQL statement.
The query is a bit verbose and more complex than logically similar procedure, but it provides a useful approach for solving “multiple inserts” problem.
Cheers!
❤ ❤ ❤