Insert everything at once: A short story about CTE
Common Table Expressions is a powerful feature that came with SQL:1999. It allows users to specify auxiliary statements to be used in a larger query. I'll share a few thoughts about its data-modifying version since it allows you to do many things within a single query.
ANNOUNCE: I build a service for API fuzzing. Sign up to check your API now!
A typical use case is INSERT into multiple tables with or without interdependencies. For example, you want to store parsed User Agent in a normalized way:
CREATE TABLE browser ( browser_id SERIAL PRIMARY KEY, family TEXT NOT NULL, version TEXT NOT NULL, UNIQUE (family, version) ); CREATE TABLE operating_system ( os_id SERIAL PRIMARY KEY, family TEXT NOT NULL, version TEXT NOT NULL, UNIQUE (family, version) ); CREATE TABLE device ( device_id SERIAL PRIMARY KEY, family TEXT NOT NULL, brand TEXT NOT NULL, model TEXT NOT NULL, UNIQUE (family, brand, model) ); CREATE TABLE user_agent ( id SERIAL PRIMARY KEY, value TEXT NOT NULL UNIQUE, browser_id BIGINT REFERENCES browser (browser_id) DEFERRABLE INITIALLY DEFERRED, os_id BIGINT REFERENCES operating_system (os_id) DEFERRABLE INITIALLY DEFERRED, device_id BIGINT REFERENCES device (device_id) DEFERRABLE INITIALLY DEFERRED, );
The intuitive approach would be to insert data into
device tables and then insert new ids into the
user_agent table. It could be done on the application level, or with some DB-level procedure.
With CTE approach you can just combine multiple
WITH data( browser_family, browser_version, os_family, os_version, device_family, device_brand, device_model, ua_string ) AS ( VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9) ) , browser_insert AS ( INSERT INTO browser (family, version) SELECT browser_family, browser_version FROM data ON CONFLICT DO NOTHING RETURNING browser_id ), os_insert AS ( INSERT INTO operating_system (family, version) SELECT os_family, os_version FROM data ON CONFLICT DO NOTHING RETURNING os_id ), device_insert AS ( INSERT INTO device (family, brand, model) SELECT device_family, device_brand, device_model FROM data ON CONFLICT DO NOTHING RETURNING device_id ) INSERT INTO user_agent (value, browser_id, os_id, device_id) SELECT data.ua_string, B.browser_id, O.os_id, D.device_id FROM data, ( SELECT browser_id FROM browser_insert UNION ALL SELECT B.browser_id FROM data JOIN browser B ON ( data.browser_family = B.family AND data.browser_version = B.version ) LIMIT 1 ) B, ( SELECT os_id FROM os_insert UNION ALL SELECT O.os_id FROM data JOIN operating_system O ON ( data.os_family = O.family AND data.os_version = O.version ) LIMIT 1 ) O, ( SELECT device_id FROM device_insert UNION ALL SELECT D.device_id FROM data JOIN device D ON ( data.device_family = D.family AND data.device_brand = D.brand AND data.device_model = D.model ) LIMIT 1 ) D ON CONFLICT ("value") DO NOTHING RETURNING id;
The query breakdown:
- “data” CTE, which contains all input values in a single place;
- 3 CTEs, that insert data into corresponding tables. Every CTE has
ON CONFLICT DO NOTHINGto handle
UNIQUEconstraint violations. Also, there is RETURNING part, that allows queries results to be used further;
- 3 actual usage of data-modifying CTEs. Each one has
UNION ALLpart, that acts as a fallback for cases when there was a conflict inside
INSERT(which is handled with
DO NOTHINGpart) - it looks for id in a corresponding table.
The query above is logically very similar to alternative implementation (procedure / application-level), but it has the following advantages:
- It is faster than alternatives;
- It is a single statement, there is no need for transaction;
- You can extend “data” CTE with as many rows as you want and it still will be a single SQL statement.
The query is a bit verbose and more complex than logically similar procedure, but it provides a useful approach for solving “multiple inserts” problem.
❤ ❤ ❤