Common Table Expressions is a powerful feature that came with SQL:1999. It allows users to specify auxiliary statements to be used in a larger query. I'll share a few thoughts about its data-modifying version since it allows you to do many things within a single query.

ANNOUNCE: I build a service for API fuzzing. Sign up to check your API now!

A typical use case is INSERT into multiple tables with or without interdependencies. For example, you want to store parsed User Agent in a normalized way:

CREATE TABLE browser (
  browser_id SERIAL PRIMARY KEY,
  family     TEXT NOT NULL,
  version    TEXT NOT NULL,
  UNIQUE (family, version)
);

CREATE TABLE operating_system (
  os_id   SERIAL PRIMARY KEY,
  family  TEXT NOT NULL,
  version TEXT NOT NULL,
  UNIQUE (family, version)
);

CREATE TABLE device (
  device_id SERIAL PRIMARY KEY,
  family    TEXT NOT NULL,
  brand     TEXT NOT NULL,
  model     TEXT NOT NULL,
  UNIQUE (family, brand, model)
);

CREATE TABLE user_agent (
  id               SERIAL PRIMARY KEY,
  value            TEXT NOT NULL UNIQUE,
  browser_id       BIGINT REFERENCES browser (browser_id) DEFERRABLE INITIALLY DEFERRED,
  os_id            BIGINT REFERENCES operating_system (os_id) DEFERRABLE INITIALLY DEFERRED,
  device_id        BIGINT REFERENCES device (device_id) DEFERRABLE INITIALLY DEFERRED,
);

The intuitive approach would be to insert data into browser, operating_system, and device tables and then insert new ids into the user_agent table. It could be done on the application level, or with some DB-level procedure.

With CTE approach you can just combine multiple INSERT’s:

WITH data(
  browser_family,
  browser_version,
  os_family,
  os_version,
  device_family,
  device_brand,
  device_model,
  ua_string
) AS (
  VALUES
    ($1, $2, $3, $4, $5, $6, $7, $8, $9)
)
  , browser_insert AS (
  INSERT INTO browser (family, version)
    SELECT
      browser_family,
      browser_version
    FROM data
  ON CONFLICT DO NOTHING
  RETURNING browser_id
), os_insert AS (
  INSERT INTO operating_system (family, version)
    SELECT
      os_family,
      os_version
    FROM data
  ON CONFLICT DO NOTHING
  RETURNING os_id
), device_insert AS (
  INSERT INTO device (family, brand, model)
    SELECT
      device_family,
      device_brand,
      device_model
    FROM data
  ON CONFLICT DO NOTHING
  RETURNING device_id
)
INSERT INTO user_agent (value, browser_id, os_id, device_id)
  SELECT
    data.ua_string,
    B.browser_id,
    O.os_id,
    D.device_id
  FROM data,
    (
      SELECT browser_id FROM browser_insert
      UNION ALL
      SELECT B.browser_id FROM data
      JOIN browser B ON (
        data.browser_family = B.family AND
        data.browser_version = B.version
      )
      LIMIT 1
    ) B,
    (
      SELECT os_id FROM os_insert
      UNION ALL
      SELECT O.os_id FROM data
      JOIN operating_system O ON (
        data.os_family = O.family AND
        data.os_version = O.version
      )
      LIMIT 1
    ) O,
    (
      SELECT device_id FROM device_insert
      UNION ALL
      SELECT D.device_id FROM data
      JOIN device D ON (
        data.device_family = D.family AND
        data.device_brand = D.brand AND
        data.device_model = D.model
      )
      LIMIT 1
    ) D
ON CONFLICT ("value") DO NOTHING
RETURNING id;

The query breakdown:

  • “data” CTE, which contains all input values in a single place;
  • 3 CTEs, that insert data into corresponding tables. Every CTE has ON CONFLICT DO NOTHING to handle UNIQUE constraint violations. Also, there is RETURNING part, that allows queries results to be used further;
  • 3 actual usage of data-modifying CTEs. Each one has UNION ALL part, that acts as a fallback for cases when there was a conflict inside INSERT (which is handled with DO NOTHING part) - it looks for id in a corresponding table.

The query above is logically very similar to alternative implementation (procedure / application-level), but it has the following advantages:

  • It is faster than alternatives;
  • It is a single statement, there is no need for transaction;
  • You can extend “data” CTE with as many rows as you want and it still will be a single SQL statement.

The query is a bit verbose and more complex than logically similar procedure, but it provides a useful approach for solving “multiple inserts” problem.

Cheers!


❤ ❤ ❤