recursive SQL queries with postgresql

src: Heinz | Personal Website & Blog](https://martinheinz.dev/blog/18) - 2022-03-12

Abstract

Recursion is very powerful tool, that can solve some problems in very elegant way. Using it with SQL is not so common though, so hopefully this article showed you some ways for how to leverage it, in case you run into data that can only be processed recursively.

Recursion however, can make your code less readable, so use it sparingly, especially with SQL, as it is often not that easy parse and understand even without recursion.

When working with databases, most of the time, all you need is SELECT, UPDATE (CRUD operations), few JOINs and WHERE clauses and that’s about it. But, sometimes you will run into datasets and tasks, that will require you to use much more advanced features of SQL. One of them is CTE or common table expression and more specifically, Recursive CTE, which we can use to make recursive SQL queries. Let’s look at how it works and what we can do with it!

Why Recursive Queries?

First of all, why is this actually even a thing? What are recursive queries good for? - In general recursive queries come in handy when working with self-referential data or graph/tree-like data structures. Just a few examples of these use cases are:

  • Self-referential data:
  • Manager Subordinate (employee) relationship
  • Category Subcategory Product relationship
  • Graphs - Flight (Plane Travel) map
  • Trees:
  • Any taxonomy system - books, animals, genetics…
  • Links between articles - for example on Wikipedia
  • Comment section - for example threads on Reddit

For any of these examples, it would require you to build temporary tables or use cursors to actually get useful data from such datasets. Now that I hopefully persuaded you that recursive queries are pretty useful, let’s see what they are and how we can use them.

What Will We Need?

Recursive queries leverage something called CTE - Common Table expression, which is SQL WITH clause, more commonly used for simplifying very complex queries (subqueries). Let’s look at example of that:

WITH avg_salary AS (
    SELECT avg(salary)
    FROM employees
)
SELECT avg_salary;

This is very simple example, but you can surely imagine how this can be used to make very complicated queries containing multiple subqueries much more readable. Apart from the syntax above, we will also need some data that we can use for our recursive queries. For that we will use Manager - Subordinate hierarchy using one self-referencing table, defined like so:

employees

idserialNULLPK
namevarchar(256)
salaryintegerNULL
jobvarchar(256)NULL
manager_idPK

Here’s SQL to create such table including some data to play with:

CREATE TABLE employees (
   id serial PRIMARY KEY,
   name varchar(255) NOT NULL,
   salary integer,
   job varchar(255),
   manager_id int,
   FOREIGN KEY (manager_id) REFERENCES employees (id) ON DELETE CASCADE
);
 
INSERT INTO employees (
    id,
    name,
    manager_id,
    salary,
    job
)
VALUES
    (1,  'John', NULL, 10000, 'CEO'),
    (2,  'Ben', 5, 1400, 'Junior Developer'),
    (3,  'Barry', 5, 500, 'Intern'),
    (4,  'George', 5, 1800, 'Developer'),
    (5,  'James', 7, 3000, 'Manager'),
    (6,  'Steven', 7, 2400, 'DevOps Engineer'),
    (7,  'Alice', 1, 4200, 'VP'),
    (8,  'Jerry', 1, 3500, 'Manager'),
    (9,  'Adam', 8, 2000, 'Data Analyst'),
    (10, 'Grace', 8, 2500, 'Developer');

So, with that out of the way, let’s build some recursive queries!

Recursion

Let’s start with formal structure of recursive SQL query:

WITH RECURSIVE name [(col_1,..., col_n)] AS (
    non_recursive_part
    UNION [ALL]
    recursive_part
)

It looks pretty simple, but doesn’t tell us much, so let’s see human readable example:

WITH RECURSIVE nums (n) AS (
    SELECT 1
  UNION ALL
    SELECT n+1 FROM nums WHERE n+1 <= 10
)
SELECT n FROM nums;

And output of that query:

n
----
  1
  2
...
  9
 10
(10 rows)

In this example, our non-recursive base case is just SELECT 1 and recursive part is SELECT n+1 FROM nums WHERE n+1 <= 10. This part creates recursion by referring to name of CTE (nums) and unioning all the results. At the end we have WHERE clause to terminate the recursion, so that our query doesn’t run infinitely.

Real World Examples

Previous section showed very basic example that explains how it works, but doesn’t really show how the recursive CTE can help us. So, let’s go through some examples using Manager - Subordinate hierarchy data from above:

Getting “manager tree” for some employee:

WITH RECURSIVE managers AS (
        SELECT id, name, manager_id, job, 1 AS level
        FROM employees
        WHERE id = 7 -- Alice, the VP
    UNION ALL
        SELECT e.id, e.name, e.manager_id, e.job, managers.level + 1 AS level
        FROM employees e
        JOIN managers ON e.manager_id = managers.id
)
 
SELECT * FROM managers;

This query can be used to generate list of all subordinates of some manager, which essentially is subtree starting from some specific employee. As a base case here, we select one employee (manager) by ID and we also include level to indicate how many levels down we went in the tree. In the recursive part we JOIN employees table to CTE (managers) using manager IDs. Again, we include level by incrementing level of results from previous step of recursion. This is the output:

id |  name  | manager_id |       job        | level
----+--------+------------+------------------+-------
  7 | Alice  |          1 | VP               |     1
  5 | James  |          7 | Manager          |     2
  6 | Steven |          7 | DevOps Engineer  |     2
  2 | Ben    |          5 | Junior Developer |     3
  3 | Barry  |          5 | Intern           |     3
  4 | George |          5 | Developer        |     3

To better visualize what happens there, look at the following tree. The highlighted part is what is returned by query:

Another practical example using same data is computing degrees of separation between 2 employees:

WITH RECURSIVE rec AS (
        SELECT id, 0 AS degree
        FROM employees
        WHERE id = 1 -- John, the CEO
    UNION ALL
        SELECT e.id, rec.degree + 1 AS degree
        FROM employees e
        JOIN rec ON e.manager_id = rec.id
)
SELECT
    'John and ' || employees.name || ' are separated by ' ||
    TO_CHAR(rec.degree, '9') ||
    ' degrees of separation'
FROM employees
JOIN rec
    ON (employees.id = rec.id)
    WHERE employees.name = 'George';

The recursive CTE is quite similar to the one in previous example. To simplify things, we only select ID and degree (instead of level). Next, we need query that looks up and tells us degrees of separation for 2 employees - to do so, we join employees table to our previously built rec table containing IDs and degrees using IDs in respective tables. In the final SELECT part we do some formatting to get nicer output and in WHERE clause we optionally specify the other (second) employee for whom we want the degrees of separation. Output:

-----------------------------------------------------------
 John and George are separated by  3 degrees of separation
(1 row)

Again, playing with the same data, let’s find out what might be the job progression in the hypothetical company:

WITH RECURSIVE rec AS (
        SELECT id, manager_id, job, 0 AS level
        FROM employees
        WHERE id = 4  -- George, the Developer
    UNION ALL
        SELECT e.id, e.manager_id, e.job, rec.level + 1 AS level
        FROM employees e
        JOIN rec ON e.id = rec.manager_id
)
 
SELECT STRING_AGG(job,' > ' ORDER BY level ASC) AS path FROM rec;

This time we run the recursion from the bottom up. This is achieved by flipping the ON clause, compared to previous examples. To create readable output, we use STRING_AGG function, which takes rows from above SELECT and concatenates them using > as delimiter. This is what the query gives us:

path
--------------------------------
 Developer > Manager > VP > CEO
(1 row)

Recursion on Graphs

Enough of that employee data set, let’s explore what we can to with graphs - graphs as a data structure are ideal candidate for traversing using recursion. So, let’s first define simple table and insert some data, so we have something to play with:

CREATE TABLE edges (
   src integer,
   dest integer
);
 
INSERT INTO edges (
    src,
    dest
)
VALUES
    (1, 2),
    (2, 3),
    (2, 4),
    (3, 4),
    (4, 1),
    (3, 5);

As an example, we will do the obvious thing - find paths is graph. For that we will need a PostgreSQL special - ARRAY data type. Arrays are not a common relational database feature, but are very handy here for storing paths. If you are not familiar with this data type, then these are the things you need for understanding SQL below: - Create array - ARRAY[value_1, value_2] - Concatenate array - ARRAY[value_1, value_2] || ARRAY[value_3] - ALL function - "x" != ALL(ARRAY["a", "b", "c"]) - compares "x" to each value in array, results is true if all comparisons result in true Alright, now for the query:

WITH RECURSIVE paths (src, dest, path) AS (
        SELECT e.src, e.dest, ARRAY[e.src, e.dest]
        FROM edges e
    UNION
        SELECT p.src, e.dest, p.path || ARRAY[e.dest]
        FROM paths p
        JOIN edges e
        ON p.dest = e.src AND e.dest != ALL(p.path)
)
SELECT * FROM paths;

The query above first selects all the direct neighbours in the non-recursive case. Then in recursive part, we build on the base case (and during subsequent iterations on previous recursive cases) and append available edge to existing paths and replace destination with same edge. We do this for every row where end of the existing path (dest) is same as start of the selected edge and where new destination is not yet in path (to prevent cycles). And this is the full output:

src | dest |    path
-----+------+-------------
   1 |    2 | {1,2}
   2 |    3 | {2,3}
   2 |    4 | {2,4}
   3 |    4 | {3,4}
   4 |    1 | {4,1}
   3 |    5 | {3,5}
   4 |    2 | {4,1,2}
   1 |    3 | {1,2,3}
   1 |    4 | {1,2,4}
   2 |    4 | {2,3,4}
   2 |    5 | {2,3,5}
   2 |    1 | {2,4,1}
   3 |    1 | {3,4,1}
   3 |    2 | {3,4,1,2}
   4 |    3 | {4,1,2,3}
   1 |    4 | {1,2,3,4}
   1 |    5 | {1,2,3,5}
   2 |    1 | {2,3,4,1}
   4 |    5 | {4,1,2,3,5}

Infinite Tables

Aside from the useful things above, you can also use recursive CTE with infinite streams of data:

WITH RECURSIVE nums (n) AS (
    SELECT 1
  UNION ALL
    SELECT n+1 FROM nums
)
SELECT n FROM nums;
-- ... Runs forever
 
SELECT n FROM nums LIMIT 15;
-- Works just fine, returns 15 rows

Recursive table is built iteratively with each step and each step produces finite number of rows, therefore it’s possible to use LIMIT to retrieve first n rows from infinite query!

Bonus: Recursive VIEW

As a quick little bonus, I want to also show, that it is possible (as of PostgreSQL 9.3) to even create recursive views:

CREATE RECURSIVE VIEW paths (src, dest, path) AS (
        SELECT e.src, e.dest, ARRAY[e.src, e.dest]
        FROM edges e
    UNION
        SELECT p.src, e.dest, p.path || ARRAY[e.dest]
        FROM paths p
        JOIN edges e
        ON p.dest = e.src AND e.dest != ALL(p.path)
);
 
SELECT * FROM paths WHERE src = 4;
 
 src | dest |    path
-----+------+-------------
   4 |    1 | {4,1}
   4 |    2 | {4,1,2}
   4 |    3 | {4,1,2,3}
   4 |    5 | {4,1,2,3,5}
(4 rows)

Even though, this is just a syntax sugar, it can make your life easier when working with some of recursive queries and data.

Conclusion

Recursion is very powerful tool, that can solve some problems in very elegant way. Using it with SQL is not so common though, so hopefully this article showed you some ways for how to leverage it, in case you run into data that can only be processed recursively.

Recursion however, can make your code less readable, so use it sparingly, especially with SQL, as it is often not that easy parse and understand even without recursion.