May 17, 2023

Combining DISTINCT and group_concat() with custom delimiters in SQLite3

In this blog post, I'll introduce two useful tools in SQLite3: group_concat() and DISTINCT. I show how they can be used individually, and that you need to watch out when combining them with custom delimiters.

I assume you have a basic understanding of SQL, but mostly you will need to know about the concepts of a table with data and simple SELECT queries.

But first, we need a table to work with. For that, we'll create a table of hobbit names, and fill it with a few hobbits from Tolkien's fictional universe:

CREATE TABLE hobbits (
  first_name VARCHAR(64),
  last_name VARCHAR(64)
);

INSERT INTO hobbits VALUES
  ("Frodo", "Baggins"),
  ("Bilbo", "Baggins"),
  ("Sam", "Gamgee"),
  ("Pippin", "Took"),
  ("Merry", "Brandybuck")
;

To simply select the first and last names of each of the hobbits, the following query can be used:

SELECT first_name, last_name FROM hobbits;
first_namelast_name
FrodoBaggins
BilboBaggins
SamGamgee
PippinTook
MerryBrandybuck

Great! But we can do more interesting things, as demonstrated next.

Concatenating

One of the tools in SQL is the group_concat() function, which returns a string that is the concatenation of all non-null values of the given result set.

For example, this query concatenates all of the hobbits' first names:

SELECT group_concat(first_name) FROM hobbits;
Frodo,Bilbo,Sam,Pippin,Merry

You can see that, by default, the values are delimited by commas (,). To customise this, it is possible to provide an additional argument that sets the delimiter:

SELECT
  group_concat(first_name, " and ")
FROM
  hobbits
;
Frodo and Bilbo and Sam and Pippin and Merry

That's a little more readable.

Distinct values

Another tool in SQL is the DISTINCT keyword, which can remove duplicate entries from result sets.

The keen-eyed (or well-read) among you may have observed (or remembered) that Frodo and Bilbo share their last names! This means that when selecting the last names, we would see a duplicate entry. And indeed:

SELECT last_name FROM hobbits;
last_name
Baggins
Baggins
Gamgee
Took
Brandybuck

To remove the duplicate Bagginses from the result set, you can specify that the result set should be made distinct with the DISTINCT keyword:

SELECT DISTINCT last_name FROM hobbits;
last_name
Baggins
Gamgee
Took
Brandybuck

As expected, only a single Baggins remains in the result.

Concatenating distinct values

So, what if we want to combine group_concat() and DISTINCT? We can! This will concatenate all the distinct last names:

SELECT
  group_concat(DISTINCT last_name)
FROM
  hobbits
;
Baggins,Gamgee,Took,Brandybuck

Fabulous! We're wielding these tools like a pro! Feeling overly confident, we may even try to customise the delimiter for extra points:

SELECT
    group_concat(DISTINCT last_name, " and ")
FROM
    hobbits
;
Parse error near line 2: DISTINCT aggregates must
  have exactly one argument

Uh oh. It looks like the way we use the DISTINCT keyword confuses the query parser. It seems like the last_name, " and " part is interpreted as two arguments for the DISTINCT keyword?

Let's try to fix that with parentheses so that the DISTINCT keyword only applies to last_name:

SELECT
    group_concat((DISTINCT last_name), " and ")
FROM
    hobbits
;
Parse error near line 39: near "DISTINCT": syntax
  error
  SELECT     group_concat((DISTINCT last_name),
    " and ") FROM     hobbits ;
             error here ---^

Nope! It seems like this is not allowed. Consulting the SELECT grammar, it is indeed the case that it is required for the DISTINCT keyword to directly follow the opening parenthesis of the group_concat() (or other) function.

It looks like the DISTINCT keyword is only allowed in certain contexts, and a special case has been made for adding it in function invocations. This special case prohibits extra arguments in the function invocation.

The Solution

So, how do we solve this? Well, an answer on the SQLite Forum provided a solution! It uses a subquery as a workaround:

SELECT
  group_concat(distinct_last_name, " and ")
FROM (
  SELECT DISTINCT
    last_name AS distinct_last_name
  FROM hobbits
);
Baggins and Gamgee and Took and Brandybuck

The inner query creates an intermediate result set that we can use in the outer query. This workaround definitely feels like... well, a workaround. If you find a more elegant solution to this, please let me know!

Conclusion

Apparently, the group_concat() and DISTINCT tools do not work together as I expected when using a custom delimiter.

As Bilbo Baggins would put it: I don't know half of SQL as well as I should like; and I like less than half of it half as well as it deserves.