MySQL: group_concat allows you to easily concatenate the grouped values of a row

Last week I stumbled over a really useful function in MySQL: group_concat allows you to concatenate the data of one column of multiple entries by grouping them by one field field. You can choose the separator to use for the concatenation. The full syntax is as follows:

GROUP_CONCAT([DISTINCT] expr [,expr ...]
[ORDER BY {unsigned_integer | col_name | expr}
[ASC | DESC] [,col_name ...]]
[SEPARATOR str_val])

According to the MySQL documentation, the function returns a string result with the concatenated non-NULL values from a group. It returns NULL if there are no non-NULLvalues. To eliminate duplicate values, use the DISTINCT clause. To sort values in the result, use the ORDER BY clause. To sort in reverse order, add the DESC (descending) keyword to the name of the column you are sorting by in the ORDER BY clause.

To make things clear, lets use a simple example to demonstrate the function:

So for example you have the following table representing the currently logged in users on a server:

CREATE TABLE `logged_in` (
`server` varchar(255) DEFAULT NULL,
`user` varchar(255) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1

With the following entries:

There is one row for each user which is logged in on a specific server (e.g. marc is logged in on Server A). If you want the resultset to have a single entry for every server with a concatenated list of all the users logged in on that particular server, you can use group_concat and group the results by the name of the server using the following query:

SELECT server,
group_concat(user SEPARATOR ',') AS logged_in_users
FROM logged_in
GROUP BY server;

Which leads to the desired result:

I hope this little demo was helpful to you. If you have any questions or comments, feel free to use the comment function below.

Links:

http://dev.mysql.com/doc/refman/5.7/en/group-by-functions.html#function_group-concat

Using SQL WITH clause to create temporary static tables at query time

A few days ago, I came across the following problem: I currently work on a project where I am the responsible of an application which generates entries to a log table every time a job is executed. This table contains a lot of information on statuses of jobs, possible problems, exceptions, duration, aso. I was working on some analytics on this data and needed to enrich the data by the version of the software which generated the log entry (since we were not capturing this in the log table). From our configuration management tool, I was able to extract the dates when which versions of the software was deployed in production

Problem

My intention was to create a temporary table to join onto the  logged entries, but I didn´t want to create the tables on the Oracle server (mainly because they would have been just temporary tables and because the schema-user I was using didn´t have the rights to create tables).

Solution: Using the WITH statement to create temporary static data

Since I was just interested in getting my analytics done, I used an SQL WITH statement to create a temporary static table (for the query), which I then linked onto the job table. Creating static data with the WITH query is rather uncommon since it is usually used to create a temporary view by querying some data.

About the WITH clause

The WITH query_name clause lets you assign a name to a subquery block. You can then reference the subquery block multiple places in the query by specifying the query name. Oracle optimizes the query by treating the query name as either an inline view or as a temporary table.

You can specify this clause in any top-level SELECT statement and in most types of subqueries. The query name is visible to the main query and to all subsequent subqueries except the subquery that defines the query name itself. [1]

The initial static data

I extracted the following data from our configuration management tool. It provides information about when a certain version of the software we use was deployed in production. (The data has been anonymised for privacy reasons though)

I then semi-manually transformed the column data from Excel into the following SQL statement.

The WITH statement to create the temporary static table

WITH software_versions as ((SELECT 'V 8.1' as version, TO_DATE('20.12.2013 05:04:00', 'dd.mm.yyyy hh24:mi:ss')  as from_date, SYSDATE as to_date FROM dual ) UNION
(SELECT 'V 8.0' as version, TO_DATE('07.10.2013 10:06:00', 'dd.mm.yyyy hh24:mi:ss')  as from_date, TO_DATE('20.12.2013 05:04:00', 'dd.mm.yyyy hh24:mi:ss')  as to_date FROM dual ) UNION
(SELECT 'V 7.2' as version, TO_DATE('29.07.2013 07:54:00', 'dd.mm.yyyy hh24:mi:ss')  as from_date, TO_DATE('07.10.2013 10:06:00', 'dd.mm.yyyy hh24:mi:ss')  as to_date FROM dual ) UNION
(SELECT 'V 7.1' as version, TO_DATE('04.06.2013 10:02:00', 'dd.mm.yyyy hh24:mi:ss')  as from_date, TO_DATE('29.07.2013 07:54:00', 'dd.mm.yyyy hh24:mi:ss')  as to_date FROM dual ) UNION
(SELECT 'V 7.0' as version, TO_DATE('13.05.2013 08:32:00', 'dd.mm.yyyy hh24:mi:ss')  as from_date, TO_DATE('04.06.2013 10:02:00', 'dd.mm.yyyy hh24:mi:ss')  as to_date FROM dual ) UNION
(SELECT 'V 6.0' as version, TO_DATE('10.05.2013 08:34:00', 'dd.mm.yyyy hh24:mi:ss')  as from_date, TO_DATE('13.05.2013 08:32:00', 'dd.mm.yyyy hh24:mi:ss')  as to_date FROM dual ) UNION
(SELECT 'V 5.0' as version, TO_DATE('16.03.2013 08:09:00', 'dd.mm.yyyy hh24:mi:ss')  as from_date, TO_DATE('10.05.2013 08:34:00', 'dd.mm.yyyy hh24:mi:ss')  as to_date FROM dual ))

Every static SELECT generates data and formats the from_date and to_date from a TEXT to a  DATE datatype. This is important because range queries will be done on this result. Subsequently the UNION of all this single rows is done.

The resultset of the query on the temporary static table created using the WITH statement

--WITH statement was omitted here for better readability
SELECT * from software_Versions;

Now, we will combine this table with the logged events in the job table.

The job table

SELECT * from job;

This is just an extract, the job table contains many fields on the status of jobs

The job table enriched by the software version information

Here the idea is to join the software_versions table with the job table and to only select these elements, where the job_start_date falls in the range where a certain version of the software was installed:

--WITH statement was omitted here for better readability
SELECT s.version, j.job_id, j.job_start_date, j.job_end_date
FROM   job j,
software_versions s
WHERE  j.job_start_date BETWEEN s.from_date AND s.to_date
ORDER BY job_id desc;

Counting the number of jobs executed by a certain software version

It might also be of interest so see, how many jobs were executed by each and every version of the software.

--WITH statement was omitted here for better readability
SELECT COUNT(job_id) as jobs_executed, s.version
FROM   job j,
software_versions s
WHERE  j.job_start_date BETWEEN s.from_date AND s.to_date
GROUP BY s.version;

Result:

Obviously, now it is possible to manipulate the query to only count jobs having a certain status (succeeded, failed, …). Please go ahead and try out different filters and groupings. This is not part of this tutorial anymore,  but you can see what I mean! 😀

Summary

In this micro tutorial, we have seen a possibility to create temporary static tables using the WITH statement. (I used an Oracle database for this tutorial, but the WITH statement is also available in other database products.) This temporary table was linked onto a jobs table and the entries in this table were enriched by the version of the software which generated the entry. Please feel free to provide any comments, and let me know if I should make thinks more clear/explicit and  if  this  tutorial was useful for you.

Links

[1] http://psoug.org/reference/with.html