Wednesday, March 22, 2023
HomeJavaUse MULTISET Predicates to Examine Knowledge Units – Java, SQL and jOOQ.

Use MULTISET Predicates to Examine Knowledge Units – Java, SQL and jOOQ.


Questions that may be a bit harder to unravel utilizing extraordinary SQL are questions of the sort:

What movies have the identical actors as a given movie X?

As all the time, we’re utilizing the sakila database for this instance. What could be a doable strategy to clear up this with SQL (for instance, PostgreSQL, to be particular)? The next question provides an outline of actors per movie:

SELECT
  film_id,
  array_agg(actor_id ORDER BY actor_id) actors
FROM film_actor
GROUP BY film_id

It produces one thing like this:

|film_id|actors                            |
|-------|----------------------------------|
|1      |{1,10,20,30,40,53,108,162,188,198}|
|2      |{19,85,90,160}                    |
|3      |{2,19,24,64,123}                  |
|4      |{41,81,88,147,162}                |
|5      |{51,59,103,181,200}               |
|6      |{21,23,62,108,137,169,197}        |
|...    |...                               |

Be aware that in SQL, arrays behave like lists, i.e. they keep their ordering, so ordering the array explicitly is necessary to have the ability to evaluate the actors with one another. Now, we wish to discover all movies that share the identical actor set, from the above:

WITH t AS (
  -- Earlier question
  SELECT
    film_id,
    array_agg(actor_id ORDER BY actor_id) actors
  FROM film_actor
  GROUP BY film_id
)
SELECT 
  array_agg(film_id ORDER BY film_id) AS movies,
  actors
FROM t
GROUP BY actors
ORDER BY depend(*) DESC, movies

The result’s now:

|movies   |actors                            |
|--------|----------------------------------|
|{97,556}|{65}                              |
|{1}     |{1,10,20,30,40,53,108,162,188,198}|
|{2}     |{19,85,90,160}                    |
|{3}     |{2,19,24,64,123}                  |
|{4}     |{41,81,88,147,162}                |
|{5}     |{51,59,103,181,200}               |
|{6}     |{21,23,62,108,137,169,197}        |
|...     |...                               |

So, as we are able to see, there are solely 2 movies which share the identical set of actors, and people movies are FILM_ID IN (97, 556). (The Sakila database is a bit boring as the information units are generated).

Utilizing MULTISET comparisons

Whereas the above is already fairly cool, on this article, I’d wish to showcase a lesser identified characteristic of the jOOQ 3.15 MULTISET help, specifically the truth that they are often in contrast with each other.

And as is the character of SQL customary MULTISET, ordering is irrelevant, so we don’t have so as to add any specific ORDER BY clause for such a comparability. The truth is, it’s not 100% irrelevant. You can order a MULTISET for projection functions, so the ordering can be maintained by jOOQ. However if you use them in predicates, jOOQ will override your ORDER BY clause.

Utilizing jOOQ, we are able to write:

ctx.choose(FILM.FILM_ID, FILM.TITLE)
   .from(FILM)
   .the place(
       multiset(
           choose(FILM_ACTOR.ACTOR_ID)
           .from(FILM_ACTOR)
           .the place(FILM_ACTOR.FILM_ID.eq(FILM.FILM_ID))
       ).eq(multiset(
           choose(FILM_ACTOR.ACTOR_ID)
           .from(FILM_ACTOR)
           .the place(FILM_ACTOR.FILM_ID.eq(97L))
       ))
   )
   .orderBy(FILM_ID)
   .fetch();

It’s a bit much less environment friendly than a question of the earlier type because it accesses the FILM_ACTOR desk from two subqueries, although solely one among them is correlated. Utilizing the default JSONB emulation, the next question is generated:

SELECT movie.film_id, movie.title
FROM movie
WHERE (
  SELECT coalesce(
    jsonb_agg(jsonb_build_array(v0) ORDER BY t.v0),
    jsonb_build_array()
  )
  FROM (
    SELECT film_actor.actor_id AS v0
    FROM film_actor
    WHERE film_actor.film_id = movie.film_id
  ) AS t
) = (
  SELECT coalesce(
    jsonb_agg(jsonb_build_array(v0) ORDER BY t.v0),
    jsonb_build_array()
  )
  FROM (
    SELECT film_actor.actor_id AS v0
    FROM film_actor
    WHERE film_actor.film_id = 97
  ) AS t
)
ORDER BY movie.film_id

I promised that no ORDER BY clause was wanted for MULTISET, and that is nonetheless true for the jOOQ code. Nevertheless, behind the scenes, jOOQ has to order the JSON arrays by their contents to make it possible for two MULTISET values are the identical, no matter their order.

The consequence is similar two IDs because the earlier consequence confirmed:

+-------+--------------+
|film_id|title         |
+-------+--------------+
|     97|BRIDE INTRIGUE|
|    556|MALTESE HOPE  |
+-------+--------------+

Evaluating MULTISET_AGG, as an alternative

Should you desire utilizing joins and GROUP BY to generate the movie’s actor MULTISET, you are able to do that as effectively, with jOOQ. This time, we’re utilizing:

  • Implicit joins to simplify entry to the FILM.TITLE from FILM_ACTOR
  • A MULTISET predicate within the HAVING clause, utilizing MULTISET_AGG

Right here’s the jOOQ model:

ctx.choose(FILM_ACTOR.FILM_ID, FILM_ACTOR.movie().TITLE)
   .from(FILM_ACTOR)
   .groupBy(FILM_ACTOR.FILM_ID, FILM_ACTOR.movie().TITLE)
   .having(multisetAgg(FILM_ACTOR.ACTOR_ID).eq(multiset(
        choose(FILM_ACTOR.ACTOR_ID)
        .from(FILM_ACTOR)
        .the place(FILM_ACTOR.FILM_ID.eq(97L))
    )))
   .orderBy(FILM_ACTOR.FILM_ID)
   .fetch();

The backing, generated SQL appears to be like like this:

SELECT film_actor.film_id, alias_75379701.title
FROM film_actor
  JOIN movie AS alias_75379701
    ON film_actor.film_id = alias_75379701.film_id
GROUP BY film_actor.film_id, alias_75379701.title
HAVING jsonb_agg(
  jsonb_build_array(film_actor.actor_id) ORDER BY film_actor.actor_id
) = (
  SELECT coalesce(
    jsonb_agg(jsonb_build_array(v0) ORDER BY t.v0),
    jsonb_build_array()
  )
  FROM (
    SELECT film_actor.actor_id AS v0
    FROM film_actor
    WHERE film_actor.film_id = 97
  ) AS t
)
ORDER BY film_actor.film_id

Discover how the implicit be a part of is expanded routinely, whereas the HAVING predicate once more makes use of the same old JSONB emulation for MULTISET and MULTISET_AGG.

Options

Within the above examples, we’ve in contrast MULTISET expressions that venture single columns, in different phrases, End result<Record1<Lengthy>> nested assortment varieties. Nothing retains you from including extra columns to the equation. jOOQ will all the time be certain that your question kind checks and that the generated SQL is appropriate.

An alternative choice to utilizing MULTISET could be utilizing ARRAY_AGG and ARRAY (now it’s important to ORDER BY explicitly, once more). With jOOQ:

ctx.choose(FILM_ACTOR.FILM_ID, FILM_ACTOR.movie().TITLE)
   .from(FILM_ACTOR)
   .groupBy(FILM_ACTOR.FILM_ID, FILM_ACTOR.movie().TITLE)
   .having(arrayAgg(FILM_ACTOR.ACTOR_ID)
        .orderBy(FILM_ACTOR.ACTOR_ID).eq(array(
            choose(FILM_ACTOR.ACTOR_ID)
            .from(FILM_ACTOR)
            .the place(FILM_ACTOR.FILM_ID.eq(97L))
            .orderBy(FILM_ACTOR.ACTOR_ID)
    )))
    .orderBy(FILM_ACTOR.FILM_ID)
    .fetch();

With SQL:

SELECT film_actor.film_id, movie.title
FROM film_actor
  JOIN movie
    ON film_actor.film_id = movie.film_id
GROUP BY film_actor.film_id, movie.title
HAVING array_agg(film_actor.actor_id ORDER BY film_actor.actor_id) = 
  ARRAY (
    SELECT film_actor.actor_id
    FROM film_actor
    WHERE film_actor.film_id = 97
    ORDER BY film_actor.actor_id
  )
ORDER BY film_actor.film_id

RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Most Popular

Recent Comments