Explicit Join Semantics In Firebird
by Claudio Valderrama
Let me show you some examples.
SQL> select count(*) from rdb$relations r join rdb$relation_fields rf on
r.rdb$relation_name = rf.rdb$relation_name and
r.rdb$relation_name = 'RDB$GENERATORS';
COUNT
============
4
SQL> select count(*) from rdb$relations r left join rdb$relation_fields rf
on r.rdb$relation_name = rf.rdb$relation_name
and r.rdb$relation_name = 'RDB$GENERATORS';
COUNT
============
39
SQL> select count(*) from rdb$relations r left join rdb$relation_fields rf
on r.rdb$relation_name = rf.rdb$relation_name
where r.rdb$relation_name = 'RDB$GENERATORS';
COUNT
============
4
I've have tried to explain to people new to SQL using the following: In an inner join, putting everything in the ON clause or anything except the join condition in the WHERE clause becomes a matter of style. Being a purist myself, I prefer for readability to write the condition that makes the match in the join after the JOIN clause and the filter that applies to the whole result set in the WHERE clause. Something like:
A join B on A.f = B.f
where A.f > x;
See the first case. You can put
WHERE r.rdb$relation_name = 'RDB$GENERATORS'
instead of
and r.rdb$relation_name = 'RDB$GENERATORS'
and it won't make a difference in practice. The table rdb$generators has four fields in Firebird 2.
In a left or right [outer] join, the ON clause can filter out only records from the optional table (I mean non-mandatory table). The ON clause doesn't have the "power" to leave out records from the mandatory table. However, whatever you put here, if it doesn't apply to the current record (not only to the optional table) it will cause the optional table's fields to be returned as null. Take the second query as an example. I think I need to post the result to explain better:
RDB$RELATION_NAME RDB$RELATION_NAME =============================== ================= RDB$PAGES <null> RDB$DATABASE <null> RDB$FIELDS <null> RDB$INDEX_SEGMENTS <null> RDB$INDICES <null> RDB$RELATION_FIELDS <null> RDB$RELATIONS <null> RDB$VIEW_RELATIONS <null> RDB$FORMATS <null> RDB$SECURITY_CLASSES <null> RDB$FILES <null> RDB$TYPES <null> RDB$TRIGGERS <null> RDB$DEPENDENCIES <null> RDB$FUNCTIONS <null> RDB$FUNCTION_ARGUMENTS <null> RDB$FILTERS <null> RDB$TRIGGER_MESSAGES <null> RDB$USER_PRIVILEGES <null> RDB$TRANSACTIONS <null> RDB$RELATION_NAME RDB$RELATION_NAME =============================== ================= RDB$GENERATORS RDB$GENERATORS RDB$GENERATORS RDB$GENERATORS RDB$GENERATORS RDB$GENERATORS RDB$GENERATORS RDB$GENERATORS RDB$FIELD_DIMENSIONS <null> RDB$RELATION_CONSTRAINTS <null> RDB$REF_CONSTRAINTS <null> RDB$CHECK_CONSTRAINTS <null> RDB$LOG_FILES <null> RDB$PROCEDURES <null> RDB$PROCEDURE_PARAMETERS <null> RDB$CHARACTER_SETS <null> RDB$COLLATIONS <null> RDB$EXCEPTIONS <null> RDB$ROLES <null> RDB$BACKUP_HISTORY <null> P <null> A <null> M <null>
For example, you find <null> in front of RDB$PAGES not because this table doesn't have fields, but because
on r.rdb$relation_name = rf.rdb$relation_name
and r.rdb$relation_name = 'RDB$GENERATORS';
matches the first part (there will be fields for that table) but not the second condition. Notice it references the mandatory table, but since it can't drop records from the mandatory table, it's interpreted as a failure of the left join and the same as if the first condition has failed: a single record with null fields if returned for the optional table. This is clearly shown with the repetition of RDB$GENERATORS, that's the only record that matches the complete ON clause. I could also have returned the rf.rdb$field_name probably, but you know you will get the names of the four fields in the table rdb$generators.
In a left or right [outer] join, it then becomes not only readability but a need to get the correct result to separate the ON clause from the WHERE clause. Then for consistency, I take the same care with inner joins, separating logically the ON and the WHERE clause. The third example shows what was probably intended from the left join, that again gave us four records. First, a lot of records were produced, where a single value in the mandatory table was repeated as many times as matching fields exist in the optional table. In this case, we'll observe rdb$pages paired with its fields, then rdb$database paired with its fields, etc. The intermediate result will have the same count as simply
SQL> select count(*) from rdb$relations r left join rdb$relation_fields rf
on r.rdb$relation_name = rf.rdb$relation_name;
COUNT
============
248
That's indeed the same as simply counting how many entries we have in rdb$relation_fields, since we know (and hope) there aren't orphan fields (fields that don't belong to any relation).
SQL> select count(*) from rdb$relation_fields;
COUNT
============
248
Then the WHERE filter is applied and leaves four records. This explains the third result.
Expect more weirdness if you go for a full [outer] join. Example:
SQL> select count(*) from rdb$relations r full join rdb$relation_fields rf
on r.rdb$relation_name = rf.rdb$relation_name;
COUNT
============
248
SQL> select count(*) from rdb$relations r full join rdb$relation_fields rf
on r.rdb$relation_name = rf.rdb$relation_name and r.rdb$relation_name =
'RDB$GENERATORS';
COUNT
============
283
Notice the second case. We tried (but couldn't) reduce one side by applying a filter in the ON clause, but got MORE records! The full join means the filter in the ON clause can't leave records out of the result on any side!
SQL> select count(*) from rdb$relations r full join rdb$relation_fields rf
on r.rdb$relation_name = rf.rdb$relation_name where r.rdb$relation_name =
'RDB$GENERATORS';
COUNT
============
4
SQL> select count(*) from rdb$relations r full join rdb$relation_fields rf
on r.rdb$relation_name = rf.rdb$relation_name where rf.rdb$relation_name =
'RDB$GENERATORS';
COUNT
============
4
Notice the WHERE clause was first applied to the left table and then to the right table. No changes in the result. Both tables are mandatory, so the results are mostly symmetric. (Our data has a very well defined parent-child relationship between the two tables.)
I don't want to upset anyone, but maybe some was asleep and doesn't know why we went from 248 to 283. The difference is 35. Hint:
SQL> select count(*) from rdb$relations;
COUNT
============
36
Now why is the difference 35 and not 36 if the number of tables is 36? Because all non-matching field pairs generate a new pair with null in the side that can't match the extra condition in the ON clause, that was to find the 'RDB$GENERATORS' name. (But they will exclude all full matches of common fields not passing the ON clause! Only in that case, the ON clause behaves like facing an inner join.) There's only one case (for rdb$generators) that doesn't generate such pair, because it matches. Convince yourself:
SQL> select count(*) from rdb$relations r full join rdb$relation_fields rf
on r.rdb$relation_name = rf.rdb$relation_name and r.rdb$relation_name =
'RDB$GENERATORS' where rf.rdb$relation_name is null;
and
SQL> select count(*)from rdb$relations r full join rdb$relation_fields rf on
r.rdb$relation_name = rf.rdb$relation_name
and rf.rdb$relation_name = 'RDB$GENERATORS' where rf.rdb$relation_name is
null;
give the same result:
COUNT ============ 35
So, for each case that doesn't match "left table having 'RDB$GENERATORS' in its field", we have some records removed and some records with null in one side added! We have 36 tables but 35 mismatches due to the ON clause. Since I know not everyone will be convinced, here you have a boring output:
SQL> select r.rdb$relation_name, rf.rdb$relation_name, rf.rdb$field_name
from rdb$relations r full join rdb$relation_fields rf on r.rdb$relation_name
= rf.rdb$relation_name and r.rdb$relation_name = 'RDB$GENERATORS' where
rf.rdb$relation_name is null;
I'm not showing the rdb$field_name, but you can trust me that it's completely null:
RDB$RELATION_NAME RDB$RELATION_NAME =============================== ================= RDB$PAGES <null> RDB$DATABASE <null> RDB$FIELDS <null> RDB$INDEX_SEGMENTS <null> RDB$INDICES <null> RDB$RELATION_FIELDS <null> RDB$RELATIONS <null> RDB$VIEW_RELATIONS <null> RDB$FORMATS <null> RDB$SECURITY_CLASSES <null> RDB$FILES <null> RDB$TYPES <null> RDB$TRIGGERS <null> RDB$DEPENDENCIES <null> RDB$FUNCTIONS <null> RDB$FUNCTION_ARGUMENTS <null> RDB$FILTERS <null> RDB$TRIGGER_MESSAGES <null> RDB$USER_PRIVILEGES <null> RDB$TRANSACTIONS <null> RDB$RELATION_NAME RDB$RELATION_NAME =============================== ================= RDB$FIELD_DIMENSIONS <null> RDB$RELATION_CONSTRAINTS <null> RDB$REF_CONSTRAINTS <null> RDB$CHECK_CONSTRAINTS <null> RDB$LOG_FILES <null> RDB$PROCEDURES <null> RDB$PROCEDURE_PARAMETERS <null> RDB$CHARACTER_SETS <null> RDB$COLLATIONS <null> RDB$EXCEPTIONS <null> RDB$ROLES <null> RDB$BACKUP_HISTORY <null> P <null> A <null> M <null>
These are the phantom 35 records that appeared. As expected, we are missing an entry for RDB$GENERATORS.
In our example, we don't have dangling values or the database would be corrupt:
SQL> select count(*)from rdb$relations r full join rdb$relation_fields rf on
r.rdb$relation_name = rf.rdb$relation_name
and rf.rdb$relation_name = 'RDB$GENERATORS' where rf.rdb$relation_name is
null and r.rdb$relation_name is null;
COUNT
============
0
Our example is biased. Let's consider using an example where both tables have records that can't be matched in the other as well as some common values:
SQL> create database 'fuss';
SQL> create table a(a int);
SQL> insert into a values(1);
SQL> insert into a values(2);
SQL> create table b(b int);
SQL> insert into b values(2);
SQL> insert into b values(3);
SQL> select a.a, b.b from a full join b on a.a = b.b;
A B
============ ============
2 2
<null> 3
1 <null>
Normal full join. All matches plus all mismatches.
SQL> select a.a, b.b from a full join b on a.a = b.b and a.a = 1;
A B
============ ============
<null> 2
<null> 3
1 <null>
2 <null>
- The ON clase excludes the full match (like it had the power it has in an inner join) but two phantoms appeared.
SQL> select a.a, b.b from a full join b on a.a = b.b and a.a = 1 and b.b = 3;
A B
============ ============
<null> 2
<null> 3
1 <null>
2 <null>
Nothing changes.
SQL> select a.a, b.b from a full join b on a.a = b.b and b.b = 3;
A B
============ ============
<null> 2
<null> 3
1 <null>
2 <null>
Same example as (1), but reversed.
SQL> select a.a, b.b from a full join b on a.a = b.b and b.b = 2;
A B
============ ============
2 2
<null> 3
1 <null>
Finally, our matching field appears and makes no different with respect to our first full join. No, I'm no SQL guru, so I'm not trying to be pedant. I'm trying to show that Firebirds behaviour is entirely consistent.