Fetching Data from Multiple Tables using Joins

7 Apr

Normalization is almost universally applied to relational databases such as MySQL in order to optimize tables for general-purpose querying and to rid them of certain undesirable characteristics that could lead to a loss of data integrity. Doing so tends to promote better accuracy of queries, but it also leads to queries that take a little more work to develop, as the data may be spread amongst several tables. In today’s article, we’ll learn how to fetch data from multiple tables by using joins.

Two Syntaxes

There are two accepted syntax styles for writing joins: ANSI-style joins and theta-style joins. ANSI syntax uses the JOIN and ON keywords, as in the following example:

SELECT     field1,
field2,
FROM     my_table     t1
JOIN    my_other_table    t2
ON    t1.primary_id_field = t2.foreign_key_id_field
WHERE    t1.lastname      = ‘Smith’;

The JOIN keyword is used to separate the names of the tables being joined, and the ON clause contains the relation showing which column is being used as the join key.
In theta-style syntax, the table joins are simply added to the WHERE clause:

SELECT     field1,
field2,
FROM     my_table     t1,
My_other_table    t2
WHERE    t1.primary_id_field = t2.foreign_key_id_field
AND    t1.lastname      = ‘Smith’;

MySQL also supports a nonstandard extension of the ANSI syntax that can be used as a sort of shorthand for when the join column has the same name in both joined tables:

SELECT     field1,
field2,
FROM     my_table     t1
JOIN    my_other_table    t2
USING    (t1.id_field)
WHERE    t1.lastname      = ‘Smith’;

So Which Syntax is Best?

The ANSI syntax is generally preferable to theta style because it’s usually easier to read and understand, particularly when writing joins involving numerous tables. There are also some types of joins that can’t be written using theta-style notation in MySQL.

Join Types

In order to join tables together, there has to be some common data that allow those tables to be connected in some meaningful way. Although it’s possible to have more than one common column between two tables, most often, the join key will be the primary key of one table and a foreign key in the other.

To illustrate, we’ll perform queries against the following sample tables:

manufacturer
id description
1 ACURA
2 ALFA ROMEO
3 ASUNA
4 AUDI
5 BMW
6 BUICK
7 CADILLAC
8 CHEVROLET
9 CHRYSLER
10 DAEWOO
11 DODGE
12 EAGLE
13 FORD
14 GEO
15 GMC
16 HONDA
17 HYUNDAI
18 INFINITI
19 ISUZU
20 JAGUAR
21 LADA
22 LEXUS
23 LINCOLN
24 MAZDA
25 MERCEDES-BENZ
26 MERCURY
27 NISSAN
28 OLDSMOBILE
29 PLYMOUTH
30 PONTIAC
31 PORSCHE
32 SAAB
33 SATURN
34 SUBARU
35 SUZUKI
36 TOYOTA
37 VOLKSWAGEN
38 VOLVO
39 JEEP
40 LAND ROVER
41 (NULL)
42 New Car Co
model
id *manufacturer_id description
1 ACURA INTEGRA
2 ACURA CL
3 ACURA LEGEND
4 ACURA RL
5 ACURA NSX
6 ACURA TL
7 ACURA VIGOR
8 ACURA EL
9 ASUNA SEDAN
10 ASUNA HATCHBACK
11 ACURA NSX-T
12 ALFA ROMEO SEDAN LS
13 ALFA ROMEO SPIDER
14 AUDI A4
15 AUDI A6
16 AUDI A8
17 AUDI S4
18 AUDI S6
19 AUDI 90 SERIES
20 AUDI CABRIOLET
21 AUDI 100 SERIES
22 AUDI V8 QUATTRO
23 BMW 318
24 BMW 323
25 BMW 328
26 BMW M3
27 BMW 320
28 BMW 325
29 BMW Z3 ROADSTER
30 BMW ROADSTER M 3.2
31 BMW 528
32 BMW 540
33 BMW 525
34 BMW 530
35 BMW M5
36 BMW 535
37 BMW 740
38 BMW 750
39 BMW
40 BMW 840
41 BMW 850
42 (NULL) Custom

*Note: In a normalized database, the Manufacturer ID would be stored in the Models table. Here, I included the full description to better illustrate which manufacturers are associated with which models.

The first table contains automobile manufacturers; the second, some models that are built by the first several auto manufacturers. The common data between the two tables is the manufacturer, which is linked by manufacturer ID.

Now we’ll extract some data from the tables, using different join types in ANSI syntax.

Inner Join

An inner join is defined as a join in which unmatched rows from either table are not to be returned. In other words, the rows must match in both tables in order to be included in the result set.

SELECT         t1.description AS ‘Manufacturer’,
t2.description AS ‘Model’
FROM         manufacturer     t1
INNER JOIN    model        t2
ON        t1.id = t2.manufacturer_id
WHERE        t1.description = ‘ACURA’;

The “INNER” keyword is not required, but it is considered good practice to include it.

Typing the query above in the MySQL Command Line Client produces the following:

mysql> SELECT           t1.description AS ‘Manufacturer’,
->                  t2.description AS ‘Model’
-> FROM             manufacturer    t1
-> INNER JOIN       model           t2
-> ON               t1.id = t2.manufacturer_id
-> WHERE            t1.description = ‘ACURA’;

+————–+———+
| Manufacturer | Model   |
+————–+———+
| ACURA        | INTEGRA |
| ACURA        | CL                |
| ACURA        | LEGEND    |
| ACURA        | RL                |
| ACURA        | NSX       |
| ACURA        | TL           |
| ACURA        | VIGOR   |
| ACURA        | EL           |
| ACURA        | NSX-T   |
+————–+———+
9 rows in set (0.00 sec)

Outer Join

Outer joins will return records in one table that aren’t matched in another. Outer joins can be further divided into the two types of left and right. In a left outer join, all records from the first (left-hand) table in a join that meet any conditions set in the WHERE clause are returned, whether or not there’s a match in the second (right-hand) table:

mysql> SELECT           t1.description AS ‘Manufacturer’,
->                  t2.description AS ‘Model’
-> FROM             manufacturer   t1
-> LEFT JOIN        model          t2
-> ON                t1.id = t2.manufacturer_id
-> WHERE        t1.description = ‘New Car Co’;

+————–+—————-+
| Manufacturer | Model          |
+————–+—————-+
| New Car Co   | (NULL)         |
+————–+—————-+
19 rows in set (0.00 sec)
The ‘New Car Co’ is returned even though there are no associated models in the model table.

Similar to the left outer join, a right outer join returns all records from the second (right-hand) table in a join that meet any conditions set in the WHERE clause, whether or not there’s a match in the first (left-hand) table:

mysql> SELECT           t1.description AS ‘Manufacturer’,
->                  t2.description AS ‘Model’
-> FROM             manufacturer    t1
-> RIGHT JOIN       model           t2
-> ON               t1.id = t2.manufacturer_id
-> WHERE        t2.description = ‘Custom’;
+————–+——–+
| Manufacturer | Model  |
+————–+——–+
| (NULL)       | Custom |
+————–+——–+
1 row in set (0.00 sec)
The ‘Custom’ model is returned even though there is no associated manufacturer.

Cross-join

The cross-join, also referred to as a Cartesian product, returns all the rows in all the tables listed in the query.  Each row in the first table is paired with all the rows in the second table.  This happens when there is no relationship defined between the two tables.

Note that, most of the time, we do not want a Cartesian join, and we end up with one because we failed to provide a filter on the join.  Result sets can get large quickly because the amount of data in the select is the number of rows in Table A multiplied by the number of rows in Table B. If you have more than two tables this multiplies at an exponential rate.

If we actually want a Cartesian join, then we should use the ANSI cross join to tell others reading the script that we actually wanted a Cartesian join. So why would we want one? One reason might be to produce all the combinations of 1, 2 and 3, which could be used as part of a password or ID generation process:

numbers
num
1
2
3

mysql> SELECT CONCAT(   CAST(t1.num AS CHAR),
    ->                  CAST(t2.num AS CHAR)) AS combinations
    -> FROM numbers t1, numbers t2;
+--------------+
| combinations |
+--------------+
| 11           |
| 21           |
| 31           |
| 12           |
| 22           |
| 32           |
| 13           |
| 23           |
| 33           |
+--------------+
9 rows in set (0.00 sec)

Now you’ve got every permutation of number combinations for two digits!

Knowing how to link tables is of great assistance in extracting data from normalized databases, but it may not always be enough. There will be times that no combination of joins will suffice to properly filter the data. In those cases, it may be necessary to use temporary tables. We’ll be looking at those in the next article.

manufacturer
id description
1 ACURA
2 ALFA ROMEO
3 ASUNA
4 AUDI
5 BMW
6 BUICK
7 CADILLAC
8 CHEVROLET
9 CHRYSLER
10 DAEWOO
11 DODGE
12 EAGLE
13 FORD
14 GEO
15 GMC
16 HONDA
17 HYUNDAI
18 INFINITI
19 ISUZU
20 JAGUAR
21 LADA
22 LEXUS
23 LINCOLN
24 MAZDA
25 MERCEDES-BENZ
26 MERCURY
27 NISSAN
28 OLDSMOBILE
29 PLYMOUTH
30 PONTIAC
31 PORSCHE
32 SAAB
33 SATURN
34 SUBARU
35 SUZUKI
36 TOYOTA
37 VOLKSWAGEN
38 VOLVO
39 JEEP
40 LAND ROVER
41 (NULL)
42 New Car Co
model
id *manufacturer_id description
1 ACURA INTEGRA
2 ACURA CL
3 ACURA LEGEND
4 ACURA RL
5 ACURA NSX
6 ACURA TL
7 ACURA VIGOR
8 ACURA EL
9 ASUNA SEDAN
10 ASUNA HATCHBACK
11 ACURA NSX-T
12 ALFA ROMEO SEDAN LS
13 ALFA ROMEO SPIDER
14 AUDI A4
15 AUDI A6
16 AUDI A8
17 AUDI S4
18 AUDI S6
19 AUDI 90 SERIES
20 AUDI CABRIOLET
21 AUDI 100 SERIES
22 AUDI V8 QUATTRO
23 BMW 318
24 BMW 323
25 BMW 328
26 BMW M3
27 BMW 320
28 BMW 325
29 BMW Z3 ROADSTER
30 BMW ROADSTER M 3.2
31 BMW 528
32 BMW 540
33 BMW 525
34 BMW 530
35 BMW M5
36 BMW 535
37 BMW 740
38 BMW 750
39 BMW
40 BMW 840
41 BMW 850
42 (NULL) Custom

Or for further references, check out the following resources:

No comments yet

Leave a Reply