I’m studying SQL and I’ve got a question that hopefully someone can answer. I normally do multi-table SQL queries like this:
select table1.stuff,table2.stuff from table1,table2 where table1.field=table2.field
and it works fine for retrieving, say, one row of info from related tables. Note that I don’t use “join” in that query.
So now I’m studying “join” and I understand what joins, inner joins, left joins, and right joins are and what they do. Can anyone explain to me when you would possibly use them, or why? It seems to me that a query which returns more then you actually want, ie. rows that are returned even though they may not contain relevant data, is wasteful of resources. Why not build a more intelligent where clause in a query without “join” clauses?
Basically I’m looking for examples of when you’d use this as opposed to not using them at all (as my example above).
The other joins exists because sometimes an equijoin can’t do the job. So sometimes it’s better to have one query that returns more rows than to require two queries to get all the required results.
Here’s an example . Let’s assume two tables, ORDERS and ORDER_DETAILS. The primary key of ORDERS is ORDERID. This is the foreign key of the ORDER_DETAILS table. Rows in ORDERS may have zero, one, or many child records in ORDER_DETAILS.
Now an order fulfillment application needs to query all orders and associated details for bulk processing. Trying it with an equijoin, say something like SELECT * FROM ORDERS A, ORDER_DETAILS B where B.ORDERID=A.ORDERID wouldn’t work because records from ORDERS with zero child records would be excluded. It could be done with the equijoin statement and then another statement to grab everything childless. The most suitable way however would be a left outer join.
Joins are a way of telling the query engine what you are trying to do. In your above example (no explicit join), you are saying: OK, first (conceptually) form the Cartesian product of table1 and table2 (that is, form the list of every possible pair of rows from table1 and table2), then go through that list and remove those that don’t have field matching field.
Whereas with a join:
select stuff from table1 inner join table2 on table1.fieldA=table2.fieldB
the query engine immediately knows what you are trying to do and can so form a better plan for finding the matching rows.
(Also as noted, the outer joins have a more concise syntax than the equivalent where clauses)
Since SQL is loosely speaking a mangled form of relation calculus, there are inevitably many ways to express the same idea - a lot of real world SQL work is working out how to phrase it right for the particular query engine you are dealing with. As optimization technology gets better this matters less, but it’s still a good idea to be as clear as possible when talking to your engine.
So does this mean that using a join is faster than using a from/= statement?
It seems that the joins can be wasteful. For instance in Nick’s example, you’re retrieving a result row for every item (and NULLs) in the order_details table. This means that the info from the main orders table is repeated in each and every related order_details result, yes?
I can see times when that is useful and times when it would suck. Thanks for the examples, dudes.
The where clause thing is just an alternate syntax for joins in modern databases. It’s not really recommended anymore as the query compiler have a harder time creating an optimized query out of it in complex cases than they do for bog standard join syntax.
left and right joins are also very useful when trying to find records that should have dependent data but don’t.
to expand nick’s example, say that you have two tables, ORDER and ORDERDETAIL. for the data to make sense, you need to have data in both tables. to find examples where you have data in ORDER but not in ORDERDETAIL, do this:
select * from ORDER join ORDERDETAIL
on ORDER.OrderID = ORDERDETAIL.OrderID
where ORDERDETAIL.SomeColumnThatShouldNeverBeNull is NULL
alternately, if your foreign keys get fucked up, you could end up with records in ORDERDETAIL that don’t have corresponding records in ORDER. a right join would find those.
left joins are also good for verifying cross-server data manipulation.
For instance in Nick’s example, you’re retrieving a result row for every item (and NULLs) in the order_details table. This means that the info from the main orders table is repeated in each and every related order_details result, yes?
your syntax is also going to do this. the equivalent of a left join is using *= in the where clause.