SQL question - join?

I’m studying SQL and I’ve got a question that hopefully someone can answer. I normally do multi-table SQL queries like this:

select table1.stuff,table2.stuff from table1,table2 where table1.field=table2.field

and it works fine for retrieving, say, one row of info from related tables. Note that I don’t use “join” in that query.

So now I’m studying “join” and I understand what joins, inner joins, left joins, and right joins are and what they do. Can anyone explain to me when you would possibly use them, or why? It seems to me that a query which returns more then you actually want, ie. rows that are returned even though they may not contain relevant data, is wasteful of resources. Why not build a more intelligent where clause in a query without “join” clauses?

Basically I’m looking for examples of when you’d use this as opposed to not using them at all (as my example above).

what happens when there isn’t matching data in table2 to match table1?

where table1.field=table2.field

now is:

where table1.field=null

So it will fail and you won’t get any data.

This is just one example where you would want to use a join.

Pull in all the data from table1, and the data from table2, where table1.field = table2.field

select table1.stuff,table2.stuff from table1 left join table2 on table1.field=table2.field

Why?? Say you have tables: Employee, and dependants.

You want to get a list of all potential attendants to an event; Employees and their dependants. With out the join, employees that have no dependants wouldn’t be in the result set.

The other joins exists because sometimes an equijoin can’t do the job. So sometimes it’s better to have one query that returns more rows than to require two queries to get all the required results.

Here’s an example . Let’s assume two tables, ORDERS and ORDER_DETAILS. The primary key of ORDERS is ORDERID. This is the foreign key of the ORDER_DETAILS table. Rows in ORDERS may have zero, one, or many child records in ORDER_DETAILS.

Now an order fulfillment application needs to query all orders and associated details for bulk processing. Trying it with an equijoin, say something like SELECT * FROM ORDERS A, ORDER_DETAILS B where B.ORDERID=A.ORDERID wouldn’t work because records from ORDERS with zero child records would be excluded. It could be done with the equijoin statement and then another statement to grab everything childless. The most suitable way however would be a left outer join.

Joins are a way of telling the query engine what you are trying to do. In your above example (no explicit join), you are saying: OK, first (conceptually) form the Cartesian product of table1 and table2 (that is, form the list of every possible pair of rows from table1 and table2), then go through that list and remove those that don’t have field matching field.

Whereas with a join:

select stuff from table1 inner join table2 on table1.fieldA=table2.fieldB

the query engine immediately knows what you are trying to do and can so form a better plan for finding the matching rows.

(Also as noted, the outer joins have a more concise syntax than the equivalent where clauses)

Since SQL is loosely speaking a mangled form of relation calculus, there are inevitably many ways to express the same idea - a lot of real world SQL work is working out how to phrase it right for the particular query engine you are dealing with. As optimization technology gets better this matters less, but it’s still a good idea to be as clear as possible when talking to your engine.

So does this mean that using a join is faster than using a from/= statement?

It seems that the joins can be wasteful. For instance in Nick’s example, you’re retrieving a result row for every item (and NULLs) in the order_details table. This means that the info from the main orders table is repeated in each and every related order_details result, yes?

I can see times when that is useful and times when it would suck. Thanks for the examples, dudes.

The where clause thing is just an alternate syntax for joins in modern databases. It’s not really recommended anymore as the query compiler have a harder time creating an optimized query out of it in complex cases than they do for bog standard join syntax.

Yep, that’s what it means. It’s not wasteful if that’s the data the application needs. Which it sometimes is. If the application doesn’t need that data, then other queries become more appropriate.

I kinda knew the answer but thanks for the additional explanations. :)

left and right joins are also very useful when trying to find records that should have dependent data but don’t.

to expand nick’s example, say that you have two tables, ORDER and ORDERDETAIL. for the data to make sense, you need to have data in both tables. to find examples where you have data in ORDER but not in ORDERDETAIL, do this:

select * from ORDER join ORDERDETAIL
on ORDER.OrderID = ORDERDETAIL.OrderID
where ORDERDETAIL.SomeColumnThatShouldNeverBeNull is NULL

alternately, if your foreign keys get fucked up, you could end up with records in ORDERDETAIL that don’t have corresponding records in ORDER. a right join would find those.

left joins are also good for verifying cross-server data manipulation.

For instance in Nick’s example, you’re retrieving a result row for every item (and NULLs) in the order_details table. This means that the info from the main orders table is repeated in each and every related order_details result, yes?

your syntax is also going to do this. the equivalent of a left join is using *= in the where clause.