Lab 4

From bd_en
Jump to: navigation, search

Objectives: the JOIN operator - showing information from several tables


Cartesian product of two tables

To show information from several tables we have two possibilities:

  1. the use of set operators (union, intersection, subtraction) - build tables vertically
  2. the use of the Cartesian product or of the derived join operators - build data horizontally.

All the queries in previous labs used only one table within the FROM clause. Try, subsequently, the following queries:

SELECT * FROM students; --how many rows returned?

SELECT * FROM grades; --how many rows returned?

SELECT * FROM students, grades; --how many rows returned? why?

SELECT * FROM students, grades WHERE fname='Andrei'; --how many rows returned? why?

SELECT * FROM students, grades, courses; --how many rows returned? why?

If we use several table names within the FROM cluase, the result corresponds to the Cartesian product of the tables. Consequently, the number of rows returned by the query is given by the product of the number of rows of the tables involved in the operation.

The above queries, which allow several table names within the FROM clause, is specific to Oracle. However, to obtain the Cartesian product in other RDBMSs, you must use the following syntax which is defined by the SQL standard (also applicable to Oracle, please test it):

SELECT * FROM students CROSS JOIN grades;

SELECT * FROM students CROSS JOIN grades WHERE fname='Andrei';

SELECT * FROM students CROSS JOIN grades CROSS JOIN courses;

Natural Join of two tables

The records in one table T1 may have corresponding/related rows in other tables (T2, T3,...); in this situation an attribute in T1 must play as a foreign key referencing a candidate key in T2,T3,... For example, attribute id_stud in table grades is used to identify the student in table students, while attribute id_course in table grades identifies the course in table courses.

To obtain the grades of a given student, it is not sufficient to show the information in both tables - students and grades (see the example above where asking for the grades of student Andrei we obtained a wrong result containing all the grades - result of the cartesian product), but we also have to specify the join condition: the attribute playing as foreign key in one table must be equal to the attribute playing as an identifier in the second table:

SELECT * FROM students, grades WHERE students.id_stud = grades.id_stud ORDER BY lname; --how many rows returned? why?

Again, the above statement is particular to Oracle. The operation executed was the internal join of two tables, corresponding actually to those rows in the Cartesian product which satisfy the condition stated in the WHERE clause. It might be useful to think about join as a cartesian product followed by a selection - in terms of relational algebra; however, the DBMS solves joins with dedicated algorithms - presented in future course lectures.

The SQL standard, which should be implemented by all RDMSs provides explicit syntax/keywords (JOIN...ON) as follows:

SELECT * FROM students JOIN grades ON students.id_stud = grades.id_stud ORDER BY lname;

SELECT * FROM students JOIN grades ON students.id_stud = grades.id_stud WHERE fname='Andrei' --compare this result with the one in the section for Cartesian product

SELECT * FROM students JOIN grades ON students.id_stud = grades.id_stud WHERE fname='Ioana'; --how many rows compared to the previous query and why?

The names of the attributes in the WHERE clause were prefixed with names of the tables. This is mandatory only when the two attributes playing as foreign and candidate key have identical names; otherwise, table names may be skipped.

As an alternative, when the names of the attributes in the join condition are identical, we can use the following syntax which does not require stating the join condition and which corresponds to the natural join operator in relational algebra:

SELECT fname, lname, value FROM students NATURAL JOIN grades WHERE fname='Ioana';

Alias for tables

When in a query there are involved several tables which have attributes with identical names, any call of an attribute must be prefixed with the table name in order to eliminate ambiguity.

Try the following:

SELECT fname, lname, value, id_stud FROM students JOIN grades ON students.id_stud=grades.id_stud WHERE fname = 'Ioana'; --why not working ?

SELECT fname, lname, value, students.id_stud FROM students JOIN grades ON students.id_stud=grades.id_stud WHERE fname = 'Ioana';

In the case of natural join, the DBMS knows that on attributes with identical names, it must apply the equality operator as a join condition. In this case there is no need to prefix any attribute:

SELECT fname, lname, value, id_stud FROM students NATURAL JOIN grades WHERE fname='Ioana';

Any attribute name which belongs to more than one table in a query which is not a natural join, must be prefixed. However, we may find tedious repeating (long) table names; in such cases, we may declare simple aliases for these in the FROM clause - this operation corresponds to the renaming operator in relational algebra:

SELECT fname, lname, value, s.id_stud FROM students s JOIN grades g ON s.id_stud=g.id_stud;

ATTENTION: introducing aliases for table names forces us to use them whenever we need to use table names within the query.

Try:

SELECT fname, lname, value, students.id_stud FROM students s JOIN grades g ON s.id_stud=g.id_stud;

Outer/External join

Let's show, uniquely, the last names of the students which result from the join of the tables students and grades:

SELECT DISTINCT lname FROM students NATURAL JOIN grades;

How many rows were returned? Run a query to see how many distinct names are in the students table. Where comes the difference from?

There are cases in which not all the records in one table have correspondents in the other table. In our example, not all the students have grades; the students not appearing in table grades did not appear in the result of the above natural join. We say about table grades that it is deficient in information because it does not contain all the student ids stored in table students. If we want all the information in one of the tables to be shown, although lacking correspondents in the other table, we may use the Outer Join.

The outer join is of several types:

  • LEFT OUTER JOIN is used to take all the data in the left table; the rows in the left table that cannot be associated with rows form the right table, receive as values for the attributes in the right table NULL:

SELECT DISTINCT lname FROM students s LEFT OUTER JOIN grades g ON s.id_stud = g.id_stud; SELECT * FROM students s LEFT OUTER JOIN grades g ON s.id_stud = g.id_stud;

  • RIGHT OUTER JOIN is similar to LEFT OUTER JOIN but the table deficient in information is at the right: SELECT DISTINCT lname FROM grades g RIGHT OUTER JOIN students s ON s.id_stud = g.id_stud;

The two versions of external join can also be obtained in Oracle as follows:

SELECT DISTINCT lname FROM students s, grades g WHERE s.id_stud = g.id_stud(+);

Symbol (+) is added on the deficient side of the join condition.

  • FULL OUTER JOIN is the union of the left and right outer join: all the rows in the two tables will be shown.

More at: http://www.techonthenet.com/oracle/joins.php

Nice view of join types: http://i.stack.imgur.com/udQpD.jpg

Joining more than two tables

All of the examples above express joins over two tables - students and grades. How can we find out the course names corresponding to the shown grades? We add a second join in the query, with table courses:

SELECT fname, lname, course_name, value 
   FROM students s 
              JOIN grades g ON s.id_stud=g.id_stud 
              JOIN courses c ON c.id=g.id_course;

Self-join

Sometimes the information is structured in such a way that we need to look for information related to one record, in the very same table. In our faculty database, we need to do this when we want to identify the colleagues of a given student (students that are assigned to the same groupno). In the E/R model, this would correspond to a recursive relationship. Every time we need to call twice a table name in a query, we perform a self-join. In such cases, the two instances of the same table name must be renamed by declaring distinct aliases. It might be helpful to think about self-join as joining two duplicates of the same table.

Example:

SELECT s.fname || ' ' || s.lname || ' is a colleague of ' || coll.fname || ' ' || 
  coll.lname AS "Andrei's colleagues of study" 
    FROM students s JOIN students coll ON s.groupno=coll.groupno AND s.year=coll.year
    WHERE s.fname='Andrei';

How can we eliminate the row where the student is a colleague of himself?

Exercises

  1. Run a query to retrieve all students' grades in the database. Show the student's last and first names, his/her grade and the last name for the instructor who graded him/her.
  2. Show the first and last names for the students who received a 10 at 'DB'.
  3. Show the instructors (last and first names) and all the information about the courses they teach.
  4. Modify the query at 3, in order to add the instructors that do not teach any courses.
  5. Modify the query at 3, in order to add the courses that are not assigned to any instructor.
  6. Modify the query at 3 in order to show both the instructors that do not teach any course and the courses that are not assigned to any instructor yet.
  7. In students table there exist students which were born in the same week day. For example, Cobzaru George and Pintescu Andrei were both born in a Tuesday. Create a list containing pairs of students that were born in the same week day; do not allow duplicates as {Cobzaru-Pintescu, Pintescu-Cobzaru}. Only the last names will be shown. The list will be sorted accordingly to the week day and if more then 2 students were born in the same week day the rows will be sorted based on the last name of the first student in the pair. (the query should return 11 rows - take care not to have the same row two times).
  8. Show, for each student, the names of his/her colleagues who received higher grades. Format the result as follows: "Popescu Gigel received a higher grade compared to Vasilescu Ionel at DB." Give an appropriate column alias [the query should return 118 rows].
  9. Show all pairs of students and the difference in their age in days. Order ascending based on age differences. Do not compare a student with himself.
  10. Show possible connections between students and instructors. A student and an instructor may be connected if their family names (last_name) have the same length (number of characters). Print student's and instructor's last names.
  11. Show the (difficult) courses for which all students' grades are less than or equal to 8.
  12. Show the students who have all the grades greater than or equal to 7.
  13. Show the students who received a 7 or a 10 at OOP in a Tuesday.
  14. An exam session is identified by the month and the year it was organized. Write the first and last names for the students who passed the exams, the grades they received and the exam session as a string "MONTH, YEAR" (eg."JUNE, 2015"). Create one extra column where you will show the "+" sign if the month has less than thirty days (null entry otherwise).