Lab 10

From bd_en
Jump to: navigation, search

Correlated Subqueries

As we saw in the previous lab, subqueries are mostly used for filtering purposes. We have used so far only uncorrelated subqueries, which are executed once per query, as in the following example:

 SELECT * 
   FROM students NATURAL JOIN grades
   WHERE value >= 
         (SELECT ROUND(AVG(value)) 
            FROM grades
         );

In the case of correlated subqueries, the subquery is evaluated once for each row returned by the external query. A subquery contains references to at least one field generated by the external query.

Consider the following:

SELECT lname, fname, scholarship, year
   FROM students s1 
   WHERE scholarship >= 
         (SELECT AVG(scholarship) 
             FROM students s2 
                GROUP BY year HAVING s1.year = s2.year
         );

This query returns all the students having a scholarship greater than or equal to the average scholarship earned by their colleagues in the same year of study. For each row processed by the external query, the subquery is executed.

Oracle considers a subquery to be correlated if it references at least one field from the main query.

EXISTS

Sometimes we are not interested in the values retrieved by the correlated subquery; we are only interested if the subquery returns at least one row. For example we are interested to show all the students who have in their study group colleagues earning scholarships. We could solve it as follows:

 SELECT lname, fname, scholarship, groupno, year
    FROM students s1 
    WHERE (
        SELECT COUNT(*) 
           FROM students s2 
           WHERE scholarship IS NOT NULL AND 
                 s1.year = s2.year AND 
           s1.study_group=s2.study_group
        ) > 0;


The count operator requires all the table to be read so that all the students earning scholarships will be identified. But we are not interested in the exact number students earning scholarships; it is sufficient to know if one exists.

A very important operator comes into place to efficiently solve such situations:

 SELECT lname, fname, scholarship, groupno, year
     FROM students s1 
     WHERE EXISTS (
         SELECT * 
            FROM students s2 
            WHERE scholarship IS NOT NULL AND 
                  s1.year= s2.year AND 
                  s1.groupno=s2.groupno
         );

This time, the subquery scans the table only to the first student earning a scholarship is retrieved. To optimize even more, we can use in the SELECT list of the subquery constants:

 SELECT lname, fname, scholarship, groupno, year
     FROM students s1 
     WHERE EXISTS (
         SELECT 1 
             FROM students s2 
             WHERE scholarship IS NOT NULL AND 
                   s1.year = s2.year AND 
                   s1.groupno=s2.groupno
         );

What do you think will happen in case of using the NOT EXISTS operator, with regard to table scan?

Exercises

  1. Show all the students who have in their year of study at least one colleague younger (look at the date_of_birth).
  2. Show all the students who have their average grade higher than the average grade of all the students in the same year of study. Show their last and first names and their average grade.
  3. For each study group, show the last name, the first name (and the study_group) for the best students (with the best average grade).
  4. Show all the students who have at least one colleague in his/her year who has the same grade (value) for at least one course.