Wednesday, June 2, 2010

Rows to Colums - 10g and below

Once again, I needed to produce a "pivot query", one in which rows appeared as columns.  And once again,I had to start nearly from the beginning to rediscover the trick.  So ... I decided to write a post explaining it (to me at least) once and for all.
Of course, in 11g this is somewhat simpler (especially in 11gR2).  However, the priciples involved with the ANSI standard method are still at play with the 11g PIVOT, UNPIVOT, and LISTAGG clauses.  So, this post is still worth posting

So, suppose we need to produce this output from the EMP table:
 POSITION DEPT10     DEPT20     DEPT30
--------- ---------- ---------- ----------
        1 KING       SCOTT      BLAKE
        2 CLARK      FORD       ALLEN
        3 MILLER     JONES      TURNER

In other words, we want to list the top three employees in terms of salary in departments 10 - 30, and we want to list them accross the screen.  The first point is that we are asking for 4 columns, and that means we will need 4 columns in our select list.  So, we are going to have to start with something like this in our query:
SELECT POSITION, DEPT10, DEPT20, DEPT30 ....

Not that this has to be the final form of our query.  Indeed, all of these column names will probably turn out to be aliases.  However, the SELECT clause must contain these four columns.

Now, we have some limitations if we are going to fetch 4 columns from a single row from a single table.  Either we have 4 suitable columns, or we are able to transform 4 columns into suitable columns.  This is required from SQL.  And here is the problem:  There is no way to get that output from a single row in a single table.

One solution is to build a longer table with some self-joins.  This would be something like
SELECT POSITION, A.ENAME DEPT10, B.ENAME DEPT20, C.ENAME DEPT30
  FROM EMP A, EMP B, EMP C
  WHERE A.DEPTNO = 10
    AND B.DEPTNO = 20
    AND C.DEPTNO = 30
    AND

and the rest is left as an exercise.
Sometimes this is relatively easy.  Sometime - it is not.  This is one of the latter cases.
If building a longer single row from a single table doesn't work, there is another approach (which is the trick):  It is to select from a group of rows and use aggregate functions (typically min, max, sum, or count) to produce a single row of output.  For example, if we could somehow produce this result set from a subquery:
    DEPTNO      SEQ_NO  ENAME
    ------        ---   -----
      20           1    SCOTT
      10           1    KING
      30           1    BLAKE

Where all the 1st place employees are in the same group, it would become easy to produce the desired output.  The Dept. 10 column is just
     max(decode(deptno,10,ename))

and the enire SELECT statement becomes
select seq_no placement,
       max(decode(deptno,10,ename)) dept10,
       max(decode(deptno,20,ename)) dept20,
       max(decode(deptno,30,ename)) dept30
   from ...

The subquery is a little more complicated.  We need to rank all of the employees by salary in their department, and we don't want any duplicate rankings or gaps.  This query does the trick:
          select deptno,
              row_number() over (partition by deptno order by sal desc) seq_no,
             ename,
             sal
          from emp

So, the complete statement is
select seq_no placement,
       max(decode(deptno,10,ename)) dept10,
       max(decode(deptno,20,ename)) dept20,
       max(decode(deptno,30,ename)) dept30
   from (
          select deptno,
              row_number() over (partition by deptno order by sal desc) seq_no,
             ename,
             sal
          from emp
   )
where seq_no < 4
group by seq_no
order by seq_no
/

With output:
 PLACEMENT DEPT10     DEPT20     DEPT30
---------- ---------- ---------- ----------
         1 KING       SCOTT      BLAKE
         2 CLARK      FORD       ALLEN
         3 MILLER     JONES      TURNER

Note that there is a GROUP BY clause in the main query.  This is the essential part of the trick: we need a group by clause to create a grouping so that we can then place multiple items in the select list. If we leave out the groupings (and also the max functions) we get
 PLACEMENT DEPT10     DEPT20     DEPT30
---------- ---------- ---------- ----------
         1            SCOTT
         1 KING
         1                       BLAKE
         2 CLARK
         2            FORD
         2                       ALLEN
         3 MILLER
         3            JONES
         3                       TURNER

Summary:  Group on whatever remains constant, and use MAX and DECODE to filter out the proper columns.