Phil's Oracle Musings: 2007

Thursday, December 27, 2007

On Uncommitted Reads

I've decided to give up trying to post detailed articles. The reason for this is simple: I find that I cannot post unless and until I feel that the article is perfect. The main thesis is precisely crafted. All objections are accounted for and answered. The length of the article is long enough to make it worthwhile. And by the time I am close to such an article, it seems to be irrelevant.

So, I am going to change focus and post rants. Rants can be long or short. Rants do not have to be perfectly balanced. Rants are doable in a short period of time.

What got me interested in ranting was a recent posting to the Oracle-L mailing list. Someone responded to a rant against uncommitted reads with the statement that "values end up committed over 99% of the time, and sometimes getting a current value is more important than getting a consistent value".

My immediate reaction was: "well, what about getting a correct value"? There isn't enough space to post this to the list, but here I can give a couple of examples where using the first part of an update without waiting for the rest can give very incorrect results. I'm going to the the standard EMP table, and assume that it is used by a real company, with a real system.

Let us assume that this is the key query in the system. There is a stored procedure which create a ref cursor for this query. Almost all parts of the application which access the EMP table do so through this query:

SELECT substr(lpad(' ', 2*level-1)||ename,1,15) name,
empno,mgr, job, sal + nvl(comm, 0) tc
FROM emp
START WITH mgr IS NULL
CONNECT BY mgr = PRIOR empno

with these results:

KING	7839		PRESIDENT	5000
JONES	7566	7839	MANAGER	2975
SCOTT	7788	7566	ANALYST	3000
ADAMS	7876	7788	CLERK	1100
FORD	7902	7566	ANALYST	3000
	7369	7902	CLERK	800
BLAKE	7698	7839	MANAGER	2850
ALLEN	7499	7698	SALESMAN	1900
WARD	7521	7698	SALESMAN	1750
MARTIN	7654	7698	SALESMAN	2650
TURNER	7844	7698	SALESMAN	1500
JAMES	7900	7698	CLERK	950
CLARK	7782	7839	MANAGER	2450
MILLER	7934	7782	CLERK	1300

So, the edict comes down that every one in Accounting (dept. 10) is to be given a 5% raise. Naturally, this is to be accompanied by a corresponding decrease to everyone in Research (dept. 20). (Somewhere, there must be a company where the Research budget is not regularly raided for the Finance budget, and I welcome any hints as to where that might be). Now, while it is possible to write this as a single UPDATE statement, it is more likely that whoever in Accounting is doing the update (Research refused to get involved) will do it in several steps. When everything is complete, the results should be:

KING	7839		PESIDENT	5250
JONES	7566	7839	MANAGER	2887.5
SCOTT	7788	7566	ANALYST	2912.5
ADAMS	7876	7788	CLERK	1012.5
FORD	7902	7566	ANALYST	2912.5
SMITH	7369	7902	CLERK	712.5
BLAKE	7698	7839	MANAGER	2850
ALLEN	7499	7698	SALESMAN	1900
WARD	7521	7698	SALESMAN	1750
MARTIN	7654	7698	SALESMAN	2650
TURNER	7844	7698	SALESMAN	1500
JAMES	7900	7698	CLERK	950
CLARK	7782	7839	MANAGER	2572.5
MILLER	7934	7782	CLERK	1365

But, if the database allows for uncommitted reads, it is possible for someone to catch the pay increases but not the pay cuts, and find this:

KING	7839		PRESIDENT	5250
JONES	7566	7839	MANAGER	2975
SCOTT	7788	7566	ANALYST	3000
ADAMS	7876	7788	CLERK	1100
FORD	7902	7566	ANALYST	3000
SMITH	7369	7902	CLERK	800
BLAKE	7698	7839	MANAGER	2850
ALLEN	7499	7698	SALESMAN	1900
WARD	7521	7698	SALESMAN	1750
MARTIN	7654	7698	SALESMAN	2650
TURNER	7844	7698	SALESMAN	1500
JAMES	7900	7698	CLERK	950
CLARK	7782	7839	MANAGER	2572.5
MILLER	7934	7782	CLERK	1365

Now, this is similar to the standard argument against uncommitted reads (a bank teller transfers money from one account to another), which I must assume the original poster is familiar with, and is unlikely to change his mind. To try for something different, let us return to the original set of data, and consider this business mandate: Dept. 30 (SALES) is getting two new employees, who will be sales supervisors. They will become the only ones reporting to BLAKE; the rest of the salesforce will report to one of these new supervisors. The new supervisors are MUTT and JEFF, and they will have numbers 7042 and 7420 respectively. Naturally, the Accounting department decides to reassign the salesforce before putting MUTT and JEFF into the system. So, the following statements gets executed:

UPDATE emp
SET mgr = 7042
WHERE empno in ( 7499, 7521);

UPDATE emp
SET mgr = 7420
WHERE empno in (7654, 7844, 7900);

Followed by these inserts:

INSERT INTO emp VALUES (7042,'MUTT','SUPER',7698,SYSDATE,2800,NULL,30);

INSERT INTO emp VALUES (7420,'JEFF','SUPER',7698,SYSDATE,2800,NULL,30);

With the following result:

KING	7839		PRESIDENT	5250
JONES	7566	7839	MANAGER	2887.5
SCOTT	7788	7566	ANALYST	2912.5
ADAMS	7876	7788	CLERK	1012.5
FORD	7902	7566	ANALYST	2912.5
SMITH	7369	7902	CLERK	712.5
BLAKE	7698	7839	MANAGER	2850
MUTT	7042	7698	SUPER	2800
ALLEN	7499	7042	SALESMAN	1900
WARD	7521	7042	SALESMAN	1750
JEFF	7420	7698	SUPER	2800
MARTIN	7654	7420	SALESMAN	2650
TURNER	7844	7420	SALESMAN	1500
JAMES	7900	7420	CLERK	950
CLARK	7782	7839	MANAGER	2572.5

Except, if uncommitted reads are allowed, and someone ran the standard query before the inserts were performed, they would get this result:

KING	7839		PRESIDENT	5250
JONES	7566	7839	MANAGER	2887.5
SCOTT	7788	7566	ANALYST	2912.5
ADAMS	7876	7788	CLERK	1012.5
FORD	7902	7566	ANALYST	2912.5
SMITH	7369	7902	CLERK	712.5
BLAKE	7698	7839	MANAGER	2850
CLARK	7782	7839	MANAGER	2572.5
MILLER	7934	7782	CLERK	1365

Which is not even close to being right, and anything which relied on this result set will be very wrong indeed. The first example can (just barely) be excused as being incorrect, but in a not yet up-to-date sense (everything is there, but some rows are not yet up to date). This last example is just plain wrong. Instead of being 99.9% correct, it is maybe 60% correct. In other words, allowing uncommitted reads can result in very wrong results.

Monday, April 9, 2007

FAQ On Transaction Locks

Where I work, we are migrating an application from an old, proprietary hierarchal data base to Oracle. Naturally, the question of how locking differs between the two has come up. So, I would like to post here a little discussion of the minimum people need to know about Oracle. And, to keep matters simple, I am primarily going to discuss what Oracle calls a DML TX lock, or simply a TX lock. It is the famous Oracle row lock. Oracle has many other locks, but these are the locks which control the modification of data, and are what most people think of, when they think of "locks".

The first thing to understand about Oracle and locks, is that, Oracle does not have locks. This is wrong, of course; if nothing else, my opening paragraph contradicts it. However, for someone used to what other database systems call "locks", it is closer to the truth than saying that Oracle does things differently.

Such a bold statement demands some evidence. Here, then, is how Oracle handles data locking. At the top end of every data block there is an area known as the "Block Overhead." Part of this area is known as the Interested Transaction List. Every transaction which is modifying data in a block enters its transaction id in this list. Another part of the Block Overhead is known as the row directory. Every row which is modified has a pointer set to indicate which transaction has modified it. When another transaction comes along, it inspects the ITL and the row directory. If the row it wishes to modify is held by an active transaction, it waits. Otherwise, it enters its transaction id in the ITL and modifies the row. Simple, really.

Actually, this paragraph has probably raised more questions than it has answered. Rather than deal with them in an organized fashion, I will deal with them in an FAQ type of format.

1) So why do you say these are not "locks"?
It's because of what is missing from other database systems. There is no lock manager. It is not even possible to get a list of all locks currently held in the instance. Locks do not affect anyone who just wants to read the data. Waiters do not time out. Locks are never escalated. Very different.

2) This must be horribly inefficient.
Well, no. Consider the first modifier. Besides modifying data (which must be done in any event; this transaction is committed to modifying data), the transaction must inspect and modify two items in the data block. This data block is already in memory, or else the data could never get modified in the first place. So, there is very little additional overhead here.

Now consider the second modifier. Most of the time, it will not encounter a locked row (unless the application was written to have one small table which everyone constantly updates. In this case, every database system will have a problem.) If it does encounter a locked row, it will have done very little extra work to find out that it is locked. It will then wait for the lock to be released, which means that it is not finding any other locked rows.

3) And if it does?
It waits again. But the chances are that the commit or rollback which released the blocking row also released any future rows the second transaction intends to modify.

4) But it must take a lot of work to clean up all those block headers after a commit?
Remember what happens during a commit? A new scn is generated, and a commit record is written to disk, along with the contents of the redo buffers. No mention of cleaning up block headers.

5) So when do they get cleaned up?
The next request for the block (read or write) examines the block header. If there is an inactive transaction in the ITL, the block header gets cleaned up at this point.

6) So a block header could stay "dirty" for hours?
If not longer. And who cares? The only way anyone would know would be to read the block but then it would get cleaned out and the problem would go away.

7) Must be hard, having to read all those data blocks just to find out all the locked rows.
Which is why there is no table listing all locked rows.

8) I see there is a view named V$LOCK. What good is it.?
It has information about transactions which have locks. If you update a row, you will find an entry regarding your transaction in V$LOCK. If you update many rows in a block, you will still have just one row. If someone else comes along and tries to update one of your rows, that will create another row in V$LOCK.

9) I have a hanging update statement. If V$LOCK won't tell me if someone else has locked my row and is blocking me, what will?
The best way is to go to the V$SESSION table, and look at these columns for your session:

ROW_WAIT_OBJ#: If the session is not involved in a locking event, this column will have a value of '-1'. If it is, then it will contain the data object id of the blocking table, and the name (and owner) of the table can be found by querying DBA_OBJECTS (or ALL_OBJECTS, if you do not have permissions on DBA_OBJECTS. Unfortunately, the converse is not true: ROW_WAIT_OBJ# can differ from '-1' and the session not be blocked by a row lock.

ROW_WAIT_FILE#, ROW_WAIT_BLOCK#, ROW_WAIT_ROW#: Assuming that you have located the name of the table containing the locked row, you can then view the row with this query:

select * from name-of-table-containing-the-locked-row
where rowid = dbms_rowid.rowid_create(1, row_wait_obj#,
row_wait_file#, row_wait_block#, row_wait_row#)

This query may surprise those used to other databases. The row is locked; how can we select it? Remember, Oracle readers are never blocked by updaters.

If you are running Oracle 10g, you have an additional clue, as the V$SESSION view there includes the BLOCKING_SESSION column. If your session is blocked by a row lock, this column will contain the SID of the session which is blocking you.

10) So what good is V$LOCK?
V$LOCK is very good for identifying whether or not there are blocked sessions present. If you are a DBA with an interest in an instance, and little interest in a specific application, it can be a useful view. If you are interested in resolving a specific question about a specific session, V$SESSION is of more use.

Saturday, February 3, 2007

Do You Know Where Your Cost Leaders Are?

I mentioned in an earlier post that allowing Nulls into a database can result in queries returning the "wrong" answer. Date has written about one example [1], one which he describes as "somewhat contrived." Here I present a different example, one which is not at all contrived (although, perhaps, of less weight).

Consider the following table (yet another variation on the famous EMP table):


EMPNO        DEPTNO         SALARY
1054          100           70000
1284          200           -null-
1523          100           80000
2472          300           40000
3232          200           60000
4421          300           -null-
5231          200           -null-
5236          300           65000
6211          200           65000

The nulls are present for a variation of the usual reason: we need to get the employees in the table so that they appear to be hired to satisfy one personnel function, but the paperwork for their salaries have not yet made it through the system, and so their salaries are not yet known.

Here is the question: Which department has the largest total salary?

The simplest approach is to run a query such as

SELECT DEPTNO, SUM(SALARY) FROM EMP
GROUP BY DEPTNO

Get

100 150000
200 125000
300 105000

in response, and conclude that the answer is department 100.

Do you agree? Would it help if I presented you with a more complete description of the table?


EMPNO   DEPTNO   SALARY   CHECK ((SALARY IS NULL) OR (DEPTNO = 100
                             AND SALARY BETWEEN 50000 AND 90000)
                           OR(DEPTNO=200 AND SALARY BETWEEN 60000
                                AND 100000)
                           OR(DEPTNO=300 AND SALARY BETWEEN
                                  65000 AND 80000))
1054     100      70000   
1284     200      -null-
1523     100      80000
2472     300      40000
3232     200      60000
4421     300      -null-
5231     200      -null-
5236     300      65000
6211     200      65000

Each department has a salary range assigned. While we don't know exactly (yet) what employees 1284, 4421, and 5231 are going to be making, we know what their minimum will be. Therefore, a better list of the department salaries would be


 100       exactly 150000
 200       between 245000 and 325000
 300       between 170000 and 190000

In other words, when all the evidence is in, department 200 will be found to have the greatest salary and 100 the least. All of the information needed to arrive at the correct result is present in the database, yet there is no simple way to ask the database for this result.

But there is a complication here also. More next time.

----------------------------------
[1] Much Ado About Nothing, at http://www.dbdebunk.com/page/page/1706814.htm

Wednesday, January 31, 2007

Say What You Mean, and Mean What You Say (Part II)

Last time I rambled on, trying to set some context for how to interpret a 3rd (or 4th) truth value of "not applicable". What I was hoping to do was to make plausible the following truth table for it's conjunction:


    P^Q | T | F | NA
    ----------------
      T | T | F |  T
    ----------------
      F | F | F |  F
    ----------------
     NA | T | F | NA

This is the plausibility argument: if a predicate does not apply, then it can be ignored. Thus, T^NA becomes simply T, since the second predicate can be ignored. Obviously, one could continue in this vein, and construct a truth table for this logic's disjunction:


    PvQ | T | F | NA
    ----------------
      T | T | T |  T
    ----------------
      F | T | F |  F
    ----------------
     NA | T | F | NA

But, if we do this, we find that the distributive law does not hold. In particular, if one considers:


     P^(QvR) = (P^Q)v(P^R)

for the truth values P=True, Q=False, and R=NULL, One finds that the left and right side both evaluate to NULL in standard SQL, but the left side evaluates to False and the right to True in my hypothetical 3 valued logic. So, this is not going to become part of SQL anytime soon.

To sum up, what is the point of all this? Here is the short answer:

Values can be missing for at least two reasons: the proper value is currently unknown, and/or the value does not apply.
The SQL Null is tailored for the "unknown" reason.
Using a Null to mean "does not apply" will produce confusing results at best, and downright wrong results at worst.

Wednesday, January 24, 2007

Say What You Mean, and Mean What You Say (Part I)

The SQL NULL is defined to mean one very specific thing: Whatever this datum may be, it is currently unknown. The problem is that "unknown" does not encompass all reasons for a value being missing (and missing values was what NULLS were created for in the first place). Codd recognized this, and proposed two kinds of missing values in his relational model: missing, but not applicable, and missing, and also applicable [The article by Fabian Pascal, Understanding NULLS, a Contradiction in Terms, found here describes this in more detail, along with some arguments against having NULLS as part of SQL]. Here I want to examine how a Not Applicable situation differs from an Unknown situation.

Consider the following SQL Table (which is a variation of a table I once had to work with. And before anyone screams, this table has been stripped down to the bare essentials of the problem. And the table was actually one of several IMS segments translated verbatim into Oracle. No attempts at data normalization at this stage (that was reserved for the Data Mart):

ID	C1	C2	VAL
1	10	20	1000
2	40	40	1111
3	15	NULL	1234
4	33	NULL	2222

Again, this was part of a data warehouse of data from a legacy system. My job was to extract rows of interest for the Data Mart and then summarize them. The problem was to identify these rows.

C1 and C2 are "Code" columns. In the beginning, only C2 was in the system. Rows would be of interest if C2 was less than some supplied value, which for the purposes of this example will be taken to be 30. At a later time, C1 was added. The intention was for C1 to replace C2, and it eventually did, but not right away. At the time I came into the picture, some rows would have values for both C1 and C2, and some would only have values for C1. So the rules for what was of interest were as follows:

If C1 and C2 both exist, extract the row if both are less than the supplied value.
If only C1 exists, extract the row if it is less than the supplied value.

In our example, the rows of interest have the ID column equal to 1 and 3.

While giving the specifications in this manner might tempt a SQL to turn to PL/SQL, it is actually quite easy to express this as a single SQL predicate (using X as the supplied value):

((C1 <>

or, as I actually wrote it

((C1 <>

What this illustrates is that a very natural Truth Table for the AND operator in this three valued logic with Not Applicable is as follows (using NA for Not Applicable):


AND	T	F	NA
T	T	F	T
F	F	F	F
NA	T	F	NA

Which differs from the NULL Truth Table in the case of (T) AND (NA).

But there are more complications. More to follow.

Saturday, January 20, 2007

More on Empty Strings

There is one feature in the last post (Outer Joins and Nulls) which deserves a closer look. But first,

Consider the string 'AB'. This is far from an unknown string. We know exactly how many characters long it is. We know exactly what characters they are. About the only thing we do not know about it is why anyone would bring it up in the first place. But, to continue ...

Consider the string 'A'. Exactly the same considerations apply to it as to 'AB'. Moreover, if we were to create two variables, V1 and V2, and assign the string 'AB' to each, we would regard any computer application in which

(V1 equals V2)

was false as being very wrong headed. [I write the above as (V1 equals V2) to get around the issue of whether the operator for string equality should be '=', '==', or 'eq' .]

Proceeding to the next step, if 'AB' is well known, and 'A' is well known, then '' should be regarded as well known. We know that it is a string of zero length, we know that it contains no characters, and we should expect '' to equal ''. Indeed, at the end of this post, I have included a listing from a SUSE Linux session where this is shown to be the case there).

What does any of this have to do with the last post?

In that post, the critical statement was the line " and b.dd = '' ". What was left unsaid, to be understood by the reader, is that Oracle interprets this statement as

and b.dd = NULL

which always evaluates to False. How this happens is worth a deeper look.

First of all, the beginning of this post demonstrated that the empty string and the SQL NULL are two entirely different things. A NULL can take any value (and thus it's exact value is unknown). An empty string has a well know, well defined value. A NULL can have any length (and thus it's precise length is unknown). An empty string is exactly zero characters long. A NULL can be equal to anything, and thus is equal to nothing. An empty string is only equal to another empty string.

Unfortunately, Oracle long ago adopted the kludge of using the empty string to represent the SQL NULL internally. Oracle realizes this, and has always included a statement in their documentation that this is non-standard behavior, that it may change without warning, and that no one should rely on it.

And now for a personal comment. After a few months with Oracle 5, I had nothing to do with Oracle for a few years. When I came back, it was to release 7.1.14 (stuck in my mind). Then I took a job with a company which was in the process of converting (it happened the 3rd week I was there) from release 6 to release 7.0.x. Everything went fine, and then a few months later we converted to 7.1. Suddenly, my boss wanted to know what was wrong with the new release. All his PL/SQL, which was full of constructions such as

Where column_A = ''

Instead of

Where column_A IS NULL

were failing. I was of no help, as I had no idea how they could have possibly worked in the first place.

What happened in release 7.1 was that '' now became equivalent to NULL. And in the process, it became impossible to store or test for an empty string. And this is more than just an academic exercise. Search most Oracle forums, and you will find someone trying to port an application where empty strings are significant from some other database, and finding out that their existing, SQL-92 compliant code does not work.

Notice that I have always been careful to refer to '' as the empty string and not the null string. The reason should be obvious; I want to emphasize that the empty string is not a SQL NULL. Not everyone is a careful as I am, and one can find places where someone will talk as if it is obvious that a null string is a NULL value. But no reader of this should ever make that mistake.

--------------------------
Here is that Linux Shell Script Listing and Output I promised:
$ cat bigtest.sh

###Set Up Some Variables

V1='AA'
V1A='AB'
V2='AB'
V3='A'
V4='A'
V5=''
V6=''

###See if 'AA' is identical to 'AB'
if [ $V1 == $V2 ]
then echo "String \$V1 equals string \$V2"
else echo "String \$V1 is not equal to string \$V2"
fi

###See if 'AA' is identical to 'AA'
if [ $V1A == $V2 ]
then echo "String \$V1A equals string \$V2"
else echo "String \$V1A is not equal to string \$V2"
fi

###See if 'A' is identical to 'A'
if [ $V3 == $V4 ]
then echo "String \$V3 equals string \$V4"
else echo "String \$V3 is not equal to string \$V4"
fi

###Finally, See if '' is identical to ''
if [ $V5 == $V6 ]
then echo "String \$V5 equals string \$V6"
else echo "String \$V5 is not equal to string \$V6"
fi

l0=${#V5}
echo "String \$V5 is $l0 bytes long"

l1=${#V3}
echo "String \$V3 is $l1 bytes long"

l2=${#V1}
echo "String \$V1 is $l2 bytes long"
$

$ ./bigtest.sh
String $V1 is not equal to string $V2
String $V1A equals string $V2
String $V3 equals string $V4
String $V5 equals string $V6
String $V5 is 0 bytes long
String $V3 is 1 bytes long
String $V1 is 2 bytes long
$

Thursday, January 18, 2007

Outer Joins and Nulls

I had planned to post something else, then I found this in AskTom:

create table test1(aa varchar(10),bb varchar(10));
create table test2(cc varchar(10),aa varchar(10),dd varchar(10));
insert into test1 values('11','qaz');
insert into test1 values('22','wsx');
insert into test1 values('33','edc');
commit; .
select a.bb from test1 a,test2 b
where a.aa=b.aa(+)
and a.bb='qaz'
and b.dd(+)='';

Run in 9i, the poster obtained the result 'qaz'.

However, when this was run in a 10g instance, no rows were returned. The poster (this was a comment to an prior question, so there reason to expect Tom to have posted his response) asked two obvious questions:

1) Why is this a bug?, and

2) How can this be?

The answer to the first is obvious: If the same query on the same tables gives two different results in two different database versions, there is probably a bug (the weasel will be exposed soon). The second is more involved. I am posting the details here; I am posting a summary to AskTom.

Let us take this query, and try to rewrite it using Ansi syntax. We can get this far:

select a.bb
from test1 a
left outer join test2 b
on a.aa = b.aa

And at this point things become ambiguous. That b.dd=''. Is that supposed to be applied before or after the outer join? In other words, should this get translated into the Ansi syntax as

select a.bb
from test1 a
left outer join test2 b
on a.aa = b.aa
and b.dd = ''
where a.bb = 'qaz'

or as

select a.bb
from test1 a
left outer join test2 b
on a.aa = b.aa
where a.bb = 'qaz'
and b.dd = ''

In the first case, you proceed as follows:

1.a) Take this table:

AA | BB
--------------------
11 | qaz
22 | wsx
33 | edc

And left outer join it to this table:

CC | AA | DD
-------------------------

(You applied a predicate which can return no rows to an empty table; you end up with an empty table).

The result is this intermediate table:

BB
------
qaz
wsx
edc

Finally, you select the row where BB = 'qaz'.

In the second case, you first do the join, and end up with the same intermediate table. You then apply both predicates, one of which can return no rows. Thus you end up with no rows.

I have not been able to find any statement by Oracle as to which assumption is made with the old syntax (and the optimizer may be capable of either). My theory is that 9i does things one way, and 10g can sometimes do things the other way.

Sunday, January 14, 2007

The Adventure Begins

Here is my first post, and I have been spending the last week trying to come up
with something sufficiently profound. Since I have not been able to do that, I
have decided to post something simple instead. Future weeks will expand on this.

The first newsgroups I read on a regular basis were comp.databases.oracle.server
and comp.databases.theory. In the latter, I noticed some comments regarding
nulls seemed to occur on a regular basis, and I wondered what was behind them. A
little over a year ago, I decided to research this and find out what the nature
of this controversy was. I now think that I understand where the "No Nulls"
group is coming from.

So, I am going to start with a (very) brief summary of the opposing arguments.
Subsequent posts will elaborate on them (if some other Oracle related notion
doesn't distract me).

To begin with, I need some exposition of SQL nulls. Rather than attempt to
improve upon what many have done before, I will just cite what I regard as the
best article on the subject:

http://www.oracle.com/technology/oramag/oracle/05-jul/o45sql.html

In the late 1960's, Codd and others developed their relational algebra, which
became the basis for the theory behind modern relational data modeling. Whenever
you see someone speak of tuples rather than tables, relations rather than
statements, or types rather than domains, it is in the context of this algebra
that they are speaking. At the time, there was some hope that should relational
databases become practical, this algebra would be their language. Instead, SQL
became the standard interface. And SQL, while close to relational algebra,
differs in a number of essential aspects.

One of these aspects is the existence of nulls, more precisely, how they are
implemented. Nulls turn the logic of SQL into a three valued logic. Relational
algebra is based on the traditional two valued logic. And so nulls become one of
the things wrong with SQL.

But there is a bigger problem. Nulls, as defined by SQL, result in some
unexpected and surprising traps. For example, a column being nullable (whether or
not it actually contains any nulls) will prevent the optimizer from using an
Anti-Join to optimize a query. And the existence of nulls can even result in a
logically wrong result set.

But on the other hand, it was none other than Codd himself who introduced nulls
into the relational database realm. His 3rd feature of relational databases was
a "systematic way of handling missing information." And while he proposed was
not exactly what SQL developed, he did remain a proponent of nulls all his life.

Well, this has gotten me started. Tomorrow, I should have more focus.

Phil's Oracle Musings