Thursday, December 27, 2007

On Uncommitted Reads

I've decided to give up trying to post detailed articles. The reason for this is simple: I find that I cannot post unless and until I feel that the article is perfect. The main thesis is precisely crafted. All objections are accounted for and answered. The length of the article is long enough to make it worthwhile. And by the time I am close to such an article, it seems to be irrelevant.

So, I am going to change focus and post rants. Rants can be long or short. Rants do not have to be perfectly balanced. Rants are doable in a short period of time.

What got me interested in ranting was a recent posting to the Oracle-L mailing list. Someone responded to a rant against uncommitted reads with the statement that "values end up committed over 99% of the time, and sometimes getting a current value is more important than getting a consistent value".

My immediate reaction was: "well, what about getting a correct value"? There isn't enough space to post this to the list, but here I can give a couple of examples where using the first part of an update without waiting for the rest can give very incorrect results. I'm going to the the standard EMP table, and assume that it is used by a real company, with a real system.

Let us assume that this is the key query in the system. There is a stored procedure which create a ref cursor for this query. Almost all parts of the application which access the EMP table do so through this query:

SELECT substr(lpad(' ', 2*level-1)||ename,1,15) name,
empno,mgr, job, sal + nvl(comm, 0) tc
FROM emp
START WITH mgr IS NULL
CONNECT BY mgr = PRIOR empno

with these results:

KING

7839


PRESIDENT

5000

JONES

7566

7839

MANAGER

2975

SCOTT

7788

7566

ANALYST

3000

ADAMS

7876

7788

CLERK

1100

FORD

7902

7566

ANALYST

3000


7369

7902

CLERK

800

BLAKE

7698

7839

MANAGER

2850

ALLEN

7499

7698

SALESMAN

1900

WARD

7521

7698

SALESMAN

1750

MARTIN

7654

7698

SALESMAN

2650

TURNER

7844

7698

SALESMAN

1500

JAMES

7900

7698

CLERK

950

CLARK

7782

7839

MANAGER

2450

MILLER

7934

7782

CLERK

1300



So, the edict comes down that every one in Accounting (dept. 10) is to be given a 5% raise. Naturally, this is to be accompanied by a corresponding decrease to everyone in Research (dept. 20). (Somewhere, there must be a company where the Research budget is not regularly raided for the Finance budget, and I welcome any hints as to where that might be). Now, while it is possible to write this as a single UPDATE statement, it is more likely that whoever in Accounting is doing the update (Research refused to get involved) will do it in several steps. When everything is complete, the results should be:

KING

7839


PESIDENT

5250

JONES

7566

7839

MANAGER

2887.5

SCOTT

7788

7566

ANALYST

2912.5

ADAMS

7876

7788

CLERK

1012.5

FORD

7902

7566

ANALYST

2912.5

SMITH

7369

7902

CLERK

712.5

BLAKE

7698

7839

MANAGER

2850

ALLEN

7499

7698

SALESMAN

1900

WARD

7521

7698

SALESMAN

1750

MARTIN

7654

7698

SALESMAN

2650

TURNER

7844

7698

SALESMAN

1500

JAMES

7900

7698

CLERK

950

CLARK

7782

7839

MANAGER

2572.5

MILLER

7934

7782

CLERK

1365

But, if the database allows for uncommitted reads, it is possible for someone to catch the pay increases but not the pay cuts, and find this:

KING

7839


PRESIDENT

5250

JONES

7566

7839

MANAGER

2975

SCOTT

7788

7566

ANALYST

3000

ADAMS

7876

7788

CLERK

1100

FORD

7902

7566

ANALYST

3000

SMITH

7369

7902

CLERK

800

BLAKE

7698

7839

MANAGER

2850

ALLEN

7499

7698

SALESMAN

1900

WARD

7521

7698

SALESMAN

1750

MARTIN

7654

7698

SALESMAN

2650

TURNER

7844

7698

SALESMAN

1500

JAMES

7900

7698

CLERK

950

CLARK

7782

7839

MANAGER

2572.5

MILLER

7934

7782

CLERK

1365

Now, this is similar to the standard argument against uncommitted reads (a bank teller transfers money from one account to another), which I must assume the original poster is familiar with, and is unlikely to change his mind. To try for something different, let us return to the original set of data, and consider this business mandate: Dept. 30 (SALES) is getting two new employees, who will be sales supervisors. They will become the only ones reporting to BLAKE; the rest of the salesforce will report to one of these new supervisors. The new supervisors are MUTT and JEFF, and they will have numbers 7042 and 7420 respectively. Naturally, the Accounting department decides to reassign the salesforce before putting MUTT and JEFF into the system. So, the following statements gets executed:

UPDATE emp
SET mgr = 7042
WHERE empno in ( 7499, 7521);

UPDATE emp
SET mgr = 7420
WHERE empno in (7654, 7844, 7900);

Followed by these inserts:

INSERT INTO emp VALUES (7042,'MUTT','SUPER',7698,SYSDATE,2800,NULL,30);

INSERT INTO emp VALUES (7420,'JEFF','SUPER',7698,SYSDATE,2800,NULL,30);

With the following result:

KING

7839


PRESIDENT

5250

JONES

7566

7839

MANAGER

2887.5

SCOTT

7788

7566

ANALYST

2912.5

ADAMS

7876

7788

CLERK

1012.5

FORD

7902

7566

ANALYST

2912.5

SMITH

7369

7902

CLERK

712.5

BLAKE

7698

7839

MANAGER

2850

MUTT

7042

7698

SUPER

2800

ALLEN

7499

7042

SALESMAN

1900

WARD

7521

7042

SALESMAN

1750

JEFF

7420

7698

SUPER

2800

MARTIN

7654

7420

SALESMAN

2650

TURNER

7844

7420

SALESMAN

1500

JAMES

7900

7420

CLERK

950

CLARK

7782

7839

MANAGER

2572.5

Except, if uncommitted reads are allowed, and someone ran the standard query before the inserts were performed, they would get this result:

KING

7839


PRESIDENT

5250

JONES

7566

7839

MANAGER

2887.5

SCOTT

7788

7566

ANALYST

2912.5

ADAMS

7876

7788

CLERK

1012.5

FORD

7902

7566

ANALYST

2912.5

SMITH

7369

7902

CLERK

712.5

BLAKE

7698

7839

MANAGER

2850

CLARK

7782

7839

MANAGER

2572.5

MILLER

7934

7782

CLERK

1365

Which is not even close to being right, and anything which relied on this result set will be very wrong indeed. The first example can (just barely) be excused as being incorrect, but in a not yet up-to-date sense (everything is there, but some rows are not yet up to date). This last example is just plain wrong. Instead of being 99.9% correct, it is maybe 60% correct. In other words, allowing uncommitted reads can result in very wrong results.