This is not the original statement of the problem, but it does describe the essence:
One is given the following table:
Income Limit Tax Rate 18150 10% 73800 15% 148850 25% 226850 28% 405100 33% 457600 35% over 457600 39.6%
The tax due is computed as follows: For taxable incomes up to 18150, the tax is 10% of the taxable income. For incomes between 18150 and 73800 the tax is 10% of 18150 plus 15% of the difference between the income and 18150, etc.
The question then, is, is to design the table and state the query to calculate the tax due.
The comments had a CASE statement being used. This struck me as unnecessary, so I set out to see if one could solve this problem without one.
In the tax forms, the method is typically stated with a table such as this:
If Your Taxable But Your Of the Taxable Income is Less tax Plus Amount Greater than Than is Over 0 18150 0 10% 0 18150 73800 1815 15% 18150 73800 148850 10163 25% 73800 148850 226850 28925 28% 148850 226850 405100 50765 33% 226850 405100 457600 109587 35% 405100 457600 127962 39.6% 457600
It is easy to create and load such a table:
Create table tax_brackets ( lowrange number(9), highrange number(9), taxbase number(9), taxpct number(4,4)) / Insert into tax_brackets values (148850, 226850, 28925, .28) / Insert into tax_brackets values (405100, 457600, 109587, .35) / Insert into tax_brackets values (0, 18150, 0, .10) / Insert into tax_brackets values (73800, 148850, 10163, .25) / Insert into tax_brackets values (18150, 73800, 1815, .15) / Insert into tax_brackets values (457600, 999999999, 127962, .396) / Insert into tax_brackets values (226850, 405100, 50765, .33) / commit /
Where I have just used the first four columns of my sample table, and given the columns the names LowRange, HighRange, TaxBase and TaxPct. Having done this, we can calculate the tax due for a given taxable Income with
Select TaxBase + TaxPct * (Income - LowRange) TaxDue from tax_Brackets where Income > LowRange and Income <= HighRange /
No CASEs to be found anywhere.
A few words of clarification here: The reader who is carefully reading will have noticed that I have appeared to have loaded the data in random order. This was deliberate. I wanted to make sure that there was no accidental ordering affecting my results. And I included a very large HighRange for my 39.6% row. This was because I didn't want to complicate matters with an almost value.
Of course, this table has some seemingly redundant columns. All the required information can be found in this table:
BRACKET TAX_RATE ---------- ---------- 0 .1 18150 .15 73800 .25 148850 .28 226850 .33 405100 .35 457600 .396
So, let's create it and see if we can use it instead:
Create table tax_bracket2 ( bracket number(9), tax_rate number(4,4) ) / insert into tax_bracket2 values( 0, .10) / insert into tax_bracket2 values(405100, .35) / insert into tax_bracket2 values(73800, .25) / insert into tax_bracket2 values(148850, .28) / insert into tax_bracket2 values(18150, .15) / insert into tax_bracket2 values(226850, .33) / insert into tax_bracket2 values(457600, .396) / commit /
We know how to solve this problem with the tax_bracket table. If we could write a query to drive the two missing columns, we could use it as a subquery in our original solution. The rest of this post describes how to do precisely that.
To clarify things, consider this query:
select Bracket LowRange, '?' HighRange, '?' TaxBase, Tax_Rate TaxPct from tax_bracket2 order by 1; / LOWRANGE H T TAXPCT ---------- - - ---------- 0 ? ? .1 18150 ? ? .15 73800 ? ? .25 148850 ? ? .28 226850 ? ? .33 405100 ? ? .35 457600 ? ? .396 7 rows selected.
Clearly (?) this is going to require Analytic Functions. Unfortunately, everytime I need to use an analytic function, I have to start at the beginning. So let's start with the HighRange Column. Reading the definition of the LEAD Analytic Function we see
LAG/LEAD
Finding a value in a row a specified number of rows from a current row.
This looks promising. Here is the syntax:
{LAG | LEAD} ( value_expr [, offset] [, default] )
[RESPECT NULLS|IGNORE NULLS]
OVER ( [query_partition_clause] order_by_clause )
Let's try and plug in some values. For a given row, the value of its HighRange function is the value of the bracket column in the 'next' row. So, we are going to use the LEAD function, and the expression is going to simply be 'bracket'. And since we want the next row, we are going to try an offset of '1'.
At this stage, our function is
LEAD(bracket,1) OVER (PARTITION BY??? ORDER BY ???)
I used to think of the 'PARTITION BY' clause as being a Group By clause for analytic functions. But, after doing my research on this, I've come to think of it more as a Break command in a SQL*Plus report. Using both approaches, it is clear that this is going to be one large grouping, so there should be no PARTITION BY clause. And we can order either by 'bracket' or 'TaxPct'. So, let's try:
select bracket LowRange, LEAD(bracket,1) OVER (ORDER BY bracket) HighRange, '?' TaxBase, tax_rate TaxPct from tax_bracket2 order by tax_rate / LOWRANGE HIGHRANGE T TAXPCT ---------- ---------- - ---------- 0 18150 ? .1 18150 73800 ? .15 73800 148850 ? .25 148850 226850 ? .28 226850 405100 ? .33 405100 457600 ? .35 457600 0 ? .396 7 rows selected.I see what went wrong. We are selecting all seven rows from tax_bracket2. The LEAD function is operating on each row. It has sorted all seven rows, and is returning the next 'bracket' value (which I have aliased to LowRange) as the HighRange value in the result set. But, when it operates on the row with 'bracket'=457600, there is no next row. So, it returns the default value. And since we haven't supplied a default value, it defaults to 0.
So, let's try this instead:
select bracket LowRange, LEAD(bracket,1,999999999) OVER (ORDER BY bracket) HighRange, '?' TaxBase, tax_rate TaxPct from tax_bracket2 order by tax_rate / LOWRANGE HIGHRANGE T TAXPCT ---------- ---------- - ---------- 0 18150 ? .1 18150 73800 ? .15 73800 148850 ? .25 148850 226850 ? .28 226850 405100 ? .33 405100 457600 ? .35 457600 999999999 ? .396 7 rows selected.Just for fun, let's see what happens if we make the second parameter = 2,
LOWRANGE HIGHRANGE T TAXPCT ---------- ---------- - ---------- 0 73800 ? .1 18150 148850 ? .15 73800 226850 ? .25 148850 405100 ? .28 226850 457600 ? .33 405100 999999999 ? .35 457600 999999999 ? .396 7 rows selected.
And, just to spare you from having to try it, using '-1' results in an error.
Now, consider this query and result set from our original table:
1 select LowRange, 2 HighRange, 3 (HighRange - LowRange) * TaxPct, 4 TaxPct 5 from tax_brackets 6* order by TaxPct / LOWRANGE HIGHRANGE (HIGHRANGE-LOWRANGE)*TAXPCT TAXPCT ---------- ---------- --------------------------- ---------- 0 18150 1815 .1 18150 73800 8347.5 .15 73800 148850 18762.5 .25 148850 226850 21840 .28 226850 405100 58822.5 .33 405100 457600 18375 .35 457600 999999999 395818790 .396 7 rows selected.If we compute a running total of our third column, we get the column I am calling 'TaxBase'. And the SUM analytic function can compute this:
1 select LowRange, 2 HighRange, 3 nvl(sum((HighRange - LowRange)*TaxPct) 4 OVER (order by TaxPct 5 rows between unbounded preceding and 1 preceding),0) TaxBase, 6 TaxPct 7 from tax_brackets 8* order by TaxPct / LOWRANGE HIGHRANGE TAXBASE TAXPCT ---------- ---------- ---------- ---------- 0 18150 0 .1 18150 73800 1815 .15 73800 148850 10162.5 .25 148850 226850 28925 .28 226850 405100 50765 .33 405100 457600 109587.5 .35 457600 999999999 127962.5 .396 7 rows selected.And, since we know how to compute our 'HighRange' column from our simple tax_bracket2 table, we can derive the entire tax_brackets table thusly:
1 select LowRange, 2 HighRange, 3 nvl(sum((HighRange - LowRange)*TaxPct) 4 OVER (order by TaxPct 5 rows between unbounded preceding and 1 preceding),0) TaxBase, 6 TaxPct 7 from ( 8 select bracket LowRange, 9 LEAD(bracket,1,9999999990) OVER (ORDER BY bracket) HighRange, 10 tax_rate TaxPct 11 from tax_bracket2 12 ) 13* order by TaxPct LOWRANGE HIGHRANGE TAXBASE TAXPCT ---------- ---------- ---------- ---------- 0 18150 0 .1 18150 73800 1815 .15 73800 148850 10162.5 .25 148850 226850 28925 .28 226850 405100 50765 .33 405100 457600 109587.5 .35 457600 9999999990 127962.5 .396 7 rows selected.
And we can write our final tax calculation query thusly:
Select TaxBase + TaxPct * (Income - LowRange) TaxDue from ( select LowRange, HighRange, nvl(sum((HighRange - LowRange)*TaxPct) OVER (order by TaxPct rows between unbounded preceding and 1 preceding),0) TaxBase, TaxPct from ( select bracket LowRange, LEAD(bracket,1,9999999990) OVER (ORDER BY bracket) HighRange, tax_rate TaxPct from tax_bracket2 ) order by TaxPct ) where Income > LowRange and Income <= HighRange /
Packaging this as a function is an excerise left to the reader.