10 Oracle SQL functions and function parameters you might not know

Functions mentioned here are in alphabetical order

BITAND function but no BITOR function

There is a BITAND function in SQL but no BITOR function.

Bitand allows to bit-compare values. It returns a value where all bits have been compared using AND. That means, each bit in the first operand must be matched with the bits in the second operand. This is useful to compare a set of flags (yes/no values) that are stored inside the same integer field.

BIN_TO_NUM is a helper function that “translates” bits into (decimal) numbers.

BITAND example

SELECT BITAND(
    BIN_TO_NUM(1,0,0),  -- 4 in decimal
    BIN_TO_NUM(0,0,1)   -- 1 in decimal
    ) check_flag
  from dual;

0

So 4 and 1 have no bits in common.

SELECT BITAND(
    BIN_TO_NUM(1,0,1),  -- 5 in decimal
    BIN_TO_NUM(1,0,0)   -- 4 in decimal
    ) check_flag
  from dual;

4
4 = 0×2⁰ + 0×2 + 1×2² = BIN_TO_NUM(1,0,0)

So 5 and 4 have bit 3 in common. (Bitcounting starts 1 but powers of 2 start with 0)

BITOR would combine the bits from each operand.

Fortunately BITOR can be calculated using BITAND. We need to sum the operands and then subtract the BITAND result from that.

Here is an example

select 1 + 4  
       - BITAND(1,4) bitor_compare
  from dual;

5

select 5 + 4  
       - BITAND(5,4) bitor_compare
from dual;

5

CAST with default on conversion error

The CAST function allows to convert values from one datatype into another.

Parameter enhancements have been introduced in 12.2 along with similar enhancements to many datatype conversion functions.

https://docs.oracle.com/en/database/oracle/oracle-database/20/sqlrf/CAST.html#GUID-5A70235E-1209-4281-8521-B94497AAEF75

Let’s experiment a little with it.

select cast('01-01-20' as date default null on conversion error
           ,'dd-mm-rr') 
from dual;
01-Jan-20

select cast('INVALID' as date default null on conversion error
           ,'dd-mm-rr') 
from dual;
(null)

Works like a charm. The string was converted into a date (and converted back into a string to print it on the screen using local nls settings). An invalid string could not be converted, instead the default NULL value was chosen.

Sometimes we want to return today in case there is a conversion error…

select cast('01-01-20' as date default sysdate on conversion error
           ,'dd-mm-rr') 
from dual;
01-Jan-20

select cast('INVALID' as date default sysdate on conversion error
           ,'dd-mm-rr') 
from dual;
17-Mar-20

seems to work!

..not so fast

alter session set nls_date_format = 'day-Mon-rr';
select cast('01-01-20' as date default sysdate on conversion error
           ,'dd-mm-rr') 
from dual;

ORA-01858: a non-numeric character was found where a numeric was expected

Although we used cast with a default on conversion error, we still got an conversion error!
The problem here is the default parameter. It needs to be a string. Since it is not, an implicit type conversion happened. The result differs from the function parameter fmt. Which then leads to a conversion error.

Unfortunately the default parameter can not be an expression.

select cast('01-01-20' as date 
            default to_char(sysdate,'dd-mm-rr') 
            on conversion error
           ,'dd-mm-rr')
from dual;

ORA-43907: This argument must be a literal or bind variable.

Also nls_date_format is not allowed as nlsparam.

select cast('01-01-20' as date default sysdate on conversion error
           ,'dd-mm-rr'
           ,'nls_date_format=dd-mm-rr')
from dual;

ORA-12702: invalid NLS parameter string used in SQL function

The same could happen for other conversions, like casting a string into a number. This example is from the docs.

SELECT CAST(200
       AS NUMBER
       DEFAULT 0 ON CONVERSION ERROR)
  FROM DUAL;

It is a very very bad example, since 200 is already a number. So let’s assume this is ‘200.00’ but the default is still 0. And we add the other parameters to specify the format of this string.

select cast('200.00' as number default 0 on conversion error
           ,'FM999d00'
           ,'nls_numeric_characters=.,')
from dual;
200

select cast('200x00' as number default 0 on conversion error
           ,'FM999d00'
           ,'nls_numeric_characters=.,')
from dual;
0

Seems to work. But here again we can manipulate the session settings into a wrong conversion. Notice that the alter session settings and the nlsparam settings differ.

alter session set nls_numeric_characters = ',.';
Session altered.

select cast('200x00' as number default 0 on conversion error
           ,'FM999d00'
           ,'nls_numeric_characters=.,')
from dual;
0

Good. Fortunately the number conversion is slightly more robust than the date conversion. We can not set a session default number format (it is always “TM9” I believe) in the same way as we can set the default date format.

However we get into trouble once the default value uses non-integer numbers.

select cast('200x00' as number default 0.5 on conversion error
           ,'FM999d00'
           ,'nls_numeric_characters=.,')
from dual;

5

The default was 0.5 but we got 5!

The correct way to do it of cause would be to use a string with a format that reflects the fmt and nlsparam parameters.

select cast('200x00' as number default '0.5' on conversion error
           ,'FM999d00'
           ,'nls_numeric_characters=.,')
from dual;

0,5

Now we got the correct default value returned!

CHECKSUM

Checksum can be used to quickly calculate a checksum over a set of rows. There is an analytic version of it too.

Description of checksum.eps follows
https://docs.oracle.com/en/database/oracle/oracle-database/20/sqlrf/checksum.html#GUID-3F55C5DF-F23A-4B2F-BC6F-E03B34B78BA8

If you want to quickly see if data was changed between two databases, you can run a checksum over each column and compare the result on both environments.

select checksum(phone_number)
from hr.employees;

Never heard of this function before? Probably because it just got introduced in 20c.

Dump

The dump function is often used to inspect binary values. Useful for example when solving character set issues.

Not many know that it has some additional parameters.

Description of dump.eps follows
https://docs.oracle.com/en/database/oracle/oracle-database/20/sqlrf/DUMP.html#GUID-A05793C9-B35D-4BA7-B68C-E3693BCF47A5

Interesting is especially the return_fmt parameter. It allows to return the dump in octal (8), hex (16), decimal (10=default) or a special format (17). The return format “17” will return the single byte values in readable ASCII format, but multi byte values that can not be converted are shown in hex.

For example the “€” symbol in UTF-8 uses 3 Bytes: E2 82 AC

Source: https://unicode-table.com/en/20AC/

Dump-17 will return this

select dump('abc€',17) from dual;
Typ=96 Len=6: a,b,c,e2,82,ac

The string is made of 6 bytes. The first 3 are single byte characters. They are converted into ascii. The 4th character is the Euro-Sign, witch is 3 bytes in UTF-8. So the format 17 is interesting, because it helps us to focus and find the problematic characters.

Another interesting option is to add 1000 to the format. This will add the character-set to the output.

select dump('abc€',1010) from dual;
Typ=96 Len=6 CharacterSet=AL32UTF8: 97,98,99,226,130,172

Of cause that works with NCHAR too.

select dump(n'abc€',1010) from dual;
Typ=96 Len=8 CharacterSet=AL16UTF16: 0,97,0,98,0,99,32,172

LNNVL

This is a very special function. Apart from the unspeakable name it is used for Oracle internal SQL transformations by the optimizer. Here is a document that describes such transformations. LNNVL returns a Boolean and can be used directly inside a where clause. This is not yet possible for user defined functions that return Boolean, but other such functions do exist (e.g. regexp_like).

I sometimes use it to find rows where two values differ. If the comparison value is NULL I still want to consider it to be different.

Instead of using the non-equality operator != we have to use the opposite the equality operator =. This is just how LNNVL works.

select dummy from dual
where lnnvl('A'='B');
X;

select dummy from dual
where lnnvl('A'='A');
No data found.

select dummy from dual
where lnnvl('A'=null);
X

This assumes that ‘A’ always is a not null value.

To get the same result typically a comparison condition needs also consider the NULL case.

select dummy from dual
where (('A'!='B' and 'B' is not null) OR 'B' is null);

Since more conditions like this might follow, the statement very soon becomes cluttered with OR checks, parenthesis and IS NULL comparisons.

Unfortunately since not many developers are familiar with this function, we should always add some comments to explain the purpose and the behaviour.

NANVL

NANVL is similar to NVL.

NVL returns a value, when NULL is encountered.
NANVL returns a value when NaN (not a number) is encountered.

NaN is part of the binary_float and binary_double datatypes.

select to_binary_float(1/17) from dual;
0,05882353

select to_binary_float('NaN') from dual;
NaN

Lets apply NANVL to it

select nanvl(to_binary_float('NaN'),'0') from dual;
0.0

Or we could set it to negative infinity…

select nanvl(to_binary_float('NaN'),'-INF') from dual;
-Infinity

Somehow interesting, but rarely useful I think.

NEXT_DAY function parameter

The NEXT_DAY function needs a second parameter that defines what target weekday it points to.

Description of next_day.eps follows

The argument char must be a day of the week in the date language of your session, either the full name or the abbreviation. The minimum number of letters required is the number of letters in the abbreviated version. Any characters immediately following the valid abbreviation are ignored. The return value has the same hours, minutes, and seconds component as the argument date.

https://docs.oracle.com/en/database/oracle/oracle-database/20/sqlrf/NEXT_DAY.html#GUID-01B2CC7A-1A64-4A74-918E-26158C9096F6

This second parameter char works differently than what you might expect. Here is an example.

alter session set nls_date_format = 'Dy, DD-MON-YYYY';
Session altered.

alter session set nls_language='AMERICAN';
select next_day(sysdate,'FRIDAY') from dual;
Fri, 20-MAR-2020

alter session set nls_language='GERMAN';
Session altered.

select next_day(sysdate,'FREITAG') from dual;
Fr, 20-MRZ-2020

select next_day(sysdate,'FRIDAY') from dual;
Fr, 20-MRZ-2020

Isn’t that strange? It looks as if although my session language is GERMAN, but that the American spelling still works!

Let’s test this with a different language. In Spanish “friday” would be “viernes”.

select next_day(sysdate,'VIERNES') from dual;
ORA-01846: Kein gültiger Wochentag
"not a valid day of the week"

alter session set nls_language='SPANISH';
Session altered.

select next_day(sysdate,'VIERNES') from dual;
Vie, 20-MAR-2020

select next_day(sysdate,'FRIDAY') from dual;
ORA-01846: día de la semana no válido
"not a valid day of the week"

Ok. So the Spanish language works as long the NLS settings are correct, but it doesn’t accept the American spelling.

Is German special? In a way yes. In German several weekdays have an abbreviation that matches the American spelling. And the abbreviation is only two letters short. Especially those days that have a good chance to be used in the NEXT_DAY function (monday, friday, saturday, but not sunday!).

For “FREITAG” the abbreviation is “FR”. The NEXT_DAY function accepts anything as long as the char string starts with the abbreviation of a weekday in the correct language. So FREITAG, FRIDAY and even FRITZ or FROG all return the next friday.

alter session set nls_language='GERMAN';
Session altered.

select next_day(sysdate,'FR') from dual;
Fr, 20-MRZ-2020

select next_day(sysdate,'FRITZ') from dual;
Fr, 20-MRZ-2020

select next_day(sysdate,'FROG') from dual;
Fr, 20-MRZ-2020

Is this how you expected the char parameter to work?

Recommendation? Don’t use NEXT_DAY at all. I prefer TRUNC over NEXT_DAY any day.

POWER (sql) = ** (plsql)

In SQL to calculate an exponentiation we have to use the POWER function.

Example 2³

select power(2,3) from dual;
8

In plsql we can use the ** operator instead.

set serveroutput on
begin
  dbms_output.put_line(2**3); 
end;
/
8

Unfortunately the ** operator will not work in SQL.

ROUND_TIES_TO_EVEN function

Everybody knows the round function. Round rounds a value up which is exactly half.

1.5 ≈ 2
2.5 ≈ 3
...
7.5 ≈ 8
8.5 ≈ 9
9.5 ≈ 10

This is what statisticians call a biased function. The computed values do not reflect the distribution of the original values – especially if there are many 0.5 decisions.

An unbiased version of rounding is to round one value up, the next down, then up again.

1.5 ≈ 2
2.5 ≈ 2
...
7.5 ≈ 8
8.5 = 8
9.5 ≈ 10

In German that is called “kaufmännisches Runden”. Loosely translated to Bankers Rounding.

The round_ties_to_even function has been introduced in 18c.

Description of round_ties_to_even.eps follows
https://docs.oracle.com/en/database/oracle/oracle-database/19/sqlrf/ROUND_TIES_TO_EVEN-number.html#GUID-49919B6B-4337-4812-A248-B5D98F102DBD
select level+0.5 as val, 
       round(level+0.5), 
       round_ties_to_even(level+0.5)
from dual connect by level <= 10;
VALROUND(VAL)ROUND_TIES_TO_EVEN(VAL)
1.522
2.532
3.544
4.554
5.566
6.576
7.588
8.598
9.51010
10.51110

This new rounding function implements “Bankers Rounding”. Especially when adding up the rounded values, the rounded sum differs less from the correct (unrounded) sum compared to what normal rounding delivers.

STANDARD_HASH function

If we quickly want to create some hash values we can use the standard_hash function.

select standard_hash(dummy) from dual;
C032ADC1FF629C9B66F22749AD667E6BEADF144B

The default uses an SHA1 hash algorithm, but we can specify SHA1, SHA256, SHA384, SHA512 and MD5.

select standard_hash(dummy,'SHA512') from dual;
3173F0564AB9462B0978A765C1283F96F05AC9E9F8361EE1006DC905C153D85BF0E4C45622E5E990ABCF48FB5192AD34722E8D6A723278B39FEF9E4F9FC62378

MD5 is not recommended at all (https://www.md5online.org/blog/why-md5-is-not-safe/). Also SHA1 is not considered secure anymore, but you can still use it safely to create hash values by it for non security purposes.

Conclusion

That’s it. 10 functions or parameters of functions that you probably didn’t know before. At least not all of them.

Exotic SQL: Hints that can change results

In general hints are suggestions for the Oracle cost based optimizer (CBO) that can influence performance but will never change the result. The CBO will comply with the hint at all costs – if it is possible.

True? Well not always!

There is a very small set of hints which can change the result of a select or DML statement.

The CBO is an extremly complicated piece of software, so it can happen that there are bugs in it, which in turn lead to different results. This is not what I am considering here.

So here is my collection of hints that are able to change the output of a statement.

ignore_row_on_dupkey_index

This hint allows to avoid UK errors during an insert. So only those rows are inserted that do not violate the unique index. The other rows are silently ignored.

See this example by Richard Foote about the hint:
https://richardfoote.wordpress.com/2010/12/20/oracle11g-ignore_row_on_dupkey_index-hint-micro-cuts/

Similar hints are
(CHANGE_DUPKEY_ERROR_INDEX and RETRY_ON_ROW_CHANGE)

They are all called semantic hints. Which has the side effect that they are not disabled by setting _optimizer_ignore_hints=TRUE

https://docs.oracle.com/en/database/oracle/oracle-database/12.2/sqlrf/Comments.html#GUID-4664D3D8-6312-4C15-8E8F-4872DD7A44F8

For more examples see this older post from Rob van Wijk
http://rwijk.blogspot.de/2009/10/three-new-hints.html

driving_site

I consider the driving_site hint to be one of the “good” hints. There are many cases where we need it. It is good to set it, because it enables behaviour that otherwise the optimizer can not choose.

However there is one strange special issue that I found.
I described this behaviour in an older blog post: https://wordpress.com/post/svenweller.wordpress.com/108

If we have a distributed query then SYSTIMESTAMP and also DBTIMEZONE is affected by the driving site hint. It will return the timezone from the local (no hint) or the remote (with hint) database. SYSDATE however was not affected.

This effect happend on an older 10g database, I wasn’t able to test if the behaviour still exists in 12c.

A similar effect can be reached using the materialize hint.

opt_param(‘OPTIMIZER_SECURE_VIEW_MERGING’,’true|false’)

The opt_param hint will allow to set instance parameters only for the time while the select is running. Some of those parameters can potentially influence the outcome of queries.

One of them is OPTIMIZER_SECURE_VIEW_MERGING
https://docs.oracle.com/en/database/oracle/oracle-database/12.2/refrn/OPTIMIZER_SECURE_VIEW_MERGING.html

An example from Tom Kyte:

the goal of this parameter [OPTIMIZER_SECURE_VIEW_MERGING] is to prevent a function owned by some user B from seeing data of some other user A that is should not see. For example, …

and after some setup…

b%ORA11GR2> set autotrace on explain
b%ORA11GR2> alter system set optimizer_secure_view_merging = true;

System altered.

b%ORA11GR2> select * from a.v v1 where f(y) = ‘ok to see’;

X Y
———- ——————————
1 ok to see

I see: ok to see

Execution Plan
———————————————————-
Plan hash value: 1931062764

—————————————————————————
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
—————————————————————————
| 0 | SELECT STATEMENT | | 10000 | 292K| 27206 (1)| 00:05:27 |
|* 1 | VIEW | V | 10000 | 292K| 27206 (1)| 00:05:27 |
|* 2 | TABLE ACCESS FULL| T | 10000 | 292K| 27206 (1)| 00:05:27 |
—————————————————————————

Predicate Information (identified by operation id):
—————————————————

1 – filter(“F”(“Y”)=’ok to see’)
2 – filter(“Y_F”(“X”)=1)

there it is clear that the Y_F(X)=1 is evaluated and then f(y) is

b%ORA11GR2> alter system set optimizer_secure_view_merging = false;

System altered.

b%ORA11GR2> select * from a.v v1 where f(y) = ‘ok to see’;

X Y
———- ——————————
1 ok to see

I see: ok to see
I see: NOT ok to see

whoops, my function saw data that it should not

Execution Plan
———————————————————-
Plan hash value: 1601196873

————————————————————————–
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
————————————————————————–
| 0 | SELECT STATEMENT | | 100 | 3000 | 27207 (1)| 00:05:27 |
|* 1 | TABLE ACCESS FULL| T | 100 | 3000 | 27207 (1)| 00:05:27 |
————————————————————————–

Predicate Information (identified by operation id):
—————————————————

1 – filter(“F”(“Y”)=’ok to see’ AND “Y_F”(“X”)=1)

and now we can see the converse is true

b%ORA11GR2> alter system set optimizer_secure_view_merging = true;

System altered.

b%ORA11GR2> set autotrace off

There are also some side effects with VPD. See https://antognini.ch/2011/09/optimizer_secure_view_merging-and-vpd/

Also in some cases this parameter can result in errors.
Here is a report of such a case: https://support.esri.com/en/technical-article/000010620

Another example for a parameter that can influence result sets would be RESULT_CACHE_REMOTE_EXPIRATION
https://docs.oracle.com/cd/B28359_01/server.111/b28320/initparams208.htm#REFRN10294

The default is 0. Changeing it to a positiv integer, will allow us to get “stale” data from the cache instead of fetching a fresh value from the remote source.

fresh_mv

This 12.2 hint can make a real time materialised view to refresh itself. So if the data is stale then the hint will use the MV and somehow add the missing data to it. Without the hint we would still see old data, but with it we will see new data.

https://docs.oracle.com/en/database/oracle/oracle-database/12.2/sqlrf/Comments.html#GUID-5EF4198B-50B3-40D8-B12A-3D3115C69D9B

The FRESH_MV hint applies when querying a real-time materialized view. This hint instructs the optimizer to use on-query computation to fetch up-to-date data from the materialized view, even if the materialized view is stale.

A similar connected hint would be
no_rewrite/rewrite_or_error
All of those hints have the ability to change the behaviour, especially the usage of stale materialized views. Which then can give a different result.

Disclaimer

This list has no pretence to be complete. Also some of the mentioned behaviours might change in different database versions. Especially if it is not documented and not intended behaviour.

There are more such hints. I have recorded some more, but couldn’t consistently reproduce the issue. It might have been related to some bug or some other obscure circumstance, that I can’t remember.

Here is the current list for those “watch out” candidates:

  • first_rows
  • (no_)query_transformation
  • (no_)result_cache

Result cache in combination with deterministic functions, that are not truely deterministic can easily produce wrong/different results. However the main culprit then is the wrong usage of the deterministic pragma.

And some more strange things

The SELECT clause can also influence the number of rows. And not only via distinct.

    with tbl as (select 1 val from dual union all  
                select 2 val from dual union all  
                select 3 val from dual )  
        SELECT  CASE  0  
                    WHEN  0  
                        THEN  'Aardvark'  
                    WHEN  SUM (val)  
                        THEN  'Baracuda'  
                END  AS c  
        FROM    tbl;  

Aardvark
Aardvark
Aardvark
3 rows selected.

    with tbl as (select 1 val from dual union all  
                select 2 val from dual union all  
                select 3 val from dual )  
        SELECT  CASE  6  
                    WHEN  0  
                        THEN  'Aardvark'  
                    WHEN  SUM (val)  
                        THEN  'Baracuda'  
                END  AS c  
        FROM    tbl;  

Baracuda
1 row selected.

Discovered by Frank Kulash and discussed and tested in this OTN thread
https://community.oracle.com/thread/4097998?start=15&tstart=0

Exotic SQL: Bubble Sort with Model Clause

Introduction

This is a “just for fun” experiment. But I demonstrate some model clause effects on the way.

Task: Sort a pipe separated list using SQL.

For example this string ’29|1|3004|3|2|24′ has 6 elements separated by |. The elements should be sorted in numerical order so that the resulting string looks like this ‘1|2|3|24|29|3004’.

Standard way

The “normal” way would be to tokenize the input string, put the tokens into rows, sort the rows and aggregate the sorted result to reassemble the string.

Here is one possible solution.


with inputdata(src)
     as (select '29|1|3004|3|2|24' from dual)
    ,num_rows (part, remains)
     as (select to_number(substr(i.src,1,coalesce(nullif(instr(i.src,'|'),0)-1,length(i.src)))) as part
               ,substr(src,instr(i.src,'|')+1) as remains
         from inputdata i
         UNION ALL
         select to_number(substr(r.remains,1,coalesce(nullif(instr(r.remains,'|'),0)-1,length(r.remains)))) as part
               ,substr(remains, nullif(instr(r.remains,'|'),0)+1) as remains
         from num_rows r
         where r.remains is not null
         )
    , combine (resultdata)
as (select listagg(to_char(part,'FM9999999999'),'|') within group (order by part) from num_rows)
select * from combine;

The num_rows subquery in this example is a recursive with clause that splits the input string into rows, one row for each token. The logic to split the string into tokens is based around substr . Essentially it just cuts of an element from the beginning of the string and keeps the remainder until there are no elements left.
The combine subquery then uses listagg to sort and convert the rows into a string again.

Standard SQL technologies that each decent developer should know about.

There are multiple ways how to tokenize,  how to create the rows and how to aggregate the rows again. But the general principle is still the same.

Here is a second example solution. This time using connect by to create rows and some regular expressions to count and split.

with inputdata(src) as (select '29|1|3004|3|2|24' from dual )
select listagg(token,'|') within group (order by to_number(token)) as res
from (
  select src, regexp_substr(src,'[^|]+', 1, level) token
  from inputdata
  connect by level <= regexp_count(src,'[|*]') + 1
  )
group by src;

Exotic way

I was wondering if we can use a totally different approach.

Instead of letting the database do the hard work and implement a sort mechanism, we can do it ourselves. Here comes bubbles sort. It is one of the most inefficient sort mechanisms you can think of (unless the list is already sorted), but it is fairly easy to implement.

Since bubble sort is an iterative approach with a simple set of rules, the model clause immediately jumped to my mind.

So here is a solution based around a slightly optimised bubble sort mechanism (gnome sort). We will take it apart afterwards.
The solution here creates several rows, but this is just to see and check each step of the iteration. It is possible to do this using one row (dim=0) only, although it is much harder to develop and to understand.

with inputdata(src) as (select '29|1|3004|3|2|24' from dual)
select *
from inputdata
model dimension by (0 as dim)
measures (  src
          , 1 ele_pos1
          , 2 ele_pos2
          , cast (regexp_substr(src,'[^\|]+',1,1) as varchar2(100)) token1
          , cast (regexp_substr(src,'[^\|]+',1,2) as varchar2(100)) token2
          , regexp_count(src,'\|')+1 max_element
          , 2 hwm
          , cast (src as varchar2(500)) as res
          )
rules iterate (500)
      until (-- we can stop if the high water mark is at the last position and no switch needed
             hwm[iteration_number] >= max_element[0]
             and (ele_pos1[iteration_number] = 1 or to_number(token1[iteration_number])  right token
  res[iteration_number+1]
    = case when to_number(token1[iteration_number]) > to_number(token2[iteration_number]) then
      -- do the switch
      regexp_replace(
             regexp_replace(res[iteration_number],'[^\|]+',token2[iteration_number],1,ele_pos1[iteration_number])
                            ,'[^\|]+',token1[iteration_number],1,ele_pos2[iteration_number])
      else res[iteration_number] -- no switch needed
      end ,
  -- calculate next position for token1
  ele_pos1[iteration_number+1]
  = case
      when ele_pos1[iteration_number] = 1 then
        -- we reached first position, so go back to hwm
        hwm[iteration_number]
      when to_number(token1[iteration_number]) > to_number(token2[iteration_number]) then
        -- after a switch, move one position the the left and check there
        ele_pos1[iteration_number] - 1
      when ele_pos1[iteration_number] + 1  to_number(token2[iteration_number]) then
        ele_pos2[iteration_number] - 1
      when ele_pos2[iteration_number] + 1 <= max_element[0] then
        hwm[iteration_number] + 1
      else
        ele_pos2[iteration_number]
      end,
  -- calculate high water mark
  hwm[iteration_number+1]
  = greatest(hwm[iteration_number],ele_pos2[iteration_number+1]),
  -- get token1 for new calculated position1
  token1[iteration_number+1]
  = regexp_substr(res[iteration_number+1],'[^\|]+',1,ele_pos1[iteration_number+1]),
  -- get token2 for new calculated position2
  token2[iteration_number+1]
  = regexp_substr(res[iteration_number+1],'[^\|]+',1,ele_pos2[iteration_number+1])
 );

Here is the result, but I show only the important column. Feel free to run the statement yourself (works on all supported database versions) and see the other helper columns.

Result (only Res column)
29|1|3004|3|2|24
1|29|3004|3|2|24
1|29|3004|3|2|24
1|29|3|3004|2|24
1|3|29|3004|2|24
1|3|29|3004|2|24
1|3|29|2|3004|24
1|3|2|29|3004|24
1|2|3|29|3004|24
1|2|3|29|3004|24
1|2|3|29|24|3004
1|2|3|24|29|3004
1|2|3|24|29|3004

Fairly easy SQL isn’t it?

65 rows instead of 8. So if your aim is to write obfuscated SQL then this certainly is the way to go.

If you want to understand it, skip to the “algorithm explained” chapter.

Rant

Now what is wrong with the model clause!? A simple algorithm like bubble sort looks this complex?

In the recent months I have written a number of model clause solutions for different problems. However none of them is easy. Apart from very specific cases all model solutions are terrible to look at and terrible complex to build. In comparison it is much easier to do the same in excel.

I attribute this to several things.

  1. The way to reference a cell is verbose. And when we look at the code then the brackets break the eye scanning mechanism of a trained developer eye. If more than one dimension is involved it is even worse.
    Here are some examples:
    measure1[cv(),3] or  resultdata[iteration_number+1]
    It is hard to see this as one cell reference.
    In Excel this would simply be: B$3 or C2.
  2. The way to reference a cell from the previous row is complex. It requires a working calculation for the cell address. We can do this by applying the row_number() analytic function or using iteration_number (for iterative models). But this requires some extra logic.
  3. There is no simple way to set several different measures at once. A rule can only be applied for one measure. This means that sometimes we need to repeat the same logic for a different measure.
    In the bubble sort example this happened for the calculation for the position of token 1 and for position of token 2. In this case it can be simplified a lot because the position of token 2 is always exactly +1 from position of token 1. In other cases it is not that simple. Additional measures can serve as a kind of variables to capture rules that then need to applied to several other columns.

Excel Comparison

The model clause resembles Excel very much.

  • Dimensions are rows.
  • Measures are columns.
  • Rules are formulas

But why does the model clause feels so more complex that some simple excel cells.

Excel has a similar way to address cells. It is called the R1C1-style reference. It uses number coordinates to find a cell. But most of times we use the A1-Style reference. This is a little more intuitive and much shorter.

But more importantly the process to build formulas (=rules) is different. While building formulas we click on a source cell and its coordinates are used automatically. Excel then calculates the difference from the current cell to the referenced cell.

The main difference to Excel is that is has a separate formula for each cell, and we use just a simple way to fill out the formulas to the other cells. We don’t need that copy mechanism with the model clause, because the rules automatically are applied for all relevant cells. But it is hard with the model clause to give the direct reference for a different row.

Also the different styles and the different ways to use absolute and relative cell references in Excel (e.g. B$2) are more convenient than doing the same in the model clause.

Pro-Tipp: A good way to develop model rules is to build the formulas in excel first, then copy the logic to SQL.

The model clause also has several advantages over excel. Among others are

  • Excel only has 2 dimensions and no decent concept of partitions.
  • Excel columns can not be named (although single cells and cell ranges can).
  • Model allows to return only updated rows.
  • Model has a easy way to work with invalid cell references (IGNORE NAV)

Algorithm explained

Before we look at the statement, let’s look at the bubble sort logic first.

bubble sort logic

Start: 29|1|3004|3|2|24

I start with the first two tokens. 29 and 1. Token1 is always the left and token2 is always the right token.

Rule 1: If token1 is bigger than token2 we need to switch them.
Rule 2: If we are at position 1, we can go one step to the right.
Rule 3: If we don’t need to switch, then we can also go one step to the right.
Rule 4: If we did switch then go one step to the left.

Now lets see how the rules can be applied to the string.

Step 1: 1|29|3004|3|2|24 – Rule 1
Step 2: 1|29|3004|3|2|24 – Rule 2
Step 3: 1|29|3004|3|2|24 – Rule 3
Step 4: 1|29|3|3004|2|24 – Rule 1
Step 5: 1|3|29|3004|2|24 – Rules 4+1
Step 6: 1|3|29|3004|2|24 – Rule 3 (here I used an additional optimisation, by storing a high water mark and jumping as far right as the HWM allows)
Step 7: 1|3|29|2|3004|24 – Rule 1

Stop: We can finish the iteration if the HWM is to the far right and if no switch is needed anymore.

model SQL explained in detail

So how is this logic implemented in the model clause?

A token is found using a regular expression with the position of the token.

regexp_substr(src,'[^\|]+',1,4)

This finds the fourth token.

We start with a way to number our rows. The initial row is defined as 0.

dimension by (0 as dim)

0 is only used because later we address our rows using an iteration_number. If we would start with 1, then we can potentially overwrite our data. Often 0 is a good starting row.

We then define several columns. And the initial value for those columns.

measures (  src
   , 1 ele_pos1
   , 2 ele_pos2
   , cast (regexp_substr(src,'[^\|]+',1,1) as varchar2(100)) token1
   , cast (regexp_substr(src,'[^\|]+',1,2) as varchar2(100)) token2
   , regexp_count(src,'\|')+1 max_element
   , 2 hwm
   , cast (src as varchar2(500)) as res
   )

The datatype and size of the columns is automatically deducted from the initial value. This is why cast is sometimes needed, if the column size should be bigger than what the initial value indicates.

Now we define that an iterative model is to be used.

rules iterate (500) until (stop criteria)

Iterative models are good for implementing procedural logic. Or for tasks when it is unknown beforehand what the area for the calculation should be.

It gives us a variable called iteration_number which can be used as a cell address.

Now lets look at a few rules.

-- switch tokens if left token > right token
res[iteration_number+1] 
    = case when to_number(token1[iteration_number]) > to_number(token2[iteration_number]) then
      -- do the switch
      regexp_replace(
             regexp_replace(res[iteration_number],'[^\|]+',token2[iteration_number],1,ele_pos1[iteration_number])
             ,'[^\|]+',token1[iteration_number],1,ele_pos2[iteration_number])
      else res[iteration_number] -- no switch needed
      end ,

The left hand side of the rule “res[iteration_number+1]” creates a new row for the res column. The right hand side of the rule references data from the previous row “token1[iteration_number]” to do the switch logic.

If data from the same row is needed, it is possible to use “token1[cv()]”. cv() stands for cell value of the current dimension. It is possible to calculate with those values. So instead of token1[iteration_number] we could also write “token1[cv()-1]” to fetch the value from the previous row. For iterative models I find it more convenient to stay consistent and use iteration_number instead.

This rule implements rule 1 (switch tokens) from the bubble sort logic. However it is not a 1:1 matching of rules. We can see this when looking at the next rule.

Instead the logic needs to be implemented for each column. To calculate the next position a similar case construct is needed.

-- calculate next position (for token1)
ele_pos1[iteration_number+1] 
       = case
         when ele_pos1[iteration_number] = 1 then
           -- we reached first position, so go back to hwm
           hwm[iteration_number]
         when to_number(token1[iteration_number]) > to_number(token2[iteration_number]) then
           -- after a switch, move one position the the left and check there
           ele_pos1[iteration_number] - 1
         when ele_pos1[iteration_number] + 1 <= max_element[0]-1 then           -- no more switch, so go to hwm and look for next element            hwm[iteration_number]  
         else 
           -- just in case, don't do anything
           ele_pos1[iteration_number]
         end

Bubble sort rules 2-4 are implemented by a case construct.

As we can see in this example, the rules from the model clause do not match our business rules. Sometimes they do, but more often they do not.

conclusion

The Model clause can open up new ways to do solve SQL puzzles. Bubble sort is a good candidate for an iterative model.

For real world cases the Model clause is hardly maintainable. If we find a solid way to match business rules to model clause rules, then we have a good way to react to future changes of those business rules.

Sometimes model clause can provide a performance advantage, because of they way how the data is handled.

Model clause is a tool in our toolbox, although an extremely sophisticated and complex one. We need to train using this tool on a regular basis.

btw: Some of the constructs from the model clause can be rediscovered in other statements. For example the new analytic views in 12.2 also have a “dimension by” and a “measures” part. But the rules are missing (one can argue that attribute dimensions and hierarchies resemble the rules).