plsql function: who_called – simplify your exception handler!

Does your error logging logic still look like this?

example 1 – old school handler

procedure doSomething is
begin
   ...
exception
  when others then
     pk_log.error(p_module    => $$PLSQL_UNIT, 
                  p_submodule => 'doSomething', 
                  p_message   => 'Something not done!'||sqlerrm);
     raise;
end;

Here is what I prefer!

example 2 – compact handler

procedure doSomething is
begin
   ...
exception
  when others then
     pk_log.error('Something not done!');
     raise;
end;

Such code is way more convinient! Of cause this will store the same information as the previous example 1.

To get there some parts have to be moved into the logging method (pk_log.error).

  1. sqlerrm
    is the error message
    When moving, it should also be improved (at least use dbms_utility.format_error_stack)
  2. $$PLSQL_UNIT
    is the name of the package (or of the standalone procedure or function)
  3. ‘doSomething’
    is the hardcoded name of the procedure
    It needs to be gathered dynamically at runtime inside the logging package.

The biggest issue is 3.) – to find the name of the module at runtime. The function who_called can solve that. It also gathers the package name on the way, so 2.) is also covered.

Side note: Is hardcoding the procedure name a problem?

Not really. However if the name changes one needs to change it in several lines. And in case one forgets to change the error handler, the compiler will not issue an error. Then the problem might go unnoticed for some time and could be misleading in case of exceptions.

I have two versions of the who_called function. The first standalone version is not recommend to use. Instead use the packaged version. However the first version demonstrates the implemented logic in a clear way.

Both functions will find the full name of the module that called this code block. The default behaviour is that modules from the same package are ignored. This allows easy usage inside packages ment for logging, auditing or testing. Everything that is instrumentation code can usually profit from this little helper.

Usecase 1 – demo purposes – standalone function

This logic assumes that we use the function to find out who called the procedure that is currently running. So not the name of the procedure itself, but the name of the caller.

Installation code

-- Cleanup
-- drop function who_called;

-- Usecase 1: standalone function - simple demo version

-- check the call stack to find the calling module.
create or replace function who_called return varchar2
  is
    v_caller utl_call_stack.unit_qualified_name;
    v_name      varchar2(128);
begin
    -- print the call stack just for demo purposes
    for i in 1..utl_call_stack.dynamic_depth  loop
      sys.dbms_output.put_line(
         rpad('+',utl_call_stack.dynamic_depth-i,'+')||
         utl_call_stack.concatenate_subprogram( utl_call_stack.subprogram(i) )
         );
    end loop;

    
    -- step backwards though the call stack 
    --      1 = current module = who_called function
    --      2 = the module that calls who_called 
    --      3 = the caller!
    v_caller := utl_call_stack.subprogram(3);

    v_name := utl_call_stack.concatenate_subprogram(v_caller);

    return v_name;

end who_called;
/

example 3 usage

set serveroutput on

-- Testcase 1.1 anonymous block
declare
  procedure DoSomething as
  begin
    dbms_output.put_line('I want to doSomething!');
    dbms_output.put_line('Who called? '|| who_called);
  end;

  procedure DoMore as
  begin
    doSomething;
    dbms_output.put_line('I want to doMore!');
    dbms_output.put_line('Who called? '|| who_called);
  end;
begin
  doMore;
end;
/

output

I want to doSomething!
+++WHO_CALLED
++__anonymous_block.DOSOMETHING
+__anonymous_block.DOMORE
__anonymous_block
Who called? __anonymous_block.DOMORE
I want to doMore!
++WHO_CALLED
+__anonymous_block.DOMORE
__anonymous_block
Who called? __anonymous_block


PL/SQL-Prozedur erfolgreich abgeschlossen.

Each time the function is called, it prints the full call stack to the screen. This is just for demonstration purposes.
The first time the function who_called is executed is from inside the submodule doSomething.
The call stack at this point looks like this

Position in StackModule
1who_called
2doSomething
3doMore
4__anonymous_block

Line 5 in the anonymous block is the line with the who_called function (call stack 1). It is part of the doSomething procedure (call stack 2). On position 3 of the call stack we always find the caller – here the procedure doMore that did execute doSomething.

The second time the function is called is in the doMore procedure. And the call stack looks like this:

Position in StackModule
1who_called
2doMore
3__anonymous_block

In this case the caller is the anonymous block.

The example clearly shows that we can fetch the name of any module in the call stack (including anonymous blocks). We just have to fetch the name from the proper level in the call stack.

Usecase 2 – error and trace logging – packaged version

This works a bit differently when used inside a larger package that is used to instrument the code base. Typically we can add error logging logic, and logging debug (trace) calls. In that scenario, we want to find out the name of the module where the trace or error logging call is made.

The logic assumes that we have a package used for this instrumentation purpose only. Which also means, that all calls inside this package can be ignored. So essentially we want to find out what is the full name of the procedure (or function) where the log error or the log trace call is done. The full name can be something like package.module.submodule.subsubmodule.

Installation code

When testing this, make sure you do not have/own a package pk_log already. This would overwrite it.

If you like it copy the code for the who_called function into your own instrumentation package.

-- Cleanup
-- drop function who_called;

-- who_called packaged version 
-- check the call stack to find the calling module.
-- calls from this package are not considered
create or replace package pk_log as
/* this is a very simplified example for error logging 
   only demonstration purposes of the WHO_CALLED function 
   your error logging logic must be better!
*/
  procedure error(message in varchar2);
  procedure trace(message in varchar2); 
end pk_log;
/

create or replace package body pk_log as
    function who_called return varchar2
      is
        -- author: Sven Weller, 2021, syntegris information soultions GmbH
        v_pkg    varchar2(128) := $$plsql_unit; -- name of the package that holds the who_called function
        v_caller utl_call_stack.UNIT_QUALIFIED_NAME;
        v_name varchar2(128);
    begin
        -- step backwards though the call stack 
        --      1 = current module = who_called function
        --      2 = the module that calls who_called 
        --      3 = the caller!
        -- we could start later (level 3 or even 4), if we knew exactly how many other submodules are always in the call stack 
        -- This might be risky if function inlineing is possible, but that can be prevented by a pragma. Example in proc error
        for i in 2..utl_call_stack.dynamic_depth  loop

          v_caller := utl_call_stack.subprogram(i);

          v_name := utl_call_stack.concatenate_subprogram(v_caller);

          -- at lexical depth 1 we find the name of the package
          if v_caller(1) != v_pkg then
            -- In some scenarios, we might want to go even one step upward the call stack. Typically not for error logging purposes.
            -- In such cases we could simply fetch one more name from the call stack
            -- v_caller := utl_call_stack.subprogram(i+1);
            
            -- build the full name, but only if we found the caller
            v_name := utl_call_stack.concatenate_subprogram(v_caller);
            -- no need to go further through the call stack
            exit;
          end if;
        end loop;

        return v_name;

    end who_called;

    procedure writeMessage(message in varchar2, message_type in varchar2) as
    begin
      case message_type 
      when 'E' then
        sys.dbms_output.put_line('ERROR at ' || who_called);
      when 'T' then
        sys.dbms_output.put_line('Info from ' || who_called);
      else  
        sys.dbms_output.put_line('ERROR! Unknown Message Typ ' || message_type || ' in '||who_called);
      end case;        
      sys.dbms_output.put_line(message);

      if message_type = 'E' then 
        -- Add full error stack
        sys.dbms_output.put_line(dbms_utility.format_error_stack);
        -- Add line where the error happened
        -- Only the last line from the backtrace is relevant, all other lines are already in the error stack
        -- compare output to sys.dbms_output.put_line(dbms_utility.format_error_backtrace);
        sys.dbms_output.put_line(' in '||nvl(UTL_CALL_STACK.backtrace_unit(utl_call_stack.backtrace_depth),'anonymous block')
                               ||' at line '||sys.utl_call_stack.backtrace_line(utl_call_stack.backtrace_depth)
                               );
      end if;
    end writeMessage; 

    procedure error(message in varchar2) as
      pragma autonomous_transaction;
    begin
      pragma inline(writeMessage,'NO'); -- we don't want to inline in case of optimization_level = 3.
      writeMessage(message,'E' );
    end error;
    procedure trace(message in varchar2) as
      pragma autonomous_transaction;
    begin
      writeMessage(message,'T' );
    end trace;   
end pk_log;
/

example 4 call package, log trace and log error

set serveroutput on

-- Testcase 2.1 anonymous block
declare
  procedure DoSomething as
  begin
    pk_log.trace('I want to doSomething!');
  end;

  procedure DoMore as
  begin
    doSomething;
    raise no_data_found;
  end;
begin
  doMore;
exception 
  when others then
    pk_log.error('I wanted to do more, but this happend!');
    raise;
end;
/

output

This is the dbms_output (the simplified trace and error logs).

Info from __anonymous_block.DOSOMETHING
I want to doSomething!
ERROR at __anonymous_block
I wanted to do more, but this happend!
ORA-01403: no data found
ORA-06512: at line 10

 in anonymous block at line 13

The output is followed by the exception from the final RAISE; of the block (from SQL developer).

Fehler beim Start in Zeile: 4 in Befehl -
declare
  procedure DoSomething as
  begin
    pk_log.trace('I want to doSomething!');
  end;

  procedure DoMore as
  begin
    doSomething;
    raise no_data_found;
  end;
begin
  doMore;
exception 
  when others then
    pk_log.error('I wanted to do more, but this happend!');
    raise;
end;
Fehlerbericht -
ORA-01403: no data found
ORA-06512: at line 17
ORA-06512: at line 10
ORA-06512: at line 13
01403. 00000 -  "no data found"
*Cause:    No data was found from the objects.
*Action:   There was no data from the objects which may be due to end of fetch.

In SQL Plus we get one more line, because of the additional raise in line 17.

There is something special in the code. The last line from the error backtrace is captured and returned. "in anonymous block at line 13" .

This last line is missing in the error stack, which means it is usually not returned when you only store dbms_utility.format_error_stack.

To add this last line utl_call_stack.backtrace_line(utl_call_stack.backtrace_depth) is executed. All other lines from the error backtrace are already in the error stack (depends on DB version) – so those can be ignored.

Conclusion

For my purposes it works fine and the resulting code is very clean.

I currently use it in an audit package based upon Connor McDonalds Audit Generator which was massively reworked for some custom needs. Maybe I publish this one day. But that is another story.

who_called advantages

  • Way less to type – less mistakes
  • much easier to see what the error handler does
  • concentrate coding effort on the important bits – mostly the additional information that should be added to the error

who_called disadvantages

  • A tiniest bit slower than hard coding the package and submodule name
    => For exception handling this is not relevant.
    => In huge debug or trace scenarios this time could add up if extreme performance is needed. Conditional compiling can be applied to remove instrumentation calls if you compile for extreme plsql performance.
  • since part of the logic is hidden other developers might wanted to add the module name into the message
  • module name is fetched at run time, not at compile time
    => this can be an issue, especially if optimization level 3 is used. Function inlineing (check the compiler warnings) is a method that the plsql optimizer can choose to speed up logic. However it also means that the function at runtime will not exist anymore. And utl_call_stack will not be able to report the name of the module correctly. Note that there is a compiler warning that inlining happend (which usually is a good thing).

Feel free to adapt this code snippet to your special needs. I’m convinced this will help you to write simpler and more efficient error handlers.

10 little Oracle SQL features – that you are probably not using enough

This is a collection of some small features, patterns or snippets, that I seem to use frequently and like to share. The features are written in no specific order. However we can structure the features into performance related [P], convenience [C] or enablers [E].

Enablers [E] would be features that allow us to do something, that is otherwise impossible or at least very hard to achieve. Sometimes even to avoid an error.


Convenience [C] features are things that offer an alternative construct. Often they can be used instead of another option. Sometimes making the code easier to change (maintainability).


Performance [P] features improve execution speed. Often they come with a drawback or one should keep exceptions or rare race conditions in mind.

1. find multiple words using regexp_like [C]

instead of writing multiple like conditions (or repeating the same select using like multiple times), we can write a single regexp_like expression.


Cumbersome:

where (str like '%word1%' OR str like '%word2%' OR ...)

Much easier:


where regexp_like(str,'(word1|word2|...)')

The pipe “|” inside the regular expression acts as the OR operator and the parenthesis () form the subexpression that is needed for that OR.

A very typical case is to query the v$parameter table for a set of parameter names.
example

select name, value 
from v$parameter 
where regexp_like(name,'(listen|dispatch|service)');
NAMEVALUE
service_namesfeno1pod
enable_dnfs_dispatcherFALSE
dispatchers(PROTOCOL=TCP) (SERVICE=feno1podXDB)
max_dispatchers
query result

What I especially like about it, is that the need for wildcards is gone and it is so easy to add more “words” to search for.

2. fetch first row only [C]

Do you still use rownum frequently? Why not use to the row limiting clause instead?

I use it a lot, mostly for ad hoc queries. One advantage is that the need to create an inner query that does the correct ordering disappears.


example

-- show the top 3 tables with the most rows
select table_name, num_rows
from user_tables
order by num_rows desc
fetch first 3 rows only;
TABLE_NAMENUM_ROWS
IMP_SD_2744-12_45_1819696
IMP_SD_2822-14_28_5319692
IMP_SD_194-09_40_5019545
Query result

3. use expression lists (a,b) [E]

I use this frequently in update statements but also sometimes in joins. Instead of setting each column separately we can set multiple columns at once. Simply by using parenthesis and a comma between the columns.

example

update persons u
set (u.first_name, u.last_name) = (select a.firstname, a.name 
                                   from applicants a 
                                   where a.person_id = u.id)
where u.status = 'NEEDS REFRESH';

Or we can filter on a combination of columns (a,b) in ((c,d),(d,c),(e,f)). The need for this should be rare, but it happens. A strange edge case is when we do an equality comparison of two expression lists. For some reason the right side needs an additional set of ().

(a,b) = ((c,d))

4. DeMorgans law [C,E]

This might be the single most important math/logic rule, that SQL developers should know. It is usually applied inside the where clause and knowing it by heart will prevent a lot of easy mistakes.

{\displaystyle {\begin{aligned}{\overline {A\cup B}}&={\overline {A}}\cap {\overline {B}},\\{\overline {A\cap B}}&={\overline {A}}\cup {\overline {B}},\end{aligned}}}
demorgans law

written as a SQL expression

not(A or B) = not(A) and not(B)

not(A and B) = not(A) or not(B)

It is easy to remember. OR changes to AND when the parenthesis are resolved (and the other way round).
A and B here are full SQL expressions, for example x=y.


Why is that so important? SQL is a very logic based language. In daily speach we often use logical operators (AND/OR/NOT) differently than what is needed when they should be applied in SQL. This can become confusing very easily. Knowing DeMorgans law helps to quickly check if the logic is used correctly.

An example

Task: “Find all employees that are not working in sales and marketing.”

Converting this sentence 1:1 into SQL would result in this:

select * from emp
where not (department = 'SALES' and department = 'MARKETING'); 

Obviously what is ment, is that we want to find those employees that are working in some other department.

Applying Damorgans Law, we can reformulate our select statement. Also we replace NOT (x=y) with x!=y .

select * from emp
where  department != 'SALES' 
    OR department != 'MARKETING'; 

A developer should understand now, that this condition will always be true. For each row the department is either sales or not sales. And if it is sales, then it is not marketing. So the combination is always true (excluding NULL value considerations). Which is probably not what is wanted.

Here the row logic also plays a part. The where clause is applied to a single row, but normal speach often uses the boolean operation to combine data sets.

The correct query would be

select * from emp
where  not (department = 'SALES' 
             OR department = 'MARKETING'); 


or even better use IN instead or OR

select * from emp
where  department not in ( 'SALES' , 'MARKETING'); 

Historic anecdote: Da Morgan was not the first one to discover this logic. Centuries before Da Morgan, a guy called William of Ockam already wrote about it. He probably had it from Aristotle. Occam (the spelling changed over the centuries) nowadays is more famous for his razor.

5. rollup and grouping sets [C]

To get a total row for a query that uses sum or count, simply add rollup.


example rollup

select tablespace_name, count(*)
from dba_tables 
group by rollup (tablespace_name);
TABLESPACE_NAME	COUNT(*)
DATA	        362
DBFS_DATA	2
SAMPLESCHEMA	14
SYSAUX	        1357
SYSTEM	        1056
	        322
	        3113

The last line is the total number of tables. One problem here are tables without a tablespace (null). The grouping() function helps to separate a normal row with a data value NULL from a superaggregate row (19c grouping).

If there are multiple columns then instead of rollup I use grouping sets. The full set (=total) in a grouping set expression can be expressed by ().

example grouping sets + grouping

select case when grouping(tablespace_name) = 1 then '-all-' else tablespace_name end as tablespace_name
     , case when grouping(cluster_name) = 1 then '-all-' else cluster_name end as cluster_name
     , count(*) 
from dba_tables 
group by grouping sets ((tablespace_name, cluster_name),());
TABLESPACE_NAME	CLUSTER_NAME	COUNT(*)
		                322
DATA		                362
SYSAUX		                1356
SYSAUX	SMON_SCN_TO_TIME_AUX	1
SYSTEM		                1020
SYSTEM	C_RG#	                2
SYSTEM	C_TS#	                2
SYSTEM	C_OBJ#	                17
SYSTEM	C_COBJ#	                2
SYSTEM	C_MLOG#	                2
SYSTEM	C_USER#	                2
SYSTEM	C_FILE#_BLOCK#	        2
SYSTEM	C_OBJ#_INTCOL#	        1
SYSTEM	C_TOID_VERSION#	        6
DBFS_DATA		        2
SAMPLESCHEMA		        14
-all-	-all-	                3113

Again the last line shows the total number of tables. Since I didn’t want to see subtotals for tablespace or cluster grouping sets is the perfect solution t add this total row.

Also notice that the first line has all null values for the names, same as the total line would have. Using the grouping function allows to find out which is the total line and give it a meaningful text.

6. enhanced listagg [E]

In the more recent database versions, the very useful LISTAGG command got even better. For production code I nowadays always try to remember to add some safety protection in place in case result of the aggregation becomes big. Otherwise it could happen to get ORA-01489: result of string concatenation is too long.

Since 12.2 we can avoid the error by using the OVERFLOW clause

listagg (... on overflow truncate without count) ...

So instead of the error message, when the maximum string size is reached (4000 bytes or 32k bytes depending on max_string_size parameter) we get usable text without the statement raising an error.

example

select count(*)
, listagg(table_name,', ' on overflow truncate) within group (order by tablespace_name desc, table_name) all_tables
from dba_tables;
COUNT(*)ALL_TABLES
3113AQ$_ALERT_QT_G, AQ$_ALERT_QT_H, AQ$_ALERT_QT, … many many more tables …, SDO_CRS_GEOGRAPHIC_PLUS_HEIGHT, SDO_CS_SRS, SDO_DATUMS, …(1304)

The three dots “…” are called an ellipsis and can be configured. without count would avoid writing the total number of entries to the end of the list. with count is the default if truncate is specified.

Although the overflow clause is very usfull, the ultimate goal would be to give the developer more influence over it. Recently there was an interesting twitter discussion around that topic.

Other useful enhancements (19c) were LISTAGG distinct. example on LiveSQL

7. Using sys_context [C,P]

A sys_context is something like a global variable in other languages. The normal context is for the session, but it is also possible to use application wide contexts.

Oracle provides several “preconfigured” contexts for us. The most common are ‘USERENV’ and ‘APEX$SESSION’ (for apex developers). Contexts are also used for security policies with VPD.

Here are the contexts that I frequently like to use

  • sys_context('userenv','client_identifier')
    value set by dbms_application_info.set_client_info
  • sys_context('userenv','current_edition_name')
    when using edition based redefinition (ebr), this shows which edition I’m in. Always good to double check!
  • sys_context('userenv','current_user')
    similar to pseudocolumn user. The schema name that we connected with.
    In some oracle versions (12.1) much faster than user, my recent test shows that this performance difference is now gone (19c)
  • sys_context('userenv','proxy_user')
    When doing a proxy connect then this is the name of the authenticated user, not the target schema
  • sys_context('userenv','os_user')
    useful when client_identifier is empty, for example the name of the Windows login account when using SQL Developer under Windows.
  • sys_context('apex$session','app_user')
    apex authenticated user
  • sys_context('apex$session','app_id')
    id of the current apex app
  • sys_context('trigger_ctl','audit_on')
    Is auditing enabled/disabled? Part of my custom adaptation/enhancement of Connor McDonalds Audit-Utility package
  • sys_context('userenv','sid')
    the session ID of the current database session

Side note: UPPER/lower case does not make a difference for contexts. Only for the values of the context.

Warning: when you start using your own custom contexts, be careful with the grant create any context privilege. It can be a security risk! Revoke it after it was used.

8. hint /*+ driving_site */ [P]

I feel like this is a classic for me. In one of my longest running projects we had a lot of databases connected by database links. One of the most important tuning activities was to understand how to do remote joins in a performant way. The driving_site hint was extremely important for that.

General rule: When doing remote queries, avoid mixing local and remote tables in one select statement. If only remote tables are in the statement always use the driving_site hint.

Nowadays the need for this hint diminishes, mostly because databases are less connected by DB links, but more by webservices (which does not really improve performance, but that is a different story)

9. join via using [C]

This only works reliably when the naming conventions of the data model fit to it.

example

We have a table PERSON and a table ADDRESS. The primary key in PERSON is PERS_ID. The relationship column (FK) in ADDRESS is also called PERS_ID. It wouldn’t work as well with ADR_PERS_ID for example.

Then we can do a quick join like this:

select * 
from person 
join address using (pers_id) 
where address.type='MAIN';

There are some slight differences compared to the normal way using ON. Mostly one then can not differentiate anymore from which table the pers_id originates. For example you can not refer to person.pers_id in the where clause anymore.

Currently I use it exclusively for ad hoc queries, not in plsql code.

10. interval literals [C]

If I want to add some time to a date or a timestamp, I always use interval literals (unless it is a full day or a full month).

example
Let’s say we want to check tickets that were entered during the last hour. Using the date datatype we could easily calculate an hour by dividing a full day / 24. This is how many developers calculate time. And it is perfectly ok to do so.

select * from tickets 
where createddt >= sysdate - 1/24 ;

The term sysdate-1/24 resembles one hour before “now”.

This is how I prefer to write it:

select * from tickets 
where createddt >= sysdate - interval '1' hour ;

I feel the code documents itself in a much better way.

Additionally requirements like this easily change.
Let’s change both expressions to 15 minutes.

sysdate - 15/24/60

sysdate - interval '15' minute

Which one is better to read and was quicker to change?

Warning: Sayan Malakshinov in the comments mentions that we should not use interval literals (ymintervals) to add full months. I agree with that. Calendar arithmetic is complex – there are specific functions like add_months for that. So do not use interval '1' month.

honorable mentions

Some more snippets or functions that didn’t quite make it into the list.

  • nvl2 [C]
  • the third parameter of to_char (nlsparams) [E]
  • query block hints [P]
  • hierarchical with clause [C,E]
  • lateral join [C]
  • alter sequence restart [C,E]
  • cross join [C]
  • lnnvl [C,E]
  • hex format “XXXX” [E]
  • sys.odcinumberlist [C]
  • analytic sum + correct window clause: rows between instead of range between [E]

Summary

I hope I mentioned something that you didn’t know about before and liked reading about.

Outlook

Here are some features/patterns that I’m currently not using myself, but where I feel that they might become quite important in the future.

  • SQL Macros
  • boolean in SQL (not implemented yet, but Oracle seems to be working on it – I expect some news about this very soon)
  • median and other statistical functions like cume_dist, percentile_xxx, percent_rank
  • match_recognize
  • with clause udf functions (still think they are overrated, but I might be wrong)
  • approx_count_distinct
  • analytical views

Audit column performance retested in 19c: “user” is fast now!

Motivation

In an older and quite successful blog post of mine I tried to find the “perfect” solution for audit columns. See: sequence-and-audit-columns-with-apex-5-and-12c.

One finding then was that sys_context('userenv','current_user') is considerably faster than the user pseudocolumn.

I recently noticed that this seem to have changed and decided to retest the behavior.

The old test was done against a 12.1.0.1 standard edition database.

The new test is done against a 19.5 enterprise edition (on ATP which features an exadata).

Test setup

Mass inserting 100000 records into empty tables. A very fast select was used to generate 100k rows of null values.

3 columns were filled with data:

  • ID by a sequence using a cache of 10000
  • created_by filled with the schema name => sys_context or user
  • created_on filled with sysdate

Test 1 (T1) was using sys_context as default value.

Test 2 (T2) was using user as default value.

Test 3 (T3) was using a trigger with sys_context.

Test 4 (T4) was using a trigger with user.

Each test was run once to warm-up the database (and the tablespace). This warm-up run didn’t count. Then each test was run 3 times. Only the execution speed of the inserts were measured.

Results

Test 1 – sys_context as default value

Elapsed:1.23 seconds
Elapsed:1.21 seconds
Elapsed:1.26 seconds

Average: 1.23 s

Test 2 – user as default value

Elapsed:1,32 seconds (This looks suspicious. Maybe the warm-up run didn’t warm up enough)
Elapsed:1,16 seconds
Elapsed:1,19 seconds

Average: 1.22 s

Test 3 – sys_context in trigger

Elapsed:17,33 seconds
Elapsed:17,08 seconds
Elapsed:17,05 seconds

Average: 17.15 s

Test 4 – user in trigger

Elapsed:17,03 seconds
Elapsed:16,71 seconds
Elapsed:16,97 seconds

Average: 16.90 s

Comparison

My test shows that user was even a tiny tiny bit faster than the sys_context. Which means the previous 12.1 recommendation not to use “user” anymore is now outdated. The difference is so small that you can choose whatever you want.

User is fast now!

The main message still is, if you can get rid of the trigger, then do it. The difference between a default value logic and the trigger is huge! But even in the trigger logic there seems to be a slight performance advantage for user now.

If you are still on 12.1 or 12.2 you should run your own tests and compare the results. If you are on 19c or higher I would now use user again.

Test scripts

Create statements

-- cleanup
drop table test_insert_perf_t1_sys_context purge;
drop table test_insert_perf_t2_user purge;
drop table test_insert_perf_t3_trigger purge;
drop sequence test_insert_perf_t3_trigger_seq;
drop table test_insert_perf_t4_trigger purge;
drop sequence test_insert_perf_t4_trigger_seq;

-- create objects
create table test_insert_perf_t1_sys_context
             (id number generated by default on null as identity (cache 10000) primary key
             ,created_by varchar2(128) default on null sys_context('userenv','current_user') not null 
             ,created_on date default on null sysdate not null);

create table test_insert_perf_t2_user
             (id number generated by default on null as identity (cache 10000) primary key
             ,created_by varchar2(128) default on null user not null 
             ,created_on date default on null sysdate not null);

create table test_insert_perf_t3_trigger
             (id number not null primary key
             ,created_by varchar2(128) not null 
             ,created_on date not null);

create sequence test_insert_perf_t3_trigger_seq cache 10000;          

create or replace trigger test_insert_perf_t3_trigger_bri
before insert on test_insert_perf_t3_trigger 
for each row
begin
  :new.id := test_insert_perf_t3_trigger_seq.nextval;
  :new.created_by := sys_context('userenv','current_user');
  :new.created_on := sysdate;
end;
/

create table test_insert_perf_t4_trigger
             (id number not null primary key
             ,created_by varchar2(128) not null 
             ,created_on date not null);

create sequence test_insert_perf_t4_trigger_seq cache 10000;          

create or replace trigger test_insert_perf_t4_trigger_bri
before insert on test_insert_perf_t4_trigger 
for each row
begin
  :new.id := test_insert_perf_t4_trigger_seq.nextval;
  :new.created_by := user;
  :new.created_on := sysdate;
end;
/

Run statements

-------------------------------------------------------------------
-- run tests (insert 100000 rows)
-------------------------------------------------------------------

-- Test 1 --------------------------------------
set serveroutput on
set time on;
declare 
    v_time number;
begin
    v_time := dbms_utility.get_time;

    -- insert 100000 records
    insert into test_insert_perf_t1_sys_context(id)
    (select n1.nr
     from (select cast(null as number) nr from dual connect by level <=100) n1
     cross join (select cast(null as number) nr from dual connect by level <=1000) n2 
    );

    sys.dbms_output.put_line('Elapsed:'||to_char((dbms_utility.get_time - v_time) / 100)||' seconds');
    rollback;

end;
/

-- Test 2 --------------------------------------
set serveroutput on
set time on;
declare 
    v_time number;
begin
    v_time := dbms_utility.get_time;

    -- insert 100000 records
    insert into test_insert_perf_t2_user(id)
    (select n1.nr
     from (select cast(null as number) nr from dual connect by level <=100) n1
     cross join (select cast(null as number) nr from dual connect by level <=1000) n2 
    );

    sys.dbms_output.put_line('Elapsed:'||to_char((dbms_utility.get_time - v_time) / 100)||' seconds');
    rollback;

end;
/


-- Test 3 --------------------------------------
set serveroutput on
set time on;
declare 
    v_time number;
begin
    v_time := dbms_utility.get_time;

    -- insert 100000 records
    insert into test_insert_perf_t3_trigger(id)
    (select n1.nr
     from (select cast(null as number) nr from dual connect by level <=100) n1
     cross join (select cast(null as number) nr from dual connect by level <=1000) n2 
    );

    sys.dbms_output.put_line('Elapsed:'||to_char((dbms_utility.get_time - v_time) / 100)||' seconds');
    rollback;

end;
/


-- Test 4 --------------------------------------
set serveroutput on
set time on;
declare 
    v_time number;
begin
    v_time := dbms_utility.get_time;

    -- insert 100000 records
    insert into test_insert_perf_t4_trigger(id)
    (select n1.nr
     from (select cast(null as number) nr from dual connect by level <=100) n1
     cross join (select cast(null as number) nr from dual connect by level <=1000) n2 
    );

    sys.dbms_output.put_line('Elapsed:'||to_char((dbms_utility.get_time - v_time) / 100)||' seconds');
    rollback;

end;
/

Warning! Hidden traps – comments in SQL*Plus

I am a big fan of comments (see also CDD Manifesto a bit further down). Comments are simple ways to add information to objects/code without changing code.

Comments that work in SQL or PLSQL might behave differently in SQL*Plus. This post is to create awareness about such traps.

The basics

comments in SQL and PLSQL

Comments in SQL and PLSQL come in two flavors. The one line comment and the multi line comment.

-- this is a one line comment. 

The double hyphen “–” starts the one line comment. It ends at the end of the line.

/* this
is a
mutli-line
comment */

A slash “/” followed by an asterisks “*” starts a multi line comment. And it is ended in the opposite way.

Comments do not stack.

-- this /* will
not work */

This also will not work.

/* line 1
   /* line 2
      /* line 3
      line 4 */
   line 5 */
line 6 */

The multi-line comment ends after the first time “*/” is encountered. So line 5 and 6 are not commented, instead line 5 will raise a syntax error.

In SQL a very special form of a comment is a hint. A hint tries to give the SQL optimizer additional instructions. A + indicates that this comment is a hint.

select /*+warp_speed(5) */ from big_table;

A hint could be compared to an annotation in other languages. The comment syntax is used to add some special meaning to the surrounding code. This post is not about such annotations.

comments in SQL*Plus

In general we run SQL statements, PLSQL blocks and SQL*plus commands in a SQL*Plus script. Typical examples are SELECT statements, CREATE OR REPLACE PACKAGE commands and SET commands..

The normal SQL and PLSQL comments do function in SQL*plus as well. Additionally we have a REM(ARK) command to add comments.

SQL> REM this is a SQL*plus comment

We can also use PROMPT to have a kind of echo inside a SQL*Plus script. Although it is not a real comment, it is used in a similar way.

SQL> PROMPT *** heading ***
*** heading ***

Surprising differences

SQL*Plus interprets comments slightly differently than SQL or PL/SQL. Here are some examples. Try to guess what happens before reading the result. The examples do work in SQL Developer, but behave differently in SQL*Plus.

Example 1

generate some DDL commands…

select 'drop '||object_type||';' 
from user_objects 
--where object_type = 'VIEW';
where object_type = 'TRIGGER';

Result 1 (real object names changed)
... many similar rows ...
'DROP'||OBJECT_TYPE||''||OBJECT_NAME||';'
drop SYNONYM PLSQL_PROFILER_DATA;
drop SYNONYM PLSQL_PROFILER_RUNNUMBER;
drop SYNONYM PLSQL_PROFILER_RUNS;
drop SYNONYM PLSQL_PROFILER_UNITS;
drop INDEX ABC_UK01;
drop PACKAGE MYPK;
drop PACKAGE BODY MYPK;
drop TABLE MYTAB;
drop DATABASE LINK DISTANT.WORLD.LOCAL;
drop DATABASE LINK REMOTE.WORK.LOCAL;
2243 rows selected.
SQL> SP2-0734: unknown command beginning "where obje…" - rest of line ignored.
SQL>

2242 rows selected? Yes there were some triggers, but not that many. When SQL*plus finds a semicolon at the end of a line, it interprets this as the end of the command. Even if that semicolon is commented out. In this example the statement produced a drop command for each object in the schema. But the intention was to only drop all triggers.

Example 2

Add plsql doc information…

create or replace function convertNum(p1 in varchar2) return number
is
/***************************
@usage: converts a string into a number. 
        If conversion is not possible, 0 is returned.
@author: Sven
@param : p1 : input string
@return : converted number
****************************/
begin
  return to_number(p1 default '0' on conversion error);
end convertNum;
/

Result 2
SQL>
create or replace function convertNum(p1 in varchar2) return number
is
/***************************
@usage: converts a string into a number.
SP2-0310: unable to open file "usage:.sql"
        If conversion is not possible, 0 is returned.
@author: Sven
SP2-0310: unable to open file "author:.sql"
@param : p1 : input string
SP2-0310: unable to open file "param.sql"
@return : converted number
SP2-0310: unable to open file "return.sql"
****************************/
begin
  return to_number(p1 default '0' on conversion error);
end convertNum;
  9  /

Function created.

SQL>

An @ at the start of a line is always interpreted as a call to a script even if it is inside a multi line comment. This lowers the usefulness of the “generate DB doc” functionality in SQL Developer a lot. Putting additional blanks before the @ will not change this behaviour.

Example 3

comment out optional params…

SQL> set serveroutput on -- size unlimited
Result 3
SP2-0158: unknown SET option "--"

This means the SET option is ignored. There is an error message, but it is a SQL*Plus (SP2) error message. This would not stop a script that has when WHENEVER SQLERROR EXIT set at the beginning.

Example 4

comment a statement…

select sysdate from dual; -- print current day

Result 4
SQL> select sysdate from dual; -- print current day
2
Do you notice the 2 at the beginning? The statement is not terminated. The buffer waits for the next line and the next command probably finishes the buffer and runs into a syntax error.

Warning

Personally I think that those examples are bugs. But it is documented behaviour, so Oracle probably disagrees. There is an extra section in the SQL*plus documentation, dedicated to comments in scripts where these and more problems are described.

Remember

If you intend to run some plsql code or a SQL statement from a SQL plus script, then

  • do not have semicolons in a comment, especially not at the end of the line
  • do not use @ in the comment, especially not at the beginning of a line
  • do not use — inside SELECTs (hints seem to work)
  • avoid to use — or /* */ to write sql*plus comments, instead use REMARK or PROMPT

More side effects

Comments near the beginning of a command can confuse sql*plus.

SQL> create /* testproc */ procedure test
2 is
3 begin null;
Warning: Procedure created with compilation errors.
SQL> end;
SP2-0042: unknown command "end" - rest of line ignored.
SQL> /

Blank lines in comments can break the code. This can be suppressed by using SET SQLBLANKLINES ON.

SQL> SET SQLBLANKLINES ON;
SQL> select sysdate from dual
2 /* this is a multi
3 line
4
5 comment */
6 ;
SYSDATE
25-AUG-20

SQL> SET SQLBLANKLINES OFF;
SQL> select sysdate from dual
2 /* this is a multi
3 line
4
SQL> comment */
2 ;
comment */
*
ERROR at line 1:
ORA-00969: missing ON keyword
SQL>

An ampersand “&” in comments is tried to be substituted by a value of a variable. This can be suppressed using SET DEFINE OFF. Not a big deal unless we do not want to suppress it.

SQL> select sysdate from dual
2 -- current day at marks&spencers
3 ;
Enter value for spencers:
old 2: -- current day at marks&spencers
new 2: -- current day at marks
SYSDATE
25-AUG-20

A comment inside a PROMPT is not a comment, but will be printed to the screen.

SQL> prompt /* comment or not comment that is the question */
/* comment or not comment that is the question */

If you want a “;” to be printed to a spool file using PROMPT, use two “;;”.

SQL> prompt test;;
test;

Additional thoughts

Good developers can get a solid overview over unknown code, just by reading the comments.

Set your syntax highlighting in such a way that makes it easy to read comments. Unfortunately Oracle SQL Developer some time ago switched from green comments to grey comments. My recommendation is to change this setting.

Comment Driven Development (CDD) Manifesto

Comments are good!

Every code needs comments!

Comment first, code later!

Comments state code intention.

Comments describe why a technical decision was made

or not made.

Comments can give examples.

The world needs more comments!

SQL Quickie: How to reset an identity column

Introduction

Since Oracle 12 we can declare columns as an identity column. Other databases call this feature “autoincrement” column. A term that can easily lead to some misunderstandings, but is already well established. In some of those databases such a column allows to reset the current value. There is no (documented) way for that with an Oracle identity column. A recent discussion in the ODC forums lead me to think whether it is possible to set an identity column to a custom value. And yes it is.

TL;DR;

  • Change the increment of the sequence that supports the identity column by modifying the column.
  • Increase the value by an insert+rollback.
  • Reset the sequence by modifying the column again.

or use the undocumented RESTART option (12.2 onwards)

alter table demo_identity_reset
modify id generated always as identity restart start with 60;

Showcase

demo setup

First we create a table with an identity column. We use default settings for the sequence, which for example means increment +1 and cache 20.

/* prepare demo table */
create table demo_identity_reset
(id number generated always as identity primary key
,name varchar2(500)
);

Then insert some demo values.

/* add some data */
insert into demo_identity_reset (name) values ('test1');
insert into demo_identity_reset (name) values ('test2');
commit;
select * from demo_identity_reset;
ID NAME
1 test1
2 test2

We now have a table with an identity column where some values are used and more values are currently cached in the SGA.

Set a custom value (12.1)

Gather info

First we need to find the name of the sequence. Either by looking at all sequences, but more correct would be to look at the column definitions. The name of the sequence can be seen in the default value of the identity column.

/* check data dictionary and find the sequence name*/
select * from user_sequences;

or

/* check data dictionary and find the sequence name*/
select data_default
from user_tab_columns
where table_name = 'DEMO_IDENTITY_RESET'
and identity_column = 'YES';
DATA_DEFAULT
"MYSCHEMA"."ISEQ$$_172206".nextval

In my case the sequence name is ISEQ$$_172206. The generated name will always start with ISEQ$$_.

Other information we need to find is the current value, the cache size and the increment. The only decent way to see the current value is to set the sequence to nocache and after that check the data dictionary. We need to remember the original cache size setting to be able to restore it later to that value.

Start the change

From this step onward no other session should insert at the same time. It might be a good idea to lock the table in exclusive mode during those steps.

The cache also needs to be set to nocache to prevent caching any values with the wrong increment during the next steps.

/* reset the cache so that we can see the next value */
select cache_size 
from user_sequences 
where sequence_name = 'ISEQ$$_172206';

alter table demo_identity_reset 
modify id generated always as identity nocache;

Cache size was 20. The sequence now is not caching anymore. This is shown in the user_sequences dictionary view.

Now we can read the next value from the data dictionary and use that information to set an increment that jumps to our target value.

Here I assume a target value of 60.

/* find the next value */
select last_number from user_sequences 
where sequence_name = 'ISEQ$$_172206';

/* calculate the increment: 57 = 60(target value) - 3(last number) */

/* change the increment so that it jumps over all the other values. */
alter table demo_identity_reset 
modify id generated always as identity increment by 57 nocache; 

The last number was 3. Last number will always hold the next value after the cached values.

-- this does a sequence.nextval which we can not call otherwise
insert into demo_identity_reset (name) 
values ('dummy. rollback immediatly.'); 
rollback;

It doesn’t matter much, which value was inserted (59). In case you want to avoid any gaps, then this insert needs to be one of the real inserts that needs to be done. And the increment needs to be one value more (58 instead of 57). So that instead of a rollback you can do a commit.

/* reset the changes */
alter table demo_identity_reset 
modify id generated always as identity increment by 1 cache 20;

Done. The next insert will now start with our target value 60.

Set a custom value (12.2)

I tested this in 19c on LiveSql but it should work in all versions that belong to the 12.2. database family (12.2.0.1, 18c, 19c).

Instead of the complicated series of steps in 12.1 we can do it all in a single command. Unfortunately this command is undocumented, so use it at your own risk.

alter table demo_identity_reset 
modify id generated always as identity restart start with 60;

The RESTART option was mentioned in one of the new feature guides and Roger Troller made me aware of it in one of his blog posts. However currently (19c) it is still missing in the SQL reference documentation.

Test the change

The following steps are not needed anymore. But they are proof that the concepts works.

/* check the result in the dictionary*/
select last_number, cache_size 
from user_sequences 
where sequence_name = 'ISEQ$$_172206';
LAST_NUMBER    CACHE_SIZE
 60    20
/* test the result using the the table*/
insert into demo_identity_reset (name) values ('test3');
commit;
select * from demo_identity_reset;
IDNAME
1test1
2test2
60test3

Yes it works!

Cleanup code

Drop the table and the sequence. A simple drop table is not enough, because the sequence will continue to exist as long as the table is still in the recycle bin.

/* undo testcase */
drop table demo_identity_reset purge;

Further information

In general I would not recommend doing that in a production environment. But there are situations where you want to do it, for example after coping a production schema to a test database and cleaning out many values, you might like to reset the sequences too.

Such situations are rare and probably once in a while activities. That is also why I didn’t include the lock table command. And the lock will only hold until the next alter table command. Which makes it not safe to use.

It is not possible to use an alter sequence command for sequences that are connected with an identity column. It results in an error message.

alter sequence "ISEQ$$_172206" restart start with 60;

ORA-32793: cannot alter a system-generated sequence

Here is an example of the definition of an autoincrement column in another database (MySQL).

https://dev.mysql.com/doc/refman/5.7/en/example-auto-increment.html

The AUTO_INCREMENT attribute can be used to generate a unique identity for new rows

No value was specified for the AUTO_INCREMENT column, so MySQL assigned sequence numbers automatically. You can also explicitly assign 0 to the column to generate sequence numbers, unless the NO_AUTO_VALUE_ON_ZERO SQL mode is enabled.

If the column is declared NOT NULL, it is also possible to assign NULL to the column to generate sequence numbers.

When you insert any other value into an AUTO_INCREMENT column, the column is set to that value and the sequence is reset so that the next automatically generated value follows sequentially from the largest column value.

Updating an existing AUTO_INCREMENT column value in an InnoDB table does not reset the AUTO_INCREMENT sequence as it does for MyISAM and NDB tables.

MySQL 5.7 Reference Manual – 3.6.9 Using AUTO_INCREMENT

Some quick facts about sequence caches and gaps in IDs

  1. Oracle sequences generate key values without gaps (minor exception from the rule: scalable sequences).
  2. Surrogate keys can and will have gaps.
  3. Gaps exist because it is not worth the huge effort to avoid them.

The biggest misunderstanding is probably the difference between the value coming from a sequence and the primary key value inside a database column (ID column). We can use a sequence to set a surrogate key. Typically we do this by a database trigger or an identity column or even as a value directly during the insert.

“The history of sequence caches is a history of misunderstandings.”

somebody on the internet, 1963

Essentially it is a two step process. The new ID value is created by sequence.nextval and then used during the insert. Oracle provided the logic to get a new sequence value in a fast, performant and multi-session friendly way. The developer is responsible to make sure this value is not wasted.

The sequence cache is a performance instrument. A lost cache value means we will see gaps in future ID values. There is only a very limited number of activities that makes cached sequence values “disappear”. The majority of gaps originates from other activities, like failed inserts or deletes. Many developers seem to think, that such a gap is caused by the sequence cache. I find it important to separate what reason ultimately lead to a gap in an ID value.

A gap analysis can be the first step. The following select finds gaps and orders them by size. Put your own table and PK column into the “ids” subquery and find out how the gaps are distributed in that table.

-- gap analysis
with ids as (select region_id as id, lead(region_id) over (order by region_id) as nextid from regions)
   , gaps as (select id, nextid - id as gap_size from ids where nextid > id + 1)
select gap_size , count(*) gaps_found  
from gaps
group by gap_size
order by count(*) desc;

Example result

gap_sizegaps_found
205
32
22
191

Here the data shows a few small gaps. Either some insert didn’t make it or rows had been deleted. But there are also several gaps of size 19 and 20, which means the sequence cache was probably lost at least 6 times.

To make it clear, IDs are allowed to have gaps. However when an end user complains why he constantly sees gaps in some ID, then we should investigate what is going on. Frequent gaps should not be the expected normal behaviour – it is worth finding the root cause for such an issue.

So here is a (non complete) list of activities that do or do not influence the caching of sequences and the existence of gaps in an ID column.

action result
multiple sessionscache used
multiple sessions with one session slow to commitIDs not in order of commit => not a gap eventually, but looks like a gap until final commit
alter system flush shared_pool ;cache values lost => gap in ID
shutdown immediate;cache values not lost
shutdown abort;cache values lost => gap in ID
insert; commit;no gap
insert; rollback;gap in ID
failed insertgap in ID
delete; commit;gap in ID
switch sequence from cache to nocachecache values not lost
kill sessionpossible gap in ID
(values not used|commited are lost)
different RAC sessionscache values not lost
(but separate caches on each RAC node)
SGA size too smallcache values lost => gap in ID
(sequence cache ages out of shared pool per LRU list)
session sequencesnot useful for unique ID values => duplicate ID
scalable sequencescache values not lost => planned large gap in ID
(because of prefix)
alter system cancel SQL ‘SID, serial’;
(insert killed)
gap in ID
alter system disconnect session post_transaction;no gap
(unless sequence was fetched before the transaction started)
alter system disconnect session immediate; gap in ID
sequence restartpossible duplicate IDs
sequence interval > 1planned gap in ID

That a sequence ages out of the shared pool is in my experience the most common cause of lost sequence values that are noticed. A strong indication is when frequent gaps of 20 values exist.

Possible workarounds are

a) set the sequence to nocache if the sequence is used rarely
alter myseq nocache;
This might cost performance.

b) pin the sequence in the shared pool
dbms_shared_pool.keep('MYSCHEMA.MYSEQ','Q')
Pinning the sequence is useful when you have times with lots of concurrent activities followed by larger idle periods.
This costs a little memory.

c) increase SGA size – sometimes it helps to increase the restricted shared pool area.
This costs memory (and money) – not available for other non-database processes.

Conclusions

  • Small gaps in surrogate keys (ID columns) are natural, mostly because of delete statements or failed inserts.
  • The sequence object typically does not produce gaps.
  • Very frequent gaps of size 20 (default cache size) are not natural and might need investigation.

Basic SQL: All about sequences

Intro

There are still a lot of misunderstandings about Oracle sequences. Sometimes even experts tell you things about sequences that are easy to misunderstand, especially if we look into the details. The following post wants to give a detailed overview about what are sequences, why they work as they do, and how we should use them.

There are also a lot of parameters that the sequence object has and that you can use to tweak the behaviour. I will cover the most common things here.

Wording

Many of the misunderstandings come from how we use the word “sequence”. It can mean several slightly different things.

meaning a) The sequence object in the database aka the number generator
meaning b) the number value that is retrieved via mySeq.nextval
meaning c) an attribute for a list of numbers, stored typically in an ID column
“This list is in sequence” often means that we have an ordered list of numbers without gaps (math: monoton increasing integer values).

For the remainder of the document I will try to make always clear which meaning I am referring to. The relevant words will be written in italics to hint about the specific interpretation in that sentence. In cases where I say “sequence” without additional specification details, I will mean the sequence object.

Purpose

The most common sequence usage is as technical values for ID columns. A typical ID column is a surrogate key. Opposed to a natural key, a surrogate key has no intrinsic meaning. It’s only use is to identify (=ID) a database record in a table. No intrinsic meaning also implies that we can not use this ID value to make business decisions dependend on it.

For example the following sentence should be considered a wrong deduction.
“Employee ID=17 was hired before Employee ID=26 because he/she has a lower ID”.

If we want to make qualified statements, then we must add the needed information to the data. For example add a column “hire_date”. Then we can use it to deduct when an employee was hired and what the order among different employees is.

The main advantage of a surrogate (meaningless technical) key is that the database can use it to ensure referential integrity. And this integrity rule is ensured even if something changes with regards to the business key. Typically business keys do not change. But if it happens, then the relationship is ensured by the foreign key still pointing to the surrogate key. For example we might have an INVOICE table. The business key might be the invoice number. In general this number is immutable, however it could be that there was some typo or scanner fault while the invoice was registered into the system. Using a surrogate key it is possible to change this invoice number without having to change all dependent records (like invoice positions) as well.

One of the best ways to supply values for such a surrogate key column (ID) is to use a sequence object and call the NEXTVAL function (pseudocolumn) on it. We can do that with a database trigger, as an identity column or directly in an insert statement.

 

Usage

standard usage

The standard usage of a sequence simply is to provide values for an ID column in the most performant way.

If you are new to the concept of Oracle sequences, then I suggest to go to livesql.com and try out the next few examples there by yourselfs. Experienced developers might want to skip those basic examples.

A) sequence + nextval on insert

First create a sequence using all default settings. We then use this sequence to provide ID values for our super-employees.


create table super_emp
(id number primary key,
   first_name varchar2(100),
   last_name varchar2(100),
   hire_date date);

create sequence emp_seq;

Then call nextval directly in an insert statement


insert into super_emp (id, first_name, last_name, hire_date)
values (emp_seq.nextval, 'Peter', 'Parker', trunc(sysdate));

insert into super_emp (id, first_name, last_name, hire_date)
values (emp_seq.nextval, 'Clark', 'Kent', trunc(sysdate));

1 row inserted.

1 row inserted.

The NEXTVAL pseudocolumn was used directly in the values section of the insert statement.

B) before row insert table trigger

Create a table trigger that fires during insert (pre 12c solution)

create or replace trigger trg_emp_bri
  before insert on super_emp
  for each row
begin
  if inserting then
    if :NEW.ID is null then
      :NEW.ID := emp_seq.nextval;
    end if;
  end if;
end;
/

The Oracle SQL Developer has a very nice wizard that helps to quickly create such a trigger. The table context menu (rightclick) has an entry to create a PK trigger based with a sequence. It creates a trigger very similar to the one above (I removed a select from dual in favour of a direct assignment).

Then insert into the table using either a NULL value or without the ID column.


insert into super_emp (id, first_name, last_name, hire_date)
values (null, 'Tony', 'Stark', trunc(sysdate));

insert into super_emp (first_name, last_name, hire_date)
values ('Bruce', 'Wayne', trunc(sysdate));

1 row inserted.

1 row inserted.

This is very nice. The application code that does the insert does not need to bother with the name of the sequence.

The trigger fires once FOR EACH ROW that is inserted. The code executes slightly BEFORE the row data is inserted. Before row triggers are typically used to set default values for columns or do some more complicated checks. After row triggers also exists. They are usually used for monitoring purposes, like writing data into an audit trail.

C) Use the sequence in the column definition (since 12c)

Since 12c we have two new options. Create a column AS an IDENTITY column or set the default value for the column to sequence.NEXTVAL. Both options can be configured to work only ON NULL. In case of an identity column, Oracle will automatically create a sequence. More about this in the chapter “identity columns”. Here is an example using the default setting.

The table trigger from B) is not needed anymore, so we can drop it.

alter table super_emp modify id default on null emp_seq.nextval;

drop trigger trg_emp_bri;

Then run the inserts.


insert into super_emp (id, first_name, last_name, hire_date)
values (null, 'Diana', 'Prince', trunc(sysdate));

insert into super_emp (first_name, last_name, hire_date)
values ('Steve', 'Rogers', trunc(sysdate));

1 row inserted.

1 row inserted.

Before 12c it was not possible to use pseudocolumns or non-deterministic functions like sysdate as a default value for a column. With 12c this is possible now. The result is the same as with a before row trigger, but usually it is noticably faster when we insert multiple rows.

 

Check the results

select id, first_name, last_name from super_emp;

ID FIRST_NAME LAST_NAME
1 Peter Parker
2 Clark Kent
3 Tony Stark
4 Bruce Wayne
5 Diana Prince
6 Steve Rogers

All inserts were done successfully. All three methods work.

 

18c create sequence parameters

create sequence syntax diagram 18c

syntax diagram

Some basic stuff first

Here we go through the different parameters. Behind some of those are very complex concepts. If so, those concepts are explained in a later section. This basic section tackles the way how to set the parameter and the immediate effects of setting or not setting it.

INCREMENT BY vs. START WITH

START WITH says what the very first value will be. It can be negative.

INCREMENT BY says how the next value will be calculated. It can also be negative but not 0.

The syntax diagram is slightly misleading. It gives the impression as if we can only specify one during the creation. Either INCREMENT or START WITH, but not both. This is not true, we can create a sequence and specify both. The default for both is 1.


create sequence testseq increment by 10 start with 2;
select testseq.nextval from dual connect by level <= 3;

NEXTVAL
2
12
22

Other parameters like CYCLE and NOCYCLE can not be specified both at the same time. The syntax diagram is correct for those.

For the reminder of this document, we assume the increment is always 1 (unless clearly mentioned otherwise)

Note that we can not alter the START WITH value, but we can alter the INCREMENT BY.

Hint: The undocumented RESTART clause allows to set a new START WITH value. See section about “How to reset a sequence”.

MAXVALUE and MINVALUE

Typically we don’t have the need to set those two parameters, the defaults are good.

Facts

  • MAXVALUE and MINVALUE specify the highest and the lowest possible value a sequence can have. MAXVALUE must be greater than MINVALUE.
  • NOMAXVALUE (1028-1) is the highest possible value.
    NOMINVALUE -(1027 -1) is the lowest possible value.
  • The default for MAXVALUE is NOMAXVALUE.
    The default for MINVALUE is 1.
  • The MINVALUE can not exceed the value defined by START WITH. Or otherwise we get an error.
    ORA-04006: START WITH cannot be less than MINVALUE

CYCLE vs. NOCYCLE (default)

CYCLE specifies, that the sequence after it reached the MAXVALUE, will start again with the MINVALUE (not with the START WITH value). The theoretical maxvalue of a sequence is 28 digits. It is a bit less with scalable sequences.

Nowadays there is no compelling reason to use CYCLE.

I believe in the old days (1990 – Oracle 7) disc space was still a premium commodity. Therefore number columns were often limited to a low number of digits (5 or 6). Under certain specific circumstances a cycling sequence then might have been useful to prevent numeric or value errors. Those days are gone.

CACHE (default) vs. NOCACHE

Caching a sequence is a huge performance feature. The default setting is CACHE 20, which is good for most scenarios.  It means 20 sequence values are read from shared memory (SGA) instead from hard drive. And after that the dictionary will be updated one time.

See the section about caching considerations for more information about this very important parameter.

Demo:

create sequence mySeq cache 1000;
select sequence_name, cache_size, last_number
from user_sequences
where sequence_name ='MYSEQ';
SEQUENCE_NAME CACHE_SIZE LAST_NUMBER
MYSEQ         1000       1

select myseq.nextval from dual connect by level <= 3;
NEXTVAL
1
2
3

select sequence_name, cache_size, last_number
from user_sequences
where sequence_name ='MYSEQ';
SEQUENCE_NAME CACHE_SIZE LAST_NUMBER
MYSEQ         1000       1001

After this, we still have 997 cached sequence values.

The default value of cache 20 is a kind of sweet spot for OLTP purposes. Only when you have the need to create a very large number of sequence values in a short time, then consider to increase the cache. This typically happens during data load situations. Don’t forget to lower the cache value again after the data load is over.

 

ORDER vs. NOORDER (default)

It is a common misconception that we need to specify ORDER to get ordered values from a sequence object. The sequence object will always produce ordered values! Oracle did not implement some kind of random mechanism. Sequence.nextval will always give you the last value + the increment. Any kind of “randomness” comes from other things, like that you seem to have no control over who fetched the last value (multi user), when was that value inserted (seq.nextval call < insert time < commit time) and lost sequence caches.

The ORDER setting is only relevant in a RAC (Real Application Cluster) environment. And even there it should always be NOORDER (the default). Read the chapter about the performance considerations for an explaination.

Remember

ORDER on RAC = slow

ORDER on non-RAC = no effect

easy.

 

KEEP vs. NOKEEP (default)

This is a switch that most database developers will never need. It might be more relevant for Java developers.

In 12.2 a new feature called application continuity was introduced. It allows to capture and replay a certain workload on the database. It comes with the license options for RAC or Active Data Guard.

Problem is that a call to sequence.nextval would deliver a new value. This is not wanted for REPLAY purposes. Altering a sequence to KEEP would provide the same sequence value during the replay.

From the appendix of Oracles White paper about Application continuity:

Mutable Functions
Mutable functions are functions that can change their results each time that they are called. Mutable functions can cause replay to be rejected because the results visible to the client can change at replay.
Consider sequence.NEXTVAL that is often used in key values. If a primary key is built with a sequence value and this is later used in foreign keys or other binds, the same function result must be returned at replay.
Application Continuity provides mutable value replacement at replay for Oracle function calls if GRANT KEEP or ALTER.. KEEP has been configured.
If the call uses database functions that support retaining original mutable
values, including sequence.NEXTVAL, SYSDATE, SYSTIMESTAMP, and SYS_GUID, then, the original values returned from the function execution can be saved and reapplied at replay. If an application decides not to grant mutable support and different results are returned to the client at replay, replay for these requests is rejected.

Important to remember is that the KEEP parameter during creation has nothing to do with keeping the sequence pinned in the SGA. An example how to do that is in the “discussion about gapless IDs” section.

SCALE vs. NOSCALE (default)

SCALE is a very interesting new setting. It allows to improve the clustering factor of the index on the ID column. More details about that in the performance section.

Useing SCALE adds the session ID (SID) to the beginning of the sequence value.

SCALE has two options EXTEND and NOEXTEND (default). See how it works and differs.

 

create sequence myseq;

Sequence MYSEQ created.

                               NEXTVAL
--------------------------------------
                                     1
alter sequence mySeq scale;
                               NEXTVAL
--------------------------------------
          1017670000000000000000000002

alter sequence mySeq scale extend;
                               NEXTVAL
--------------------------------------
    1017670000000000000000000000000003

For sake of brevity I removed the “sequence altered.” results and “select mySeq.nextval from dual;” calls.

My session ID in this demo was 767. The 101 is the instance ID (1) + 100. So in a RAC environment, this will ensure that the values provided by different nodes will not clash. On non-RAC systems this leading part should always be 101.

NOSCALE gives us a normal sequence value of 1.
SCALE NOEXTEND gives us a sequence value of 28 digits (MAXVALUE) with a 2 at the end (last value+increment by) and a 101767 at the beginning.
SCALE EXTEND gives us a sequence value of  28+6 digits with a 3 at the end (last value+increment by) and a 101767 at the beginning.

So EXTEND adds the additional digits on top of the MAXVALUE setting, whereas NOEXTEND adds it inside the range defined by MAXVALUE.

In most circumstances – if we consider scalable sequences – we should use SCALE NOEXTEND. Just to be sure, that the generated value still does fit into the table column. For very large tables if there are already some extremly high values, we might need to use EXTEND, but I expect this situation to be very rare.

When is this useful? Only for cases when extrem performance matters. So for large or very large tables, with a lot of inserts from multiple sessions (parallel inserts).

SESSION vs. GLOBAL (default)

Sequence values do not depend on a user session. Every call to sequence.nextval will give the next incremented value regardless of which session executed this. This feature ensures that nobody gets a duplicate key.

User/Session A calls mySeq.nextval 3 times and gets 1,2,3.
User/Session B calls mySeq.nextval 3 times and gets 4,5,6.
If both sessions fetch the values almost simultaniously then A might get 1,3,4 and B might get 2,5,6. Notice that there might be gap from the perspective of a single session, but the values are still ordered for each session.

With SESSION sequences this behaviour changes. Session A calls mySeq.nextval three times and gets 1,2,3. Session B calls mySeq.nextval three times and also gets 1,2,3. The values are not shared between sessions.

Where do we need this? Only for global temporary tables (GTT). The data in a GTT  persists for the duration of a session (alternatively until commit) and then is gone. Same behaviour for the SESSION sequence – the generated sequence values only persist for the duration of the session.

 

Practical considerations

For most cases the default settings are perfect.

Only if you encounter issues (performance  or unusual number of gaps) or if your data has some special scenarios (batch ETL jobs, very large number of rows) then you should start thinking about tinkering with the default settings.

The following sub chapters discuss common questions and show cases how to work with sequences to solve some typical tasks.

How to avoid reusing the same ID in different dbs

Sometimes we have a distributed database. Especially for global companies each region might have its own database. The data for the different regions still needs to be comparable. And sometimes the data will be consolidated or exchanged. In such cases it helps, if the ID values do not overlap.

One way to do that is to use the INCREMENT parameter. On database 1 we use a sequence object starting by 1 and and increment of 10. So this will give IDs like 1, 11, 21, 31, 41,….

create sequence testseq increment by 10 start with 1;

On Database 2 we use a sequence starting with 2 and increment 10. This will work up to 10 regions. So this will give IDs like 2, 12, 22, 32, 42, ….

create sequence testseq increment by 10 start with 2;

Result is that those values do not overlap. There are other (and possibly better) ways to solve the situation, like sys_guid(). But this is a fairly easy and stable concept.

Caching considerations

If the sequence is used very infrequently, then you can set it to NOCACHE. For example if you have an staff table; I don’t expect that new personell is hired every second. Typically it will be a few people per month (depends on the size of your company of cause). For such low frequency inserts performance doesn’t matter. You can set the sequence object to NOCACHE or to a very low cache value. However if you do a large data import, consider to increase the cache size before running that data load.

Does setting a larger cache size need more SGA memory?

No.

Or to explain it with Tom Kytes words

All we need to keep in the cache is:

the sequence on disk was N
the cache size is M
the current value is X

As long as X is less than N+M – we just increment X when someone calls NEXTVAL.

we do not need to keep in the cache “N, N+1, N+2, … N+M-1”, we just keep N, M and X and increment X when someone asks for a new sequence value. When X=M, we update SEQ$ and reset N in the cache.

So, cache 1000 and cache 20 take the same amount of space in the cache.

How to reset a sequence?

There are three general ways to set a sequence to a different value.

  1. Call sequence.NEXTVAL so often until you reach the target value
  2. Manipulate the increment parameter using a negative increment. Call nextval once. Reset.
  3. Restart the sequence (new undocumented feature)

The first way usually is not practical. A noticable exception might be, if you manually added some data without using the sequence and you want to jump over those few values.

If you want way 1, then the CONNECT BY LEVEL clause helps to do it quickly.


select myseq.nextval from dual connect by level <= 996;

 

And here is a demo for way 2:
Preparation setup


drop sequence mySeq;
create sequence mySeq cache 1000;
set autotrace traceonly statistics
select myseq.nextval from dual connect by level <= 996;
set autotrace off
select myseq.nextval from dual;
   NEXTVAL
----------
       997

The “set autotrace traceonly” command works in sql*plus. I used it here to avoid printing 996 values onto the screen. It is not relevant for the demo itself.

The current value now is 997 but we want that the next call to nextval should give us 1.

Now reset the sequence.

alter sequence mySeq increment by -996 nocache;
select myseq.nextval from dual;
alter sequence mySeq increment by 1 cache 1000;

After this code, the very first session that calls myseq.nextval, will see 2 as the value returned.

If we really need to see 1 we also must lower the MINVALUE. Because INCREMENT BY can not result in anything lower than the MINVALUE (ORA-08004: sequence MYSEQ.NEXTVAL goes below MINVALUE and cannot be instantiated).

alter sequence mySeq increment by -997 nocache minvalue 0;
select myseq.nextval from dual;
alter sequence mySeq increment by 1 cache 1000;
select myseq.nextval from dual; 

Notice that we incremented now by -997 instead of -996 and that we are calling nextval twice. We can not reset the minvalue to 1 during the second ALTER sequence command, because that also would violate the rules (ORA-04007: MINVALUE cannot be made to exceed the current value). Easyiest solution is to let it stay at 0.

Using NOCACHE is important, to avoid having issues with the stored last_value. Also check the increment by and the cache setting, before you alter the sequence. If the increment by is different, then you need to change the above code and probably need to call nextval a second time.

In 18c we got a third option to reset a sequence – the RESTART option.


ALTER SEQUENCE mySeq RESTART;

This is currently undocumented.

Thanks to Roger Troller for makeing me aware about it (Blog).

I tested it a little bit further and found out two more things.

  • We can already use it in 12.2.0.1. Which makes sense, since 18c is really just 12.2.0.2.
  • And we can combine it with the START WITH clause.

So the following works !

alter sequence testseq_20 restart;
VAL1
1
alter sequence testseq_20 restart start with 15;
Sequence TESTSEQ_20 altered.
select testseq_20.nextval val1 from dual;
VAL1
15

Very convinient. This should be the preferred way to reset a sequence whenever you need to do that.

Not recommended is to drop and recreate the sequence. While this will also allow us to set a new START_WITH value, it has a major drawback. All references to the seqeunce are then broken. Especially all privileges are lost, like GRANT SELECT on #sequence to #schema.

 

Can we use an ID from a sequence to order by insert time?

Short answer no. The order of inserts and the order of sequenced values often match but are not guaranteed to match.

Detailed answer: Usually it works.

I very often use a ID column filled by a sequence as a second order criteria. For example I typically sort a logging table – where trace information is written – by the insert date and the LOG_ID (sequence based PK ).

order by insert_date desc, log_id desc

The insert date (if it is a date) is only accurate to the second. Even if it is a timestamp there might have been multiple inserts at the same fraction of a second. The log_id is a perfect second order criterium.

We can safely assume, that the inserts that were done from the same session, have ordered sequence values. There might be gaps, but the sequence values will be produced in the same order as we did the inserts. There can be ID values in between, that are from a different sessions. But for trace log information, usually it does not matter if a different session is ordered before or after our session. However the data from one session should be correctly ordered. And this is guaranteed.

Is cycling useful?

I never had the need of cycling sequences. I firmly believe if you think you need those, you have a much deeper problem somewhere else. It would probably better to solve that problem, instead of useing a cycling sequence.

With 18c we get SESSION sequences. For some cases where CYCLE was considered in the past, a SESSION sequence might be the better choice.

Also ROWNUM and the analytic function ROW_NUMBER can be used to create consecutive values at time of select, instead of a sequence providing those values at time of insert.

Discussion of gapless IDs

This is a problem/question I often encountered: How to make a sequence gapless?

TL;DR: You don’t need to. The effort and the restrictions to make an ID column gapless is to high in (almost) all use cases.

The sequence object can and does provide gapless numbers. In a multi user environment we just can not reliably use the provided values to store them in a gapless way. Even in a single user environment, the stored IDs could be deleted. So one consequence of the gaplessness requirement, would be to forbid delete operations.

The main point is that almost all the things that will create “holes” in an ID column are under our control. It is not the Oracle database that can not provide gapless sequences. It is the complexity of the business rules combined with performance requirements in a multi user environment, that make it almost impossible to have an ID column without potential gaps.

Performance + Multi User + Gapless IDs build a triangle of goals that exclude each other. We can not reach all three goals at the same time, one needs to suffer. However those goals do not react in the same way, when we sacrifice one a tiny bit. So let’s investigate what happens then:

We still can not reach good performance (instead of very high performance) if we need multi user capability and gaplessness at the same time. To enable this we need to serialize access to the whole table. Which in turn means only one session can write into the table and all other sessions will need to wait until the other session finishes the whole transaction.

We can have very high performance and gaplessness if we only have a single user. But as soon as a second user wants to write at the same time, we need to introduce severe serialization of the whole transaction, just to ensure gaplessness. And this means performance drops immensely. Btw. this is how MS Access works. Only one user can write into the so-called database.

But we can get almost gapless IDs and still have very high performance for multiple users. Almost gapless means, we sometimes might have gaps in our sequence, but this situation is rare. This is the default behaviour of Oracle sequences.

How do we get gaps in our IDs?

a) a record in our table was deleted.
b) the insert run into an error
(remember sequence.nextval is called a tiny moment before the insert is executed).
c) Somebody called sequence.nextval but didn’t use the value.
d) The sequence cache was lost. One way how this happens is if the database decides that other objects need to be in SGA memory and the sequence wasn’t called for a longer time.

By pinning the sequence we can avoid situations where the sequence cache ages out of the shared pool. A better alternative is to size the shared pool appropriately, so that in general sequence caches will not age out of it.

execute sys.dbms_shared_pool.keep(owner.mySequence,'Q');

This still doesn’t guarantee gapless IDs, but for most use cases it would be good enough.

The oracle docs about skipping cached numbers:

18.1 Database Admin Guide – Managing Sequences
The database might skip sequence numbers if you choose to cache a set of sequence numbers. For example, when an instance abnormally shuts down (for example, when an instance failure occurs or a SHUTDOWN ABORT statement is issued), sequence numbers that have been cached but not used are lost. Also, sequence numbers that have been used but not saved are lost as well. The database might also skip cached sequence numbers after an export and import. See Oracle Database Utilities for details.

A normal or immediate shutdown of the database will not loose sequence numbers. Instead the database will update the data dictionary (table sys.seq$) with the last used value.  Unfortunatly most DBAs prefer to shutdown a database using abort, since they don’t bother enough about user sessions.

Sequence Performance

Why is a sequence fast and how can we use it in the most performant way?

Oracle invented sequences with performance in mind. They provide a way to create surrogate keys in a multi user environment while minimizing serialization.

Basic working of an Oracle sequence

A sequence is just an entry in the dictionary table sys.seq$.

desc sys.seq$;

Name       Null?    Type 
---------- -------- ------------ 
OBJ#       NOT NULL NUMBER 
INCREMENT$ NOT NULL NUMBER 
MINVALUE            NUMBER 
MAXVALUE            NUMBER 
CYCLE#     NOT NULL NUMBER 
ORDER$     NOT NULL NUMBER 
CACHE      NOT NULL NUMBER 
HIGHWATER  NOT NULL NUMBER 
AUDIT$     NOT NULL VARCHAR2(38) 
FLAGS               NUMBER 
PARTCOUNT           NUMBER

The highwater column is the same as the last_number column in the view user_sequences.

When a sequence fetches a new sequence value (using .nextval) then the dictionary table needs to be read and the row needs to be updated with the new value. Now if multiple sessions do that, then one would have to wait for the other. This is called serialization. To avoid that issue, Oracle uses two clever mechanisms.

  1.  The dictionary table is updated using an autonomous transaction. So the value is stored and other sessions can see it, even if the main transaction (from the user session) is not finished.
  2. The new highwater value that is stored, is not the next value, but it is the value + the cache. Any call to sequence.nextval will first read from the sequence cache and only once the numbers there are exhausted, it will read from the table and update it.

It is of cause possible to write a similar mechanism ourselfs with our own table. And I have seen projects where they did exactly that. But it is very hard to do properly and even then will not beat the performance of the original sequence.  So you would need a very special business case to justify writing your own sequence mechanism.

Speed it up

If you aim for maximum performance there are some considerations to do.

  1. You must use a sequence cache. The cache size also plays an important role. For most OLTP tables the default setting of cache=20 is a very good choice. However when you do large dataloads, then a much larger cache size is advisable. There is a diminishing returns effect. Doubleing the cache does not double the performance.
  2. On a RAC you really should use NOORDER. The ORDER keyword is only relevant for real application clusters. Using ORDER would try to synchronize the sequence caches over all cluster nodes. This is extremly bad for the performance. Useing NOORDER gives each RAC node a separate sequence cache. Which also means that an insert on node1 might have sequence value 1 and the next insert on node2 might have a sequence value 1001. The third insert on node1 again would use value 2.
  3. Sequences should be used as late as possible. There is usually no need to fetch a sequence value first and then do the insert later. Use the sequence while doing the insert. Either by adding it to the insert statement, or via a database trigger or since 12c as an identity column or a default on NULL column setting. Using the 12c mechanics allows to avoid the database trigger. This results in much better performance, as I have shown in a previous blog post.
  4. Consider scalable sequences for large tables if you are on 18c already. The effect can not be seen immediatly, but scalable sequences should give a better and more stable performance in the long run.
    For small and medium sized tables I expect scalable sequences to be slower than non-scalable sequences (because a bigger number needs to be stored). I didn’t test the effect, but a normal (small) sequence value only needs 2-6 bytes, wheras a scaled sequence value needs always 15 (NOEXTEND) or 18 (EXTEND) bytes. These bytes are used by the table column, the unique index that supports the primary key (PK), all foreign key (FK) columns pointing to the PK and the indexes supporting those FKs.

 

If you need the value of the sequence later in your code again you can either use .currval (not recommended) or use the returning clause to give you the generated ID.

best practice: returning clause

Several SQL and PL/SQL DML commands have a returning clause. It allows to get back data that is created or manipulated while the DML (insert or update) is running.

The most common usage is to return the ID value, that is filled by a database trigger (or an identity column) so that this ID can now be used furthermore in the same session or transaction or to be returned back to the client. For example to insert any child records or to show the freshly generated record in a GUI report.


insert into super_emp (first_name, last_name, hire_date)
values ('Bruce', 'Banor', trunc(sysdate))
returning id into :ID;

print :ID;

ID
--------------------------------------------------------------------------------
9

The returning clause is more typical in pl/sql. Here is an example using a record of %rowtype. We can even return the generated ID value directly into the record.

declare
r_super_emp super_emp%rowtype;
begin
r_super_emp.first_name := 'Hal';
r_super_emp.last_name := 'Jordan';
r_super_emp.hire_date := trunc(sysdate);

insert into super_emp
values r_super_emp
returning id into r_super_emp.id;

sys.dbms_output.put_line('New ID = '||r_super_emp.id);
end;
/
New ID = 10

 

Identity columns

Identity columns and Default on null are a great enhancements in db version 12.1.

It allows us to use a sequence as late as possible (while inserting). But without the need for a before row insert table trigger. This improves insert performance dramatically. A trigger is plsql based. It runs during the execution of a SQL DML statement (insert). Because of that a context switch from the SQL to the PL/SQL engine (and back) is needed. If we can avoid the trigger completly we can avoid the context switch and this will improve performance considerably.

I made some tests and under very favourible circumstances (nothing else inserted but the ID) the insert performance was 900% faster using IDENTITY or DEFAULT columns instead of a trigger.

With DEFAULT ON NULL we would still create the sequence by ourself. Which also means we know the name. With IDENTITY the sequence is automatically created and maintained by Oracle.

The name of the generated sequence will always begin with “ISEQ$$_”.

demo


create table test
( id number generated by default on null as identity (start with 20) primary key
);

select table_name, column_name, identity_column, default_on_null, data_default
from user_tab_columns;

TABLE_NAME COLUMN_NAME IDENTITY_COLUMN DEFAULT_ON_NULL DATA_DEFAULT
TEST ID YES YES “SCHEMANAME”.”ISEQ$$_10707661″.nextval

 

Drawbacks

It can be problematic to use identity columns over a database link. Especially if the ID value is needed. The main issue is that the returning clause does not work over a db link and there are no good alternatives for identity columns. This works slightly better with “default on null”. We know the sequence object and can use it to fetch the id value over a DB link first and use it then later for the insert. Not performant at all, but it works.

We also can not directly alter an existing ID column into an IDENTITY column. Although it is possible to modify an existing identity column (for example switching between generated always and generated by default on null).

Another minor inconvinience is that the system generated sequence will still be there when the table is dropped. At least as long as the table is still in the recycle bin.

There were also some other very special bugs using identity columns. All have workarounds, but my experience is, that default on null is slightly less error prone.

 

Index contention and Scalable Sequences

Scalable sequences where secretly introduced in 12.1.0.1 but only documented in 18.1.

Richard Foote did a three part series about scalable sequences that covers all you need to know.

The basic problem has to do with index contention.

To give a very brief explanation: when we have an ID column that is inserted using a sequence the index -over time- will become unbalanced. Because new values will only be added to the right side of the index leaf block splits will happen there frequently. Sometimes it will be 50-50 block split and the space in those blocks usually will not be filled up.  This eventually leads to a heavily right (un)balanced index tree.

Such a block split is a very ressource intensive operation and other sessions will need to wait for it. If you see a high number of “enq: TX – index contention issue” wait events (check MOS 873243.1) the reason could be those index block splits.

One workaround for the index contention problem in the past was to use a REVERSE KEY index. But this created other performance problems, like the CBO will not do any range scans on that index.

Scalable sequences are a slightly better solution to avoid those index contention issues (hot index blocks). Because they have the session ID in front of the number, values provided by a scalable sequences are distributed more evenly over the index. At least as long as multiple sessions do the insert.

 

Export and Import

consistency issues

When you do an export of a database or a schema it is crucial to do a time consistent export using

exp ... consistent=Y ...

Why? Otherwise the sequence object including the current value as start with is exported first. And later the tables with their data. Which means, that in between some session could call sequence.nextval and use up a value. You won’t notice the issue during import. But as soon as an insert in the imported schema happens, you will get an dup_val_on_index error, because the table has an ID value already, that the sequence generator just provided.

sys warning

Consistent=Y does not work as SYS. So never export data as SYS! The reason is that sys can not do read only transactions. Using SYSTEM is fine.

datapump

For datapump the equivalent to consistent=Y is the flashback parameter.

expdp ... flashback_time=systimestamp ...

There is also a flashback_scn parameter. Both do a time consistent export.

And since 11.2 there is a legacy mode for datapump, which allows to use consistent=Y (it is rewritten into the flashback_time parameter).

 

Other ways to generate ordered numbers

Sometimes a sequence is not the best way to generate ordered numbers. For example when we want to sort entries from a detail table based upon their parent keys. Each detail record should start with 1 for each parent entry. A sequence is not the proper tool to get such values.

Alternatives are ROWNUM, the analytical function ROW_NUMBER() and certain ways to create lists in SQL, for example by using hierachical queries with CONNECT BY.

 

 

Further reads

 

 

OTN/ODC Appreciation Day 2018 – Thanks for the mutating table error!

This is not sarcasm!

What is the mutating table error?

I firmly believe the mutating table error is a good thing. For anybody who does not know what I am talking about here is an example.

There are various ways how the error can happen. A before row trigger is probably the most typical scenario.

Since Halloween is getting close, I need a table to organize the personell for my evil lab. I setup a table that holds each member devided by departments. Only one labhead per department is allowed. This integrity rule is enforced by table triggers.
maniac_mansion_scientist

-- build demo
create table swe_test_mutating
  (id number not null primary key 
  ,name varchar2(1000) not null
  ,department varchar2(254)
  ,is_head_of_department varchar2(1) 
     constraint yn_ck check (is_head_of_department in ('Y','N'))
  );


-- trigger to make sure that there is only one head of department.
create or replace trigger swe_test_mutating_bri
  before insert or update on swe_test_mutating
  for each row 
declare
  v_dummy varchar2(1);
begin
  if :new.is_head_of_department = 'Y' then 
    -- check if there is at least one other person 
    -- for the same department
    -- which is also marked as HEAD
    begin
      select is_head_of_department
      into v_dummy
      from swe_test_mutating
      where id != :new.id -- must be different 
      and department = :new.department -- must be same department
      and is_head_of_department = 'Y' -- is head
      and rownum <= 1  -- enough if we find one
      ; 
      
      raise_application_error(-20000, 'There can only be one head of '||:new.department||'!');
    exception
      when no_data_found then
        -- ok. Single head of department only
        null;
    end;    
  end if;
end;
/

Trigger SWE_TEST_MUTATING_BRI compiled

add some data


-- add some personell to the evil lab
insert into swe_test_mutating
values (1, 'The master', 'LAB05', 'Y');

1 row inserted.

insert into swe_test_mutating
values (2, 'The apprentice', 'LAB05', 'N');

1 row inserted.

insert into swe_test_mutating
values (3, 'Doc Holiday', 'LAB05', 'Y');

ORA-20000: There can only be one head of LAB05!
ORA-06512: at "EVIL_LAB.SWE_TEST_MUTATING_BRI", line 17
ORA-04088: error during execution of trigger 'EVIL_LAB.SWE_TEST_MUTATING_BRI'

This shows, the trigger worked for single row inserts. The logic inside the trigger prevented us from adding a second head for one department (LAB05).

Now a test with multi row inserts.

insert into swe_test_mutating
(
select 4, 'Mr. Sissorhands', 'LAB07', 'Y' from dual union all
select 5, 'Nagging Nanny', 'LAB07', 'N' from dual union all
select 6, 'Evil Clown', 'LAB07', 'Y' from dual
);

ORA-04091: table EVIL_LAB.SWE_TEST_MUTATING is mutating, trigger/function may not see it
ORA-06512: at "EVIL_LAB.SWE_TEST_MUTATING_BRI", line 8
ORA-04088: error during execution of trigger 'EVIL_LAB.SWE_TEST_MUTATING_BRI'

So what happend? We got an error (ORA-04091) saying, we are not allowed to work with the same table that is currently modified.

We would have gotten the same mutating error even if the select part has only one row. If there is an insert..select then it is considered a multi row insert and then the mutating table error bites.

Why is the mutating table error a good thing?

There are multiple reasons

1) The error prevents developers from making stupid mistakes.

I like to compare it to a safety catch. On a pistol it prevents the developer from shooting their own foot. On a climbing rope it prevents the developer from plummeting to death, if something breaks loose. If you remove the safety catch on purpose, then it is your own risk. Only remove it if you are absolutly sure what you are doing.

Without the error it is easy to create endless loops. Think about what happens, if we do an update on the same table in an after update row trigger. This update would again trigger another update which would trigger yet another update and so on. Eventually we run out of disk space (archive/redo logs) or something else dramatic happens.

The mutating table error stops us (developers) from creating such an endless loop or making other mistakes (that mostly have to do with read consistency).

2) The Error and I have a little bit of history together.

In one of my first big projects almost all tables had validity columns (valid_from and valid_to). We used triggers to make sure that certain validity rules were also considered over relationships.

For example if the validity of a parent record was extended, then the validity of a detail record needed to be extended too – but only, if it didn’t clash with any of the other detail records. Very complex rules.

Those triggers always ended up with mutating or constraining table errors, so in the end the whole logic needed to be moved to after statement triggers. It was a hell to maintain.

Nowadays I would not put all those logic into triggers, instead I would use packaged low level APIs that also make sure have all the integrity rules in place. The code would look so much cleaner with that.

I believe having gone through this experience, made me a better developer.

3) The error is complex to understand.

I do understand it. Which gives me the chance to educate about it. Sharing and passing on knowledge is a fun and highly satisfying thing to do. As is writing such a blog post.

Error Darwinism

Interestingly errors seem to comply to the laws of natural selection.

Here is what I noticed how the mutating table error has evolved over time.

  • It now (since 10g I believe) bites only if more than one row is inserted (or updated or deleted. Which is a very hideous thing to do. Novice developers test using single row inserts. Experienced developers know they should use bulk inserts and test accordingly.
  • The little brother of the mutating table error was the constraining table error.
    AskTom: “A mutating table is a table that is being modified. A constraining table is one involved in declaritive referential integrity with the mutating table.”

    Evolution took its toll on it. The constraining table error now is rarly seen in the wild. I asked around a bit, and there were no developers that found it recently. Most didn’t even know it existed. We might consider it to be extinct.

  • Compound triggers appeared in 11.1. They help to deal with the error in a quick and not so dirty way. So they are a kind of master predator for this error.
  • The recommendation about the errors have changed. Certain tools, like Oracle SQL developer show additional information if an error happend. In the case of the ORA-04091 those additional notes included something about “pragma autonomous transaction”. This hint was very misleading. Nowadays the misinformation is gone.
    It was like a mosquito (bug) infection in a swamp. The swamp is now dried up, so we see less mosquitos feeding upon that error.

How to avoid the error?

Anyone who tries to avoid the mutating table error by using the PRAGMA AUTONOMOUS TRANSACTION creates another bug. For a while the error message was misleading and trapped novice developers into thinking that this pragma is the right way to circumvent the issue. It is not. Bug count >= 2 after that pragma.

So how to correct it? We move the logic to the after statement trigger. In the past we needed 3 different table triggers to do it correctly. A before statement trigger to initialize a package variable, a row trigger to store the IDs of all records that were touched by the DML and an after statement trigger to do the actual work based upon those stored IDs. By using a compound trigger we can combine those three triggers and make the code look much cleaner.

Here is one way how to do it.

-- compound trigger to make sure that there is only one head of each evil lab (department).

create or replace trigger swe_test_mutating_comptrg
  for insert or update on swe_test_mutating
  compound trigger 

  -- define collection type
  type ID_t is table of swe_test_mutating.id%type 
    index by binary_integer;
  -- define variable to hold a list of IDs
  IDs ID_t;
  
before statement is
begin
   -- no initialization needed for compound trigger
   null;  
end before statement;

after each row is
begin
  -- check if we have a new head of department
  if :new.is_head_of_department = 'Y'
  and (:old.is_head_of_department is null -- insert
       or :new.is_head_of_department != 'Y' ) then

     -- remember ID
     IDs(IDs.count + 1) := :new.id;
  end if;     
       
end after each row;

-- main logic
after statement is
  v_department swe_test_mutating.department%type;
begin
  if IDs.count > 0 then 
    -- check if there is at least one other person 
    -- for the same department
    -- which is also marked as HEAD
    for i in 1..IDs.count loop
      begin
        -- since we didn't store the department, we have to reread it from the table
        select t1.department
        into v_department
        from swe_test_mutating t1
        join swe_test_mutating t2 on t1.department = t2.department -- must be same department
        where t1.id = IDs(i)
        and t1.id != t2.id -- must be different id
        and t1.is_head_of_department = 'Y' -- remembered ID is head
        and t2.is_head_of_department = 'Y' -- second personell is also head
        and rownum <= 1  -- enough if we find one
        ; 
        
        raise_application_error(-20000, 'There can only be one head of '||v_department||'!');
      exception
        when no_data_found then
          -- ok. Single head of department only. 
          null;
      end;    
    end loop;
  end if;
end after statement;
end;
/

-- remove the old trigger
drop trigger swe_test_mutating_bri;

If we now do a multi-row insert we will not see the mutating table error. But instead the error message that we want to see (only 1 head allowed).

— repeat the test!
insert into swe_test_mutating
(
select 4, 'Mr. Sissorhands', 'LAB07', 'Y' from dual union all
select 5, 'Nagging Nanny', 'LAB07', 'N' from dual union all
select 6, 'Evil Clown', 'LAB07', 'Y' from dual
);

ORA-20000: There can only be one head of LAB07!
ORA-06512: at "EVIL_LAB.SWE_TEST_MUTATING_COMPTRG", line 50
ORA-04088: error during execution of trigger 'EVIL_LAB.SWE_TEST_MUTATING_COMPTRG'

There are multiple optimisations for this code possible. We also could enfore the integrity rule by using a function based unique constraint. The point however was to show how code that demonstrates how to work around the mutating table error.

Also note, that the usage of table triggers is not only to create error messages, like I did in the example with the “there can only be one” check. Often they are used to store aggregated (=redundant) data in some header tables. And many other scenarios are possible.

Addemdum – integrity not guaranteed

The trigger as I wrote it first does not guarantee the intended data integrity. It avoids successfully the mutating table error. However it does not ensure, that there will only be one labhead.

Here is the scenario to show the problem. If we insert a single labhead from two separate sessions without commiting the first session. Then we end up with two active labheads. Which is not what our business rule says.

-- In session A
insert into swe_test_mutating
(
select 4, 'Mr. Sissorhands', 'LAB07', 'Y' from dual union all
select 5, 'Nagging Nanny', 'LAB07', 'N' from dual 
);

-- do not commit in session A yet!

Now add another head of lab in a different session.

-- session B
insert into evil_lab.swe_test_mutating
(select 6, 'Evil Clown', 'LAB07', 'Y' from dual);
commit;

After commit in session A we can check the results of the table

-- session A
commit;

select * from swe_test_mutating;
ID	NAME	        DEPARTMENT	IS_HEAD_OF_DEPARTMENT
6	Evil Clown	LAB07	        Y
4	Mr. Sissorhands	LAB07	        Y
5	Nagging Nanny	LAB07	        N

Thanks to Alex Nuijten who pointed out this issue in the comments.

On a more abstract level: In a multi user environment, we need to make sure that all data that is needed for a decision is immutable until the transaction is finished. This also means we need to serialize the access to the same lab.

The solution is not so simple.

We can consider to lock all the personell, at least for the lab in question. But this will not prevent a rule violation via an insert.

The proper way is to lock the parent row (the lab) itself. Now in my simplified datamodel we do not have this part. So I first need to extend the datamodel and then do the row locking.

Here is the updated scenario

Improved datamodel with two tables and a FK

drop table swe_test_mutating;
drop table swe_test_mutating_labs;
drop table swe_test_mutating_personell;
-- build lab enhancement
create table swe_test_mutating_labs
   (name varchar2(254) not null primary key 
   ,description varchar2(4000));
   
create table swe_test_mutating_personell
  (id number not null primary key
  ,name varchar2(1000) not null
  ,department varchar2(254) references swe_test_mutating_labs(name)
  ,is_head_of_department varchar2(1) 
     constraint yn_ck check (is_head_of_department in ('Y','N'))
  );

-- We really should create an index on the FK to lessen the chance for a major deadlock scenario.
-- However dealing with the dead is daily business for the evil lab. 
-- So I don't care at the moment.

-- construct some labs
insert into swe_test_mutating_labs (name) 
select 'LAB'||to_char(level,'fm00') 
from dual connect by level <= 10;

commit;

The compound trigger also needed to be changed. The most important thing is that we need to lock the lab in question, before the duplicate check is made.
Improved compound trigger

-- compound trigger to make sure that there is only one head of each evil lab (department).
 
create or replace trigger swe_test_mutating_personell_comptrg
  for insert or update on swe_test_mutating_personell
  compound trigger
 
  -- define collection type
  type ID_t is table of swe_test_mutating_personell.id%type 
    index by binary_integer;
  -- define variable to hold a list of IDs
  IDs ID_t;
   
before statement is
begin
   -- no initialization needed for compound trigger
   null;  
end before statement;
 
after each row is
begin
  -- check if we have a new head of department
  if :new.is_head_of_department = 'Y'
  and (:old.is_head_of_department is null -- insert
       or :new.is_head_of_department != 'Y' ) then
 
     -- remember ID
     IDs(IDs.count + 1) := :new.id;
  end if;     
        
end after each row;
 
-- main logic
after statement is
  v_department swe_test_mutating_personell.department%type;
begin
  if IDs.count > 0 then
    -- check if there is at least one other person 
    -- for the same department
    -- which is also marked as HEAD
    for i in 1..IDs.count loop
      begin
        -- since we didn't store the department, we have to reread it from the table
        select p1.department
        into v_department
        from swe_test_mutating_personell p1
        where p1.id = IDs(i)
        and p1.is_head_of_department = 'Y' -- inserted ID is a labhead
        ;
        
        -- lock the department
        select d.name 
        into v_department
        from swe_test_mutating_labs d
        where d.name = v_department
        for update of d.name;

        -- after we ensured exclusive access to this lab, 
        -- we can test if there is another labhead 
        select p2.department
        into v_department
        from swe_test_mutating_personell p2
        where p2.department = v_department -- must be same department
        and p2.id != IDs(i) -- must be different person
        and p2.is_head_of_department = 'Y' -- second personell is also head
        and rownum <= 1  -- enough if we find one
        ; 

        -- found one! 
        raise_application_error(-20000, 'There can only be one head of '||v_department||'!');
      exception
        when no_data_found then
          -- ok. Single head of department only. 
          null;
      end; 
    end loop;
  end if;
end after statement;
end;
/

We can now repeat the test using the new tables.

— In session A
insert into swe_test_mutating_personell
(
select 4, 'Mr. Sissorhands', 'lab07', 'Y' from dual union all
select 5, 'Nagging Nanny', 'lab07', 'N' from dual
);

Before we commit in session A, we do an insert in session B

— session B
insert into swe_test_mutating_personell
(select 6, 'Evil Clown', 'lab07', 'Y' from dual);

This insert will now WAIT (it "hangs") until the first session finishes its transaction. Essentially we serialized the access to each lab. Only one session at a time is allowed to do changes with regards who is the head.

The result in session B depends whether session A does a COMMIT or a ROLLBACK.

session A: commit;
session B: ORA-20000: There can only be one head of lab07!

session A: rollback;
session B: 1 row inserted.

But this again shows how much trouble we have to avoid and to consider if we decide to a) implement business rules inside database triggers and b) try to avoid the mutating table error.

Thank you mutating table error! We have come a good way since I first encountered you.

Basic SQL: IF..THEN..ELSE

How to write conditional logic in SQL

This is a question that sometimes is asked by programmers who just started useing SQL.

If-Then-Else-diagram.svg
By P. Kemp – own work created using Dia, CC0, Link

Introduction

Programatic 3rdGL languages all have an IF..THEN..ELSE construct.

Pseudocode:

IF #condition# THEN #doSomething# ELSE #doAnotherThing#

Since SQL is a 4thGL language it has no need for conditional execution. This is sometimes surprising for beginners. IF..THEN..ELSE is very procedural thinking.

We do not tell SQL how to reach a certain result, instead we describe the wanted result.

Although a conditional execution is not needed, there are cases that come quite close. If we want to distinguish data dependend on other data, this can be done using conditional logic or conditional expressions.

In general expressions can not #doSomething# but they return a value (=data). SQL is all about data. A conditional expression in SQL is showing different data depending on other data.

SQL constructs

Here is a (non complete) list of different ways how we can write IF..THEN..ELSE in SQL. Be aware that most of those expressions are only ever useful in the SELECT clause of a query. The WHERE clause can do conditional logic simply by using AND+OR+NOT+() combinations. We do not need extra functional expressions there.

The functions towards the end of the list are a bit of an obscure choice for doing conditional logic. However they are useful to remember for some special scenarios.

  • CASE
  • DECODE(Oracle)
  • NULLIF
  • NVL or NVL2 or COALESCE
  • LNNVL
  • SIGN
  • ABS
  • GREATEST or LEAST

Each of those functions have advantages and disadvantages. I will try to give an indication where the usage seems proper.

All functions have a “translated” syntax in the following form.

SQL syntax: procedural code

The following business case is used
Requirement:
If a person is older than 65 years it should get a pension.

Or to say it in more technical terms (specification):
If today the age of a person – based upon its birthdate – is equal or greater than 65, then the retirement flag should be ‘Y’ else ‘N’.

CASE

CASE when a>b then x else z end: If a>b then x else z

The CASE function should always be the first choice. It is the best compromise between brevity and clarity of the expression.

case when add_months(birthday,12*65) >= trunc(sysdate)
     then 'Y'
     else 'N'
end

The add_months function will give us the day when the age of 65 is reached (retirement age). It will also consider some special calendar effects. For example a person born on 29th Feb. 1953 will get the flag already on 28th Feb. 2017.

We have to be very careful and check some special dates to make sure that our logic is the one we are looking for. For simplicity I assume that this logic for calculating the retirement candidates is correct.

CASE also has a second syntax (simple case expression). But this allows only to compare for equality.

CASE a when b then x else z end: If a=b then x else z

The syntax allows to stack multple case statements. But most of the times this is not needed. Instead use several WHEN sections. And we can profit from short-circuit evaluation.

case when a>b then x 
     when a>c then y
     when d

Short-circuit evaluation here means that if a>b=true the next conditions are not checked. Most importantly y, z and q are never calculated. And usually c,d and f also not. There are a few special exceptions. See this forum thread Oracle bug? Aggregates in CASE expressions negate short-circuit evaluation. for a discussion about the behaviour.

Code written using a CASE statement can sometimes get quite lengthy. But it is very close to natural language and therefore easy to maintain. CASE should be the first choice when conditional logic is needed.

DECODE

DECODE(a,b,x,c,y,z): if a=b then x elsif a=c then y else z

DECODE is nice if we need to check multiple values and provide an alternative value instead. I still like to use it in the ORDER BY clause to do rankings.

DECODE is not part of ANSI SQL and can only be used in Oracle databases.

If the condition is more complex than a simple equality comparison, then a combination of DECODE and SIGN can be used. Especially if numeric values play a role.

decode(sign(add_months(birthday,12*65) - trunc(sysdate),
        1, 'Y'
       -1, 'N'
        0, 'Y'))

The expression “add_months(birthday,12*65) – trunc(sysdate)” returns a positive value if the person is older than 65 and a negative value if he/she is younger.

DECODE+SIGN was used very frequently before CASE was introduced in Oracle 8 or 9. You still find it sometimes in older code.

DECODE also can compare with NULL values.

NULLIF

NULLIF(a,b): If a=b then NULL else a

This function returns a NULL value if input A matches Input B otherwise A is returned.

The following (slightly constructed) example will return the birthdate for all persons that reached retirement age. But NULL for those who didn’t. The GREATEST function is used to help us to level the values.

nullif(birthdate,
       greatest(add_months(trunc(sysdate)+1,-65*12),
                birthdate)
       )

It is not so useful for the choosen example. But it is very handy if we want to ignore some values. Especially in combination with aggregation functions, since they ignore NULL values during the aggregation.

NVL or NVL2 or COALESCE

NVL(a,b): If a is null then b else a
NVL2(a,b,c): If a is null then c else b
COALESCE(a,b,c,d): 
   If a is not null then a 
      elsif b is not null then b 
         elsif c is not null then c 
            elsif d is not null then d

NVL is often used for providing a default value, in case no value was found.
If the birthday of a person is an optional field, we might want to consider setting a default (=standard) age, for our calculation.

nvl(birthdate,date '1980-01-01')

The NVL2 syntax is a little less known but it is very useful and should be remembered.

COALECSE is very similar to NVL. It returns the first value that is not null. But it can be used for checking more than one value. A major advantage is that the second and later expressions are only checked if the first expression is NULL (short-circuit evaluation). This can give an performance advantage over NVL if the second expression is expensive. This performance advantage is why many programmers always prefer COALESCE over NVL. However there are also some special compiler optimizations that do kick in for NVL and not for COALESCE. As a rule of thumb I would stick with NVL if we have a simple second value. I use COALESCE if more than one value needs to be checked or if the second value is complex (like a plsql function or a subquery).

return the most recent activity (lastest date) for each shipped order

coalesce(arrival_date, shipping_date, sent_date, creation_date) as latest_date

LNNVL

LNNVL(a=b): if not(a=b) or a is null or b is null then true

LNNVL is a strange and hard to understand function. It negates a condition. It is used internally by Oracle to rewrite certain queries (not in into not exists). Contrary to all other functions LNNVL returns a boolean result and can be used in the where clause .

One usage is if we want to make sure two values are different and still want a result even if one of the values is null. So it can ease the working with NULL values.

This select returns something if the values a and b are different.

select * from dual
where LNNVL(a=b);

SIGN

See decode.

The SIGN function can be used to rephrase a non-equal comparison into an equality comparison.

if a > b then

is (for numbers only) mathematically the same as

If SIGN(a-b) = 1 then

SIGN is a typical helper function for DECODE. Since DECODE can only compare for equality, SIGN helps to enhance that to do more complex comparisons.

ABS

ABS is sometimes used in combination with SIGN. In rare cases it avoids sorting the input data for the SIGN function.

if a != b then

is (for numbers only) mathematically the same as

if sign(ABS(a-b)) = 1 then

Also much easier would be:

not(a=b)

ABS => Not really useful anymore.

GREATEST or LEAST

GREATEST(a,b,c): if a > b then a else b => result1; if result1 > c then result1 else c)

See NULLIF example.

GREATEST give us the maximum value from a list of values. LEAST gives us the minimum. GREATEST and LEAST can be used to harmonize certain values and then to allow an equality comparison for them. As such they are similiar to SIGN. However SIGN works only with numerical data, whereas GREATEST|LEAST can also be applied to strings.

As with all functions we have to be careful about NULL values. Remember NULL means “UNKNOWN”. So if one of the values in the list is NULL, then GREATEST|LEAST will return NULL (UNKNOWN).

further considerations

Of cause it is also possible to do conditional logic using DML commands.
The WHERE clause of the DML command matches the IF part, the SET part of an update, matches the THEN part. For a delete command the THEN part is to delete the object.

For example a procedural requirement like
“if the data is older than 3 years then delete it”
can be translated into sql like this:

Delete from myData
where insert_date < add_months(trunc(sysdate),-3*12)

As we can see the procedural requirement can be translated into SQL in a very elegant and straight forward way.

Another point to remember:

If you think a CASE expression is needed in the where clause you are most probably wrong. One notable exception to this rule is, if you want to access a function based index (fbi). If a function based index uses a CASE function, then you must use exactly the same function inside your where clause, to be able to profit from this index.

Conclusion

SQL can do conditional logic. The first place to look for it is the WHERE clause. Conditional expressions can be done using the SQL functions CASE, NVL, COALESCE, NVL2, DECODE and NULLIF (in that order). Other functions can help to adapt conditional logic to the specific business requirements.