Scalable sequences – size check

Motivation

In recent times I speculate where and when to use scalable sequences. Especially if it makes sense to use them in general for sequence driven ID columns of larger tables. I know this sounds a bit like an early sign of CTD (compulsive tuning disorder), but at least I’m aware of it.

Scalable sequences offer a performance benefit in certain special situations (hot block, index contention). This post is not about those benefits.

I assume the reader has a basic understanding what a scalable sequence is. If not, check out this older post of mine: All about sequences .

A scalable sequence provides larger numeric values than a normal sequence does.

A normal sequence => 3
A scalable sequence => 10176700000003

Here I only analyze and test the effects of those bigger numbers, especially for indexes.

Side note: The first 3 digits of the scalable sequence are the instance number + 100. The second 3 digits are the session id. The remainder is the normal sequence value. Different sessions provide different numbers and by this the sequence numbers are not monotonously increasing anymore. At least not when looking at multiple sessions. Which also means inserts are done all over the index and not only on the right hand side of the index.

Bigger numbers need more storage

The number 3 needs 2 bytes, the number 10176700000003 needs 8 bytes of storage. Use the DUMP function to measure it. Values provided by scalable sequences might use up to 18 bytes of storage.

This effects multiple areas, table segment size but also indexes tend to be bigger. We can safely assume the sequences are used to provide values for ID columns which are almost always supported by an unique index. But there are also foreign key columns using those numbers and there the index will not be unique for similar ID values.

Because values are bigger less index references fit into a single index block – meaning we need more index blocks. Here I focus explictly on the size of indexes and not on the size of columns. I believe that a few additional bytes for a whole table row is usually not relevant. In case of an index however the whole row is that single value plus references (index branch block) or rowids (index leaf block).

For comparison reasons I also check the size of an reverse key index since they were used in the past to combat index contention issues. Scalable sequences are the new kid in town that should replace reverse key indexes.

So let’s measure it.

Testscript 1 – single session insert

First we create a table with 3 number columns. Then we create a normal and a scalable sequence that provide some values. We add two normal and a reverse key index on those 3 columns and populate the table with values originating from the sequences.

------------------------------------------------------------------------------- Scenario: Test index size
--------------------------------------------------------------------------------
-- create a table with two columns files with IDs
create table test_sequence_effects 
  (id_normal number, 
   id_scalable number,
   id_reverse number);

-- create a normal and a scalable sequence
create sequence test_sequence_normal;
create sequence test_sequence_scale scale;

-- create 3 indexes
create index test_sequence_normal_ix on test_sequence_effects(id_normal); 
create index test_sequence_scale_ix on test_sequence_effects(id_scalable);
create index test_sequence_reverse_ix on test_sequence_effects(id_reverse) reverse ;

-- insert 1 million datapoints
insert into test_sequence_effects 
select test_sequence_normal.nextval, test_sequence_scale.nextval, test_sequence_normal.nextval 
from dual connect by level <= 1000000;

commit;

-- create statistics
execute dbms_stats.gather_index_stats(user, 'TEST_SEQUENCE_NORMAL_IX',estimate_percent => 100);
execute dbms_stats.gather_index_stats(user, 'TEST_SEQUENCE_SCALE_IX',estimate_percent => 100);
execute dbms_stats.gather_index_stats(user, 'TEST_SEQUENCE_REVERSE_IX',estimate_percent => 100);

result after single session insert

We can have a look at the index statistics and the segment sizes.

-- check index stats
select index_name, blevel, leaf_blocks, distinct_keys, clustering_factor 
from dba_indexes 
where index_name like 'TEST_SEQUENCE_%'; 
INDEX_NAMEBLEVELLEAF_BLOCKSDISTINCT_KEYSCLUSTERING_FACTOR
TEST_SEQUENCE_NORMAL_IX2199910000004271
TEST_SEQUENCE_SCALE_IX2337910000004271
TEST_SEQUENCE_REVERSE_IX 229651000000999996

Now all three indexes still have the same blevel, meaning for a single direct index access the performance should be identical. However when the number of entries increases eventually the ID values created by a scalable sequence might create a new level in the btree before the same happens for IDs from a normal sequence. I tested when this will happen. If you are curious check out the appendix section “blevel3 break point”, but try to guess the value before you check.

-- check index size
select segment_name, segment_type, segment_subtype, blocks, bytes/1024/1024 as size_in_MB
from user_segments
where segment_name like 'TEST_SEQUENCE_%';
SEGMENT_NAMESEGMENT_TYPESEGMENT_SUBTYPEBLOCKSSIZE_IN_MB
TEST_SEQUENCE_EFFECTS TABLE ASSM 4352 34
TEST_SEQUENCE_NORMAL_IX INDEX ASSM 2176 17
TEST_SEQUENCE_REVERSE_IX INDEX ASSM 3072 24
TEST_SEQUENCE_SCALE_IX INDEX ASSM 3584 28
table and index size

The scaleable index in this experiment is 65% larger than the normal index. The cause for this increment can be attributed directly to the bigger numbers that needs to be stored.

We have to keep in mind that the test scenario was run in a single session. In reality multiple sessions will be used to run the insert(s). This is where scalable sequences should give a performance boost, even at the cost of potentially slightly higher storage consumption.

Testscript 2 – parallel inserts from multiple sessions

Ok now repeat the same experiment but using multiple sessions instead of a single session.

I created a code block that repeatetly does single row inserts in blocks of 10 and waits 1 second. This was run in up to 100 parallel sessions using dbms_scheduler jobs until approximately 1 million rows were inserted. The code for this test can be found in the appendix section “Testscript 2 – code”. Because of the 1 sec. pause this will run for some time.

Index stats after parallel inserts

INDEX_NAMEBLEVELLEAF_BLOCKSDISTINCT_KEYSCLUSTERING_FACTOR
TEST_SEQUENCE_NORMAL_IX23649 1000198 943211
TEST_SEQUENCE_SCALE_IX26577 1000198 32329
TEST_SEQUENCE_REVERSE_IX23240 1000198 986633

The number of blocks increased for all indexes. But more importantly we see that the clustering factor for the normal index went through the roof and is now close to the number of rows in the table. Which is bad. It also shows that scalable sequences seem to keep the promise of avoiding index contention and hot blocks, even if this test was not set up to measure any performance implications.

If you want to know more about clustering factor, read appendix section “index clustering factor”.

Side note: There were a few more rows inserted than in the single session approach (0.02%). That was because how the stopping mechanism worked for the parallel jobs. It does not influence the comparison at all.

Index sizes after parallel inserts

SEGMENT_NAMESEGMENT_TYPESEGMENT_SUBTYPEBLOCKSSIZE_IN_MB
TEST_SEQUENCE_EFFECTSTABLEASSM4480 35
TEST_SEQUENCE_NORMAL_IXINDEXASSM3840 30
TEST_SEQUENCE_REVERSE_IXINDEXASSM3328 26
TEST_SEQUENCE_SCALE_IXINDEXASSM6784 53

So the size of a normal index rose from 17 to 30MB (76% increase) and for the scaleable index from 28 to 53MB (83%). The reason for both increaments in size should be because now we see more 50-50 index block splits resulting in a considerable amount of blocks not being full and rarely being reused.

Overall the size didn’t explode, but it is still a noticable amount.

Side note: It is possible to shrink the indexes after such parallel insert sessions, but the effect is not great. Shrink and coalesce really should be considered when frequent deletes are run against the table. Scalable sequences potentially can provide values that might fill some of the gaps created by deletes. An index rebuild is not needed or advisable as long as more inserts are to come.

Conclusion

Values created by scalable sequences are using more storage space than values from normal sequences. For indexes we can expect a growth in index size between 50% and 100% compared to a normal index. The growth is partially because of larger numbers but also because there will be more not yet used space in many of the index blocks (caused by 50-50 index block splits instead of 90-10).

The clustering factor for scalable sequences always is better or equal than any alternative.

For all customers that I work with such an increment in storage size would not be any problem at all, especially if there is a performance improvement.

For me the storage size increase is not an argument against scalable sequences.

Appendix

Testscript 2 – code

--------------------------------------------------------------------------------
-- Scenario: Test index size after parallel inserts
--------------------------------------------------------------------------------
-- cleanup logic
drop table test_sequence_effects purge;
drop sequence test_sequence_normal;
drop sequence test_sequence_scale;

execute dbms_scheduler.drop_program('test_sequence_do_inserts');
begin
 for j in (select job_name from all_scheduler_jobs where job_name like 'TEST_SEQUENCE%') loop
     dbms_scheduler.drop_job(j.job_name);
 end loop;    
end;
/
--------------------------------------------------------------------------------
-- create a table with three columns to be filled with IDs
create table test_sequence_effects 
  (id_normal number, 
   id_scalable number,
   id_reverse number);

-- create a normal and a scalable sequence
create sequence test_sequence_normal;
create sequence test_sequence_scale scale;

-- create 3 indexes
create index test_sequence_normal_ix on test_sequence_effects(id_normal); 
create index test_sequence_scale_ix on test_sequence_effects(id_scalable);
create index test_sequence_reverse_ix on test_sequence_effects(id_reverse) reverse ;

-- insert 1 million datapoints
-- run as 100 parallel scheduled jobs 
-- each job inserts a batch of 10 IDs using single row inserts 
-- each job waits for 10 seconds before the next batch is run
-- each job stops after the max number for the normal ID is reached.

--create a program for dbms_scheduler to be used in jobs later
begin
    DBMS_SCHEDULER.create_program (
    program_name   => 'test_sequence_do_inserts',
    program_type   => 'PLSQL_BLOCK',
    program_action => q'[declare
  last_id number;
  max_id number := 1000000;
  cnt_id number := 0;
  e_stop exception;
  pragma exception_init (e_Stop, -20022);
begin 
  for j in 1..max_id/10 loop
    -- single insert for 10 rows 
    for i in 1..10 loop
        insert into test_sequence_effects 
        values (test_sequence_normal.nextval -- ID_NORMAL
             , test_sequence_scale.nextval  -- ID_SCALABLE
             , test_sequence_normal.nextval -- ID_REVERSE
             )
        returning id_normal into last_id;
        cnt_id := cnt_id + 1;
        if last_id >= max_id then 
          commit;
          raise e_stop;
        end if;   
    end loop;
    commit; 
    -- pause for 1 sec  
    dbms_session.sleep(1);
  end loop;  
exception
 when e_stop then
   -- finish gracefully
   dbms_output.put_line('This session inserted '||to_char(cnt_id)||' rows');
   commit;
end;
]',
    enabled        => TRUE,
    comments       => 'Insert 10 rows, 1 sec pause.');


   -- activate the program. This does not run the logic yet.
   DBMS_SCHEDULER.enable (name => 'test_sequence_do_inserts');

end;
/

-- create multiple jobs
begin 
  for j in 1..100 loop
     dbms_scheduler.create_job
     (job_name      => 'test_sequence_do_inserts_job'||to_char(j,'FM000'),
      program_name  => 'test_sequence_do_inserts',
      start_date => systimestamp,
      enabled => true);

     dbms_scheduler.run_job (job_name => 'test_sequence_do_inserts_job'||to_char(j,'FM000'),
                              use_current_session => false);
   end loop;
end;
/



After executing this script the scheduled job runs for quite some time.

We can check the number of rows in the table to see if it is finished.

select max(id_normal) last_value,count(*) row_count from test_sequence_effects;

other interesting selects to monitor the scheduled jobs

-- check scheduler jobs
select * from all_scheduler_programs;
select * from all_scheduler_jobs where job_name like 'TEST_SEQUENCE%';
select * from all_scheduler_running_jobs where job_name like 'TEST_SEQUENCE%';
select * from all_scheduler_job_log where job_name like 'TEST_SEQUENCE%';
select * from all_scheduler_job_run_details where job_name like 'TEST_SEQUENCE%';

And of cause the statistics need to be gathered

-- compute statistics for all indexes
execute dbms_stats.gather_index_stats(user, 'TEST_SEQUENCE_NORMAL_IX',estimate_percent => 100);
execute dbms_stats.gather_index_stats(user, 'TEST_SEQUENCE_SCALE_IX',estimate_percent => 100);
execute dbms_stats.gather_index_stats(user, 'TEST_SEQUENCE_REVERSE_IX',estimate_percent => 100);

Index clustering factor

What is the clustering factor of an index? Essentially it tells us how good the index and the table data match. How many table blocks need to be read if each block is accessed via the index? The value itself needs to be compared with the number of blocks (min value) and rows (max value) in the table.

Here are some links that might improve your understanding of the clustering factor:

http://dbaparadise.com/2018/05/what-is-index-clustering-factor/

Ask Tom: https://asktom.oracle.com/pls/apex/f?p=100:11:0::::P11_QUESTION_ID:1032431852141

David Fitzjarrell in detail about the clustering factor and Oracles calculation of it

Blevel 3 break point

I run some further tests to see when the index based upon a scalable sequence will reach blevel3.

For a single session insert, blevel 3 was reached after approximately 38 million inserts.

INDEX_NAME                  BLEVEL LEAF_BLOCKS DISTINCT_KEYS CLUSTERING_FACTOR
--------------------------- ------ ----------- ------------- -----------------
TEST_SEQUENCE_NORMAL_IX          2       81148      38200000            173362
TEST_SEQUENCE_REVERSE_IX         2      125912      38200000          38199995
TEST_SEQUENCE_SCALE_IX           3      129055      38200000            173362

For multiple sessions the same amount of blocks/leaf blocks should be reached earlier. However the distribution would be considerably more even. Both effects will partially cancel each other out. So far I didn’t measure it.

SQL*PLUS Basics – ways to measure time

Here is a quick overview about commands in SQL*plus that help to track and measure time.

set time on/off

This displays a prompt in front of each statement with the current time. Be aware that it is not the time, when the statement was executed, but the time when the line in sql*plus was created. This difference is usually not relevant when running scripts, just something to be aware of when manually typing and executing statements in sql*plus.

SQL> set time on;
10:56:02 SQL>
10:56:10 SQL> execute dbms_session.sleep(3);

PL/SQL procedure successfully completed.

10:56:23 SQL>

The prompt shows that 13 seconds passed by. However the sleep time specified was only 3 seconds. The reason is it took me approximately 10 seconds to type the statement.

set timing on/off

The result is the runtime of a sql statement or an anonymous plsql block in SQL*plus.

SQL> set timing on;
SQL> execute dbms_session.sleep(3);

PL/SQL procedure successfully completed.

Elapsed: 00:00:03.02
SQL>

After any SQL statement or anonymous plsql blocks, sql*plus shows the elapsed time for that command.

The elapsed time is not shown after SQL*PLUS commands (like SET or SPOOL).

The time is measured on the client. So it includes the time to send the command to the server, execute the command there and move the resulting information from the database server back to the sql*plus client. When very exact measurements are needed network issues will make this elapsed time fluctuate a little more than what is useful.

timing start+stop

Timing start+stop is very similar to set timing on/off. It starts a timer with timing start and shows the elpased time after the timing stop command. This allows to measure the time to run multiple commands. Wheras set timing on/off only measures a single command.

SQL> set time on;
11:14:37 SQL> timing start
11:14:45 SQL> execute dbms_session.sleep(3);

PL/SQL procedure successfully completed.

11:14:55 SQL> select * from dual;

D
-
X

11:15:00 SQL> execute dbms_session.sleep(1);

PL/SQL procedure successfully completed.

11:15:12 SQL> timing stop;
Elapsed: 00:00:31.48
11:15:17 SQL>

We can compare the measurement from timing with the times shown by set time on.
From 11:14:45 to 11:15:17 there are 32 seconds. Which matches perfectly the elapsed time or 31.48 seconds shown by timing start+stop.

_DATE

This is a predefined sql*plus variable that holds the current time. It uses the sessions nls settings as display format.

SQL> prompt &_DATE
27-APR-22
SQL> alter session set nls_date_format = 'dd-mm-yyyy HH24:MI:SS';

Session altered.

SQL> prompt &_DATE
27-04-2022 11:22:20

SQL> set pagesize 10
SQL> ttitle "Important list of numbers &_DATE."
SQL> select level as num from dual connect by level <= 10;
Important list of numbers 27-04-2022 11:59:56
       NUM
----------
         1
         2
         3
         4
         5
         6

Important list of numbers 27-04-2022 11:59:56
       NUM
----------
         7
         8
         9
        10

10 rows selected.
SQL> ttitle OFF

Useful when one wants to add the day and/or time to the name of a log file for example.

sysdate/systimestamp

These are SQL pseudofunctions that give the current datetime. Sysdate delivers a date (incl. time up to the second), systimestamp delivers also fractional seconds. The time is derived from the servers time settings. So client settings do not influence the value. However client nls settings effect the conversion to strings.

SQL> select sysdate from dual;

SYSDATE
---------
27-APR-22

SQL> alter session set nls_date_format='DD-MM-YYYY HH24:MI:SS';

Session altered.
SQL> select sysdate from dual;

SYSDATE
-------------------
27-04-2022 14:33:03

SQL> select to_char(sysdate,'FMDay, DDth Month YYYY') from dual;

TO_CHAR(SYSDATE,'FMDAY,DDTHMONTHYYYY')
-----------------------------------------
Wednesday, 27TH April 2022

SQL> select systimestamp from dual;

SYSTIMESTAMP
-----------------------------------------
27-APR-22 02.33.10.675396 PM +02:00

SQL> alter session set nls_timestamp_tz_format='DD-MM-YYYY HH24:MI:SSXFF';

Session altered.

SQL> select systimestamp from dual;

SYSTIMESTAMP
------------------------------------------
27-04-2022 14:33:18.033306

SQL> select to_char(systimestamp,'FMDay, DDth Month YYYY HH24:MI:SSXFF TZR') from dual;

TO_CHAR(SYSTIMESTAMP,'FMDAY,DDTHMONTHYYYYHH24:MI:SSXFFTZR')
------------------------------------------
Wednesday, 27TH April 2022 14:33:22.260310 +02:00

dbms_utility.get_time

A possibility to measure very exact times (1/100 of a second) in plsql.

SQL>declare
2   time number;
3 begin
4   time:=dbms_utility.get_time;
5   dbms_session.sleep(3);
6   dbms_output.put_line('Elapsed in sec: '||to_char((dbms_utility.get_time - time)/100));
7 end;
8 /
Elapsed in sec: 3.04

PL/SQL procedure successfully completed.

SQL>

Be aware that the numbers returned by get_time are often negative integers, like -1618843773.

We can also use this to measure the time between several sql*plus calls by storing the first time in a sql*plus variable. So it doesn’t need to be a single large plsql block.

SQL> var stime number;
SQL> exec :stime := dbms_utility.get_time;

PL/SQL procedure successfully completed.

SQL> exec dbms_session.sleep(3);

PL/SQL procedure successfully completed.

SQL> select * from dual;

D
-
X

SQL> exec dbms_output.put_line('Elapsed in sec: '||to_char((dbms_utility.get_time - :stime)/100));
Elapsed in sec: 18.62

PL/SQL procedure successfully completed.

SQL>

Serveroutput needs to be switched on to see the result of dbms_output.

Recommendations

  • For crude but easy performance time measurements set time on and set timing on are solid.
  • For exact and comparable measurements on the database use dbms_utility.get_time.
  • To show date and time in scripts sometimes _DATE is a little easier to use than sysdate.
  • To add the full runtime of a script to the end of a spooled logfile timing start+stop can be used.

lateral join – decorrelation gone wrong

A colleaque made me aware of the following misbehaviour of Oracles optimizer. Shoutout to Christine S. who discovered that problem.

demo

create table a 
as select 1 x from dual;

create table b 
as select 1 x, 1 y from dual union all 
   select 1 x ,2 y from dual;

select a.x, c.y
from  a,
     lateral (select b.x, max (b.y) y from b where b.x = a.x) c
where a.x = c.x;

result

         X          Y
---------- ----------
         1          2

At first glance this looks like what we intended to see. However a closer inspection of the code reveals that select b.x, max (b.y) y from b is not valid syntax. We do get an error message, when running it directly.

ORA-00937: not a single-group group function

We also get the same error message when we use the select with the lateral keyword to drive a materialized view.

create materialized view mv
as
select a.x, c.y
from a,
     lateral (select b.x, max (b.y) y from b where b.x = a.x) c
where a.x = c.x;
ERROR at line 5:
ORA-00937: not a single-group group function

The ORA-937 error message is correct and should be displayed.

That we get a result in the first case, is an error!

reason

Oracle transforms the lateral join into a decorrelated lateral join. During this transformation the syntax error disappears. We see this when checking the outline section of the execution plan (see below). This transformation is different when used in a materialized view.

select *
from dbms_xplan.display_cursor(sql_id=>'2c1nchvqtjt6s', format=>'+outline');
Plan hash value: 3244120894

----------------------------------------------------------------------------------------
| Id  | Operation            | Name            | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |                 |       |       |     7 (100)|          |
|*  1 |  HASH JOIN           |                 |     1 |    29 |     7  (15)| 00:00:01 |
|   2 |   TABLE ACCESS FULL  | A               |     1 |     3 |     3   (0)| 00:00:01 |
|   3 |   VIEW               | VW_DCL_A18161FF |     1 |    26 |     4  (25)| 00:00:01 |
|   4 |    HASH GROUP BY     |                 |     1 |     6 |     4  (25)| 00:00:01 |
|   5 |     TABLE ACCESS FULL| B               |     2 |    12 |     3   (0)| 00:00:01 |
----------------------------------------------------------------------------------------

Outline Data
-------------

  /*+
      BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('19.1.0')
      DB_VERSION('19.1.0')
      ALL_ROWS
      OUTLINE_LEAF(@"SEL$B1519A3D")
      DECORRELATE(@"SEL$2")
      OUTLINE_LEAF(@"SEL$13D91C5D")
      OUTER_JOIN_TO_INNER(@"SEL$71877063" "C"@"SEL$1")
      OUTLINE(@"SEL$2")
      OUTLINE(@"SEL$71877063")
      DECORRELATE(@"SEL$B1519A3D")
      OUTLINE(@"SEL$1")
      OUTLINE(@"SEL$B1519A3D")
      DECORRELATE(@"SEL$2")
      FULL(@"SEL$13D91C5D" "A"@"SEL$1")
      NO_ACCESS(@"SEL$13D91C5D" "C"@"SEL$1")
      LEADING(@"SEL$13D91C5D" "A"@"SEL$1" "C"@"SEL$1")
      USE_HASH(@"SEL$13D91C5D" "C"@"SEL$1")
      FULL(@"SEL$B1519A3D" "B"@"SEL$2")
      USE_HASH_AGGREGATION(@"SEL$B1519A3D")
      END_OUTLINE_DATA
  */

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("A"."X"="C"."X")

The name of the view VW_DCL_xxx indicates that it was used for a decorrelated lateral operation. For a normal lateral operation the name of the view woud be VW_LAT_xxx.

opinion

I’m a big fan of lateral joins. For more insights what a lateral join does, check this older blog post by the Optimizer Development Group: http://optimizermagic.blogspot.com/2007/12/outerjoins-in-oracle.html

Since 12c (or maybe a little earlier than that) there were some optimizer enhancements to lateral joins, called decorrelation. In essence it means a lateral join, which can be seen as a kind of correlated subquery, is rewritten into a uncorrelated subquery under certain specific circumstances.

I believe this rewrite is almost always wrong. If the developer on purpose decides to use the lateral keyword (same goes for cross apply and outer apply) the optimizer should not try to correct this.

Other authors have written about problems with decorrelation before.
See

The case demonstrated here is just another example where decorrelation has gone wrong. An optimizer hint should not change the result of a query, and even more so should not change whether the query is formulated correctly or not. There are exceptions to this rule, where the intention of the hint is to change the outcome, for example ignore_row_on_dupkey_index or fresh_mv. But the (undocumented) DECORRELATE/NO_DECORRELATE hints are no such cases.

workaround

There are two ways to disable the decorrelation and get the expected ora- error message.

a) use the NO_DECORRELATE hint.

SQL> select /*+ no_decorrelate(@"SEL$2") */
  2   a.x, c.y
  3  from  a,
  4       lateral (select b.x, max (b.y) y from b where b.x = a.x) c
  5  where a.x = c.x;
     lateral (select b.x, max (b.y) y from b where b.x = a.x) c
                     *
ERROR at line 4:
ORA-00937: not a single-group group function

b) switch the decorrelation feature off using the hidden (underscore) parameter _optimizer_ansi_join_lateral_enhance

SQL> alter session set "_optimizer_ansi_join_lateral_enhance"=false;

Session altered.

SQL> select a.x, c.y
  2  from  a,
  3       lateral (select b.x, max (b.y) y from b where b.x = a.x) c
  4  where a.x = c.x;
     lateral (select b.x, max (b.y) y from b where b.x = a.x) c
                     *
ERROR at line 3:
ORA-00937: not a single-group group function

 

conclusion

Although LATERAL joins are a nifty feature, always be aware that such features often come with bugs.

plsql function: who_called – simplify your exception handler!

Does your error logging logic still look like this?

example 1 – old school handler

procedure doSomething is
begin
   ...
exception
  when others then
     pk_log.error(p_module    => $$PLSQL_UNIT, 
                  p_submodule => 'doSomething', 
                  p_message   => 'Something not done!'||sqlerrm);
     raise;
end;

Here is what I prefer!

example 2 – compact handler

procedure doSomething is
begin
   ...
exception
  when others then
     pk_log.error('Something not done!');
     raise;
end;

Such code is way more convinient! Of cause this will store the same information as the previous example 1.

To get there some parts have to be moved into the logging method (pk_log.error).

  1. sqlerrm
    is the error message
    When moving, it should also be improved (at least use dbms_utility.format_error_stack)
  2. $$PLSQL_UNIT
    is the name of the package (or of the standalone procedure or function)
  3. ‘doSomething’
    is the hardcoded name of the procedure
    It needs to be gathered dynamically at runtime inside the logging package.

The biggest issue is 3.) – to find the name of the module at runtime. The function who_called can solve that. It also gathers the package name on the way, so 2.) is also covered.

Side note: Is hardcoding the procedure name a problem?

Not really. However if the name changes one needs to change it in several lines. And in case one forgets to change the error handler, the compiler will not issue an error. Then the problem might go unnoticed for some time and could be misleading in case of exceptions.

I have two versions of the who_called function. The first standalone version is not recommend to use. Instead use the packaged version. However the first version demonstrates the implemented logic in a clear way.

Both functions will find the full name of the module that called this code block. The default behaviour is that modules from the same package are ignored. This allows easy usage inside packages ment for logging, auditing or testing. Everything that is instrumentation code can usually profit from this little helper.

Usecase 1 – demo purposes – standalone function

This logic assumes that we use the function to find out who called the procedure that is currently running. So not the name of the procedure itself, but the name of the caller.

Installation code

-- Cleanup
-- drop function who_called;

-- Usecase 1: standalone function - simple demo version

-- check the call stack to find the calling module.
create or replace function who_called return varchar2
  is
    v_caller utl_call_stack.unit_qualified_name;
    v_name      varchar2(128);
begin
    -- print the call stack just for demo purposes
    for i in 1..utl_call_stack.dynamic_depth  loop
      sys.dbms_output.put_line(
         rpad('+',utl_call_stack.dynamic_depth-i,'+')||
         utl_call_stack.concatenate_subprogram( utl_call_stack.subprogram(i) )
         );
    end loop;

    
    -- step backwards though the call stack 
    --      1 = current module = who_called function
    --      2 = the module that calls who_called 
    --      3 = the caller!
    v_caller := utl_call_stack.subprogram(3);

    v_name := utl_call_stack.concatenate_subprogram(v_caller);

    return v_name;

end who_called;
/

example 3 usage

set serveroutput on

-- Testcase 1.1 anonymous block
declare
  procedure DoSomething as
  begin
    dbms_output.put_line('I want to doSomething!');
    dbms_output.put_line('Who called? '|| who_called);
  end;

  procedure DoMore as
  begin
    doSomething;
    dbms_output.put_line('I want to doMore!');
    dbms_output.put_line('Who called? '|| who_called);
  end;
begin
  doMore;
end;
/

output

I want to doSomething!
+++WHO_CALLED
++__anonymous_block.DOSOMETHING
+__anonymous_block.DOMORE
__anonymous_block
Who called? __anonymous_block.DOMORE
I want to doMore!
++WHO_CALLED
+__anonymous_block.DOMORE
__anonymous_block
Who called? __anonymous_block


PL/SQL-Prozedur erfolgreich abgeschlossen.

Each time the function is called, it prints the full call stack to the screen. This is just for demonstration purposes.
The first time the function who_called is executed is from inside the submodule doSomething.
The call stack at this point looks like this

Position in StackModule
1who_called
2doSomething
3doMore
4__anonymous_block

Line 5 in the anonymous block is the line with the who_called function (call stack 1). It is part of the doSomething procedure (call stack 2). On position 3 of the call stack we always find the caller – here the procedure doMore that did execute doSomething.

The second time the function is called is in the doMore procedure. And the call stack looks like this:

Position in StackModule
1who_called
2doMore
3__anonymous_block

In this case the caller is the anonymous block.

The example clearly shows that we can fetch the name of any module in the call stack (including anonymous blocks). We just have to fetch the name from the proper level in the call stack.

Usecase 2 – error and trace logging – packaged version

This works a bit differently when used inside a larger package that is used to instrument the code base. Typically we can add error logging logic, and logging debug (trace) calls. In that scenario, we want to find out the name of the module where the trace or error logging call is made.

The logic assumes that we have a package used for this instrumentation purpose only. Which also means, that all calls inside this package can be ignored. So essentially we want to find out what is the full name of the procedure (or function) where the log error or the log trace call is done. The full name can be something like package.module.submodule.subsubmodule.

Installation code

When testing this, make sure you do not have/own a package pk_log already. This would overwrite it.

If you like it copy the code for the who_called function into your own instrumentation package.

-- Cleanup
-- drop function who_called;

-- who_called packaged version 
-- check the call stack to find the calling module.
-- calls from this package are not considered
create or replace package pk_log as
/* this is a very simplified example for error logging 
   only demonstration purposes of the WHO_CALLED function 
   your error logging logic must be better!
*/
  procedure error(message in varchar2);
  procedure trace(message in varchar2); 
end pk_log;
/

create or replace package body pk_log as
    function who_called return varchar2
      is
        -- author: Sven Weller, 2021, syntegris information soultions GmbH
        v_pkg    varchar2(128) := $$plsql_unit; -- name of the package that holds the who_called function
        v_caller utl_call_stack.UNIT_QUALIFIED_NAME;
        v_name varchar2(128);
    begin
        -- step backwards though the call stack 
        --      1 = current module = who_called function
        --      2 = the module that calls who_called 
        --      3 = the caller!
        -- we could start later (level 3 or even 4), if we knew exactly how many other submodules are always in the call stack 
        -- This might be risky if function inlineing is possible, but that can be prevented by a pragma. Example in proc error
        for i in 2..utl_call_stack.dynamic_depth  loop

          v_caller := utl_call_stack.subprogram(i);

          v_name := utl_call_stack.concatenate_subprogram(v_caller);

          -- at lexical depth 1 we find the name of the package
          if v_caller(1) != v_pkg then
            -- In some scenarios, we might want to go even one step upward the call stack. Typically not for error logging purposes.
            -- In such cases we could simply fetch one more name from the call stack
            -- v_caller := utl_call_stack.subprogram(i+1);
            
            -- build the full name, but only if we found the caller
            v_name := utl_call_stack.concatenate_subprogram(v_caller);
            -- no need to go further through the call stack
            exit;
          end if;
        end loop;

        return v_name;

    end who_called;

    procedure writeMessage(message in varchar2, message_type in varchar2) as
    begin
      case message_type 
      when 'E' then
        sys.dbms_output.put_line('ERROR at ' || who_called);
      when 'T' then
        sys.dbms_output.put_line('Info from ' || who_called);
      else  
        sys.dbms_output.put_line('ERROR! Unknown Message Typ ' || message_type || ' in '||who_called);
      end case;        
      sys.dbms_output.put_line(message);

      if message_type = 'E' then 
        -- Add full error stack
        sys.dbms_output.put_line(dbms_utility.format_error_stack);
        -- Add line where the error happened
        -- Only the last line from the backtrace is relevant, all other lines are already in the error stack
        -- compare output to sys.dbms_output.put_line(dbms_utility.format_error_backtrace);
        sys.dbms_output.put_line(' in '||nvl(UTL_CALL_STACK.backtrace_unit(utl_call_stack.backtrace_depth),'anonymous block')
                               ||' at line '||sys.utl_call_stack.backtrace_line(utl_call_stack.backtrace_depth)
                               );
      end if;
    end writeMessage; 

    procedure error(message in varchar2) as
      pragma autonomous_transaction;
    begin
      pragma inline(writeMessage,'NO'); -- we don't want to inline in case of optimization_level = 3.
      writeMessage(message,'E' );
    end error;
    procedure trace(message in varchar2) as
      pragma autonomous_transaction;
    begin
      writeMessage(message,'T' );
    end trace;   
end pk_log;
/

example 4 call package, log trace and log error

set serveroutput on

-- Testcase 2.1 anonymous block
declare
  procedure DoSomething as
  begin
    pk_log.trace('I want to doSomething!');
  end;

  procedure DoMore as
  begin
    doSomething;
    raise no_data_found;
  end;
begin
  doMore;
exception 
  when others then
    pk_log.error('I wanted to do more, but this happend!');
    raise;
end;
/

output

This is the dbms_output (the simplified trace and error logs).

Info from __anonymous_block.DOSOMETHING
I want to doSomething!
ERROR at __anonymous_block
I wanted to do more, but this happend!
ORA-01403: no data found
ORA-06512: at line 10

 in anonymous block at line 13

The output is followed by the exception from the final RAISE; of the block (from SQL developer).

Fehler beim Start in Zeile: 4 in Befehl -
declare
  procedure DoSomething as
  begin
    pk_log.trace('I want to doSomething!');
  end;

  procedure DoMore as
  begin
    doSomething;
    raise no_data_found;
  end;
begin
  doMore;
exception 
  when others then
    pk_log.error('I wanted to do more, but this happend!');
    raise;
end;
Fehlerbericht -
ORA-01403: no data found
ORA-06512: at line 17
ORA-06512: at line 10
ORA-06512: at line 13
01403. 00000 -  "no data found"
*Cause:    No data was found from the objects.
*Action:   There was no data from the objects which may be due to end of fetch.

In SQL Plus we get one more line, because of the additional raise in line 17.

There is something special in the code. The last line from the error backtrace is captured and returned. "in anonymous block at line 13" .

This last line is missing in the error stack, which means it is usually not returned when you only store dbms_utility.format_error_stack.

To add this last line utl_call_stack.backtrace_line(utl_call_stack.backtrace_depth) is executed. All other lines from the error backtrace are already in the error stack (depends on DB version) – so those can be ignored.

Conclusion

For my purposes it works fine and the resulting code is very clean.

I currently use it in an audit package based upon Connor McDonalds Audit Generator which was massively reworked for some custom needs. Maybe I publish this one day. But that is another story.

who_called advantages

  • Way less to type – less mistakes
  • much easier to see what the error handler does
  • concentrate coding effort on the important bits – mostly the additional information that should be added to the error

who_called disadvantages

  • A tiniest bit slower than hard coding the package and submodule name
    => For exception handling this is not relevant.
    => In huge debug or trace scenarios this time could add up if extreme performance is needed. Conditional compiling can be applied to remove instrumentation calls if you compile for extreme plsql performance.
  • since part of the logic is hidden other developers might wanted to add the module name into the message
  • module name is fetched at run time, not at compile time
    => this can be an issue, especially if optimization level 3 is used. Function inlineing (check the compiler warnings) is a method that the plsql optimizer can choose to speed up logic. However it also means that the function at runtime will not exist anymore. And utl_call_stack will not be able to report the name of the module correctly. Note that there is a compiler warning that inlining happend (which usually is a good thing).

Feel free to adapt this code snippet to your special needs. I’m convinced this will help you to write simpler and more efficient error handlers.

My favorite top 10 new features in Oracle database 19c

Justification

This database version is out now for 2 years. Why talk about new features now?

Many of my customers recently made the upgrade to the 19c database. Usually from 12.1 or 12.2, at least one from an even older version. So I compiled a list of things that I enjoy using and now having available in a 19c environment.

Be aware, 19c is not a major new database release it is just a rebranded 12.2.0.4. However it is the final (=long term support) release of the Oracle 12.2 database family. As such it already received a stack of backported features of 21c. And some of them look very useful to me.

This is a highly subjective compilation. My time spent is 75% development, 50% DBA stuff and 25% management (yes those areas overlap) – which might explain some of the preferences.

10 Gradual Database Password Rollover for Applications (19.12)

A new profile parameter PASSWORD_ROLLOVER_TIME allows to change a database account password, without a downtime for the application that needs to use this password.

See also: https://docs.oracle.com/en/database/oracle/oracle-database/21/nfcon/gradual-database-password-rollover-for-applications-222774864.html

This parameter was originally developed for 21c and was backported in version 19.12. It can be set for a profile, but also the alter user syntax was enhanced.

Essentially it means for a certain time a user can login with either the old or with the new password. The maximum allowed time is 7 days.

Some accounts (administrative) can not use this, probably for security reasons.
ORA-28227: Gradual password rollover is not supported for administrative users.

Julian Dontcheff explains the parameter in more detail:

For security consideration check this post by Rodrigo Jorge: https://www.dbarj.com.br/en/2020/12/21c-gradual-database-password-rollover-brings-new-backdoor-opportunities/

9 Hybrid Partitioned Tables

Partitioned external tables were introduced in 12.2. Such tables use the external table driver (ORACLE_LOADER or ORACLE_DATAPUMP) to get the data from a file or even a cloud source (see also DBMS_CLOUD). And each partition can have a different file or even a different oracle directory as the source.

Now we can have tables that have external partitions and normal table partitions. They are called hybrid partitioned tables and are a special case of the external partitioned tables.

The base idea is that actively used partitions would stay in the database, but rarely used partitions can be externalised into cheaper storage systems and out of the DB.

The interesting part is that all external partitioned tables when used in queries can profit from certain partitioning operations, like partition pruning and partition wise joins.

Of cause the partitioning license is needed for this (or use 21cXE)!

Further readings:

8 MAX_IDLE_BLOCKER_TIME

Additionally to MAX_IDLE_TIME there is a new parameter that can restrict the duration of a database session: MAX_IDLE_BLOCKER_TIME. Both are initialization parameters, but also ressource plan directives.

Setting such a parameter to 0 means the session is unrestricted. Other values (for the initialization parameters) are in minutes.

MAX_IDLE_BLOCKER_TIME will limit sessions that consume ressources. This is the much better option, because connection pools from application servers usually are idle when the are not in use. Those sessions should not be touched if we set MAX_IDLE_TIME to 0 and MAX_IDLE_BLOCKER_TIME to 30 for example.

https://docs.oracle.com/en/database/oracle/oracle-database/21/nfcon/details-max_idle_blocker_time-parameter-282450835.html

7 Bitmap Based Count Distinct SQL Function

Essentially this allows to create materialized views (MV) using COUNT (DISTINCT …) over multiple dimensions. Large aggregations in DWH environments is where this technique shines.

Use case

Imagine a statement as this

select count(distinct product) unique_#_of_products,
          count(distinct customer) unique_#_of_customers
from orders
where order_month = 202108;

If this statement is slow, we can improve it using a materialized view. However such a view is very difficult to write, if we allow different levels of hierachies and filters. Like for a time dimension it could be day, month, year.

The problem is that we can not aggregate the distinct count from a lower level to a higher level. The distinct customer count for single month could be 3. If each month in this year has the same distinct customer count of 3, we still don’t know if it was the same customer each month or totally different customers. All we can deduct is that the distinct customer count for this year is at least 3 and at most 36 (=3*12).

With 19c we got several new BITMAP_* functions that combined help us to create a materialized view for such a scenario. It is a complex task.

Connor McDonald explains how the logic works: https://connor-mcdonald.com/2020/08/17/faster-distinct-operations-in-19c/

Here are direct links to the 5 new functions that are needed to implement this. Three scalar functions and two aggregate functions (those that end with _AGG)

Also see Dani Schniders take on the subject: https://danischnider.wordpress.com/2019/04/20/bitmap-based-countdistinct-functions-in-oracle-19c/

6 SQL Macros (19.7)

SQL Macros were announced for 20c.

There are two kinds of SQL macros – scalar and table macros. In 19c we only have TABLE macros (so far).

Scalar macros can be used in most clauses of a SQL statement (typically a select). The select, the where, the order by clause, table macros can only be used in the from clause.

I was quite sceptical about SQL macros, but I’m starting to see how useful they can be. Personally I think the scalar macros are more usable, but we don’t have them in 19c yet. The table macros are the fancier things of cause.

Macros are implemented useing a plsql function. This function is resolved at compile time (not at runtime!) and provides additional code, that replaces the function. We can see it as some kind of advanced text substitution.

Links

SQL Macros – Creating parameterised views

5 JSON_MERGEPATCH

JSON_MERGEPATCH is a new sql function that allows partial updates to JSON clobs.

https://docs.oracle.com/en/database/oracle/oracle-database/19/adjsn/updating-json-document-json-merge-patch.html#GUID-31F88F28-3D92-489B-9CCD-BD1931B91F1F

It allows easy patching of json objects. However we can not as easily add entries to a json array for example. To do so, the whole array needs to be replaced.

The clob is copied during the mergepatch process. If performance or memory is relevant, keep that in mind.

Examples from the docs

UPDATE j_purchaseorder 
SET po_document =
  json_mergepatch(po_document, '{"Special Instructions":null}');

SELECT json_mergepatch(treat('{"Order":"0815","Special Instructions":"Handle with Care"}' as json), '{"Special Instructions":null}'
                       RETURNING CLOB PRETTY)
  FROM dual;
{
  "Order" : "0815"
}

4 Easy Connect Plus

Easy connect plus allows everything that was previousy added in a tnsnames.ora or sqlnet.ora. So no tnsnames.ora is needed anymore, even when setting some special options for the connection.

Syntax

Easy connect:


database_host[:port][/[service_name]


Easy Connect Plus:

[[protocol:]//]host1{,host2}[:port1]{,host2:port2}[/[service_name][:server_type][/instance_name]][?parameter_name=value{&parameter_name=value}]

Important to note here is a way to specify the protocol, which can be TCPS or TCP for example. Multiple servers are supported, which is very useful in case of a data guard environment. And there is a way to add parameters.

Some common parameters could be:

  • failover
  • wallet_location
  • sdu
  • retry_count

What those parameters do can be seen in the white paper about Easy Connect Plus.

Example

tcps://salesserver1:1521/sales.us.example.com?wallet_location=”/tmp/
oracle”

Further readings

White Paper from oracle

Easy connect plus in scripts against environments protected by data guard (German article) by Marco Mischke:

https://www.robotron.de/unternehmen/aktuelles/blog/easy-connect-plus-in-version-19c

3 SQL/JSON Simplifications

Dot notation for JSON inside SQL.

Example

The following example is taken direcly from LiveSQL (link below).

drop table t;
create table t (id number, col clob check (col IS JSON));
insert into t values(1, '{a:[1, "1", "cat", true, null, false, {}, ["dog","mouse"]], b:"json is fun"}');

-- The so-called JSON simplified syntax or 'dot notation' allows to use dots and square brackets to navigate into a SQL expression if it is known to be JSON.

select t.col.a from t t;

A
[1,"1","cat",true,null,false,{},["dog","mouse"]]

select t.col.b from t t;

B
json is fun

Also see treat( ... as json)

More on LiveSQL:

JSON Path expressions and simplified syntax

2 3 PDBs for Standard Edition 2 (19.5)

Multitenant for Standard Edition!

Starting from 19c all databases including Standard Edition 2 can use up to 3 PDBs without any additional license.

That means for example we can now clone a PDB from Database TEST to Database INTEGRATION without unplugging the existing PDBs (as long as the total number excluding the PDB$SEED is equal or less than three).

The parameter MAX_PDBS influences how many pdbs you are allowed to create. It should be set to 3 if you are on a standard edition and have the recent patches.

See also: https://mikedietrichde.com/2019/09/17/3-pdbs-included-with-oracle-19c-and-desupport-of-non-cdbs-with-20c/

Also I remember that the limit was later further increased to 5. But I didn’t find any documentation for that, so I can only recommend using 3 PDBs if you are on SE2.

1 LISTAGG distinct

LISTAGG is a very usefull aggregate function (also with an analytic version) mostly used in reporting environments. For example we can create a list of all employees that belong to a department.

All following code examples have been run on LiveSQL.

listagg(e.last_name,',') within group (order by e.last_name)

Since each employee is always only in one department, that is a fairly easy list to do.

How about if we want to see the list of jobs that are in each department?

listagg(j.job_title,',') within group (order by j.job_title)

Unfortunatly this does repeat all the jobs.

Listagg distinct avoids those duplications.

listagg(distinct j.job_title,',') within group (order by j.job_title)

Thanks to @oraesque who suggested this feature in the Database Ideas Space.

Try it on LiveSQL: https://livesql.oracle.com/apex/livesql/file/content_HT1O85E4BHSBWN93G1B3M8SI2.html

Honorable mentions:

Some features did not make it into the list. I still like to make a quick mental note about them, but without further explanation.

  • Polymorphic table functions (18c)
  • dynamic sequence cache (19.10 – seems a bit shaky at the moment)
  • blockchain tables
  • dbms_hprof trace output in the database (18c)

10 little Oracle SQL features – that you are probably not using enough

This is a collection of some small features, patterns or snippets, that I seem to use frequently and like to share. The features are written in no specific order. However we can structure the features into performance related [P], convenience [C] or enablers [E].

Enablers [E] would be features that allow us to do something, that is otherwise impossible or at least very hard to achieve. Sometimes even to avoid an error.


Convenience [C] features are things that offer an alternative construct. Often they can be used instead of another option. Sometimes making the code easier to change (maintainability).


Performance [P] features improve execution speed. Often they come with a drawback or one should keep exceptions or rare race conditions in mind.

1. find multiple words using regexp_like [C]

instead of writing multiple like conditions (or repeating the same select using like multiple times), we can write a single regexp_like expression.


Cumbersome:

where (str like '%word1%' OR str like '%word2%' OR ...)

Much easier:


where regexp_like(str,'(word1|word2|...)')

The pipe “|” inside the regular expression acts as the OR operator and the parenthesis () form the subexpression that is needed for that OR.

A very typical case is to query the v$parameter table for a set of parameter names.
example

select name, value 
from v$parameter 
where regexp_like(name,'(listen|dispatch|service)');
NAMEVALUE
service_namesfeno1pod
enable_dnfs_dispatcherFALSE
dispatchers(PROTOCOL=TCP) (SERVICE=feno1podXDB)
max_dispatchers
query result

What I especially like about it, is that the need for wildcards is gone and it is so easy to add more “words” to search for.

2. fetch first row only [C]

Do you still use rownum frequently? Why not use to the row limiting clause instead?

I use it a lot, mostly for ad hoc queries. One advantage is that the need to create an inner query that does the correct ordering disappears.


example

-- show the top 3 tables with the most rows
select table_name, num_rows
from user_tables
order by num_rows desc
fetch first 3 rows only;
TABLE_NAMENUM_ROWS
IMP_SD_2744-12_45_1819696
IMP_SD_2822-14_28_5319692
IMP_SD_194-09_40_5019545
Query result

3. use expression lists (a,b) [E]

I use this frequently in update statements but also sometimes in joins. Instead of setting each column separately we can set multiple columns at once. Simply by using parenthesis and a comma between the columns.

example

update persons u
set (u.first_name, u.last_name) = (select a.firstname, a.name 
                                   from applicants a 
                                   where a.person_id = u.id)
where u.status = 'NEEDS REFRESH';

Or we can filter on a combination of columns (a,b) in ((c,d),(d,c),(e,f)). The need for this should be rare, but it happens. A strange edge case is when we do an equality comparison of two expression lists. For some reason the right side needs an additional set of ().

(a,b) = ((c,d))

4. DeMorgans law [C,E]

This might be the single most important math/logic rule, that SQL developers should know. It is usually applied inside the where clause and knowing it by heart will prevent a lot of easy mistakes.

{\displaystyle {\begin{aligned}{\overline {A\cup B}}&={\overline {A}}\cap {\overline {B}},\\{\overline {A\cap B}}&={\overline {A}}\cup {\overline {B}},\end{aligned}}}
demorgans law

written as a SQL expression

not(A or B) = not(A) and not(B)

not(A and B) = not(A) or not(B)

It is easy to remember. OR changes to AND when the parenthesis are resolved (and the other way round).
A and B here are full SQL expressions, for example x=y.


Why is that so important? SQL is a very logic based language. In daily speach we often use logical operators (AND/OR/NOT) differently than what is needed when they should be applied in SQL. This can become confusing very easily. Knowing DeMorgans law helps to quickly check if the logic is used correctly.

An example

Task: “Find all employees that are not working in sales and marketing.”

Converting this sentence 1:1 into SQL would result in this:

select * from emp
where not (department = 'SALES' and department = 'MARKETING'); 

Obviously what is ment, is that we want to find those employees that are working in some other department.

Applying Damorgans Law, we can reformulate our select statement. Also we replace NOT (x=y) with x!=y .

select * from emp
where  department != 'SALES' 
    OR department != 'MARKETING'; 

A developer should understand now, that this condition will always be true. For each row the department is either sales or not sales. And if it is sales, then it is not marketing. So the combination is always true (excluding NULL value considerations). Which is probably not what is wanted.

Here the row logic also plays a part. The where clause is applied to a single row, but normal speach often uses the boolean operation to combine data sets.

The correct query would be

select * from emp
where  not (department = 'SALES' 
             OR department = 'MARKETING'); 


or even better use IN instead or OR

select * from emp
where  department not in ( 'SALES' , 'MARKETING'); 

Historic anecdote: Da Morgan was not the first one to discover this logic. Centuries before Da Morgan, a guy called William of Ockam already wrote about it. He probably had it from Aristotle. Occam (the spelling changed over the centuries) nowadays is more famous for his razor.

5. rollup and grouping sets [C]

To get a total row for a query that uses sum or count, simply add rollup.


example rollup

select tablespace_name, count(*)
from dba_tables 
group by rollup (tablespace_name);
TABLESPACE_NAME	COUNT(*)
DATA	        362
DBFS_DATA	2
SAMPLESCHEMA	14
SYSAUX	        1357
SYSTEM	        1056
	        322
	        3113

The last line is the total number of tables. One problem here are tables without a tablespace (null). The grouping() function helps to separate a normal row with a data value NULL from a superaggregate row (19c grouping).

If there are multiple columns then instead of rollup I use grouping sets. The full set (=total) in a grouping set expression can be expressed by ().

example grouping sets + grouping

select case when grouping(tablespace_name) = 1 then '-all-' else tablespace_name end as tablespace_name
     , case when grouping(cluster_name) = 1 then '-all-' else cluster_name end as cluster_name
     , count(*) 
from dba_tables 
group by grouping sets ((tablespace_name, cluster_name),());
TABLESPACE_NAME	CLUSTER_NAME	COUNT(*)
		                322
DATA		                362
SYSAUX		                1356
SYSAUX	SMON_SCN_TO_TIME_AUX	1
SYSTEM		                1020
SYSTEM	C_RG#	                2
SYSTEM	C_TS#	                2
SYSTEM	C_OBJ#	                17
SYSTEM	C_COBJ#	                2
SYSTEM	C_MLOG#	                2
SYSTEM	C_USER#	                2
SYSTEM	C_FILE#_BLOCK#	        2
SYSTEM	C_OBJ#_INTCOL#	        1
SYSTEM	C_TOID_VERSION#	        6
DBFS_DATA		        2
SAMPLESCHEMA		        14
-all-	-all-	                3113

Again the last line shows the total number of tables. Since I didn’t want to see subtotals for tablespace or cluster grouping sets is the perfect solution t add this total row.

Also notice that the first line has all null values for the names, same as the total line would have. Using the grouping function allows to find out which is the total line and give it a meaningful text.

6. enhanced listagg [E]

In the more recent database versions, the very useful LISTAGG command got even better. For production code I nowadays always try to remember to add some safety protection in place in case result of the aggregation becomes big. Otherwise it could happen to get ORA-01489: result of string concatenation is too long.

Since 12.2 we can avoid the error by using the OVERFLOW clause

listagg (... on overflow truncate without count) ...

So instead of the error message, when the maximum string size is reached (4000 bytes or 32k bytes depending on max_string_size parameter) we get usable text without the statement raising an error.

example

select count(*)
, listagg(table_name,', ' on overflow truncate) within group (order by tablespace_name desc, table_name) all_tables
from dba_tables;
COUNT(*)ALL_TABLES
3113AQ$_ALERT_QT_G, AQ$_ALERT_QT_H, AQ$_ALERT_QT, … many many more tables …, SDO_CRS_GEOGRAPHIC_PLUS_HEIGHT, SDO_CS_SRS, SDO_DATUMS, …(1304)

The three dots “…” are called an ellipsis and can be configured. without count would avoid writing the total number of entries to the end of the list. with count is the default if truncate is specified.

Although the overflow clause is very usfull, the ultimate goal would be to give the developer more influence over it. Recently there was an interesting twitter discussion around that topic.

Other useful enhancements (19c) were LISTAGG distinct. example on LiveSQL

7. Using sys_context [C,P]

A sys_context is something like a global variable in other languages. The normal context is for the session, but it is also possible to use application wide contexts.

Oracle provides several “preconfigured” contexts for us. The most common are ‘USERENV’ and ‘APEX$SESSION’ (for apex developers). Contexts are also used for security policies with VPD.

Here are the contexts that I frequently like to use

  • sys_context('userenv','client_identifier')
    value set by dbms_application_info.set_client_info
  • sys_context('userenv','current_edition_name')
    when using edition based redefinition (ebr), this shows which edition I’m in. Always good to double check!
  • sys_context('userenv','current_user')
    similar to pseudocolumn user. The schema name that we connected with.
    In some oracle versions (12.1) much faster than user, my recent test shows that this performance difference is now gone (19c)
  • sys_context('userenv','proxy_user')
    When doing a proxy connect then this is the name of the authenticated user, not the target schema
  • sys_context('userenv','os_user')
    useful when client_identifier is empty, for example the name of the Windows login account when using SQL Developer under Windows.
  • sys_context('apex$session','app_user')
    apex authenticated user
  • sys_context('apex$session','app_id')
    id of the current apex app
  • sys_context('trigger_ctl','audit_on')
    Is auditing enabled/disabled? Part of my custom adaptation/enhancement of Connor McDonalds Audit-Utility package
  • sys_context('userenv','sid')
    the session ID of the current database session

Side note: UPPER/lower case does not make a difference for contexts. Only for the values of the context.

Warning: when you start using your own custom contexts, be careful with the grant create any context privilege. It can be a security risk! Revoke it after it was used.

8. hint /*+ driving_site */ [P]

I feel like this is a classic for me. In one of my longest running projects we had a lot of databases connected by database links. One of the most important tuning activities was to understand how to do remote joins in a performant way. The driving_site hint was extremely important for that.

General rule: When doing remote queries, avoid mixing local and remote tables in one select statement. If only remote tables are in the statement always use the driving_site hint.

Nowadays the need for this hint diminishes, mostly because databases are less connected by DB links, but more by webservices (which does not really improve performance, but that is a different story)

9. join via using [C]

This only works reliably when the naming conventions of the data model fit to it.

example

We have a table PERSON and a table ADDRESS. The primary key in PERSON is PERS_ID. The relationship column (FK) in ADDRESS is also called PERS_ID. It wouldn’t work as well with ADR_PERS_ID for example.

Then we can do a quick join like this:

select * 
from person 
join address using (pers_id) 
where address.type='MAIN';

There are some slight differences compared to the normal way using ON. Mostly one then can not differentiate anymore from which table the pers_id originates. For example you can not refer to person.pers_id in the where clause anymore.

Currently I use it exclusively for ad hoc queries, not in plsql code.

10. interval literals [C]

If I want to add some time to a date or a timestamp, I always use interval literals (unless it is a full day or a full month).

example
Let’s say we want to check tickets that were entered during the last hour. Using the date datatype we could easily calculate an hour by dividing a full day / 24. This is how many developers calculate time. And it is perfectly ok to do so.

select * from tickets 
where createddt >= sysdate - 1/24 ;

The term sysdate-1/24 resembles one hour before “now”.

This is how I prefer to write it:

select * from tickets 
where createddt >= sysdate - interval '1' hour ;

I feel the code documents itself in a much better way.

Additionally requirements like this easily change.
Let’s change both expressions to 15 minutes.

sysdate - 15/24/60

sysdate - interval '15' minute

Which one is better to read and was quicker to change?

Warning: Sayan Malakshinov in the comments mentions that we should not use interval literals (ymintervals) to add full months. I agree with that. Calendar arithmetic is complex – there are specific functions like add_months for that. So do not use interval '1' month.

honorable mentions

Some more snippets or functions that didn’t quite make it into the list.

  • nvl2 [C]
  • the third parameter of to_char (nlsparams) [E]
  • query block hints [P]
  • hierarchical with clause [C,E]
  • lateral join [C]
  • alter sequence restart [C,E]
  • cross join [C]
  • lnnvl [C,E]
  • hex format “XXXX” [E]
  • sys.odcinumberlist [C]
  • analytic sum + correct window clause: rows between instead of range between [E]

Summary

I hope I mentioned something that you didn’t know about before and liked reading about.

Outlook

Here are some features/patterns that I’m currently not using myself, but where I feel that they might become quite important in the future.

  • SQL Macros
  • boolean in SQL (not implemented yet, but Oracle seems to be working on it – I expect some news about this very soon)
  • median and other statistical functions like cume_dist, percentile_xxx, percent_rank
  • match_recognize
  • with clause udf functions (still think they are overrated, but I might be wrong)
  • approx_count_distinct
  • analytical views

Audit column performance retested in 19c: “user” is fast now!

Motivation

In an older and quite successful blog post of mine I tried to find the “perfect” solution for audit columns. See: sequence-and-audit-columns-with-apex-5-and-12c.

One finding then was that sys_context('userenv','current_user') is considerably faster than the user pseudocolumn.

I recently noticed that this seem to have changed and decided to retest the behavior.

The old test was done against a 12.1.0.1 standard edition database.

The new test is done against a 19.5 enterprise edition (on ATP which features an exadata).

Test setup

Mass inserting 100000 records into empty tables. A very fast select was used to generate 100k rows of null values.

3 columns were filled with data:

  • ID by a sequence using a cache of 10000
  • created_by filled with the schema name => sys_context or user
  • created_on filled with sysdate

Test 1 (T1) was using sys_context as default value.

Test 2 (T2) was using user as default value.

Test 3 (T3) was using a trigger with sys_context.

Test 4 (T4) was using a trigger with user.

Each test was run once to warm-up the database (and the tablespace). This warm-up run didn’t count. Then each test was run 3 times. Only the execution speed of the inserts were measured.

Results

Test 1 – sys_context as default value

Elapsed:1.23 seconds
Elapsed:1.21 seconds
Elapsed:1.26 seconds

Average: 1.23 s

Test 2 – user as default value

Elapsed:1,32 seconds (This looks suspicious. Maybe the warm-up run didn’t warm up enough)
Elapsed:1,16 seconds
Elapsed:1,19 seconds

Average: 1.22 s

Test 3 – sys_context in trigger

Elapsed:17,33 seconds
Elapsed:17,08 seconds
Elapsed:17,05 seconds

Average: 17.15 s

Test 4 – user in trigger

Elapsed:17,03 seconds
Elapsed:16,71 seconds
Elapsed:16,97 seconds

Average: 16.90 s

Comparison

My test shows that user was even a tiny tiny bit faster than the sys_context. Which means the previous 12.1 recommendation not to use “user” anymore is now outdated. The difference is so small that you can choose whatever you want.

User is fast now!

The main message still is, if you can get rid of the trigger, then do it. The difference between a default value logic and the trigger is huge! But even in the trigger logic there seems to be a slight performance advantage for user now.

If you are still on 12.1 or 12.2 you should run your own tests and compare the results. If you are on 19c or higher I would now use user again.

Test scripts

Create statements

-- cleanup
drop table test_insert_perf_t1_sys_context purge;
drop table test_insert_perf_t2_user purge;
drop table test_insert_perf_t3_trigger purge;
drop sequence test_insert_perf_t3_trigger_seq;
drop table test_insert_perf_t4_trigger purge;
drop sequence test_insert_perf_t4_trigger_seq;

-- create objects
create table test_insert_perf_t1_sys_context
             (id number generated by default on null as identity (cache 10000) primary key
             ,created_by varchar2(128) default on null sys_context('userenv','current_user') not null 
             ,created_on date default on null sysdate not null);

create table test_insert_perf_t2_user
             (id number generated by default on null as identity (cache 10000) primary key
             ,created_by varchar2(128) default on null user not null 
             ,created_on date default on null sysdate not null);

create table test_insert_perf_t3_trigger
             (id number not null primary key
             ,created_by varchar2(128) not null 
             ,created_on date not null);

create sequence test_insert_perf_t3_trigger_seq cache 10000;          

create or replace trigger test_insert_perf_t3_trigger_bri
before insert on test_insert_perf_t3_trigger 
for each row
begin
  :new.id := test_insert_perf_t3_trigger_seq.nextval;
  :new.created_by := sys_context('userenv','current_user');
  :new.created_on := sysdate;
end;
/

create table test_insert_perf_t4_trigger
             (id number not null primary key
             ,created_by varchar2(128) not null 
             ,created_on date not null);

create sequence test_insert_perf_t4_trigger_seq cache 10000;          

create or replace trigger test_insert_perf_t4_trigger_bri
before insert on test_insert_perf_t4_trigger 
for each row
begin
  :new.id := test_insert_perf_t4_trigger_seq.nextval;
  :new.created_by := user;
  :new.created_on := sysdate;
end;
/

Run statements

-------------------------------------------------------------------
-- run tests (insert 100000 rows)
-------------------------------------------------------------------

-- Test 1 --------------------------------------
set serveroutput on
set time on;
declare 
    v_time number;
begin
    v_time := dbms_utility.get_time;

    -- insert 100000 records
    insert into test_insert_perf_t1_sys_context(id)
    (select n1.nr
     from (select cast(null as number) nr from dual connect by level <=100) n1
     cross join (select cast(null as number) nr from dual connect by level <=1000) n2 
    );

    sys.dbms_output.put_line('Elapsed:'||to_char((dbms_utility.get_time - v_time) / 100)||' seconds');
    rollback;

end;
/

-- Test 2 --------------------------------------
set serveroutput on
set time on;
declare 
    v_time number;
begin
    v_time := dbms_utility.get_time;

    -- insert 100000 records
    insert into test_insert_perf_t2_user(id)
    (select n1.nr
     from (select cast(null as number) nr from dual connect by level <=100) n1
     cross join (select cast(null as number) nr from dual connect by level <=1000) n2 
    );

    sys.dbms_output.put_line('Elapsed:'||to_char((dbms_utility.get_time - v_time) / 100)||' seconds');
    rollback;

end;
/


-- Test 3 --------------------------------------
set serveroutput on
set time on;
declare 
    v_time number;
begin
    v_time := dbms_utility.get_time;

    -- insert 100000 records
    insert into test_insert_perf_t3_trigger(id)
    (select n1.nr
     from (select cast(null as number) nr from dual connect by level <=100) n1
     cross join (select cast(null as number) nr from dual connect by level <=1000) n2 
    );

    sys.dbms_output.put_line('Elapsed:'||to_char((dbms_utility.get_time - v_time) / 100)||' seconds');
    rollback;

end;
/


-- Test 4 --------------------------------------
set serveroutput on
set time on;
declare 
    v_time number;
begin
    v_time := dbms_utility.get_time;

    -- insert 100000 records
    insert into test_insert_perf_t4_trigger(id)
    (select n1.nr
     from (select cast(null as number) nr from dual connect by level <=100) n1
     cross join (select cast(null as number) nr from dual connect by level <=1000) n2 
    );

    sys.dbms_output.put_line('Elapsed:'||to_char((dbms_utility.get_time - v_time) / 100)||' seconds');
    rollback;

end;
/

Small demo about sequences and identity columns

A recent twitter discussion lead me to (re)test the syntax a little more around auto-incrementing ID columns that work without triggers. I was especially curious about the LIMIT VALUE option, that I didn’t know before.

identity options

This option is documented (as opposed to RESTART which still seems not to made it into the docs)

Demo 1 – Attach a sequence to a table via the DEFAULT option

The comments explain what I’m trying to do/show.

-- Create some sequences to be used later
create sequence mySeq1;
Sequence MYSEQ1 created.

create sequence mySeq2;
Sequence MYSEQ2 created.

-- Table with PK not tied to a sequence
create table t1 (id  number not null primary key
               , txt varchar2(30)); 
Table T1 created.

-- fill PK using a Sequence
insert into t1 values (mySeq1.nextval, 'Test1');
1 row inserted.

-- set the sequence to a new start value 55
alter sequence mySeq1 restart start with 55;
Sequence MYSEQ1 altered.

-- insert more data
insert into t1 values (mySeq1.nextval, 'Test2');
1 row inserted.

-- check data
select * from t1;
ID TXT
1 Test1
55 Test2

Unfortunately we can not easily change our PK into an identity column.

alter table t1 modify id generated as identity;

ORA-30673: column to be modified is not an identity column

alter table t1 modify id generated by default on null as identity;

ORA-30673: column to be modified is not an identity column

Other syntax variants including using LIMIT VALUE also do not work.

However it is easily possible to use a default value for the column and populate it by an existing (user managed) sequence.

-- Use a default value column and a different sequence
alter table t1 modify id default on null mySeq2.nextval;
Table T1 altered.

-- Trying to insert a values give a UK error, 
-- because we used a fresh sequence which gave us value 1
-- which exists already.
insert into t1 (txt) values ('Test3');
ORA-00001: unique constraint (MYUSER.SYS_C00145531) violated

-- set the sequence to a different value using RESTART
alter sequence mySeq2 restart start with 70;
Sequence MYSEQ2 altered.

insert into t1 (txt) values ('Test3');
1 row inserted.

insert into t1 values (null, 'Test4');
1 row inserted.

insert into t1 values (default, 'Test5');
1 row inserted.

-- Check data
select * from t1;
ID TXT
1 Test1
55 Test2
70 Test3
71 Test4
72 Test5

So Demo 1 shows some commands around normal ID columns that are populated via a default value setting.

Demo 2 – Use identity column from the start

create table t2 (id  number generated always as identity primary key
               , txt varchar2(30)); 
Table T2 created.

-- Try to insert some rows
insert into t2 values (null, 'Test1');
ORA-32795: cannot insert into a generated always identity column

insert into t2 values (default, 'Test2');
1 row inserted.

insert into t2 (txt) values ('Test3');
1 row inserted.

-- Try to insert a record via plsql
declare 
  r t2%rowtype;
begin 
  r.txt := 'Test4';
  insert into t2 values r;
end;
/

ORA-32795: cannot insert into a generated always identity column

-- hide the ID column then try row insert by record again
alter table t2 modify id invisible;
Table T2 altered.

declare 
  r t2%rowtype;
begin 
  r.txt := 'Test5';
  insert into t2 values r;
end;
/
PL/SQL procedure successfully completed.

-- Check data
select * from t2;
TXT
Test2
Test3
Test5

The ID is missing? Well it is there.

select id, txt from t2;
ID TXT
1 Test2
2 Test3
3 Test5

Hiding the ID column is one possible way to make certain types of inserts work again. Not my preferred way thou.

-- make column visible again
alter table t2 modify id visible;
Table T2 altered.

-- logical column order now is changed. This can impact DML that depend on column order!
select * from t2;
TXT ID
Test2 1
Test3 2
Test5 3
-- restore original column order by toggling other columns invisible
alter table t2 modify txt invisible;
Table T2 altered.

alter table t2 modify txt visible;
Table T2 altered.

-- modify to allow other data values (instead of generated ALWAYS)
alter table t2 modify id generated by default on null as identity ;
Table T2 altered.

-- Try inserts again that didn't work previously
insert into t2 values (null, 'Test1');
1 row inserted.

declare 
  r t2%rowtype;
begin 
  r.txt := 'Test4';
  insert into t2 values r;
end;
/
PL/SQL procedure successfully completed.

-- Check data
select * from t2;
ID TXT
1 Test2
2 Test3
3 Test5
4 Test1
5 Test4
-- add custom IDs
insert into t2 values (-1, 'Test6');
1 row inserted.

insert into t2 values (100, 'Test7');
1 row inserted.

-- Check data
select * from t2;
ID TXT
1 Test2
2 Test3
3 Test5
4 Test1
5 Test4
-1 Test6
100 Test7

So now we can insert other ID values into that column, even thou, we have an identity column. Eventually this will lead to a UK error when the value created by the Oracle managed sequence reaches 100.

This can be solved with the LIMIT VALUE clause

-- Check current high water mark (last value) of sequence identity column
select identity_column, data_default from user_tab_columns where table_name = 'T2' and column_name = 'ID';

IDENTITY_COLUMN	DATA_DEFAULT
-------
YES	"MYUSER"."ISEQ$$_258212".nextval

-- using the sequence name from the previous select
select last_number, cache_size 
from user_sequences where sequence_name = 'ISEQ$$_258212';

LAST_NUMBER	CACHE_SIZE
-------
24	20

-- reset Identity column to the highest value that is currently in the table 
alter table t2 modify ID  generated by default on null as identity start with limit value;
Table T2 altered. 

-- Check high water mark again 
select last_number, cache_size 
from user_sequences where sequence_name = 'ISEQ$$_258212';

LAST_NUMBER	CACHE_SIZE
-------
101	20

-- insert 
insert into t2 values (default, 'Test8');
1 row inserted.

-- Check data
select * from t2;
ID TXT
1 Test2
2 Test3
3 Test5
4 Test1
5 Test4
-1 Test6
100 Test7
101 Test8

Success!

So Demo 2 showed commands how to work with an identity column. Including the possibility to switch it from generated always as identity to generated by default on null. Which opens up some ways to manipulate the data inside this managed ID column.

Remember we can not influence (alter) the Oracle managed sequence that drives the identity column directly. But we can change its properties via ALTER TABLE MODIFY .

cleanup code

-- cleanup
drop sequence mySeq1;
drop sequence mySeq2;
drop table t1 purge;
drop table t2 purge;

How to avoid misleading plsql warning PLW-07206: analysis suggests that the assignment to ‘V_DUMMY’ may be unnecessary

PLW-messages are warnings issued by the PL/SQL compiler.

The PLW-07206 warning tells us that we have some code (an assignment statement) that is not needed. Usually because the target of the assignment is not used later anywhere.

PLW-07206: analysis suggests that the assignment to <something> may be unnecessary

Problem demonstration

Example 1)

Here is a very reduced demonstration of the issue.

First we enable the PL/SQL warnings. I assume this setting is still in place for all following examples.

-- set plsql warning on
alter session set PLSQL_WARNINGS = 'ENABLE:ALL';

In SQL Developer we can set the warning level in the preferences, but this setting is not applied to code running in a SQL worksheet as a script.

Then a function is created.

create or replace function dummy_existsCheck return boolean 
authid definer
is
      v_dummy number(1);
begin
      select 1
      into v_dummy
      from all_objects
      where 1=2;
      return true;  
      -- Do something here if a record was found
exception  
     when no_data_found then
      -- Do something else here if NO record was found
      return false;
end dummy_existsCheck;
/  

Compiling the function shows the warning.

Function DUMMY_EXISTSCHECK compiled
LINE/COL ERROR

6/7 PLW-07206: analysis suggests that the assignment to 'V_DUMMY' may be unnecessary
Example 2)

Here is an example where the warning is correctly added.

create or replace function dummy_existsCheck return boolean 
authid definer
is
   v_dummy varchar2(1);
begin
   v_dummy := 'B';
     
   -- Do something here without using v_dummy
   return null; 
end dummy_existsCheck;
/ 
Function DUMMY_EXISTSCHECK compiled
LINE/COL ERROR

6/5 PLW-07206: analysis suggests that the assignment to 'V_DUMMY' may be unnecessary

The difference between the two examples is that a SELECT statement in PL/SQL needs an INTO clause. And because of that this dummy variable is needed. So the warning in the first example is wrong.

The compiler is aware of this situation already. For some strange reason the warning goes away when we change the datatype of the variable from number to varchar2. Which is also the suggested workaround.

Workaround for example 1)
create or replace function dummy_existsCheck return boolean 
authid definer
is
      v_dummy varchar(1);
begin
      select 'x'
      into v_dummy
      from all_objects
      where 1=2;
      return true;  
      -- Do something here if a record was found
exception  
     when no_data_found then
      -- Do something else here if NO record was found
      return false;
end dummy_existsCheck;
/  

Function DUMMY_EXISTSCHECK compiled

No warning issued.

Obviously this is a bug. Although a very minor one.

Example 3)

Here is another slightly more complex example. The same message appears when we assign the result of a function. Because a function always has a return value, we need to assign it to some variable, even if the variable later in the code is not used anymore.

create or replace function dummy_getID(p_table in varchar2) return number
authid definer
is
      v_result number(1);
begin
      select 1
      into v_result
      from user_tables
      where table_name = p_table;
      
      return v_result;
end dummy_getID;
/  

create or replace function dummy_existsCheck return boolean
authid definer
is
  v_tab varchar2(30) := 'DUMMY';
  v_dummy number(1);
begin
  v_dummy := dummy_getID(v_tab);
  return true;
exception
  when no_data_found then
    dbms_output.put_line('Table '||v_tab||' does not exist.');
    return false;
end dummy_existsCheck;
/ 

Function DUMMY_GETID compiled
Function DUMMY_EXISTSCHECK compiled
LINE/COL ERROR

7/3 PLW-07206: analysis suggests that the assignment to 'V_DUMMY' may be unnecessary
Workaround for example 3)

Again, changing the variable to a varchar2 data type, avoids the warning.

create or replace function dummy_existsCheck return boolean
authid definer
is
  v_tab varchar2(30) := 'DUMMY';
  v_dummy varchar2(1);
begin
  v_dummy := dummy_getID(v_tab);
  return true;
exception
  when no_data_found then
    dbms_output.put_line('Table '||v_tab||' does not exist.');
    return false;
end dummy_existsCheck;
/ 
Function DUMMY_EXISTSCHECK compiled

Datatype check

Surprisingly other numeric data types also avoid the warning. It seems that it happens mostly for number(x).

The following data types did not produce the warning:

  • varchar2(1)
  • varchar(1)
  • char(1)
  • pls_integer
  • binary_integer
  • number
  • float

The following data types did produce the warning:

  • number(1)
  • number(38)
  • decimal
  • integer
  • integer(1)
  • simple_integer

Conclusion

Compiler warnings are a good thing. Wrong or misleading warnings are not.

PLW-07206 can (sometimes) be avoided by changing the data type of the assignment target from number(x) to some other data type.

Cleanup demo code
drop function dummy_existsCheck;
drop function dummy_getID;

Warning! Hidden traps – comments in SQL*Plus

I am a big fan of comments (see also CDD Manifesto a bit further down). Comments are simple ways to add information to objects/code without changing code.

Comments that work in SQL or PLSQL might behave differently in SQL*Plus. This post is to create awareness about such traps.

The basics

comments in SQL and PLSQL

Comments in SQL and PLSQL come in two flavors. The one line comment and the multi line comment.

-- this is a one line comment. 

The double hyphen “–” starts the one line comment. It ends at the end of the line.

/* this
is a
mutli-line
comment */

A slash “/” followed by an asterisks “*” starts a multi line comment. And it is ended in the opposite way.

Comments do not stack.

-- this /* will
not work */

This also will not work.

/* line 1
   /* line 2
      /* line 3
      line 4 */
   line 5 */
line 6 */

The multi-line comment ends after the first time “*/” is encountered. So line 5 and 6 are not commented, instead line 5 will raise a syntax error.

In SQL a very special form of a comment is a hint. A hint tries to give the SQL optimizer additional instructions. A + indicates that this comment is a hint.

select /*+warp_speed(5) */ from big_table;

A hint could be compared to an annotation in other languages. The comment syntax is used to add some special meaning to the surrounding code. This post is not about such annotations.

comments in SQL*Plus

In general we run SQL statements, PLSQL blocks and SQL*plus commands in a SQL*Plus script. Typical examples are SELECT statements, CREATE OR REPLACE PACKAGE commands and SET commands..

The normal SQL and PLSQL comments do function in SQL*plus as well. Additionally we have a REM(ARK) command to add comments.

SQL> REM this is a SQL*plus comment

We can also use PROMPT to have a kind of echo inside a SQL*Plus script. Although it is not a real comment, it is used in a similar way.

SQL> PROMPT *** heading ***
*** heading ***

Surprising differences

SQL*Plus interprets comments slightly differently than SQL or PL/SQL. Here are some examples. Try to guess what happens before reading the result. The examples do work in SQL Developer, but behave differently in SQL*Plus.

Example 1

generate some DDL commands…

select 'drop '||object_type||';' 
from user_objects 
--where object_type = 'VIEW';
where object_type = 'TRIGGER';

Result 1 (real object names changed)
... many similar rows ...
'DROP'||OBJECT_TYPE||''||OBJECT_NAME||';'
drop SYNONYM PLSQL_PROFILER_DATA;
drop SYNONYM PLSQL_PROFILER_RUNNUMBER;
drop SYNONYM PLSQL_PROFILER_RUNS;
drop SYNONYM PLSQL_PROFILER_UNITS;
drop INDEX ABC_UK01;
drop PACKAGE MYPK;
drop PACKAGE BODY MYPK;
drop TABLE MYTAB;
drop DATABASE LINK DISTANT.WORLD.LOCAL;
drop DATABASE LINK REMOTE.WORK.LOCAL;
2243 rows selected.
SQL> SP2-0734: unknown command beginning "where obje…" - rest of line ignored.
SQL>

2242 rows selected? Yes there were some triggers, but not that many. When SQL*plus finds a semicolon at the end of a line, it interprets this as the end of the command. Even if that semicolon is commented out. In this example the statement produced a drop command for each object in the schema. But the intention was to only drop all triggers.

Example 2

Add plsql doc information…

create or replace function convertNum(p1 in varchar2) return number
is
/***************************
@usage: converts a string into a number. 
        If conversion is not possible, 0 is returned.
@author: Sven
@param : p1 : input string
@return : converted number
****************************/
begin
  return to_number(p1 default '0' on conversion error);
end convertNum;
/

Result 2
SQL>
create or replace function convertNum(p1 in varchar2) return number
is
/***************************
@usage: converts a string into a number.
SP2-0310: unable to open file "usage:.sql"
        If conversion is not possible, 0 is returned.
@author: Sven
SP2-0310: unable to open file "author:.sql"
@param : p1 : input string
SP2-0310: unable to open file "param.sql"
@return : converted number
SP2-0310: unable to open file "return.sql"
****************************/
begin
  return to_number(p1 default '0' on conversion error);
end convertNum;
  9  /

Function created.

SQL>

An @ at the start of a line is always interpreted as a call to a script even if it is inside a multi line comment. This lowers the usefulness of the “generate DB doc” functionality in SQL Developer a lot. Putting additional blanks before the @ will not change this behaviour.

Example 3

comment out optional params…

SQL> set serveroutput on -- size unlimited
Result 3
SP2-0158: unknown SET option "--"

This means the SET option is ignored. There is an error message, but it is a SQL*Plus (SP2) error message. This would not stop a script that has when WHENEVER SQLERROR EXIT set at the beginning.

Example 4

comment a statement…

select sysdate from dual; -- print current day

Result 4
SQL> select sysdate from dual; -- print current day
2
Do you notice the 2 at the beginning? The statement is not terminated. The buffer waits for the next line and the next command probably finishes the buffer and runs into a syntax error.

Warning

Personally I think that those examples are bugs. But it is documented behaviour, so Oracle probably disagrees. There is an extra section in the SQL*plus documentation, dedicated to comments in scripts where these and more problems are described.

Remember

If you intend to run some plsql code or a SQL statement from a SQL plus script, then

  • do not have semicolons in a comment, especially not at the end of the line
  • do not use @ in the comment, especially not at the beginning of a line
  • do not use — inside SELECTs (hints seem to work)
  • avoid to use — or /* */ to write sql*plus comments, instead use REMARK or PROMPT

More side effects

Comments near the beginning of a command can confuse sql*plus.

SQL> create /* testproc */ procedure test
2 is
3 begin null;
Warning: Procedure created with compilation errors.
SQL> end;
SP2-0042: unknown command "end" - rest of line ignored.
SQL> /

Blank lines in comments can break the code. This can be suppressed by using SET SQLBLANKLINES ON.

SQL> SET SQLBLANKLINES ON;
SQL> select sysdate from dual
2 /* this is a multi
3 line
4
5 comment */
6 ;
SYSDATE
25-AUG-20

SQL> SET SQLBLANKLINES OFF;
SQL> select sysdate from dual
2 /* this is a multi
3 line
4
SQL> comment */
2 ;
comment */
*
ERROR at line 1:
ORA-00969: missing ON keyword
SQL>

An ampersand “&” in comments is tried to be substituted by a value of a variable. This can be suppressed using SET DEFINE OFF. Not a big deal unless we do not want to suppress it.

SQL> select sysdate from dual
2 -- current day at marks&spencers
3 ;
Enter value for spencers:
old 2: -- current day at marks&spencers
new 2: -- current day at marks
SYSDATE
25-AUG-20

A comment inside a PROMPT is not a comment, but will be printed to the screen.

SQL> prompt /* comment or not comment that is the question */
/* comment or not comment that is the question */

If you want a “;” to be printed to a spool file using PROMPT, use two “;;”.

SQL> prompt test;;
test;

Additional thoughts

Good developers can get a solid overview over unknown code, just by reading the comments.

Set your syntax highlighting in such a way that makes it easy to read comments. Unfortunately Oracle SQL Developer some time ago switched from green comments to grey comments. My recommendation is to change this setting.

Comment Driven Development (CDD) Manifesto

Comments are good!

Every code needs comments!

Comment first, code later!

Comments state code intention.

Comments describe why a technical decision was made

or not made.

Comments can give examples.

The world needs more comments!