Tackling “cursor: pin S wait on X” wait event issue

Problem description

I found the following interesting issue by accident during an performance analysis using ASH and real time SQL monitoring.

We (@gassenmj and myself) saw that all connections from an application server connection pool were suddenly “hanging” at the same SQL statement. The single session did have wait times ranging from a few seconds up to 30 seconds and more. Additionally some CPU throttleling was going on.

After a couple of minutes the system healed itself and went back to normal behaviour. Those were the symptoms. Although that was not directly related to the performance analysis we had planned, we decided to investigate further.

 

Problem indicators

Here follows a list of statements that helped to analyze the problem.

check session infos

select * from v$session;

select * from v$session_wait 
--where sid=sys_context('userenv','sid')
order by wait_time+seconds_in_wait desc;

We saw a high number of “cursor: pin S wait on X” events with a high wait time. At that time I didn’t know what the issue was, but it was obvious that there is some kind of problem. If you encounter a high number of sessions all with the same wait event, then you might want to investigate further. Especially if the system seems generally slow while this event happens.

check the active session history (ASH) report

Please note that accessing the ASH report requires an enterprise edition and a licensed diagnostics pack. Do not try to run any select against v$active_session_history in a standard edition!

If you are allowed to use ASH, then run the following selects for a quick check. ASH typically holds information only for the last 30 minutes. Then the data is consolidated into the AWR report. If possible I prefere ASH because it samples information every second.

-- ash info
select ash.sample_time, ash.session_id, ash.session_serial#, ash.sql_id,  
       ash.event, -- ash.p1, ash.p1text, ash.p2, ash.p2text, ash.p3, ash.p3text,
       ash.wait_time, ash.time_waited, 
       ash.blocking_session, ash.blocking_session_serial#,
       ash.in_parse --, ash.in_hard_parse, ash.in_sql_execution
from v$active_session_history ash
where ash.event = 'cursor: pin S wait on X'
order by ash.sql_id, ash.sample_time;

Eventually the data is moved into the dba_hist_active_sess_history view.

-- awr info
select sql_id, sample_time, session_id, wait_time, time_waited, module, machine
from dba_hist_active_sess_history
where event = 'cursor: pin S wait on X'
order by sql_id, sample_time;

We can consolidate the infomation over all sessions that seem to be connected to the issue. This summary statment also includes the session that seems to cause the issue.

-- ash summary
with sum_by_session as (
        select ash.sql_id, ash.session_id, 
               trunc(ash.sample_time,'HH') timeframe, min(ash.sample_time) min_time, max(ash.sample_time) max_time,
               sum(ash.wait_time) + sum(ash.time_waited)  total_wait,
               ash.event, ash.p1, ash.p1text, ash.p2, ash.p2text, ash.p3, ash.p3text, ash.in_hard_parse
        from v$active_session_history ash
        join v$active_session_history sqlids on sqlids.sql_id = ash.sql_id
        where (ash.event like 'cursor: pin S%' or ash.in_hard_parse = 'Y' )
        and sqlids.event = 'cursor: pin S wait on X'
        group by  ash.sql_id,  ash.session_id, ash.event, ash.p1, ash.p1text, ash.p2, ash.p2text, ash.p3, ash.p3text, ash.in_hard_parse
                 ,trunc(ash.sample_time,'HH')
         )        
select s.sql_id, to_char(s.timeframe,'dd-Mon-RR HH24') timeframe,
       to_char(min(s.min_time),'HH24:MI:SS')||'-'||to_char(max(s.max_time),'HH24:MI:SS') timeperiod, 
       round(sum(total_wait)/1000000,2)  total_wait_in_s,
       s.event, s.p1, s.p1text, s.p2, s.p2text, s.p3, s.p3text, s.in_hard_parse, 
       listagg(s.session_id,',') within group (order by s.session_id) as sids
from sum_by_session s              
group by s.sql_id,  s.event, s.p1, s.p1text, s.p2, s.p2text, s.p3, s.p3text, s.in_hard_parse, s.timeframe
order by s.sql_id, s.in_hard_parse desc, s.timeframe;

Here is some sample output from a test database. It shows a list of sessions (column SQLIDs) that are waiting for cursor: pin S wait on X.

SQL_ID	   TIMEFRAME	    TIMEPERIOD	   TOTAL_WAIT_IN_S 	EVENT			P1		P1TEXT	P2		P2TEXT	P3		P3TEXT	IN_HARD_PARSE	SIDS
a7s0bjm8467xg	22-mai-18 04	04:21:10-04:21:10	0,46	cursor: pin S wait on X	3494059951	idn	674309865472	value	21474836480	where	N	20,255,301
cq42gcnqfs5vd	22-mai-18 04	04:21:09-04:21:09	0,04				1		file#	31620		block#	1		blocks	Y	247
cq42gcnqfs5vd	22-mai-18 04	04:21:10-04:21:10	0,03				1		file#	31323		block#	1		blocks	Y	247
cq42gcnqfs5vd	22-mai-18 04	04:21:09-04:21:10	610,9	cursor: pin S wait on X	753669997	idn	1060856922112	value	21474836480	where	N	18,51,70,73,127,169,191,229,326,379,390,426,436,456
3320xrzqy5t5c	22-mai-18 12	12:21:13-12:21:13	0				1		file#	22750		block#	1		blocks	Y	269
3320xrzqy5t5c	22-mai-18 12	12:21:13-12:21:13	0,11	cursor: pin S wait on X	3991069868	idn	1155346202624	value	21474836480	where	N	314
76fxznczqskp8	22-mai-18 00	00:21:07-00:21:07	0				1		file#	589		block#	1		blocks	Y	15
76fxznczqskp8	22-mai-18 00	00:21:08-00:21:08	0,05				1		file#	31323		block#	1		blocks	Y	15
76fxznczqskp8	22-mai-18 04	04:21:08-04:21:08	0,04				1		file#	507		block#	1		blocks	Y	73
76fxznczqskp8	22-mai-18 00	00:21:07-00:21:08	404,66	cursor: pin S wait on X	1064061608	idn	64424509440	value	21474836480	where	N	8,67,90,151,167,193,221,253,314,351,364,367,419,456
76fxznczqskp8	22-mai-18 04	04:21:08-04:21:08	419,54	cursor: pin S wait on X	1064061608	idn	313532612608	value	21474836480	where	N	18,51,70,127,169,191,247,326,379,390,426,436,456

In this example we had many sessions with the “cursor: pin S wait on X” wait event for the cursor “cq42gcnqfs5vd”. Those sessions combined waited for more than 10 minutes! And we also see two lines for the session that created this cursor (hard_parse=Y). Two lines because the hard parse took considerable time and ASH samples every second. The P2 info changed during that (last two bytes changed). I assume this means that the hard parse was progressing.

In general the summary statement shows if this wait event happend recently and also that many sessions where waiting for this event.

Finding the problem statement

We can use v$sql or v$sqlarea.

select * --sql_id, sql_text, sql_fulltext 
from v$sql 
where sql_id = 'cq42gcnqfs5vd';

or

select sql_id, module,
       sql_fulltext,
       cpu_time, elapsed_time, concurrency_wait_time, 
       version_count, parse_calls, invalidations
from v$sqlarea
where sql_id = 'cq42gcnqfs5vd';

The column SQL_FULLTEXT will hold the statement. But this information might age out of the SGA. So the sooner after the issue you run this the better. The good news is that the hashed sql_id is stable. Meaning if the same statement is run later again, you will see the same sql_id. And since we are investigating an issue that happened because the exactly same statement was run several times at the same moment (many sessions where waiting), there is a good chance that we get lucky and see this command in v$sql.

historical comparison

Very useful are the event_histograms. It can give an indication how severe the problem is. If we see a recent LAST_UPDATE_TIME for one of the high numbers (>= 4096) then some action might be needed. Also the v$event_histogram view helps to monitor if the issue disappears or is lowered once it is resolved.

select * from v$event_histogram 
where event = 'cursor: pin S wait on X';
EVENT#	EVENT			WAIT_TIME_MILLI	WAIT_COUNT	LAST_UPDATE_TIME
282	cursor: pin S wait on X	1		393	22.05.18 07:09:40,837552 +02:00
282	cursor: pin S wait on X	2		72	06.05.18 20:21:14,069132 +02:00
282	cursor: pin S wait on X	4		302	19.05.18 20:52:37,773557 +02:00
282	cursor: pin S wait on X	8		2043	22.05.18 07:09:41,041724 +02:00
282	cursor: pin S wait on X	16		6586	22.05.18 01:07:19,804808 +02:00
282	cursor: pin S wait on X	32		14719	22.05.18 07:09:41,054201 +02:00
282	cursor: pin S wait on X	64		16058	22.05.18 12:21:14,725227 +02:00
282	cursor: pin S wait on X	128		7514	22.05.18 12:21:13,702598 +02:00
282	cursor: pin S wait on X	256		10496	22.05.18 07:09:40,366636 +02:00
282	cursor: pin S wait on X	512		11360	22.05.18 07:09:40,364821 +02:00
282	cursor: pin S wait on X	1024		2123	22.05.18 07:00:05,691792 +02:00
282	cursor: pin S wait on X	2048		2240	22.05.18 04:21:11,172316 +02:00
282	cursor: pin S wait on X	4096		284	03.05.18 12:22:09,853604 +02:00
282	cursor: pin S wait on X	8192		169	30.04.18 16:22:30,975458 +02:00
282	cursor: pin S wait on X	16384		180	30.04.18 16:22:30,985937 +02:00
282	cursor: pin S wait on X	32768		31	19.09.17 18:50:00,401702 +02:00
282	cursor: pin S wait on X	65536		2	04.05.16 22:09:14,572347 +02:00

I run this on May 22nd. We see mutex sleeps for up to 2 seconds (column wait_time_milli = 2048) at 4 a.m. And a high number of mutex sleeps for 1 second at around 7 a.m. Nothing too critical but so that it might be worth checking what is going on. More about mutex sleeps in the section that explains the problem.

We also see an entry for 16 seconds from 30th of April. This was the time before we fixed the problem. The event history shows that since that time, we never had such long sleeps anymore. Which proves that our fix is working.

What the event histogram does not show, is that the waits from May 22nd and from April 30th are from a different sql_ids. So it give us only a very crude monitoring.

Event detail and mutex analysis

select * 
from GV$MUTEX_SLEEP_HISTORY
where blocking_session = 247;

This should show several sesssions during the problematic time frame. The location is the most interesting column. It will probably be this location

kkslce [KKSCHLPIN2]

We can use this location to check out the general mutex sleep history.

select MUTEX_TYPE,LOCATION,REQUESTING_SESSION,BLOCKING_SESSION,SLEEP_TIMESTAMP
from GV$MUTEX_SLEEP_HISTORY
where location='kkslce [KKSCHLPIN2]'

And to get an impression about the severity of the issue, lets compare the event against other mutex sleeps.

select * from v$mutex_sleep
--where location='kkslce [KKSCHLPIN2]'
order by sleep time desc;

Example result

MUTEX_TYPE	LOCATION	SLEEPS	WAIT_TIME
Cursor Pin	kkslce [KKSCHLPIN2]	1555183	0
Library Cache	kglpndl1  95	116623	0
Library Cache	kglpin1   4	107578	0
Library Cache	kgllkdl1  85	105747	0
Library Cache	kglhdgn2 106	83336	0
Library Cache	kglllal3 111	45584	0
Library Cache	kgllkc1   57	41674	0
Library Cache	kglpnal1  90	37040	0
Library Cache	kglget2   2	12203	0
Library Cache	kgllldl2 112	11882	0
Cursor Pin	kksfbc [KKSCHLFSP2]	11303	0
...

A high number of sleeps here shows that we have some issue.

But enough about such statements. I will explain a little why they are relevant for the problem and also what kind of problem there is.

Some technical background information

There are several recommended publications that helped me to understand and solve the issue. See links to Tanel Poder and Jonathan Lewis articles in the link section at the end of this post.

About this wait event: doc about “cursor: pin S wait on X”

The doc explains a little bit what happens. This wait event occurs when a session requires a shared mutex pin, while another session holds an exclusive mutex pin. Mutex pins are essentially locks on memory objects.

It is a lock to create the execution plan for a statement. The mutex organizes access to a shared cursor. During a hard parse the cursor (=execution plan) is created. The hard parse sets the exclusive lock. Other sessions are checking if there is already a cursor for the SQL statement (based upon the hashed sql_id). This is done by a shared pin call (cursor pin S). However the first session (247 in my case) did not free up this ressource yet, because it is still hard parsing. That is why we see a cursor: pin S wait on X for all the other sessions.

Problem cause

The issue was caused by a select that was extremly expensive to (hard) parse but which was very fast to execute.

The hard parse needed several seconds (usually 9 but up to 24 seconds). Any consecutive statement call was a soft parse using bind variables and executed in way less than 1 second.

This statement was send by some java application at the exact same time to all sessions in the connection pool. So the exact same statement, but with different bind parameters needed to run.

Each database session did this:

  • First it builds a hash (SQL_ID) for the sql statement.
  • Then it searches in the SGA if an entry for this SQL_ID already exists in a memory lookup table.
  • Since another session, at the same time, already tries to create an execution plan (hard parse). Therefore the later session waits for the outcome of this. This is a mutex wait.
  • The mutex then goes to sleep for a short time.
  • It wakes up and since the hard parse is still running, it goes to sleep again. This time a little longer.

The problem with mutex sleeps is twofold.

  • the sleep times tend to increase exponentially. We can see this in the mutex_sleep_history and event_histogram views. It starts with 1 microsecond,
    but very soon reaches an order of several seconds.
  • When a mutex tried to access a latch and sleeps on it – the latch is not free for other sessions to grab it. This can throttle the CPU.

So the mutex sleeps get longer and longer while the hard parse is still running.
Eventually the next mutex sleep will be longer than what the original hard parse time was. So even if the blocking session finished, the waiting session still doesn’t execute because the mutex sleeps now for the next 8 or 16 seconds. This can cascade to other sessions. I have seen total wait times of more than 30 seconds because of this cascading effect.

The process is partially documented (shared pool check), although not in enough detail to cover this mutex problem scenario.

For more information about how mutex sleeps are coupled with CPU starvation, check out the first article in the links section by Tanel Poder.
 

Solutions

Two ways to address the problem immediatly spring to mind:

a) Avoid calling the same statement so many times. Maybe do a single call where all the relevant binds are added as an in-clause. Or as a subquery.

b) Improve the performance of the hard parse.

Best idea is probably to do both. Decrease the number of calls and improve hard parse times.

But often we can influence only the database side, not the caller side. Here are some ways to make the hard parse faster. There are various ways like SQL profiles, pinning the cursor, etc. available. But if you are a developer those are usually not your first choises.

Note that using SQL Baselines will probably not help (see https://jonathanlewis.wordpress.com/2014/11/23/baselines/).

As a developer think about the SQL statement first and if there is a way to help the optimizer.

Ask yourself and try out

Are all the subqueries/views really needed to get the result? Often when adding views, there are joins to tables included that are not needed. While the optimizer can use a feature called table elimination, it still needs time to consider this. Rewrite the SQL and instead of joining the view, join only the needed the base tables.

If there are lookup tables that are joined to the main queries FROM clause consider to move them to the SELECT part. This will not work, if the lookup is part of the filtering. However many lookup tables have a multi-language component. Which in turn often means, they are considered child tables for the optimizer. So that the driving table can not be correctly identified anymore. Changing that will change how the optimizer thinks about the other tables (also 1 less table for the permutations count). Sometimes moveing the lookup logic to plsql is also an option. Those values will stay while the whole process is running, so fetch them once, then use the IDs for filtering.

If larger repeating subqueries exist, consider subquery factoring (WITH clause).

Check if you have unnessecary DISTINCT operations. Sometimes this prevents optimizer possibilities like combining UNION ALL queries (join factorization). Typically this will not help to reduce hard parse times, but I’ve seen one example with lots of joined tables in the UNION ALL query, where it happend.

Hints might be a final option to use. Especially the leading hint is a way to massivly decrease the number of permutations that the optimizer needs to consider. While hints should not be the first choice, here we have a scenario where the developer has better information than the CBO.  An approach that I call “minified hinted outlines” can be used. Essentially create a plan and grab all the leading hints from the outline. Then put those leading hints into the statement. This will freeze the table order but still allows the optimizer to choose between access paths like index access vs. full table scan and between many other options. I blogged about it before: SQL parse time issue – use “minified hinted outlines”. It is a solid compromise to reduce hard parse time, without totally freezing the execution plan.

This is the hint we ended up with.

/* Do not remove the LEADING hints. 
The next statement has an extremly long hard parse time.
The hint minimized the hard parse time. Execution of the statement is very fast.

Without the hint we see major blocking issues (CURSOR: PIN S WAIT ON X) for multiple concurrent sessions 
*/
 select  /*+
          LEADING(@"SEL$9E38E278" "OLIS"@"SEL$11" "ERRORS"@"SEL$11" "EXTRAINFO"@"SEL$11" "WFI"@"SEL$2" "ILI"@"SEL$19" "PL"@"SEL$27" "TUBELOC"@"SEL$29" "RACK"@"SEL$31" "ICH"@"SEL$17" "SOL"@"SEL$25" "BAT"@"SEL$21" "CPD"@"SEL$23")  
          LEADING(@"SEL$FC92CB92" "OSM"@"SEL$12" "O"@"SEL$12" "S"@"SEL$12" "LI"@"SEL$12")  
          LEADING(@"SEL$5B04CACE" "SOA"@"SEL$13" "CE"@"SEL$13")  
          LEADING(@"SEL$AB668856" "POS"@"SEL$8" "SOA"@"SEL$7" "CWFI"@"SEL$6" "LI"@"SEL$6")  
          LEADING(@"SEL$6A1B73E7" "SOA"@"SEL$4" "CWFI"@"SEL$3" "LI"@"SEL$3")  
        */                       
         r.barcode
        ,r.orderid
...

 

Test and reproduce the issue

I tried to build a test script to reproduce such long hard parses. Essentially one needs to create a lot of joins with different filter conditions on all of the tables. Maybe add a couple of union all + distinct makes sure. And we could cheat slightly. By increasing the value of the hidden parameter _optimizer_max_permutations we increase the hard parse time if we join enough tables.

So far I didn’t have the time to completly rebuild the scenario. Mostly because I have multiple statements for a real customer, which I can use to test.

 

Side notes

This is a very rare edge case and the first time I see an effect where using evenly distributed bind variables is slower than not using them. But it must be mentioned, that both cases are far from the optimal execution time.

Forcing the hard parse by not using bind variables would have resulted in an average parsing time of 7-9 seconds. Which is incredibly slow. However it would not have caused any mutex sleeps and therefore no CPU throtteling.
 

Additional reads

Advertisements

Fetch module name from line number in package with pl/scope

Here is a small statement I am using to find the name of a submodule based upon error stack data (utl_call_stack.error_line(x) or dbms_utility.format_error_stack).

The function IDENTIFY_MODULE helps to diagnose errors. If an error happens in plsql the error stack returns only the name of a package (=unit) and the line number. Using this line number we can look up the pl/scope information and make a solid guess about the module name. For various reasons this is only a good guess, not a guarantee (see problem section below).

solution

The following code snippets will only work if the relevant packages were compiled with PLSCOPE_SETTINGS=’IDENTIFIERS:ALL’.

ALTER SESSION SET plscope_settings='identifiers:all';

SQL statement

fetch the nearest procedure or function

select * --type, name, signature, line, usage_id, usage_context_id
from all_identifiers
where object_name = :PACKAGE_NAME
and object_type= 'PACKAGE BODY'
and usage in ( 'DEFINITION','DECLARATION' )
and type in ('PROCEDURE','FUNCTION')
and line <= :LINE_NUMBER
order by line desc
fetch first 1 row only
;

This will find procedures and functions in our code that were declared just before the line of error.

There are many cases where we get some false positives.

But it is a good start.

plsql enrichment

Using a little bit of plsql we can make this logic more robust. And even get data about submodule hierachies.

Put this function in the instrumentation package of your choice (for example logger) and then use it to improve log information. How this is done in detail is out of scope for this post.

create or replace function identify_module (p_owner in varchar2, p_unit in varchar2, p_line in number) return varchar2
is
  /**************************************************************
  /* Name          :  identify_module
  **************************************************************
  * description    :  uses PLSCOPE, to get additional info about the module name of a source code line 
  *  This only works reliably for code compiled with optimizationlevel = 1
  *  higher optimization levels might move code lines. The line reported in error and backtrace stacks (run time) can differ from the lines stored in PL/scope or user_source (compile time).
  *
  * @author: Sven Weller, syntegris information solutions GmbH
  **************************************************************
  * parameter      
  * @param  : p_owner = schema name of unit
  * @param  : p_unit = package name
  * @param  : p_line = line of code, for which we would like to see the name of the modul
  * @return : concatenated submodule names  
  **************************************************************/
 cursor c_search_by_line (cv_owner in varchar2, cv_unit in varchar2, cv_line in number)
  is
    select /*+ first_rows(1) */ i.type, i.name, i.line, i.usage_id, i.usage_context_id, i.usage, i.signature
    from ALL_IDENTIFIERS i
    where i.owner = cv_owner
    and i.object_name = cv_unit
    and i.object_type = 'PACKAGE BODY'
    and i.line <= cv_line 
    -- context must be in same package body
    and i.usage_context_id in (select i2.usage_id from ALL_IDENTIFIERS i2 where i2.owner = cv_owner and i2.object_name = cv_unit and i2.object_type = 'PACKAGE BODY')
    order by line desc, usage_id asc
    ;
  
  cursor c_search_by_usage (cv_owner in varchar2, cv_unit in varchar2, cv_usage_id in number)
  is
    select /*+ first_rows(1) */ type, name, line, usage_id, usage_context_id, usage, signature
    from ALL_IDENTIFIERS
    where owner = cv_owner
    and object_name = cv_unit
    and OBJECT_TYPE = 'PACKAGE BODY'
    and usage_id = cv_usage_id 
    order by decode (usage, 'DEFINITION',1, 'DECLARATION', 2, 3), line desc,  usage_id asc
    ;

  r_result_byLine  c_search_by_line%rowtype;
  r_result_byUsage c_search_by_usage%rowtype;
  r_last_result    c_search_by_usage%rowtype;
  v_owner          all_identifiers.owner%type;
  v_modul_name     all_identifiers.name%type;
  v_first_type     all_identifiers.type%type;
  v_max_hierarchy  binary_integer := 5;
begin
  -- If owner is missing, use the current schema
  v_owner := coalesce(p_owner,sys_context('userenv','current_schema'));
  
  -- find the closest line and check its context.
  open c_search_by_line(v_owner, p_unit, p_line);
  fetch c_search_by_line into r_result_byLine;
  close c_search_by_line;

  if r_result_byLine.usage_context_id = 1 then
    -- we seem to be already in main package body.
    -- this can be either a problem during a parameter call 
    -- or the error happened in the initialisatzion part of the package
    case r_result_byLine.usage 
      when 'DEFINITION' then
        v_modul_name :=r_result_byLine.type||' '||p_unit||'.'||r_result_byLine.name;
      when 'DECLARATION' then
        v_modul_name :='declaration of '||r_result_byLine.type||' '||p_unit||'.'||r_result_byLine.name;
      else 
        v_modul_name :='body of '||p_unit;
    end case;
    
  else    
      r_result_byUsage := r_result_byLine;
      --r_result_byUsage.usage_context_id := r_result_byLine.usage_id;
      
      -- find module names
      <>
      loop 
        if r_result_byUsage.usage in ('DEFINITION', 'DECLARATION')  
           and r_result_byUsage.type in ('PROCEDURE','FUNCTION')
           and (r_last_result.signature != r_result_byUsage.signature or r_last_result.signature is null)
           then
             -- concat multiple submodule names
             v_modul_name := r_result_byUsage.name
                            ||case when v_modul_name is not null then '.'||v_modul_name end;
             v_first_type := coalesce(v_first_type, r_result_byUsage.type);
             -- remember result to compare if we get duplicate entries because of declaration->definition
             r_last_result := r_result_byUsage;
        end if;
        -- stop when package body level is reached
        exit when r_result_byUsage.usage_context_id in (0, 1) or v_max_hierarchy = 0;


        -- it seems to be a submodule, so do an additional call and fetch also the parent module
        open c_search_by_usage(p_owner, p_unit, r_result_byUsage.usage_context_id);
        fetch c_search_by_usage into r_result_byUsage;
        close c_search_by_usage;
    
        
        -- safety counter to prevent endless loops
        v_max_hierarchy := v_max_hierarchy - 1;
      end loop parent_modules;  
    
      -- add info about type (FUNCTION/PROCEDURE)
      if v_modul_name is not null then
        v_modul_name :=v_first_type||' '||p_unit||'.'||v_modul_name;
      --else   
      --  v_modul_name := '--no submodule found--';
      end if;
  end if;  
  return v_modul_name;
exception
  when no_data_found then
    return null;
end identify_module;
/

Example

Check the result for each line of some test package.
You can run this example yourself in LiveSQL .

The function had to be modified slightly to use USER_IDENTIFIERS instead of ALL_IDENTIFIERS to be able to run in LiveSQL.

select line, identify_module(user, name, line) , text 
from user_source
where name='TEST_PACKAGE_FUNC_PROC'
and type = 'PACKAGE BODY';

Result

LINE	IDENTIFY_MODULE(USER,NAME,LINE)	TEXT
1		"package body Test_Package_Func_Proc "
2		"as "
3	declaration of VARIABLE TEST_PACKAGE_FUNC_PROC.GLOBAL_VAR	" global_var number := 0;"
4	FUNCTION TEST_PACKAGE_FUNC_PROC.TEST_FUNC	" function test_func (in_val in number) return number  "
5	FUNCTION TEST_PACKAGE_FUNC_PROC.TEST_FUNC	" is "
6	FUNCTION TEST_PACKAGE_FUNC_PROC.TEST_FUNC	" begin "
7	FUNCTION TEST_PACKAGE_FUNC_PROC.TEST_FUNC	" return in_val; "
8	FUNCTION TEST_PACKAGE_FUNC_PROC.TEST_FUNC	" exception "
9	FUNCTION TEST_PACKAGE_FUNC_PROC.TEST_FUNC	"  when others then "
10	FUNCTION TEST_PACKAGE_FUNC_PROC.TEST_FUNC	"    RAISE; "
11	FUNCTION TEST_PACKAGE_FUNC_PROC.TEST_FUNC	" end; "
12	FUNCTION TEST_PACKAGE_FUNC_PROC.TEST_FUNC	""
13	PROCEDURE TEST_PACKAGE_FUNC_PROC.TEST_PROC	" procedure test_proc  (in_val in number) "
14	PROCEDURE TEST_PACKAGE_FUNC_PROC.TEST_PROC	" is "
15	PROCEDURE TEST_PACKAGE_FUNC_PROC.TEST_PROC.SUBMODULE	"   procedure submodule( in_val in number) is"
16	PROCEDURE TEST_PACKAGE_FUNC_PROC.TEST_PROC.SUBMODULE	"   begin"
17	PROCEDURE TEST_PACKAGE_FUNC_PROC.TEST_PROC.SUBMODULE	"      dbms_output.put_line (in_val); "
18	PROCEDURE TEST_PACKAGE_FUNC_PROC.TEST_PROC.SUBMODULE	"   end;  "
19	PROCEDURE TEST_PACKAGE_FUNC_PROC.TEST_PROC.SUBMODULE	" begin "
20	PROCEDURE TEST_PACKAGE_FUNC_PROC.TEST_PROC	"   submodule(in_val); "
21	PROCEDURE TEST_PACKAGE_FUNC_PROC.TEST_PROC	" exception "
22	PROCEDURE TEST_PACKAGE_FUNC_PROC.TEST_PROC	"  when others then "
23	PROCEDURE TEST_PACKAGE_FUNC_PROC.TEST_PROC	"    RAISE; "
24	PROCEDURE TEST_PACKAGE_FUNC_PROC.TEST_PROC	" end; "
25	PROCEDURE TEST_PACKAGE_FUNC_PROC.TEST_PROC	"begin"
26	body of TEST_PACKAGE_FUNC_PROC	"  global_var := 1;"
27	body of TEST_PACKAGE_FUNC_PROC	"end; "

The test package was copied and modifed based upon Hemant K Chitales “Function and Procedure in Package” LiveSQL demo.

Some problems

  • Currently only ment for package bodies
  • Compiler optimization can move code. That means the line number of an error at runtime is not the same line number as during compile time. PL/Scope only gives us compile time information. So would all_source.
    The only (bad) workaround is to compile with optimization level 1 and recreate the error, so that the correct line of error is shown.
  • Errors that happen in the declaration part of a package can not always be resolved. We might get a false positive for the previously declared object.
  • Does not inform when package body is wrapped.

Further readings

10 more SQL and PLSQL things you might not know

Since my previous blog posts about 10 Oracle SQL features you might not know and “10 Oracle plsql things you probably didn’t know
raised quite some interest, I decided to add some more unknown features. Read careful, I started to write this blog post on 1st of April, so there is an easter egg hidden somewhere in this post. If you are not sure, always test and verify for yourself.

10. NVL can also handle unusual datatypes

NVL can also handle unusual datatypes like BOOLEAN, COLLECTIONS and Advanced Datatypes (ADTs).

Example

set serveroutput on
declare
  b boolean;
begin
  if nvl(b,true) then
     dbms_output.put_line('TRUE');
  else
     dbms_output.put_line('FALSE');
  end if;
end;
/
TRUE

This is part of the sys.standard implementation.
But since boolean is only supported in PLSQL we can’t do much with that in SQL.

9. secret column name “rowlimit_$$_rownumber”

We shouldn’t use “rowlimit_$$_rownumber” or “rowlimit_$$_total” as a column name.

Here is what could happen:

select dummy as "rowlimit_$$_rownumber"
from dual
fetch first 3 rows only;

ERROR at line 1:

ORA-00918: column ambiguously defined

The reason for this can be found when we use the new 12c functionality to expand a query. Typically this is used for views, but it can also be applied to some other features. Here for the logic that does the LIMIT action.

Special thanks to OTN forum members Solomon Yakobson and padders who pointed at the issue in this thread.

What happens behind the scene is that the limit clause “fetch first 3 rows” is changed (expanded) into a subquery that adds a second column “rowlimit_$$_rownumber” to the query. This column uses the row_number analytic function and is later used to filter upon the relevant rows of the LIMIT clause. The error happens because we now have two columns with the same name.

And here is one way to see the expanded code.

set linesize 1000
set longc 1000
set long 1000
variable c clob
exec dbms_utility.expand_sql_text('select dummy from dual fetch first 3 rows only',:c)
print c
SELECT "A1"."DUMMY" "DUMMY"
FROM (  SELECT
            "A2"."DUMMY" "DUMMY",
            ROW_NUMBER() OVER (
                ORDER BY NULL
            ) "rowlimit_$$_rownumber"
        FROM "SYS"."DUAL" "A2"
    ) "A1"
WHERE "A1"."rowlimit_$$_rownumber"

“rowlimit_$$_total” has the same problem. It appears when we use PERCENT in the limit clause.

Example

select dummy as "rowlimit_$$_total"
from dual
fetch first 3 percent rows only;

ORA-00918: column ambiguously defined

And if we expand the working query we see the reason.

SELECT "A1"."DUMMY" "DUMMY"
FROM ( SELECT
            "A2"."DUMMY" "DUMMY",
            ROW_NUMBER() OVER(
                ORDER BY
                    NULL
            ) "rowlimit_$$_rownumber",
            COUNT(*) OVER() "rowlimit_$$_total"
        FROM "SYS"."DUAL" "A2"
    ) "A1"
WHERE "A1"."rowlimit_$$_rownumber"
        <= ceil("A1"."rowlimit_$$_total" * 3 / 100)

The PERCENT keyword requires to do a total count and uses this total count as a filter.

Fortunatly the chance that we by accident name our columns so is very very low.

8. Do you know all plsql pragmas?

Pragmas are instructions for the plsql compiler. There are many of them. Here is the list of pragmas I know or heared about. Not all of them are documented. Not all of them can be used by developers. Several can only be used as sys and come with additional restrictions, so they are only for Oracle internal purposes. Still they pique our curiosity.

The documented and not deprecated pragmas are in bold. At least we should know all of those.

  • autonomous_transaction

    One of the most misunderstood things in plsql.

    Defines that the plsql logic runs independently from the main transaction.

    Typical use case: To log away an error, even if the main transaction is rolled back.

    It is not a workaround for mutating table errors!

  • builtin

    Defines SQL builtin functions and operators.

    This is an internal pragma for usage in package sys.standard.

  • coverage

    This is a new pragma in 12.2.

    from the 12.2. doc: COVERAGE pragma

    The COVERAGE pragma marks PL/SQL source code to indicate that the code may not be feasibly tested for coverage. The pragma marks a specific code section. Marking infeasible code improves the quality of coverage metrics used to assess how much testing has been achieved.

  • deprecate

    Adds a compile time warning if the object is referenced. The message of the warning can be influenced. This new pragma was introduced in 12.2. We can add it to code that should be replaced. Useful in environments where multiple teams of developers call/reference the same code.

  • exception_init

    Combines a plsql exception with an exception number.

  • fipsflag

    Another internal pragma that is used in package sys.standard.

    I guess that the FIPSFLAG pragma has something to do with FIPS from NIST.

    From https://www.howtogeek.com/245859/why-you-shouldnt-enable-fips-compliant-encryption-on-windows/

    FIPS stands for “Federal Information Processing Standards.” It’s a set of government standards that define how certain things are used in the government–for example, encryption algorithms. FIPS defines certain specific encryption methods that can be used, as well as methods for generating encryption keys. It’s published by the National Institute of Standards and Technology, or NIST.

    It seems that US-government computers have a FIPSFLAG enabled. Applications that run on these machines need to be FISMA compliant to be working on those machines.

    Also interesting in that context:
    https://blogs.technet.microsoft.com/secguide/2014/04/07/why-were-not-recommending-fips-mode-anymore/

    https://nvlpubs.nist.gov/nistpubs/FIPS/NIST.FIPS.200.pdf

  • Update: I finally found some documentaion about FIPS.

    Identifying Extensions to SQL92 (FIPS Flagging)

    The Federal Information Processing Standard for SQL (FIPS 127-2) requires a way to identify SQL statements that use vendor-supplied extensions. Oracle provides a FIPS flagger to help you write portable applications.

    When FIPS flagging is active, your SQL statements are checked to see whether they include extensions that go beyond the ANSI/ISO SQL92 standard. If any non-standard constructs are found, then the Oracle Server flags them as errors and displays the violating syntax.

    The FIPS flagging feature supports flagging through interactive SQL statements submitted using Enterprise Manager or SQL*Plus. The Oracle Precompilers and SQL*Module also support FIPS flagging of embedded and module language SQL.

    When flagging is on and non-standard SQL is encountered, the following message is returned:

    ORA-00097: Use of Oracle SQL feature not in SQL92 level Level

    Where level can be either ENTRY, INTERMEDIATE, or FULL.

    So the FIPSFLAG is a way to inform how well a specific function complies to the SQL Standard.

    And we can turn this feature on or off by using SET in sqlplus.

    From the docs

    12.41.24 SET FLAGGER {OFF | ENTRY | INTERMED[IATE] | FULL}

    Checks to make sure that SQL statements conform to the ANSI/ISO SQL92 standard.

    If any non-standard constructs are found, the Oracle Database Server flags them as errors and displays the violating syntax. This is the equivalent of the SQL language ALTER SESSION SET FLAGGER command.

    You may execute SET FLAGGER even if you are not connected to a database. FIPS flagging will remain in effect across SQL*Plus sessions until a SET FLAGGER OFF (or ALTER SESSION SET FLAGGER = OFF) command is successful or you exit SQL*Plus.

    When FIPS flagging is enabled, SQL*Plus displays a warning for the CONNECT, DISCONNECT, and ALTER SESSION SET FLAGGER commands, even if they are successful.

    Btw: In SQL Developer we get an error message “SET FLAGGER ENTRY” is Obsolete.

    The alter session command still works.

    example

    ALTER SESSION SET FLAGGER = FULL;
    Session altered.
    
    select * from dual d1, dual d2
    where d1.dummy=d2.dummy;
    
    Error starting at line : 3 in command -
    select * from dual d1, dual d2
    where d1.dummy=d2.dummy
    Error at Command Line : 3 Column : 1
    Error report -
    SQL Error: ORA-00097: use of Oracle SQL feature not in SQL92 Full Level
    ORA-06550: line 3, column 15:
    PLS-01454: No operator may be used with values of data type CHAR
    
    ALTER SESSION SET FLAGGER = OFF;
    
    Error starting at line : 6 in command -
    ALTER SESSION SET FLAGGER = OFF
    Error report -
    ORA-00097: use of Oracle SQL feature not in SQL92 Full Level
    
    select * from dual d1, dual d2
    where d1.dummy=d2.dummy;
    
    D D
    - -
    X X
    
    

    At first I thought it might be useful to enforce writeing ANSI compliant SQL. Now I seriously doubt there is any kind of practical usage.
    end UPDATE.

  • inline

    Turns submodule inlining on or off. Submodule inlining is a plsql compiler feature implemented since 10g. The compiler can rewrite plsql code so that the resulting code runs faster. Among other options the compiler can add the code from inside a module directly at the point where that code is used (optimization level 3). This is called inlining. The performance advantage is that the expensive submodule call can be avoided. The disadvantage is that the same code is repeated everywhere where the submodule was originally. But we do not have to program this.

    So we as developers can follow the DRY (don’t repeat yourself) paradigm and the optimizer tunes this code for performance. This is the best of two worlds.

  • Because the compiler already does a good job, the pragma is usually not needed. In rare cases we might want to enforce inlining even if compiled with optimization level 2.

  • interface

    Gateway for internal oracle functions to c libraries.
    It is heavily used inside the sys.standard package spec.

    An example

      --#### interface pragmas
    
      --#### Note that for any ICD which maps directly to a PVM
      --#### Opcode MUST be mapped to pes_dummy.
      --#### An ICD which invokes another ICD by flipping operands is
      --#### mapped to pes_flip, and an ICD whose result is the inverse of
      --#### another ICD is mapped to pes_invert
      --#### New ICDs should be placed at the end of this list, and a
      --#### corresponding entry must be made in the ICD table in pdz7
    
      PRAGMA interface(c,length,"pes_dummy",1);
      PRAGMA interface(c,substr,"pes_dummy",1);
      PRAGMA interface(c,instr,"pesist",1);
      PRAGMA interface(c,UPPER,"pesupp",1);
      PRAGMA interface(c,LOWER,"peslow",1);
      PRAGMA interface(c,ASCII,"pesasc");
    
  • new_names

    This is an internal pragma that restricts the use of particular new entries in package standard. It is only valid in package standard.

  • poke_mon

    This is an internal pragma that can be added by database machine learning code. So it might appear by random somewhere in your code. If the schema is pokemon enabled you can use this pragma to train your modules to react faster or to eliminate invalid input data. The pragma was introduced in 19.1.4 using the multi lingual engine (MLE). So far it is only available in autonomous databases (cloud first). If your modules have collected enough power they can be combined to overtake and replace other packages during recompilation. The ultimate goal is to remove all bad performing code from the database.

  • restrict_references
    (RNPS, WNPS, WNDS, RNDS, TRUST)

    This is an outdated pragma. I can remember setting this in an Oracle 7.3 database.
    It informs the database about the intended scope of the module. An error is raised if this pragma is violated.

    RNPS = read no package state
    WNPS = write no package state
    RNDS = read no database state
    WNDS = write no database state
    TRUST = trust me, and don’t double check if all dependend objects do also behave correctly.

    This pragma shouldn’t be needed anymore. Instead make functions DETERMINISTIC.

  • serially_reusable

    Loose all state when the call is finished. Package variables, open cursors and other plsql state is reset when the package is declared with this pragma.

  • timestamp

    This pragma sets/modifies the timestamp value of a package. Valid only in SYS (and probably only for package standard).

  • udf

    Userdefined function

    This pragma can be used if a function is mostly referenced directly inside a SQL statement. It avoids some of the additional overhead during the switch from the SQL to the PLSQL engine. Especially a simplified (less expensive) datatype check is done.

    While the udf pragma is really a great performance feature it is currently very limited. For example the function can only have numeric parameters. If one parameter is a date, then the udf pragma will silently not work anymore, so we will not gain the performance benefit. If you want to improve that behaviour feel free to vote up this database enhancement idea by @LotharFlatz.

    Btw: There are some indications that udf for functions with varchar2 parameters seem to be working in 12.1 but not anymore in 12.2. I didn’t verify this.

How many of the documented pragmas did you know? And how many of the additional ones?
Did you catch them all?

7. LoC limit

There is a limit for how many lines of code (LoC) a plsql object can have.
The limit was increased to 226 DIANA (Descriptive Intermediate Attributed Notation for Ada) nodes (~6 million LoC) in Oracle 8i. Before that it was only about 3000 Lines of Code (215 Diana Nodes).

Nowadays there are other limits that are more likely to be encountered, before the LoC limit is reached. See also: https://docs.oracle.com/en/database/oracle/oracle-database/18/lnpls/plsql-program-limits.html#GUID-00966B4C-B9A5-47D4-94AA-54AEBCC07CE9

Remember: compiler optimizations like inlining might increase your lines of code quite a bit.

6. datatype signtype

There is a datatype signtype. It allows only the numbers -1, 0 and 1.

Example:

set serveroutput on
declare
  v_val pls_integer;
  v_sign signtype;
begin
  for i in 1..10 loop
    v_val := round(dbms_random.value(-5,5));
    v_sign := sign(v_val);
    dbms_output.put_line(v_sign);
  end loop;
end;
/
-1
1
-1
-1
-1
1
0
-1
-1
0

PL/SQL procedure successfully completed.

But this is PLSQL only. In SQL we can not use this type.

create table test(id number, s signtype);

ORA-00902: invalid datatype

Interesting, but so far I never found a need to use it.

5. functions without begin..end

We can declare functions that do not have a function body (no begin..end block).
Example:

create or replace function kommaSepariert(ctx in varchar2)
  return varchar2 deterministic parallel_enable
  aggregate using kommaSepariert_ot;

The secret here is that this function is an user defined aggregation function that uses an object type. And the type body holds the function logic.
 

Here is the matching type definition
create or replace TYPE          "KOMMASEPARIERT_OT"                                          as object(
 str varchar2(4000),

 static function odciaggregateinitialize(
   sctx in out kommaSepariert_ot)
   return number,

  member function odciaggregateiterate(
    self in out kommaSepariert_ot,
    ctx in varchar2)
    return number,

  member function odciaggregateterminate(
    self in kommaSepariert_ot,
    returnval out varchar2,
    flags in number)
    return number,

  member function odciaggregatemerge(
    self in out kommaSepariert_ot,
    ctx2 kommaSepariert_ot)
    return number);
/

create or replace TYPE BODY          "KOMMASEPARIERT_OT" as

  static function odciaggregateinitialize(
    sctx in out kommaSepariert_ot)
    return number
  is
  begin
    sctx := kommaSepariert_ot(null);
    return odciconst.success;
  end;

  member function odciaggregateiterate(
    self in out kommaSepariert_ot,
    ctx in varchar2)
    return number
  is
  begin
    if self.str is not null then
      self.str := self.str ||',';
    end if;
    self.str := self.str || ctx;
    return odciconst.success;
  end;

  member function odciaggregateterminate(
    self in kommaSepariert_ot,
    returnval out varchar2,
    flags in number)
    return number
  is
  begin
    returnval := self.str;
    return odciconst.success;
  end;

  member function odciaggregatemerge(
    self in out kommaSepariert_ot,
    ctx2 kommaSepariert_ot)
    return number
  is
  begin
    if self.str is not null then
      self.str := self.str ||',';
    end if;
    self.str := self.str || ctx2.str;
    return odciconst.success;
  end;
end;
/

 

Such functions have been used in the past to combine strings. Nowadays we can use LISTAGG.
Here is a quick demonstration how it works

with testdata as(select 'abc' t from dual union all
                select 'def' t from dual union all
                select 'ghi' t from dual union all
                select 'jkl' t from dual)
select kommasepariert(t)
from testdata
;
KOMMASEPARIERT(T)
abc,def,ghi,jkl

4. The select clause can influence the number of rows returned

I’m not talking about using select DISTINCT (this is another cruel way where the select clause can change the number of rows returned).

Here is a more surprising situation. Consider those two slightly different queries.

with tbl as (select 1 val from dual union all  
            select 2 val from dual union all  
            select 3 val from dual )  
    SELECT  CASE  0  
            WHEN  0         THEN  'A'  
            WHEN  SUM (val) THEN  'B'  
            END  AS c  
    FROM    tbl;  
Result (3 rows)
A
A
A
with tbl as (select 1 val from dual union all  
            select 2 val from dual union all  
            select 3 val from dual )  
    SELECT  CASE  6  
            WHEN  0         THEN  'A'  
            WHEN  SUM (val) THEN  'B'  
            END  AS c  
    FROM    tbl;  
Result (only 1 row)
B

So 3 rows are returned if we check against 0 and 1 row is returned if we check against 6.

This is a side effect of two rules.
Rule 1: A select with an aggregate function doesn’t need a group by clause and then it is guaranteed to return a single row.
Rule 2: case statements use short-circuit evaluation.

In the first example the sum(val) was never evaluated, so no aggregation took place.

See also this otn thread where the situation was discussed.

I tested the behaviour in 12.1.0.2 SE, 12.2.0.1 SE and in 11.2.0.4 EE.
I also think this should be treated as a bug. Small changes as this to the select clause should not influence the number of rows returned.

3. Default behaviour for windowing clause in analytic functions

This is something I learned from the great Kim Berg Hansen (@Kibeha).

The default windowing clause is “RANGE BETWEEN unbounded preceding and current row”. This can sometimes lead to wrong surprising results. In most cases we should switch and use ROWS BETWEEN. It is something a developer needs to be aware of.
default_windowing
from SQL reference – Analytic Functions

Whenever the order_by_clause results in identical values for multiple rows, the function behaves as follows:

CUME_DIST, DENSE_RANK, NTILE, PERCENT_RANK, and RANK return the same result for each of the rows.

ROW_NUMBER assigns each row a distinct value even if there is a tie based on the order_by_clause. The value is based on the order in which the row is processed, which may be nondeterministic if the ORDER BY does not guarantee a total ordering.

For all other analytic functions, the result depends on the window specification. If you specify a logical window with the RANGE keyword, then the function returns the same result for each of the rows. If you specify a physical window with the ROWS keyword, then the result is nondeterministic.

SUM is one of those “other” analytic functions.

Consider the following example. We have a table with a list of transactions. And we want to see the transaction value but also a cumulative sum for those values.

with testdata as 
   (select 1 trans_id, 10 transaction_value, trunc(sysdate-10) transaction_day from dual union all
    select 2 trans_id, 20 transaction_value, trunc(sysdate-8) transaction_day from dual union all
    select 3 trans_id, -10 transaction_value, trunc(sysdate-2) transaction_day from dual union all
    select 4 trans_id, 30 transaction_value, trunc(sysdate-2) transaction_day from dual union all
    select 5 trans_id, 100 transaction_value, trunc(sysdate) transaction_day from dual 
   )
select trans_id, transaction_day as trans_day, transaction_value as trans_value,  
       sum(transaction_value) over (order by transaction_day) cumulative_sum
from testdata
order by trans_id;   

Result

TRANS_ID TRANS_DAY	TRANS_VALUE	CUMULATIVE_SUM
1	 24.04.18 	10	        10
2	 26.04.18 	20	        30
3	 02.05.18 	-10	        50
4	 02.05.18 	30	        50
5	 04.05.18 	100	        150

As you can see the transaction 3 and 4 have the same cumulative sum. The reason is that our order criteria in the analytic window function does not separate those two rows.

There are two possible solutions. Either make sure that the order is not ambiquious. Or use “rows between”.

with testdata as 
   (select 1 trans_id, 10 transaction_value, trunc(sysdate-10) transaction_day from dual union all
    select 2 trans_id, 20 transaction_value, trunc(sysdate-8) transaction_day from dual union all
    select 3 trans_id, -10 transaction_value, trunc(sysdate-2) transaction_day from dual union all
    select 4 trans_id, 30 transaction_value, trunc(sysdate-2) transaction_day from dual union all
    select 5 trans_id, 100 transaction_value, trunc(sysdate) transaction_day from dual 
   )
select trans_id, transaction_day as trans_day, transaction_value as trans_value,  
       sum(transaction_value) over (order by transaction_day 
       rows between unbounded preceding and current row) cumulative_sum
from testdata
order by trans_id;   
TRANS_ID TRANS_DAY	TRANS_VALUE	CUMULATIVE_SUM
1	 24.04.18 	10	        10
2	 26.04.18 	20	        30
3	 02.05.18 	-10	        20
4	 02.05.18 	30	        50
5	 04.05.18 	100	        150

2. batched commits

The performance of many small commits can be improved when doing batched commits.

Instead of writing

commit;

we can do

commit work write batch;

commit_batch

Here are two real world examples where this was tested.

a) I recommended using batched commits to a colleague of mine, who tried to tune a set of java logic that run in highly parallel mode. The goal was to do 1 select + 2 inserts + commit in 1000 parallel sessions per second.

Switching to batched commits was so hugely successful, that they now raised the performance requirement to 2500 concurrent sessions per second. Which also means now the ball is passed back to the java developers to come up with a better mode to execute lots of small checks against the db.

b) I also tested batched commits in a different and more general context.

Most of our code has code instrumentation logic. That means we can turn on debugging with a certain trace level and while the code is running a lot of tracing information is written into a logging table. The instrumentation call (like logger.log_trace) uses an autonomous function to do so. Essentially it is a single insert followed by a commit. That also means that a lot of commits are executed. Which can put stress on the log writer and the storage system.

So I compared what happens when we do a commit vs. a batched commit while writing lots of tracing data.

The batched commit was orders of magnitude faster than the normal commit.
I plan to write a separate article to show the exact measurements.

UPDATE: I finally managed to recheck this behaviour. Under plsql a commit will always do a kind of batched nowait commit. I tested this in 12.2.0.1 (SE). It is now also documented as C.Neumüller correctly pointed out. (https://docs.oracle.com/en/database/oracle/oracle-database/18/lnpls/static-sql.html#GUID-56EC1B31-CA06-4460-A098-49ABD4706B9C).

Note:

The default PL/SQL commit behavior for nondistributed transactions is BATCH NOWAIT if the COMMIT_LOGGING and COMMIT_WAIT database initialization parameters have not been set.

In older db versions this documentation note was missing, but have been traces by several DBAs who found that commit behaviour in plsql was different from normal commits.

I assume that my previous test (which was on older hardware) was influenced by some other factors, that I wasn’t aware of. I now suspect some disc activities throttled the log writes on the storage system.

Batched commits are still useful for NON-Plsql environments. Like typical .net or java middle tier appliactions that send their statements via connection pool to the database.

Appologies to all readers who got the impression that you could now simply turn a magic key on, and speed up your application.

Writing into logging tables can be speeded up by other means. A blog post about that is currently in the making. Stay tuned!

Another thing to remember is that this different batch mode influences performances comparisons between plsql and java programs heavily in favor of the plsql world.
end UPDATE.

So what is the disadvantage? Why not always use batched commits?
To be clear: For normal situations stay with the normal commit. But if you run into issues where the log writer is not fast enough then this can be a possibility.

The drawback (as I understand it) is that in the case of a database crash, you might not only loose the currently unfinished transactions but also some transactions, that were commited already, but which the logwriter didn’t finalize yet. Typically all transactions from the last 3 seconds are at risk.

1. “CRASH” is a reserved plsql word

I have no idea why.

Oracle 18.1 PL/SQL Reserved Words and Keywords

plsql_reserved_crash