First thoughts about Oracle 18XE db

On the 20th october the new free Oracle database version 18XE (Express Edition) was anounced (https://blogs.oracle.com/oracle-database/oracle-database-18c-xe-now-available).

I had a first quick look at the database but mostly at the documentation. And here are my thoughts about it.

Restrictions

  • Up to 12 GB of user data (previously 11 GB)
  • Up to 2 GB of database RAM (previously 1 GB) – that is SGA + PGA combined.
  • Up to 2 CPU threads (previously 1 CPU – but multiple threads?!)
  • Up to 3 Pluggable Databases (this is new)

Not much of an improvement.

Previously there was a restriction of only 1 XE per computer (not per VM). The license guide does not mention this restriction anymore. I think it means that we can have multiple XEs on the same machine, but they need to be in different VMs. In the area of cloud based installations, we might not even know if another XE is on the same server. So this is very good news.

missing and included features

The following list does not claim to be complete. But those are features that I find interesting to consider.

multi tenant (up to 3 PDBs)
no plug / unplug into all other databases
partitioning
 no sharding
in-memory column store
compression
 no active session history (ASH) and other performance related diagnostics
spatial & graph support
encryption and redaction
Database Vault, FGA
 RAC

 

In general everything that works is allowed! This is different from 11gXE!

 

Thoughts

Server/VM limitations

11g XE was limited to one installation per machine. This was specified in the 11gXE license agreement.

Any use of the Oracle Database Express Edition is subject to the following limitations;
1. Express Edition is limited to a single instance on any server;

18cXE is limited to one installation per VM (my interpretation).

The restriction that was previously in the license guide is gone, however the install guide for linux, still has this passage

4.2 Oracle Database XE Installation and Execution Restrictions

Only one installation of Oracle Database XE can be performed on a single environment. …
To run more than one Oracle Database instance or install more than one copy of the database software, upgrade to Oracle Database 18c Personal Edition, Oracle Database 18c Standard Edition 2, or Oracle Database 18c Enterprise Edition.

The word server was replaced by environment. I interpret that so, that on a single server with multiple environments (e.g. VM images) we can have multiple XE installations.

The other restricting factor is that the oracle SID is fixed to XE: I first thought it is hardcoded, but I seen one trustworthy report, where the claim was made that this can be changed. But even if that is possible, it is not clear whether that is allowed.

This might be problematic for upgrade scenarios (from 18XE to 19XE). We can not have two active DBs one old 18XE and the new 19XE on the same environment. Maybe not even if one is shut down (because of the only one copy of the db software issue).

There is a section in the installation guide, that seems to support this impression: Exporting and Importing Data for non-APEX Users. It describes an upgrade scenario via datapump. It specifically mentions to deinstall 11cXE.

2. Deinstall Oracle Database XE 11.2 if installation of Oracle Database XE 18c is planned on the same system. See Deinstalling Oracle Database XE for more information

There certainly is some clarification with regards to possible upgrade scenarios needed in the future. For now, only one XE installation on a single virtual box/machine.

Ressource restrictions

The 2 CPU threads might be one of the more limiting factors. It feels like a downgrade from the previous version. With hyperthreading a CPU could support easily up to 4 or even 8 threads. This is now limited to two threads. Oracle tries to make sure that we can not trick the limitations by useing advanced hardware.

This reminds my of the restriction introduced when switching to SE2 (standard edition 2). SE2 is limited to 16 (user) threads. Which essentially means, we should disable hyperthreading and just use normal CPU power.

Especially for web based applications the thread limit seems severe. In APEX we typcially have connection pools that serve up to 10 concurrent connections. Sometimes multiple connections pools (apex, apex_al, apex_pu, apex_rt). We need to rethink the default apex and ords settings. A thread on a CPU usually does not resemble a user process 1:1 but it can serve as a crude first estimation.

The 12 GB user data means we can not store many document files in the database. For most well structured data this should be reasonably enough. As long as no video, audio or very large photo collections are used. This still leaves the option to store documents as BFILE directly in the file system. This would avoid the data being used for the limited user table space. I’m not sure if temp and undo space is included in that limit. Main question seems to be, what exactly is measured.

Also note that system tablespace is not included in that 12 GB limit.  Unfortunatly I can’t find the source anymore where I read about this.

This query might help measureing how much space is used.

select nvl(tablespace_name,' - Total - ') tablespace_name,
round(sum(bytes)/1024/1024/1024,1) as size_in_GB
from dba_segments
where tablespace_name not in ('SYSAUX','SYSTEM')
group by rollup(tablespace_name)
order by size_in_GB desc;

I think that restriction is quite ok. Although the improvement from 11gXE is only marginal.

In-memory

Since we are only able to use 2 GB RAM in-memory column storage is not really useful. It can be nice to test out this feature or maybe use it on very tiny databases.
It might also be useful in combination with partitioning. I’m not sure if that is possible, but if we load only the relevant (e.g. current) partitions into memory we might be able to stay within the 2GB limit even when data sizes grow.

More of a marketing stunt, that in-memory option is included.

Compression

There is basic and advanced compression. Could be highly useful, especially to save up on disk space. However if you ever plan to upgrade to a Standard Edition, don’t depend on compression. This also is an EE feature.

On the other hand compression might be used to lower the pressure on the 12 GB user data limit. Using advanced compression it should be easily possible to double the size of the user data without reaching the limit.

APEX

APEX is not preinstalled. This is in line with the current 18c db situation. In 11g/12cR1 APEX was preinstalled in the CDB, which was generally considered bad practice. It was changed with 18c (i believe) and XE behaves the same way. Still it would be nice to have an option during install that gives us ORDS and APEX in a PDB.

In general APEX (and ORDS) update their versions more frequently than the database,
This might change with the new release cycles of the database however. 19XE is supposed to come out with the 19.3 db version.

upgrade thoughts

We can not directly upgrade from 11gXE to 18cXE. Nor will we be able to upgrade from 18cXE to 19XE. The way to upgrade is to export the data and import it into the new DB.

multitenant

We got 3 PDBs to use. This is very nice. Remember SE2 (standard edition 2) only allows one plugged PDB (+the seed PDB). I still would only put one PDB into major use. But it might become much easier to clone a PDB to try out a few things there, without the need to shut the original PDB down.

Unfortunatly it is not possible to plug the XE PDB into a SE2 database. It is possible to plug it into an EE database or even into a Oracle Cloud Db. I believe the SE restriction has to do with the problem that several EE features are available in the XE, which are not available in SE/SE2.

database links

So far I do not know if it is is possible to create a DB link on an XE. I totally expect that it will work. The whole CDB/PDB concept uses DB links.

backup

It is now possible to use RMAN to backup a XE. Recommended!

Tuning and diagnostic tools

At first I heared that ASH and AWR reports will not be possible in XE. But in this thread from July 2018 Gerald Venzl explained otherwise

Will it be legal to use all these options like DIAGNOSTIC+TUNING, ADDM, AWR, ASH in production systems?

–> Yes

Meanwhile Gerald clarified that the feature did not make it into 18XE. It is now planned for 19XE.

The license guide has a detailed chapter showing which options are available and which not.

It would be great to have it. Performance troubleshooting is a topic on any production system. I don’t expect XE to be an exception.

 

production ready and other usages

Can we use it for production environments?

From a pure license standpoint it is ok. See the quote from the faq

Can I use Oracle 18c XE in production?

Oracle Database Express Edition does not restrict in which environment it can be deployed. However, Oracle Database Express Edition is not supported and does not receive any patches, including security patches. Oracle recommends to run production deployments on fully supported Oracle Database editions or Cloud Services.

So – yes we can! But should we use it in production? Yes and no. For non-critical applications (like if you want to manage your local club data via a XE + apex) that is certainly fine. Although the cloud based alternatives are interesting.  However we will not receive any security fixes and don’t have oracle support (we are not paying for it, so that is fair). This might be a risk. And that’s why I wouldn’t recommend XE for mission critical stuff.

The other thing people already speculated, is to use an XE for testing purposes. If it is feature testing, then this we can already do with an EE (enterprise edition) using the OTN Developer  license. Just recently the oracle prebuild developer-vm was upgraded to 18c. If you just want the database then you can use a vagrant-box to to provision the installation: Oracle DB 18.3 . Since XE will not receive any patches, we will not be able to test the newest version of a feature on it. So for pure testing purposes it would be a bad choice.

It might be an exellent tool to demonstrate the capability of an oracle database to critical customers however. If you are an ISV you could even use it to give your potential customer a trial version of your software running on an XE.

Conclusion

I’m very excited that 18cXE finally made it. I’m also happy that a lot of EE features are available. For demos, talks and presentations I will probably stay with the preconfigured Developer-VM and an EE version of the database. For long-running small side projects, I consider to setup one or two XEs. Additionally it is a great counter argument for all those “Oracle DB is (too) expensive” opinions.

useful links

XE Quick Start: https://www.oracle.com/database/technologies/appdev/xe/quickstart.html

XE licensing information: https://docs.oracle.com/en/database/oracle/oracle-database/18/xelic/licensing-information.html#GUID-0F2574A6-360F-4237-8098-17B02FFC3BB3

XE Forum: https://community.oracle.com/community/groundbreakers/database/developer-tools/oracle_database_express_edition_xe

Oracle Vagrant-Boxes: https://blogs.oracle.com/developers/announcing-the-oracle-vagrant-boxes-github-repository

Jonathan Lewis about compression: https://www.red-gate.com/simple-talk/sql/oracle/compression-oracle-basic-table-compression/

Advertisements

OTN/ODC Appreciation Day 2018 – Thanks for the mutating table error!

This is not sarcasm!

What is the mutating table error?

I firmly believe the mutating table error is a good thing. For anybody who does not know what I am talking about here is an example.

There are various ways how the error can happen. A before row trigger is probably the most typical scenario.

Since Halloween is getting close, I need a table to organize the personell for my evil lab. I setup a table that holds each member devided by departments. Only one labhead per department is allowed. This integrity rule is enforced by table triggers.
maniac_mansion_scientist

-- build demo
create table swe_test_mutating
  (id number not null primary key 
  ,name varchar2(1000) not null
  ,department varchar2(254)
  ,is_head_of_department varchar2(1) 
     constraint yn_ck check (is_head_of_department in ('Y','N'))
  );


-- trigger to make sure that there is only one head of department.
create or replace trigger swe_test_mutating_bri
  before insert or update on swe_test_mutating
  for each row 
declare
  v_dummy varchar2(1);
begin
  if :new.is_head_of_department = 'Y' then 
    -- check if there is at least one other person 
    -- for the same department
    -- which is also marked as HEAD
    begin
      select is_head_of_department
      into v_dummy
      from swe_test_mutating
      where id != :new.id -- must be different 
      and department = :new.department -- must be same department
      and is_head_of_department = 'Y' -- is head
      and rownum <= 1  -- enough if we find one
      ; 
      
      raise_application_error(-20000, 'There can only be one head of '||:new.department||'!');
    exception
      when no_data_found then
        -- ok. Single head of department only
        null;
    end;    
  end if;
end;
/

Trigger SWE_TEST_MUTATING_BRI compiled

add some data


-- add some personell to the evil lab
insert into swe_test_mutating
values (1, 'The master', 'LAB05', 'Y');

1 row inserted.

insert into swe_test_mutating
values (2, 'The apprentice', 'LAB05', 'N');

1 row inserted.

insert into swe_test_mutating
values (3, 'Doc Holiday', 'LAB05', 'Y');

ORA-20000: There can only be one head of LAB05!
ORA-06512: at "EVIL_LAB.SWE_TEST_MUTATING_BRI", line 17
ORA-04088: error during execution of trigger 'EVIL_LAB.SWE_TEST_MUTATING_BRI'

This shows, the trigger worked for single row inserts. The logic inside the trigger prevented us from adding a second head for one department (LAB05).

Now a test with multi row inserts.

insert into swe_test_mutating
(
select 4, 'Mr. Sissorhands', 'LAB07', 'Y' from dual union all
select 5, 'Nagging Nanny', 'LAB07', 'N' from dual union all
select 6, 'Evil Clown', 'LAB07', 'Y' from dual
);

ORA-04091: table EVIL_LAB.SWE_TEST_MUTATING is mutating, trigger/function may not see it
ORA-06512: at "EVIL_LAB.SWE_TEST_MUTATING_BRI", line 8
ORA-04088: error during execution of trigger 'EVIL_LAB.SWE_TEST_MUTATING_BRI'

So what happend? We got an error (ORA-04091) saying, we are not allowed to work with the same table that is currently modified.

We would have gotten the same mutating error even if the select part has only one row. If there is an insert..select then it is considered a multi row insert and then the mutating table error bites.

Why is the mutating table error a good thing?

There are multiple reasons

1) The error prevents developers from making stupid mistakes.

I like to compare it to a safety catch. On a pistol it prevents the developer from shooting their own foot. On a climbing rope it prevents the developer from plummeting to death, if something breaks loose. If you remove the safety catch on purpose, then it is your own risk. Only remove it if you are absolutly sure what you are doing.

Without the error it is easy to create endless loops. Think about what happens, if we do an update on the same table in an after update row trigger. This update would again trigger another update which would trigger yet another update and so on. Eventually we run out of disk space (archive/redo logs) or something else dramatic happens.

The mutating table error stops us (developers) from creating such an endless loop or making other mistakes (that mostly have to do with read consistency).

2) The Error and I have a little bit of history together.

In one of my first big projects almost all tables had validity columns (valid_from and valid_to). We used triggers to make sure that certain validity rules were also considered over relationships.

For example if the validity of a parent record was extended, then the validity of a detail record needed to be extended too – but only, if it didn’t clash with any of the other detail records. Very complex rules.

Those triggers always ended up with mutating or constraining table errors, so in the end the whole logic needed to be moved to after statement triggers. It was a hell to maintain.

Nowadays I would not put all those logic into triggers, instead I would use packaged low level APIs that also make sure have all the integrity rules in place. The code would look so much cleaner with that.

I believe having gone through this experience, made me a better developer.

3) The error is complex to understand.

I do understand it. Which gives me the chance to educate about it. Sharing and passing on knowledge is a fun and highly satisfying thing to do. As is writing such a blog post.

Error Darwinism

Interestingly errors seem to comply to the laws of natural selection.

Here is what I noticed how the mutating table error has evolved over time.

  • It now (since 10g I believe) bites only if more than one row is inserted (or updated or deleted. Which is a very hideous thing to do. Novice developers test using single row inserts. Experienced developers know they should use bulk inserts and test accordingly.
  • The little brother of the mutating table error was the constraining table error.
    AskTom: “A mutating table is a table that is being modified. A constraining table is one involved in declaritive referential integrity with the mutating table.”

    Evolution took its toll on it. The constraining table error now is rarly seen in the wild. I asked around a bit, and there were no developers that found it recently. Most didn’t even know it existed. We might consider it to be extinct.

  • Compound triggers appeared in 11.1. They help to deal with the error in a quick and not so dirty way. So they are a kind of master predator for this error.
  • The recommendation about the errors have changed. Certain tools, like Oracle SQL developer show additional information if an error happend. In the case of the ORA-04091 those additional notes included something about “pragma autonomous transaction”. This hint was very misleading. Nowadays the misinformation is gone.
    It was like a mosquito (bug) infection in a swamp. The swamp is now dried up, so we see less mosquitos feeding upon that error.

How to avoid the error?

Anyone who tries to avoid the mutating table error by using the PRAGMA AUTONOMOUS TRANSACTION creates another bug. For a while the error message was misleading and trapped novice developers into thinking that this pragma is the right way to circumvent the issue. It is not. Bug count >= 2 after that pragma.

So how to correct it? We move the logic to the after statement trigger. In the past we needed 3 different table triggers to do it correctly. A before statement trigger to initialize a package variable, a row trigger to store the IDs of all records that were touched by the DML and an after statement trigger to do the actual work based upon those stored IDs. By using a compound trigger we can combine those three triggers and make the code look much cleaner.

Here is one way how to do it.

-- compound trigger to make sure that there is only one head of each evil lab (department).

create or replace trigger swe_test_mutating_comptrg
  for insert or update on swe_test_mutating
  compound trigger 

  -- define collection type
  type ID_t is table of swe_test_mutating.id%type 
    index by binary_integer;
  -- define variable to hold a list of IDs
  IDs ID_t;
  
before statement is
begin
   -- no initialization needed for compound trigger
   null;  
end before statement;

after each row is
begin
  -- check if we have a new head of department
  if :new.is_head_of_department = 'Y'
  and (:old.is_head_of_department is null -- insert
       or :new.is_head_of_department != 'Y' ) then

     -- remember ID
     IDs(IDs.count + 1) := :new.id;
  end if;     
       
end after each row;

-- main logic
after statement is
  v_department swe_test_mutating.department%type;
begin
  if IDs.count > 0 then 
    -- check if there is at least one other person 
    -- for the same department
    -- which is also marked as HEAD
    for i in 1..IDs.count loop
      begin
        -- since we didn't store the department, we have to reread it from the table
        select t1.department
        into v_department
        from swe_test_mutating t1
        join swe_test_mutating t2 on t1.department = t2.department -- must be same department
        where t1.id = IDs(i)
        and t1.id != t2.id -- must be different id
        and t1.is_head_of_department = 'Y' -- remembered ID is head
        and t2.is_head_of_department = 'Y' -- second personell is also head
        and rownum <= 1  -- enough if we find one
        ; 
        
        raise_application_error(-20000, 'There can only be one head of '||v_department||'!');
      exception
        when no_data_found then
          -- ok. Single head of department only. 
          null;
      end;    
    end loop;
  end if;
end after statement;
end;
/

-- remove the old trigger
drop trigger swe_test_mutating_bri;

If we now do a multi-row insert we will not see the mutating table error. But instead the error message that we want to see (only 1 head allowed).

— repeat the test!
insert into swe_test_mutating
(
select 4, 'Mr. Sissorhands', 'LAB07', 'Y' from dual union all
select 5, 'Nagging Nanny', 'LAB07', 'N' from dual union all
select 6, 'Evil Clown', 'LAB07', 'Y' from dual
);

ORA-20000: There can only be one head of LAB07!
ORA-06512: at "EVIL_LAB.SWE_TEST_MUTATING_COMPTRG", line 50
ORA-04088: error during execution of trigger 'EVIL_LAB.SWE_TEST_MUTATING_COMPTRG'

There are multiple optimisations for this code possible. We also could enfore the integrity rule by using a function based unique constraint. The point however was to show how code that demonstrates how to work around the mutating table error.

Also note, that the usage of table triggers is not only to create error messages, like I did in the example with the “there can only be one” check. Often they are used to store aggregated (=redundant) data in some header tables. And many other scenarios are possible.

Addemdum – integrity not guaranteed

The trigger as I wrote it first does not guarantee the intended data integrity. It avoids successfully the mutating table error. However it does not ensure, that there will only be one labhead.

Here is the scenario to show the problem. If we insert a single labhead from two separate sessions without commiting the first session. Then we end up with two active labheads. Which is not what our business rule says.

-- In session A
insert into swe_test_mutating
(
select 4, 'Mr. Sissorhands', 'LAB07', 'Y' from dual union all
select 5, 'Nagging Nanny', 'LAB07', 'N' from dual 
);

-- do not commit in session A yet!

Now add another head of lab in a different session.

-- session B
insert into evil_lab.swe_test_mutating
(select 6, 'Evil Clown', 'LAB07', 'Y' from dual);
commit;

After commit in session A we can check the results of the table

-- session A
commit;

select * from swe_test_mutating;
ID	NAME	        DEPARTMENT	IS_HEAD_OF_DEPARTMENT
6	Evil Clown	LAB07	        Y
4	Mr. Sissorhands	LAB07	        Y
5	Nagging Nanny	LAB07	        N

Thanks to Alex Nuijten who pointed out this issue in the comments.

On a more abstract level: In a multi user environment, we need to make sure that all data that is needed for a decision is immutable until the transaction is finished. This also means we need to serialize the access to the same lab.

The solution is not so simple.

We can consider to lock all the personell, at least for the lab in question. But this will not prevent a rule violation via an insert.

The proper way is to lock the parent row (the lab) itself. Now in my simplified datamodel we do not have this part. So I first need to extend the datamodel and then do the row locking.

Here is the updated scenario

Improved datamodel with two tables and a FK

drop table swe_test_mutating;
drop table swe_test_mutating_labs;
drop table swe_test_mutating_personell;
-- build lab enhancement
create table swe_test_mutating_labs
   (name varchar2(254) not null primary key 
   ,description varchar2(4000));
   
create table swe_test_mutating_personell
  (id number not null primary key
  ,name varchar2(1000) not null
  ,department varchar2(254) references swe_test_mutating_labs(name)
  ,is_head_of_department varchar2(1) 
     constraint yn_ck check (is_head_of_department in ('Y','N'))
  );

-- We really should create an index on the FK to lessen the chance for a major deadlock scenario.
-- However dealing with the dead is daily business for the evil lab. 
-- So I don't care at the moment.

-- construct some labs
insert into swe_test_mutating_labs (name) 
select 'LAB'||to_char(level,'fm00') 
from dual connect by level <= 10;

commit;

The compound trigger also needed to be changed. The most important thing is that we need to lock the lab in question, before the duplicate check is made.
Improved compound trigger

-- compound trigger to make sure that there is only one head of each evil lab (department).
 
create or replace trigger swe_test_mutating_personell_comptrg
  for insert or update on swe_test_mutating_personell
  compound trigger
 
  -- define collection type
  type ID_t is table of swe_test_mutating_personell.id%type 
    index by binary_integer;
  -- define variable to hold a list of IDs
  IDs ID_t;
   
before statement is
begin
   -- no initialization needed for compound trigger
   null;  
end before statement;
 
after each row is
begin
  -- check if we have a new head of department
  if :new.is_head_of_department = 'Y'
  and (:old.is_head_of_department is null -- insert
       or :new.is_head_of_department != 'Y' ) then
 
     -- remember ID
     IDs(IDs.count + 1) := :new.id;
  end if;     
        
end after each row;
 
-- main logic
after statement is
  v_department swe_test_mutating_personell.department%type;
begin
  if IDs.count > 0 then
    -- check if there is at least one other person 
    -- for the same department
    -- which is also marked as HEAD
    for i in 1..IDs.count loop
      begin
        -- since we didn't store the department, we have to reread it from the table
        select p1.department
        into v_department
        from swe_test_mutating_personell p1
        where p1.id = IDs(i)
        and p1.is_head_of_department = 'Y' -- inserted ID is a labhead
        ;
        
        -- lock the department
        select d.name 
        into v_department
        from swe_test_mutating_labs d
        where d.name = v_department
        for update of d.name;

        -- after we ensured exclusive access to this lab, 
        -- we can test if there is another labhead 
        select p2.department
        into v_department
        from swe_test_mutating_personell p2
        where p2.department = v_department -- must be same department
        and p2.id != IDs(i) -- must be different person
        and p2.is_head_of_department = 'Y' -- second personell is also head
        and rownum <= 1  -- enough if we find one
        ; 

        -- found one! 
        raise_application_error(-20000, 'There can only be one head of '||v_department||'!');
      exception
        when no_data_found then
          -- ok. Single head of department only. 
          null;
      end; 
    end loop;
  end if;
end after statement;
end;
/

We can now repeat the test using the new tables.

— In session A
insert into swe_test_mutating_personell
(
select 4, 'Mr. Sissorhands', 'lab07', 'Y' from dual union all
select 5, 'Nagging Nanny', 'lab07', 'N' from dual
);

Before we commit in session A, we do an insert in session B

— session B
insert into swe_test_mutating_personell
(select 6, 'Evil Clown', 'lab07', 'Y' from dual);

This insert will now WAIT (it "hangs") until the first session finishes its transaction. Essentially we serialized the access to each lab. Only one session at a time is allowed to do changes with regards who is the head.

The result in session B depends whether session A does a COMMIT or a ROLLBACK.

session A: commit;
session B: ORA-20000: There can only be one head of lab07!

session A: rollback;
session B: 1 row inserted.

But this again shows how much trouble we have to avoid and to consider if we decide to a) implement business rules inside database triggers and b) try to avoid the mutating table error.

Thank you mutating table error! We have come a good way since I first encountered you.

fun with calendars and dates

Intro

1911_Ottoman_Calendar
By Unknown – 1911 Ottoman calendar, Public Domain, Link

Calendar and date arithmetic is not easy. And not because of the database, but because the stuff exists in the real word. The database does an excellent job to reflect many or most of the strange attributes that calendar systems have.

Here is a collection of surprising, quirky or interesting things around usage of date and time values in an Oracle database. A lot of those things have nothing to do with the database, but instead with the complexity of the world.

By examining those edge cases we might learn something about the general workings of dates, calendars and the Oracle database.

5 Fun Facts

1) We live in the year 2561

alter session set nls_calendar = 'THAI BUDDHA';
select to_char(sysdate,'YYYY') as current_year from dual;

CURRENT_YEAR
2561

q.e.d.

Reason of cause is that there are other calendars out there in the world, not only the default Gregorian one. And according to the Thai Buddha calendar we are now in 2561.

2) persons existed that were born on the 30th of february

…but only in Sweden some time ago.

I got this from a source which doesn’t exist anymore. However here is a site that seem to have the same information stored:
the-ultimate-guide-to-the-datetime-datatypes

Why is 1753 the earliest date for datetime?
Good question. It is for historical reasons. In what we sometimes refer to as the western world, we have had two calendars in modern time: the Julian and the Gregorian calendars. These calendars were a number of days apart (depending on which century you look at), so when a culture that used the Julian calendar moved to the Gregorian calendar, they dropped from 10 to 13 days. Great Britain made this shift in 1752 (1752-09-02 were followed by 1752-09-14). An educated guess why Sybase selected 1753 as earliest date is that if you were to store an earlier date than 1753, you would also have to know which country and also handle this 10-13 day jump. So they decided to not allow dates earlier than 1753. Note, however that other countries did the shift later than 1752. Turkey, for instance, did it as late as 1927.

Being Swedish, I find it a bit amusing that Sweden had the weirdest implementation. They decided to skip the leap day over a period of 40 years (from 1700 to 1740), and Sweden would be in sync with the Gregorian calendar after 1740 (but meanwhile not in sync with anyone). However, in 1704 and 1708 the leap day wasn’t skipped for some reason, so in 1712 which was a leap year, they inserted yet an extra day (imagine being born in Feb 30!) and then did the shift over a day like everyone else, in 1753.

Does the database know about this? Lets find out:

alter session set nls_date_language='SWEDISH';
alter session set nls_territory='SWEDEN';
select to_date('30-02-1712','dd-mm-YYYY') from dual;
ORA-01839: date not valid for month specified

Using a date literal instead of TO_DATE doesn’t help either.

select to_char(date '1712-02-30') from dual;
ORA-01847: day of month must be between 1 and last day of month

So this doesn’t work. Why? Because the 30th of february only existed in the Julian Calendar System. The Oracle Database doesn’t support a Julian Calendar – we can not choose calendar systems, that are not in use anymore.

Btw. the 29th of februar 1712 does exist. Even in Sweden.

select to_char(date '1712-02-29','fmddth Month YYYY') from dual;
29th Februari 1712

The article also briefly mentioned that Turkey was the last country to switch to the Gregorian Calendar. Before that, they had the Rumi Calendar, which is a variation of Julian. It differs mostly about the starting year.

Turkey did the switch as late as 1927. After the 15th February 1332 AH (Rumi) they skipped 13 days, so that the next day was the 1st March 1927 AD (Gregorian). And they switched the year too.

Again, if we ask the database then it does not know about this gap.

set pagesize 100
alter session set nls_date_language='TURKISH';
alter session set nls_territory='TURKEY';
-- 28 days later...
with twentyeight as (select level lv from dual connect by level <=28)
select to_char(date '1927-02-10' + lv,'fmddth Month YYYY')
from twentyeight;
11th Şubat 1927
12th Şubat 1927
13th Şubat 1927
14th Şubat 1927
15th Şubat 1927
16th Şubat 1927
17th Şubat 1927
18th Şubat 1927
19th Şubat 1927
20th Şubat 1927
21st Şubat 1927
22nd Şubat 1927
23rd Şubat 1927
24th Şubat 1927
25th Şubat 1927
26th Şubat 1927
27th Şubat 1927
28th Şubat 1927
1st Mart 1927
2nd Mart 1927
3rd Mart 1927
4th Mart 1927
5th Mart 1927
6th Mart 1927
7th Mart 1927

Since the leaped days existed in the Gregorian Calendar, there is no reason to leave them out.

Interestingly in Turkey there should still be living persons that were born in or before 1332!
(older than 92 years)

3) There are no days between 4 Oct 1582 and 15 Oct 1582

select to_char(date '1582-10-4','ddth Mon YYYY') from dual;  
04th Oct 1582

+1 day

select to_char(date '1582-10-4' + 1,'ddth Mon YYYY') from dual; 
15th Oct 1582

04th Oct 1582 + 1 = 15th Oct 1582 ?

After the previous chapter the reason should be obvious.
At this time the Gregorian calendar leapt 10 days in order to catch up with the astronomical year.

From http://www.unc.edu/~rowlett/units/dictY.html#year

year (a or y or yr) [2]
a traditional unit of time usually equal to 365 or 366 days. We need a whole number of days for the calendar year used in ordinary life. Ancient astronomers knew that the year [1] is approximately 365 days long, and we now know the correct figure is approximately 365.242 days. If we use 365 as the number of days in every calendar year, the extra 0.242 day adds up quickly and causes large errors in predicting the seasons. To solve this problem, the Roman emperor Julius Caesar decreed in 46 BC that the calendar year should have 365 days generally, but that every fourth year should have an extra, or 366th, day. The longer year is called a leap year. In this Julian calendar, four years equal exactly 1461 days, so the average Julian year is exactly 365.25 days.

This was a big step toward accuracy in the calendar, but the Julian year is too long by 0.008 day, or a little over 11 minutes. By the time of the Renaissance, these 11-minute errors had accumulated to a total error of about 10 days (since the Council of Nicaea in 325 AD, which set the rules for deciding when Easter should be celebrated). The spring equinox was occurring near March 11 instead of March 21. In 1582, Pope Gregory XIII decreed that 10 days should be dropped from the calendar: the day after 1582 October 4 was October 15. To reduce future errors, the pope further decreed that years divisible by 100 are not leap years unless they are also divisible by 400. Thus 2000 and 2400 are leap years, but 2100, 2200, and 2300 are not. It took many years, but the Gregorian calendar has now been accepted as the civil calendar in all countries of the world.

With the Gregorian adjustment, there are exactly 146 097 days in every 400 years, and the average Gregorian year is exactly 365.2425 days. The Gregorian year is still too long, but by less than half a minute. It will take thousands of years for this error to accumulate to 1 day, so the calendar year and the tropical year are in good enough agreement to last us a long time.

And this time, the database with its default Gregorian calendar knows about it!

Database considers days that seem to be inside this gap, as if they were after the leaped days.

select to_char(date '1582-10-5','ddth Mon YYYY') from dual; 
15th Oct 1582

4) The north pole and the south pole are on different time zones

From wikipedia: https://simple.wikipedia.org/wiki/Time_zone

In the poles, the time is UTC in the North Pole and UTC+12 in the South Pole.

(and no this is not the “South Pole, Illinois, USA” nor the “North Pole, New South Wales, Australia”)

So in case you happen to live or work at the south pole, set your session time zone to UTC+12.

Let me simulate that for you:

col sessiontimezone format A20
col dbtimezone format A20
-- check current settings
select sessiontimezone, tz_offset(sessiontimezone),
       dbtimezone, tz_offset(dbtimezone)
from dual;
SESSIONTIMEZONE      TZ_OFFS DBTIMEZONE           TZ_OFFS
-------------------- ------- -------------------- -------
Europe/Berlin        +02:00  +02:00               +02:00

Currently I’m in Frankfurt which is the same time zone as Berlin, so 2 hours before UTC.
Since I have to do some quick work at the south pole, let’s change my session settings to reflect that.

-- change to UTC+12
alter session set time_zone ='+12:00';
Session altered.
-- how late is it here currently?
select sysdate, current_date from dual;
SYSDATE           CURRENT_DATE     
----------------- -----------------
17.07.18 10:51:12 17.07.18 20:51:12

So here at the south pole, it is almost 9 p.m. whereas the database is still in Frankfurt at 11 a.m.

After enjoying The Great White Silence I need to return.

-- change back to original
alter session set time_zone=local;
Session altered.
-- doublecheck settings
select sessiontimezone, tz_offset(sessiontimezone),
       dbtimezone, tz_offset(dbtimezone)
from dual;
SESSIONTIMEZONE      TZ_OFFS DBTIMEZONE           TZ_OFFS
-------------------- ------- -------------------- -------
Europe/Berlin        +02:00  +02:00               +02:00

It is good to be back! Timezone travelling is exhausting.

5) The word calendar comes from the Latin word Kalendae which meant the first day of the month

A calendar is essentially a list of first month days.

Getting the first day of a month is easy.

trunc(sysdate,'MM') => 1st of the current month.

But what about the first day of a week? And which week are we in?

This is complex for two major reasons. There is quite some confusion about when does a week start and what is the first week in the year. And with confusion I mean, different countries have different rules for that.

https://www.calendar-week.org/

The good thing is, the Oracle database has all this knowledge. A developer just has to understand it. As usual NLS settings will influence the behaviour.

Different date formats refer to those NLS settings.

‘D’ returns the number of the day inside the week.

‘WW’ returns the number of the week inside the year. Week 1 starts on the first day of the year and continues to the seventh day of the year.
‘IW’ returns the number of the ISO-week.

‘YYYY’ refers to the year
‘IYYY’ refers the ISO-year.

some examples

What weekday is the first day of the week?

NLS_TERRITORY influences what the first day of a week is.

alter session set NLS_DATE_LANGUAGE = 'AMERICAN';

alter session set NLS_TERRITORY = 'AMERICA';
select to_char(trunc(sysdate,'D'),'Day') as "First Day in America" from dual;
alter session set NLS_TERRITORY = 'GERMANY';
select to_char(trunc(sysdate,'D'),'Day') as "First Day in Germany" from dual;
alter session set NLS_TERRITORY = 'IRAQ';
select to_char(trunc(sysdate,'D'),'Day') as "First Day in Iraq"  from dual;
select to_char(trunc(sysdate,'IW'),'Day') as "ISO First Day" from dual;

result

Session altered.
Session altered.

First Day in America
---------
Sunday   

Session altered.

First Day in Germany
---------
Monday   

Session altered.

First Day in Iraq
---------
Saturday 

ISO First Day
---------
Monday   

NLS_TERRITORY does not directly influence what the first week in the year is! To get the week we always use formatter options (IW or WW) that directly decide, which week logic needs to be applied.

However NLS_TERRITORY sets an ISO-Week flag. Unfortunately this is not well documented. I believe this flag is partially responsible for one of the behaviours mentioned in the “Stranger Things” section.

5 Stranger Things – bug or feature?

Stranger Things logo

1) sysdate returns a date, but systimestamp does not return a timestamp

systimestamp returns a timestamp with timezone. Localtimestamp would be the function that returns a timestamp. In many cases systimestamp still is the best option to use!

These are the DATATYPES : pseudocolumns dealing with points in time.

  • DATE : sysdate, current_date
  • TIMESTAMP : localtimestamp
  • TIMESTAMP WITH TIME ZONE : systimestamp, current_timestamp
  • TIMESTAMP WITH LOCAL TIME ZONE

example
First modify the session so that session timezone and db timezone differs.

ALTER SESSION SET TIME_ZONE='+10:00';

show nls
DB_TIMEZONE +01:00
SESSION_TIMEZONE +10:00
SESSION_TIMEZONE_OFFSET +10:00

Check the pseudocolumns and convert them to timestamp if needed (using cast). The same would happen when the value is stored in a timestamp column. The database implicitly does the conversion.

select to_char(sysdate,'DD-MM-RR HH24:MI:SS') as "sysdate",
       to_char(localtimestamp,'DD-MM-RR HH24:MI:SS') as "localTS",
       to_char(current_timestamp,'DD-MM-RR HH24:MI:SS TZH') as "currentTS",
       to_char(systimestamp,'DD-MM-RR HH24:MI:SS TZH') as "sysTS",
       to_char(cast(current_timestamp as timestamp),'DD-MM-RR HH24:MI:SS') as "currentTS_converted",
       to_char(cast(systimestamp as timestamp),'DD-MM-RR HH24:MI:SS') as "sysTS_converted"
from dual;

Result (timestamp):

sysdate	              16-07-18 15:01:58
localTS	              16-07-18 23:01:58
currentTS             16-07-18 23:01:58 +10	
sysTS	              16-07-18 15:01:58 +02
currentTS_converted   16-07-18 23:01:58
sysTS_converted       16-07-18 15:01:58

Note that localtimestamp and current_timestamp returns (and stores) a different hour than sysdate. Systimestamp returns the same hour after conversion and therefore matches sysdate.

Better would be to have a column that stores the timezone information too. Either timestamp with time zone or timestamp with local time zone. Since TZ info is stored, it does not matter anymore which datatype has the source.


select to_char(sysdate,'DD-MM-RR HH24:MI:SS') as "sysdate",
       to_char(localtimestamp,'DD-MM-RR HH24:MI:SS') as "localTS",
       to_char(current_timestamp,'DD-MM-RR HH24:MI:SS TZH') as "currentTS",
       to_char(systimestamp,'DD-MM-RR HH24:MI:SS TZH') as "sysTS",
       to_char(cast(localtimestamp as timestamp with local time zone),'DD-MM-RR HH24:MI:SS [TZR]') as "localTS_converted",
       to_char(cast(current_timestamp as timestamp with local time zone),'DD-MM-RR HH24:MI:SS [TZR]') as "currentTS_converted",
       to_char(cast(systimestamp as timestamp with local time zone),'DD-MM-RR HH24:MI:SS [TZR]') as "sysTS_converted"
from dual;

Result (timestamp with local time zone):

sysdate	              16-07-18 15:13:26
localTS	              16-07-18 23:13:26
currentTS             16-07-18 23:13:26 +10	
sysTS	              16-07-18 15:13:26 +02
localTS_converted     16-07-18 23:13:26 [+10:00]
currentTS_converted   16-07-18 23:13:26 [+10:00]
sysTS_converted       16-07-18 23:13:26 [+10:00]

As we can see the result is always the same point In Time.

2) Micro Intervals

And of cause there are also datatypes that deal with durations.

  • NUMBER
  • INTERVAL YEAR TO MONTH
  • INTERVAL DAY TO SECOND

Why are there two interval types? Because those are independend measurement systems for a duration of time. And there is no correct conversion possible from one into the other.

Consider this: 29 days (interval day to second) could be slightly less, slightly more or sometimes even exactly one 1 month (interval year to month). We can only do this conversion if we additionally have some reference point, like the starting day. Without this reference point in time, we can’t correctly convert from one system of measurement into the other.

We can convert from number to interval using the functions numtoDSinterval or numtoYMinterval.

Let’s do some workday calculations.

My typical day of work is from (ok I might understate a bit) is from 9:30 to 18:30.

select to_date('18:30','HH24:MI') - to_date('9:30','HH24:MI') from dual;
0.375 

A date minus a date returns a number. This is the result in days. So I worked for 0.375 days. Lets see that in interval format.

select numtoDSinterval(0.375, 'day') from dual;
+00 09:00:00.000000

Looks about right!

Can we also see that in a Month interval ? The MONTHS_BETWEEN functions gives us the result in number (= months) for the difference between two dates.

select months_between(to_date('19:30','HH24:MI')
                     ,to_date('10:30','HH24:MI')) 
from dual;
0

Hm… 9 hours equal 0 months…

Let’s assume somebody has a large shift requiring to work for more than 24 hours.

-- how many months did he work?
select months_between(to_date('2018-07-02 19:30','YYYY-MM-DD HH24:MI')
                     ,to_date('2018-07-01 07:15','YYYY-MM-DD HH24:MI'))
from dual;
0.0487231182795698924731182795698924731183

And as an interval…

select numtoYMinterval(0.0487, 'MONTH') from dual;
+00-01

Hmm… 0.0488 Months => 0 Years 1 Month ?

Is it always rounded up?

So what would a larger result be:

select numtoYMinterval(1.0487, 'MONTH') from dual;
+00-01

Hmm… 1.0487 Months => 0 Years 1 Month

This time it is not rounded up, but rounded down.

How about some more micro intervals?

numtoYMinterval(0.001, 'MONTH') => +00-00
numtoYMinterval(0.009, 'MONTH') => +00-00
numtoYMinterval(0.01,  'MONTH') => +00-01
numtoYMinterval(0.49,  'MONTH') => +00-01
numtoYMinterval(0.5,   'MONTH') => +00-01
numtoYMinterval(1.01,  'MONTH') => +00-01
numtoYMinterval(1.5,   'MONTH') => +00-02
numtoYMinterval(2.01,  'MONTH') => +00-02
numtoYMinterval(2.5,   'MONTH') => +00-03

Result

For very small values (0.01-0.49) numtoYMinterval rounds up, instead of rounding down. For larger values, the rounding is consistent. Those values are rounded down, as expected.

This feature was implemented somewhere between Oracle DB version 10 and 11. It took me a while, but now it is accepted as a unpublished bug (26244914).

Recommendation: Avoid numtoYMinterval. There are several strange edge cases with that function.

3) date format “J” – Julian

We can use the J format mask to spell out numbers (Jsp).
J is called Julian day; the number of days since January 1, 4712 BC.

Let’s play around with it a little.

select to_char(sysdate,'J') from dual;

2458317

Of cause when you execute it, you will get a different (larger) number. Time passed.

We use this number and substract it from sysdate, then add the number of days we are interested in. This time using “sp” = spelling in the format parameter.

select to_char((sysdate-(2458317-5)),'Jsp') from dual
Five

Cool! This way we can spell out any number. Unfortunatly this only works in English, not in other languages.

Important side note: The calculation must be like this

sysdate-(2458317-5)

Although mathematically the same, this would have resulted in an error

(sysdate-2458317)+5
ORA-01841: (full) year must be between -4713 and +9999, and not be 0

Problem is that the resulting date value is invalid (before the beginning of time).

To avoid hardcoding 2458317, we can calculate it from sysdate.

select to_char(sysdate-(to_number(to_char(sysdate,'J'))-123),'Jsp') spell_number from dual;
One Hundred Twenty-Three

Ok let’s try a large number…

select to_char(sysdate-(to_number(to_char(sysdate,'J'))-1721116),'Jsp') spell_number from dual
One Million Seven Hundred Twenty Thousand Seven Hundred Fifty-One

And one more…

select to_char(sysdate-(to_number(to_char(sysdate,'J'))-1721117),'Jsp') spell_number from dual
000000000000000000000000000000000000000000000000000000000000000000000000000000

Wat!? So spelling out One Million Seven Hundred Twenty Thousand Seven Hundred Fifty-One is ok, but One Million Seven Hundred Twenty Thousand Seven Hundred Fifty-Two not…

Hold on a minute. Even the first number was not spelled correctly!
1721116 != One Million Seven Hundred Twenty Thousand Seven Hundred Fifty-One

As so often there is a reason for that. The difference between both numbers is exactly 365. Which is our first clue. We need to understand which dates are we talking about.

The following rows depend heavily on the database version it is running.

This is from a 12.2.0.1 DB. For readability, I removed several hundred rows from the output.

with testdata as (select level lv, 1721340-300+level days# from dual connect by level <= 400)
select lv, days#, 
       to_char(sysdate-(to_number(to_char(sysdate,'J'))-days#),'YYYY-MM-DD A.D.') dy,
       to_char(sysdate-(to_number(to_char(sysdate,'J'))-days#),'J') Julian,
       to_char(sysdate-(to_number(to_char(sysdate,'J'))-days#),'Jsp') spelled_number
from testdata;
LV	DAYS#	DY		JULIAN	SPELLED_NUMBER
1	1721041	0001-12-15 B.C.	1721041	One Million Seven Hundred Twenty-One Thousand Forty-One
2	1721042	0001-12-16 B.C.	1721042	One Million Seven Hundred Twenty-One Thousand Forty-Two
3	1721043	0001-12-17 B.C.	1721043	One Million Seven Hundred Twenty-One Thousand Forty-Three
4	1721044	0001-12-18 B.C.	1721044	One Million Seven Hundred Twenty-One Thousand Forty-Four
5	1721045	0001-12-19 B.C.	1721045	One Million Seven Hundred Twenty-One Thousand Forty-Five
6	1721046	0001-12-20 B.C.	1721046	One Million Seven Hundred Twenty-One Thousand Forty-Six
7	1721047	0001-12-21 B.C.	1721047	One Million Seven Hundred Twenty-One Thousand Forty-Seven
8	1721048	0001-12-22 B.C.	1721048	One Million Seven Hundred Twenty-One Thousand Forty-Eight
9	1721049	0001-12-23 B.C.	1721049	One Million Seven Hundred Twenty-One Thousand Forty-Nine
10	1721050	0001-12-24 B.C.	1721050	One Million Seven Hundred Twenty-One Thousand Fifty
11	1721051	0001-12-25 B.C.	1721051	One Million Seven Hundred Twenty-One Thousand Fifty-One
12	1721052	0001-12-26 B.C.	1721052	One Million Seven Hundred Twenty-One Thousand Fifty-Two
13	1721053	0001-12-27 B.C.	1721053	One Million Seven Hundred Twenty-One Thousand Fifty-Three
14	1721054	0001-12-28 B.C.	1721054	One Million Seven Hundred Twenty-One Thousand Fifty-Four
15	1721055	0001-12-29 B.C.	1721055	One Million Seven Hundred Twenty-One Thousand Fifty-Five
16	1721056	0001-12-30 B.C.	1721056	One Million Seven Hundred Twenty-One Thousand Fifty-Six
17	1721057	0001-12-31 B.C.	1721057	One Million Seven Hundred Twenty-One Thousand Fifty-Seven
18	1721058	0001-01-01 B.C.	1720693	One Million Seven Hundred Twenty Thousand Six Hundred Ninety-Three
19	1721059	0001-01-02 B.C.	1720694	One Million Seven Hundred Twenty Thousand Six Hundred Ninety-Four
20	1721060	0001-01-03 B.C.	1720695	One Million Seven Hundred Twenty Thousand Six Hundred Ninety-Five
21	1721061	0001-01-04 B.C.	1720696	One Million Seven Hundred Twenty Thousand Six Hundred Ninety-Six
22	1721062	0001-01-05 B.C.	1720697	One Million Seven Hundred Twenty Thousand Six Hundred Ninety-Seven
23	1721063	0001-01-06 B.C.	1720698	One Million Seven Hundred Twenty Thousand Six Hundred Ninety-Eight
24	1721064	0001-01-07 B.C.	1720699	One Million Seven Hundred Twenty Thousand Six Hundred Ninety-Nine
25	1721065	0001-01-08 B.C.	1720700	One Million Seven Hundred Twenty Thousand Seven Hundred

... snip 1721066-1721112 no major change there...

73	1721113	0001-02-25 B.C.	1720748	One Million Seven Hundred Twenty Thousand Seven Hundred Forty-Eight
74	1721114	0001-02-26 B.C.	1720749	One Million Seven Hundred Twenty Thousand Seven Hundred Forty-Nine
75	1721115	0001-02-27 B.C.	1720750	One Million Seven Hundred Twenty Thousand Seven Hundred Fifty
76	1721116	0001-02-28 B.C.	1720751	One Million Seven Hundred Twenty Thousand Seven Hundred Fifty-One
77	1721117	0000-00-00 0000	0000000	000000000000000000000000000000000000000000000000000000000000000000000000000000
78	1721118	0001-03-01 B.C.	1720752	One Million Seven Hundred Twenty Thousand Seven Hundred Fifty-Two
79	1721119	0001-03-02 B.C.	1720753	One Million Seven Hundred Twenty Thousand Seven Hundred Fifty-Three
80	1721120	0001-03-03 B.C.	1720754	One Million Seven Hundred Twenty Thousand Seven Hundred Fifty-Four

... snip 1721120-1721420 no major change there...

381	1721421	0001-12-29 B.C.	1721055	One Million Seven Hundred Twenty-One Thousand Fifty-Five
382	1721422	0001-12-30 B.C.	1721056	One Million Seven Hundred Twenty-One Thousand Fifty-Six
383	1721423	0001-12-31 B.C.	1721057	One Million Seven Hundred Twenty-One Thousand Fifty-Seven
384	1721424	0001-01-01 A.D.	1721424	One Million Seven Hundred Twenty-One Thousand Four Hundred Twenty-Four
385	1721425	0001-01-02 A.D.	1721425	One Million Seven Hundred Twenty-One Thousand Four Hundred Twenty-Five
386	1721426	0001-01-03 A.D.	1721426	One Million Seven Hundred Twenty-One Thousand Four Hundred Twenty-Six
387	1721427	0001-01-04 A.D.	1721427	One Million Seven Hundred Twenty-One Thousand Four Hundred Twenty-Seven
388	1721428	0001-01-05 A.D.	1721428	One Million Seven Hundred Twenty-One Thousand Four Hundred Twenty-Eight
389	1721429	0001-01-06 A.D.	1721429	One Million Seven Hundred Twenty-One Thousand Four Hundred Twenty-Nine
390	1721430	0001-01-07 A.D.	1721430	One Million Seven Hundred Twenty-One Thousand Four Hundred Thirty
391	1721431	0001-01-08 A.D.	1721431	One Million Seven Hundred Twenty-One Thousand Four Hundred Thirty-One
392	1721432	0001-01-09 A.D.	1721432	One Million Seven Hundred Twenty-One Thousand Four Hundred Thirty-Two
393	1721433	0001-01-10 A.D.	1721433	One Million Seven Hundred Twenty-One Thousand Four Hundred Thirty-Three
394	1721434	0001-01-11 A.D.	1721434	One Million Seven Hundred Twenty-One Thousand Four Hundred Thirty-Four
395	1721435	0001-01-12 A.D.	1721435	One Million Seven Hundred Twenty-One Thousand Four Hundred Thirty-Five
396	1721436	0001-01-13 A.D.	1721436	One Million Seven Hundred Twenty-One Thousand Four Hundred Thirty-Six
397	1721437	0001-01-14 A.D.	1721437	One Million Seven Hundred Twenty-One Thousand Four Hundred Thirty-Seven
398	1721438	0001-01-15 A.D.	1721438	One Million Seven Hundred Twenty-One Thousand Four Hundred Thirty-Eight
399	1721439	0001-01-16 A.D.	1721439	One Million Seven Hundred Twenty-One Thousand Four Hundred Thirty-Nine
400	1721440	0001-01-17 A.D.	1721440	One Million Seven Hundred Twenty-One Thousand Four Hundred Forty

Two strange things to observe. If we look closely we can see that the year 1 BC repeats itself two times. For example line 17 and line 383 both show the 31st of December 1 BC. I now say 1 BC[1] and 1 BC[2] to distinguish the entries for those years. Second observation is that the 29th of February 1 BC[2] exists and is kind of undefined (0000-00-00 0000 0000000). The 29th of februar in year 1 BC[1] does not exist in the data. This is not shown in the sample set, but you can easily modify the query to see for yourself.

In older DB versions the second 1 BC occurence would be year 0. For which some of the date functions also return the error message

ORA-01841: (full) year must be between -4713 and +9999, and not be 0

I believe that there was some half baked bug fix at oracle. Year 0 was aliased as Year 1 BC. This also explains, why there seems to be a leap day. Year 0 would have a leap day according to the logical rules (as defined by Pope Gregory XIII). In older database versions any day from the year 0 resulted in 0000-00-00 0000. Which also created problems. Fortunatly the issue only seems to appear when playing around with the J(ulian) format. And even then only for some very large values. The chance to get bugs in real world systems is extremly slim.

some more examples
Date based comparison

select case when 1721057 = 1721423  
            then 'equal' else 'unequal' end as "check" 
from dual;
unequal

As expected those two numbers are not equal.
Compare the same numbers, but added to the same date.

select case when to_date('01-01-4712 BC','DD-MM-YYYY AD') + 1721057 
               = to_date('01-01-4712 BC','DD-MM-YYYY AD') + 1721423 
            then 'equal' else 'unequal' end as "check"
from dual;
equal

Now they are equal.

consecutive day check

Negative years within a date literal are BC years. Here follows a shot list of small date calculations around the problematic years and days.

select to_char(date '-0001-12-31','DD-Mon-YYYY AD') from dual;
31-Dec-0001 BC
select to_char(date '-0001-12-31' + 1,'DD-Mon-YYYY AD') from dual;
01-Jan-0001 AD

31-Dec-0001 BC + 1 = 01-Jan-0001 AD

select to_char(date '-0001-01-01' - 1,'DD-Mon-YYYY AD') from dual;
31-Dec-0002 BC

01-Jan-0001 BC – 1 = 31-Dec-0002 BC

select to_char(date '-0002-12-31' + 1,'DD-Mon-YYYY AD') from dual;
01-Jan-0001 BC

31-Dec-0002 BC + 1 = 01-Jan-0001 BC

select to_char(date '-0001-02-28' + 1,'DD-Mon-YYYY AD') from dual;
01-Mar-0001 BC

28-Feb-0001 BC = 01-Mar-0001 BC

Seems about right.

At least here Oracle adhers to ISO 8601 regulation “every date must be consecutive”.

And some more edge cases

select to_char(date '-4712-1-1','YYYY A.D.') from dual;
4712 B.C.
select to_char(date '-1000000','YYYY A.D.') from dual;
ORA-01841: (full) year must be between -4713 and +9999, and not be 0

So there is no 1 Million Years B.C. in the Oracle DB.

4) The first ISO-year day for 1st Januar 2021 is in 2019

select to_char(trunc(date '2021-01-01','IYYY'),'YYYY-MM-DD') from dual;

2019-12-30

Now this looks strange! It happens in all supported DB versions.
According to the docs IYYY is a supported format model for TRUNC. It is specified as “Year containing the calendar week, as defined by the ISO 8601 standard“.
Note that the doc mentions the calendar week. The 1st Januar 2021 is in calendar week (IW) 53. That week still is part of year 2020. And the first day (Monday) of the first ISO week of 2020 is 30th of December 2019.

So it seems as if

TRUNC(:date,'IYYY') 

really does:

TRUNC(
      TRUNC(
            TRUNC(:date,'IW')  -- first day of the iso-week
     ,'YEAR') -- first day of the calendar YEAR regardless of iso
,'IW') -- first day of iso-week

I think this is illogical behaviour and as such should be considered a bug. However it is correct in terms of the ISO 8601 – which calls it the “ISO week-numbering year”. So most likely it will not be fixed.

Remember: Always truncate useing ‘YEAR’ or ‘YYYY’, never use ‘IYYY’ it is just to confusing.

5) lateral SQL injection

Consider the following code. Is it safe?

create or replace procedure date_proc(p_date in date) is
  stmt varchar2(200);
  res  varchar2(100);
begin
  stmt:='select object_name from all_objects where created = ''' || p_date ||'''';
  dbms_output.enable;
  dbms_output.put_line(stmt);
  execute immediate stmt into res;
  dbms_output.put_line('result:'||res);
exception
  when no_data_found then
    dbms_output.put_line('result: no objects found');
end;
/

set serveroutput on
execute date_proc(sysdate);

Dynamic sql is always at risk for sql injection. However since that code makes sure the value is a correct date it doesn’t seem possible to tamper with the code.

This is just some demo code. It has other issues as well (like removing time from the input).

Afaik David Litchfield was the first to describe the potential security issue with this. He called it the Lateral SQL injection. I prefer the name SQL injection by NLS.

See also: http://www.hexatier.com/lateral-sql-injection-in-oracle-database/

Consider the following NLS setting. Any client can modify its NLS environment.

alter session set nls_date_format = 'DD/MM/YYYY"'' or 1=1"--';

Then run the statement again!

set serveroutput on
execute date_proc(sysdate);
select object_name from all_objects where created = '16/07/2018 ' or 1=1--'
ORA-01422: exact fetch returns more than requested number of rows

Ok this happens, because the code doesn’t expect multiple rows to be returned. We can modify that.

alter session set nls_date_format = 'DD/MM/YYYY"'' or 1=1 and rownum=1"--';

Fortunatly the possibilites via NLS format are limited. For example the length of the injected code can only be very small. When trying more complex modifications we get

ORA-01801: date format is too long for internal buffer

However sometimes this can potentially be used as a first step for more serious hacking.

A similar exploit is possible using the NUMBER datatype (via nls_numeric_characters).

The more interesting question is, how to correct code like this.
One option is to use bind variables.

create or replace procedure date_proc(p_date in date) is
  stmt varchar2(200);
  res  varchar2(100);
begin
  stmt:='select object_name from all_objects where created = :dat';
  dbms_output.enable;
  dbms_output.put_line(stmt);
  execute immediate stmt into res using p_date ;
  dbms_output.put_line('result:'||res);
exception
  when no_data_found then
    dbms_output.put_line('result: no objects found');
end;
/

Other options include validating any concatenated input for example by using dbms_assert.

Conclusion

Calendars are difficult. Oracle implemented calendar functionality very thoroughly.
It is the duty of a developer/DBA to understand the complexities around calendars, time zones, time conversions and Time Machines.

If you found any movie references – keep them!

plsql collection lookup with multiple keys

problem scenario

A recent question on otn asked how to access a plsql collection using multiple keys.

It is a typical scenario to build a second collection, to be able to access the main data in the first collection. The second collection fulfills the same role as an index on a table. It allows fast access to the main data record. I usually call the second collection an “index collection”.

Here is an article by Steven Feuerstein who explains the concept of index collections in more detail: https://blogs.oracle.com/oraclemagazine/on-the-pga-and-indexing-collections

But the forum question was not how to do a simple key => value lookup, but instead have two keys (based upon record set values) and use them to access the main collection.

solution

There are two general ways.
Combine the keys into a single key or build a nested collection. I will show a quick example for both ways.

Way 1) Combined key

We can consolidate the two keys into one single key. Typically using some delimiter. And then use that combined key for the index_collection.

combinedKey := key1||':'||key2;

Of cause we need to make sure the delimiter is some value that does not exist in any of the keys.

Example
If you try to copy and run this example please note that the syntax highlighter removed the label “build_index”
You might want to add this again, just after the “– index data” comment.

set serveroutput on
declare
  cursor c is (select trunc((level+2)/3) lvkey, chr(ASCII('A')+mod(level*2,3)) letter, round(dbms_random.value(1,100)) val 
               from dual 
               connect by level <=10
                );
  type tabdata_t is table of c%rowtype index by binary_integer;   
  tabdata tabdata_t;
  
  type keylookup_t is table of binary_integer index by varchar2(100);
  keylookup keylookup_t;
begin
  -- load data
  open c;
  fetch c 
  bulk collect into tabdata; -- notice no limit clause here. For larger record sets you need to use a loop and LIMIT!
  close c;
  
  -- index data
  <>
  for i in 1..tabdata.count loop
    keylookup(to_char(tabdata(i).lvKey)||':'||tabdata(i).letter) := i;
  end loop build_index;
  -- index is now complete. 
 
   -- Test the index first
  dbms_output.put_line('Index 1B=>'|| to_char(keylookup('1:B')));
  dbms_output.put_line('Index 3A=>'|| to_char(keylookup('3:A')));

  -- now fetch the data using the index
  dbms_output.put_line('Data 1B=>'|| tabdata(keylookup('1:B')).val);
  dbms_output.put_line('Data 3A=>'|| tabdata(keylookup('3:A')).val);
end;
/

Output

Index 1B=>2
Index 3A=>9
Data 1B=>32
Data 3A=>67

PL/SQL procedure successfully completed.

Way 2) collection of collection

A collection can be part of another collection. Nesting collections means we could do a lookup using two (or more) key values where each key is used for one collection.

lookup(key1) => col2
this returns a collection (e.g. col2). We can access the elements of the collection using the second key.

col2(key2) => element

or the short form:
lookup(key1)(key2) => element

For readability we could add a record layer in between, but that is not neccessary.
lookup(key1).list(key2) => element

Example
Notice the definition of the index collection is in line 14 with the matching type definitions in line 11 and 12.

set serveroutput on
declare
  cursor c is (select trunc((level+2)/3) lvkey, chr(ASCII('A')+mod(level*2,3)) letter, round(dbms_random.value(1,100)) val 
               from dual 
               connect by level <=10
                );
  type tabdata_t is table of c%rowtype index by binary_integer;   
  
  tabdata tabdata_t;
  
  type letterlookup_t is table of binary_integer index by varchar2(10);
  type keylookup_t is table of letterlookup_t index by binary_integer; -- number not allowed!
  
  keylookup keylookup_t;
  empty_key letterlookup_t;
begin
  -- load data
  open c;
  fetch c 
  bulk collect into tabdata; -- notice no limit clause here. For larger record sets you need to use a loop and LIMIT!
  close c;
  
  -- index data
  <>
  for i in 1..tabdata.count loop
    if keylookup.exists(tabdata(i).lvKey) then
      if keylookup(tabdata(i).lvKey).exists(tabdata(i).letter) then 
        -- same key twice?
        -- maybe add the values, maybe raise an error
        raise dup_val_on_index;
      else  
        dbms_output.put_line('build index KEY='||tabdata(i).lvKey||',+letter='||tabdata(i).letter);
        keylookup(tabdata(i).lvKey)(tabdata(i).letter) := i;
      end if;      
    else -- key not in index yet
      dbms_output.put_line('build index +KEY='||tabdata(i).lvKey||',+letter='||tabdata(i).letter);
      keylookup(tabdata(i).lvKey) := empty_key;
      keylookup(tabdata(i).lvKey)(tabdata(i).letter) := i;
    end if;      
      
  end loop build_index;
  -- index is now complete. 
 
  -- Lets access the data using some combinations of keyLv and letters
  -- Test the index first
  dbms_output.put_line('Index 1B=>'|| to_char(keylookup(1)('B')));
  dbms_output.put_line('Index 3A=>'|| to_char(keylookup(3)('A')));
  --dbms_output.put_line('1F='|| keylookup(1)('F')); -- this will raise NO_DATA_FOUND
  
  -- now fetch the data using the index
  dbms_output.put_line('Data 1B=>'|| tabdata(keylookup(1)('B')).val);
  dbms_output.put_line('Data 3A=>'|| tabdata(keylookup(3)('A')).val);
  
end;
/

Output
build index +KEY=1,+letter=C
build index KEY=1,+letter=B
build index KEY=1,+letter=A
build index +KEY=2,+letter=C
build index KEY=2,+letter=B
build index KEY=2,+letter=A
build index +KEY=3,+letter=C
build index KEY=3,+letter=B
build index KEY=3,+letter=A
build index +KEY=4,+letter=C
Index 1B=>2
Index 3A=>9
Data 1B=>62
Data 3A=>34

PL/SQL procedure successfully completed.

If we try to access a collection that does not exist we get an error (usually no_data_found). Since the index collections are always sparse, this is something to keep in mind. If you are not sure if the key combination is already indexed, then either check for existence or react on the NO_DATA_FOUND.

comparison

It is not easy to compare the two approaches. The first way looks slightly less complex. It depends also how familar other developers are with collections and especially with nested collections. For many the double parenthesis syntax “myCol()()” is a little complicated at the beginning.

For very complex scenarios the first way might be the better way. It depends on data distribution (the more sparsly populated the key combinations are, the better is this way) and on how many keys (=dimensions) we have.

I once measured the performance in a system where we needed 5 dimensions (5 different keys) to access some data. The combined key lookup was faster than building a complex collection of collection of collection of collection. But the additional time to concat the key values in the end also was a large performance burden.

So in my specific case
lookup('A:B:C:D:E') >>> lookup('A')('B')('C')('D')('E')

I do not think this performance experience is representativ.

conclusion

It is possible to use two keys as an index collection to lookup data in the main collection. Nested collections is a tool that every plsql developer should know about. Only when we know our tools, we can decide when to use them or when not.

dbms_scheduler 12c – run EXTERNAL_SCRIPT

Introduction

With 12c we have several new job types for our scheduler jobs. One of them is EXTERNAL_SCRIPT. The other new job types are SQL_SCRIPT and BACKUP_SCRIPT.

From Oracle 12.2 plsql packages and type reference

‘EXTERNAL_SCRIPT’

This specifies that the job is an external script that uses the command shell of the computer running the job. For Windows this is cmd.exe and for UNIX based systems the sh shell, unless a different interpreter is specified by prefixing the first line of the script with #!.

In the past we could run an external script using the EXECUTABLE job type. This type is still available.

‘EXECUTABLE’

This specifies that the job is going to be run outside the database using an external executable. External jobs are anything that can be executed from the command line of the operating system. Anydata arguments are not supported with a job or program type of EXECUTABLE. The job owner must have the CREATE EXTERNAL JOB system privilege before the job can be enabled or run.

In general both options could do the same thing. Execute something on the host OS.
So why should we change anything? Is there a difference?

Let’s find out.

schedule an EXTERNAL_SCRIPT

setup credentials first

We can create credentials using a dbms package or via SQL Developer.

dbms_credential.create_credential(credential_name   =>  'ORACLE_OS_CREDS',
                                  username          =>  'oracle',
                                  password          =>  'oracle',
                                  comments          =>  'run scripts using oracle OS account');

In the developer VM box, the password is always oracle. That’s why I included it here. You need to use your own correct password.

For real world environments I suggest to create a specific OS account that is only allowed to execute the script and to do anything that needs to be done for this specific task but not more. This account might need the “Log On As Batch Job” Right under windows (support note #2065024.1).

For demonstration purposes I stay here with the oracle credentials.

setup a scheduled job to run a linux script

Of cause this works under windows too, but I did test it only using Oracle Linux.

Here I setup 4 slightly different examples how to run an EXTERNAL_SCRIPT job. After that we check and compare the output.

These are our for slightly different test scenarios.

  1. run a simple bash script.
  2. run a script, that has an error
  3. run a script with an error, but an exitcode=0
  4. same as 3. and use undocumented FAIL_ON_SCRIPT_ERROR argument

To test what happens if the script itself has an error, I added a change directory command pointing to a non existent directory.
This command will result in an error.

cd /abcd/efgh/ijk

Here is the command to schedule the 4 jobs. Each job has a slightly different name. The differences between one and the previous job are marked.

-- First test a script that does not produce an error
declare
  v_jobname     varchar2(200);
  v_good_script clob;
begin
   v_jobname := upper('DEMO_SCHEDULED_EXTERNAL_SCRIPT_NO_ERROR');

  -- the following line breaks are important.
  -- Do NOT remove them, they are part of the linux script.
  v_good_script :=  '#!/bin/bash
echo "Job ok!"';

   dbms_scheduler.create_job(job_name   => v_jobname,
                              job_type   => 'EXTERNAL_SCRIPT',
                              job_action => v_good_script,
                              credential_name => 'ORACLE_OS_CREDS',
                              enabled    => false,
                              auto_drop  => false
                              );
   -- run the job
   dbms_scheduler.enable(v_jobname);         

end;
/

PL/SQL procedure successfully completed.

-- Now test a script that does produce an error
declare
  v_jobname    varchar2(200);
  v_bad_script clob;
begin
   v_jobname := upper('DEMO_SCHEDULED_EXTERNAL_SCRIPT_WITH_ERROR');

   v_bad_script :=  '#!/bin/bash
cd /abcd/efgh/ijk';

   dbms_scheduler.create_job(job_name   => v_jobname,
                              job_type   => 'EXTERNAL_SCRIPT',
                              job_action => v_bad_script,
                              credential_name => 'ORACLE_OS_CREDS',
                              enabled    => false,
                              auto_drop  => false
                              );
   dbms_scheduler.enable(v_jobname);
end;
/

PL/SQL procedure successfully completed.

-- Now test a script that does produce an error but uses exit 0
declare
  v_jobname    varchar2(200);
  v_bad_script clob;
begin
   v_jobname := upper('DEMO_SCHEDULED_EXTERNAL_SCRIPT_WITH_ERROR_EXIT0');

   v_bad_script :=  '#!/bin/bash
cd /abcd/efgh/ijk
exit 0';

   dbms_scheduler.create_job(job_name   => v_jobname,
                              job_type   => 'EXTERNAL_SCRIPT',
                              job_action => v_bad_script,
                              credential_name => 'ORACLE_OS_CREDS',
                              enabled    => false,
                              auto_drop  => false
                              );

   dbms_scheduler.enable(v_jobname);
end;
/

PL/SQL procedure successfully completed.

-- run script using attribute FAIL_ON_ERROR
declare
  v_jobname    varchar2(200);
  v_bad_script clob;
begin
   v_jobname := upper('DEMO_SCHEDULED_EXTERNAL_SCRIPT_WITH_ERROR_FAILONERROR');

   v_bad_script :=  '#!/bin/bash
cd /abcd/efgh/ijk
exit 0';

   dbms_scheduler.create_job(job_name   => v_jobname,
                              job_type   => 'EXTERNAL_SCRIPT',
                              job_action => v_bad_script,
                              credential_name => 'ORACLE_OS_CREDS',
                              enabled    => false,
                              auto_drop  => false
                              );
  -- Make sure script errors result in a job error and are noticed.
   dbms_scheduler.set_attribute(
            name => v_jobname,
            attribute => 'FAIL_ON_SCRIPT_ERROR',
            value => true);                              

   -- run the job
   dbms_scheduler.enable(v_jobname);
end;
/

PL/SQL procedure successfully completed.

Note that all anonymous blocks executed successfully. This only means that we could create and start a scheduled job. It does not tell us, what the result of the job was.

To find the job result, we need to check the data dictionary view all_scheduler_job_run_details. The information is in there but only AFTER the job finished.

Results

The data dictionary gives us information about the result of the scheduler runs in the view ALL_SCHEDULER_JOB_RUN_DETAILS.

select replace(job_name,'DEMO_SCHEDULED_EXTERNAL_SCRIPT') as job_name,
       log_date, status, output, error#, errors, additional_info
from all_Scheduler_job_run_details
where job_name like 'DEMO_SCHEDULED_EXTERNAL_SCRIPT%'
--and log_date >= sysdate - interval '5' minute;

For presentation purposes I flipped rows and columns. So the next picture shows the columns from the DD view as rows.
result_run_details_transposed

Let’s go through the results step by step.

The first testcase did not have an error. status of the run = SUCCEEDED. The output column also shows the stdev output which is nice. So there is no need to spool the output into an extra file, just to be able to see later what happend. The same column is also used for jobs of type SQL_SCRIPT to return dbms_output.
 
The second testcase had an error. The job status correctly reported that the script errored with exit code = 1 (column error#). And we see what kind of error happen in the errors column.

"/tmp/job_2078996_3568888_script: line 2: cd: /abcd/efgh/ijk: No such file or directory
"

We also see that the script itself was created as a file in the /tmp folder using job and log id for the file name “job_2078996_3568888_script”.
Also note that the additional_info column says

"EXTERNAL_LOG_ID="job_2078995_3568886",
ORA-27369: job of type EXECUTABLE failed with exit code: Operation not permitted
"

This is slightly misleading, since the job_type was EXTERNAL_SCRIPT and not EXECUTABLE. And “Operation not permitted” could lead us suspecting some issue with privs (might be the case when “cd” doesn’t work) or with the credentials (definitly not the case here).

The third testcase had an error in the script, however it finished with exit 0 (=success). It makes sense that in this case the job run status is also marked as SUCCEEDED. However the error “no such file or directory” still can be found in the errors column.

The forth testcase uses a new feature. I’m not sure if that is already there in the 12.1 db version, all my tests were done under 12.2. The all_scheduler_jobs view has a new column FAIL_ON_SCRIPT_ERROR. It defaults to FALSE. We can set this as an attribute for the scheduled job.

   dbms_scheduler.set_attribute(
            name => v_jobname,
            attribute => 'FAIL_ON_SCRIPT_ERROR',
            value => true);   

This is currently undocumented, but it works, as above demo shows. I believe it is a documentation bug. The new column in the view is documented, but FAIL_ON_SCRIPT_ERROR is not in the list of allowed attributes.

The result is, that even with EXIT=0, the job status goes to FAILED. We also see a different error number 27382 instead of 1. 27382 seems to be the ORA-Error number that we also see in the additional_details column. This column now says “ORA-27382: job type EXTERNAL_SCRIPT has errors in the job action”. Not a major difference, but slightly better additional_details when we use this attribute.

Quite some interesting behaviour. So if we want to run a script that always exits with 0, we can still make our job go to FAILED if there are any errors inside that script.

I would guess that we find the same behaviour for the other new job_types SQL_SCRIPT and BACKUP_SCRIPT. I didn’t test it yet.

Comparison between EXTERNAL_SCRIPT and EXECUTABLE

Executable is the old way.

It requires to configure and start an external agent. Often this is a fairly complex task. It involves configuration of the listener, tnsnames.ora, extproc.ora and some other parts. Those are usually not under control of a developer. Additionally all external jobs that are executed using this agent run with the same OS privileges. Using credentials gives us a little more control.

Starting from 12.1 such external jobs can alternativly run with credentials. Same as I already showed for running external_scripts. So the credentials argument is only a half-baked one.

It is also difficult to track down an error in case something goes wrong. This is where EXTERNAL_SCRIPT seem much better then EXECUTABLE. I have to admit I did only some very short tests about this, but I never found a disadvantage for running a scheduled job as EXTERNAL_SCRIPT instead of EXECUTABLE.

So if you have a choice go with the newer option.

Integrate it into plsql including status check

How to setup a plsql procedure that is able to run an external_script, but also returns an exception, in case the script runs into errors?

Lets assume the external script is something like this:

# set environment
export PATH=/usr/local/bin:$PATH
export ORACLE_SID=XE
export ORAENV_ASK=NO
. /usr/local/bin/oraenv

CLASSPATH=fonts
#CLASSPATH=$CLASSPATH:$APEX_HOME/utilities
CLASSPATH=$CLASSPATH:.

export CLASSPATH

cd /opt/jasper/report1

java -jar runJasperReport.jar
exit

So this executes some jar file. I don’t want to wait until the java logic is finished, but I want to be informed if something basic goes wrong, like if the jar file couldn’t be found.

Here is an example that I used in some APEX application. After the job is started, it hangs around for a couple of seconds and checks if anything surprising did happen. The code to start the job itself is not included. But you can assume that it is done in the same packaged procedure. The script is in the v_script variable.

Do not copy it 1:1, but understand it and adapt it to your needs.

...
    v_jobname             varchar2(128);
    r_job_details         all_scheduler_job_run_details%rowtype;
    r_job                 all_scheduler_jobs%rowtype;
    c_max_check_job_tries constant binary_integer :=3;
    v_message             varchar2(1000);
    v_script              clob;
begin
...
   -- check if the scheduled job did sucessfully start.
   -- In case some error happend after starting the job, we might not notice that, since it was started in the background.
   -- e.G. ORA-27369: Job of Typ EXECUTABLE not successfull with Exit-Code: No such file or directory
   -- In such a case check scheduler data dictionary almost immediatly after job creation.
   apex_util.pause(p_seconds => 0.5);
   <>
   for i in 1..c_max_check_job_tries loop 
      -- constant is set to 3, so max. wait time = 3.5 seconds.
   
        -- read info about scheduled job
        -- started/running jobs are in scheduler_jobs, fnished jobs including results are in scheduler_job_run_details
        begin
          select * into r_job from all_scheduler_jobs where job_name = upper(v_jobname);
          logger.trace(p_message=>'Job "'||v_jobname||'" created with state='||r_job.state );
       exception
         when no_data_found then 
           -- Job might have stopped already, check run details!
           r_job.state := 'NOT FOUND';
       end;    

       if r_job.state in ('RUNNING','SCHEDULED') then
         -- RUNNING+SCHEDULED => Looks ok, Job runs,just to make sure wait for another few seconds
         apex_util.pause(p_seconds =>  1);
         
       else
         begin
           -- for any other state check details
           select * into r_job_details from all_scheduler_job_run_details where job_name = upper(v_jobname);
           logger.trace(p_message=>'Job Details "'||v_jobname||'" with status='||r_job_details.status );
           if r_job_details.status= 'FAILED' or r_job.state= 'FAILED'  then
                v_message := 'Job "'|| v_jobname||'" with Error!';
                if r_job_details.additional_info like '%ORA-27369%' or r_job_details.additional_info like '%ORA-27382%' then
                  v_message := v_message ||' Jobaction='||v_script;
                end if;  
                pk_logging.pr_logError(p_message=>v_message);
                apex_error.add_error(   p_message => v_message,
                                        p_additional_info => r_job_details.additional_info,
                                        p_display_location => apex_error.c_inline_in_notification --apex_error.c_on_error_page
                                        );
                -- step out of loop and raise an error using the OS error message                
                Raise_application_error(-20001, r_job_details.errors);
           end if;      
         exception
           when no_data_found then
             -- Job not started yet or just about to finish...
             -- consider to wait a few sec first. Then raise an error if job still not there.
             -- last try?
             if i=c_max_check_job_tries then
                logger.error(p_message=>'Job "'||v_jobname||'" was not started!');
                Raise_application_error(-20001, 'Warning! Job "'|| v_jobname||'" wasn't started (yet)! Check application log!');
             else
                -- wait 1 second until job is hopefully created
                apex_util.pause(p_seconds =>  1);
             end if;
         end;
       end if;   
   end loop get_job_info;   
   if r_job.state = 'SCHEDULED' then
      -- still scheduled? inform user
      Raise_application_error(-20001, 'Warning! Job "'|| v_jobname||'" needs longer than expected to start. Please monitor closely and informa administrator!');
   end if;  
...

Security considerations

Running external jobs is always something where we need to take extra care – so that we do not put holes into our security defense system.

I carefully watch out for two major security risks:
1) If we add something dynamically to the script that we are executing, like an extra parameter, make sure to sanitize all the inputs. Otherwise we could get some kind of injection problem.

2) The agent/account that runs our script should only get the least needed privileges. So it shoud NOT be running under user oracle, like I did in the demo.
Create a separate account for that. Name the account to something that logically points to the task that it is supposed to do.

conclusion

The new job_type EXECUTE_SCRIPT is useful.

Some of the hurdles that developers face when trying to run a host command are lowered. The script itself does not need to be deployed on the database server.

Reacting to errors in the script is possible, but we need to check the correct columns and use the proper settings.

cleanup demo jobs

-- cleanup
-- remove all the jobs
execute dbms_scheduler.drop_job(job_name => 'DEMO_SCHEDULED_EXTERNAL_SCRIPT_NO_ERROR');
execute dbms_scheduler.drop_job(job_name => 'DEMO_SCHEDULED_EXTERNAL_SCRIPT_WITH_ERROR');
execute dbms_scheduler.drop_job(job_name => 'DEMO_SCHEDULED_EXTERNAL_SCRIPT_WITH_ERROR_EXIT0');
execute dbms_scheduler.drop_job(job_name => 'DEMO_SCHEDULED_EXTERNAL_SCRIPT_WITH_ERROR_FAILONERROR');

Tackling “cursor: pin S wait on X” wait event issue

Problem description

I found the following interesting issue by accident during an performance analysis using ASH and real time SQL monitoring.

We (@gassenmj and myself) saw that all connections from an application server connection pool were suddenly “hanging” at the same SQL statement. The single session did have wait times ranging from a few seconds up to 30 seconds and more. Additionally some CPU throttleling was going on.

After a couple of minutes the system healed itself and went back to normal behaviour. Those were the symptoms. Although that was not directly related to the performance analysis we had planned, we decided to investigate further.

 

Problem indicators

Here follows a list of statements that helped to analyze the problem.

check session infos

select * from v$session;

select * from v$session_wait 
--where sid=sys_context('userenv','sid')
order by wait_time+seconds_in_wait desc;

We saw a high number of “cursor: pin S wait on X” events with a high wait time. At that time I didn’t know what the issue was, but it was obvious that there is some kind of problem. If you encounter a high number of sessions all with the same wait event, then you might want to investigate further. Especially if the system seems generally slow while this event happens.

check the active session history (ASH) report

Please note that accessing the ASH report requires an enterprise edition and a licensed diagnostics pack. Do not try to run any select against v$active_session_history in a standard edition!

If you are allowed to use ASH, then run the following selects for a quick check. ASH typically holds information only for the last 30 minutes. Then the data is consolidated into the AWR report. If possible I prefere ASH because it samples information every second.

-- ash info
select ash.sample_time, ash.session_id, ash.session_serial#, ash.sql_id,  
       ash.event, -- ash.p1, ash.p1text, ash.p2, ash.p2text, ash.p3, ash.p3text,
       ash.wait_time, ash.time_waited, 
       ash.blocking_session, ash.blocking_session_serial#,
       ash.in_parse --, ash.in_hard_parse, ash.in_sql_execution
from v$active_session_history ash
where ash.event = 'cursor: pin S wait on X'
order by ash.sql_id, ash.sample_time;

Eventually the data is moved into the dba_hist_active_sess_history view.

-- awr info
select sql_id, sample_time, session_id, wait_time, time_waited, module, machine
from dba_hist_active_sess_history
where event = 'cursor: pin S wait on X'
order by sql_id, sample_time;

We can consolidate the infomation over all sessions that seem to be connected to the issue. This summary statment also includes the session that seems to cause the issue.

-- ash summary
with sum_by_session as (
        select ash.sql_id, ash.session_id, 
               trunc(ash.sample_time,'HH') timeframe, min(ash.sample_time) min_time, max(ash.sample_time) max_time,
               sum(ash.wait_time) + sum(ash.time_waited)  total_wait,
               ash.event, ash.p1, ash.p1text, ash.p2, ash.p2text, ash.p3, ash.p3text, ash.in_hard_parse
        from v$active_session_history ash
        join v$active_session_history sqlids on sqlids.sql_id = ash.sql_id
        where (ash.event like 'cursor: pin S%' or ash.in_hard_parse = 'Y' )
        and sqlids.event = 'cursor: pin S wait on X'
        group by  ash.sql_id,  ash.session_id, ash.event, ash.p1, ash.p1text, ash.p2, ash.p2text, ash.p3, ash.p3text, ash.in_hard_parse
                 ,trunc(ash.sample_time,'HH')
         )        
select s.sql_id, to_char(s.timeframe,'dd-Mon-RR HH24') timeframe,
       to_char(min(s.min_time),'HH24:MI:SS')||'-'||to_char(max(s.max_time),'HH24:MI:SS') timeperiod, 
       round(sum(total_wait)/1000000,2)  total_wait_in_s,
       s.event, s.p1, s.p1text, s.p2, s.p2text, s.p3, s.p3text, s.in_hard_parse, 
       listagg(s.session_id,',') within group (order by s.session_id) as sids
from sum_by_session s              
group by s.sql_id,  s.event, s.p1, s.p1text, s.p2, s.p2text, s.p3, s.p3text, s.in_hard_parse, s.timeframe
order by s.sql_id, s.in_hard_parse desc, s.timeframe;

Here is some sample output from a test database. It shows a list of sessions (column SQLIDs) that are waiting for cursor: pin S wait on X.

SQL_ID	   TIMEFRAME	    TIMEPERIOD	   TOTAL_WAIT_IN_S 	EVENT			P1		P1TEXT	P2		P2TEXT	P3		P3TEXT	IN_HARD_PARSE	SIDS
a7s0bjm8467xg	22-mai-18 04	04:21:10-04:21:10	0,46	cursor: pin S wait on X	3494059951	idn	674309865472	value	21474836480	where	N	20,255,301
cq42gcnqfs5vd	22-mai-18 04	04:21:09-04:21:09	0,04				1		file#	31620		block#	1		blocks	Y	247
cq42gcnqfs5vd	22-mai-18 04	04:21:10-04:21:10	0,03				1		file#	31323		block#	1		blocks	Y	247
cq42gcnqfs5vd	22-mai-18 04	04:21:09-04:21:10	610,9	cursor: pin S wait on X	753669997	idn	1060856922112	value	21474836480	where	N	18,51,70,73,127,169,191,229,326,379,390,426,436,456
3320xrzqy5t5c	22-mai-18 12	12:21:13-12:21:13	0				1		file#	22750		block#	1		blocks	Y	269
3320xrzqy5t5c	22-mai-18 12	12:21:13-12:21:13	0,11	cursor: pin S wait on X	3991069868	idn	1155346202624	value	21474836480	where	N	314
76fxznczqskp8	22-mai-18 00	00:21:07-00:21:07	0				1		file#	589		block#	1		blocks	Y	15
76fxznczqskp8	22-mai-18 00	00:21:08-00:21:08	0,05				1		file#	31323		block#	1		blocks	Y	15
76fxznczqskp8	22-mai-18 04	04:21:08-04:21:08	0,04				1		file#	507		block#	1		blocks	Y	73
76fxznczqskp8	22-mai-18 00	00:21:07-00:21:08	404,66	cursor: pin S wait on X	1064061608	idn	64424509440	value	21474836480	where	N	8,67,90,151,167,193,221,253,314,351,364,367,419,456
76fxznczqskp8	22-mai-18 04	04:21:08-04:21:08	419,54	cursor: pin S wait on X	1064061608	idn	313532612608	value	21474836480	where	N	18,51,70,127,169,191,247,326,379,390,426,436,456

In this example we had many sessions with the “cursor: pin S wait on X” wait event for the cursor “cq42gcnqfs5vd”. Those sessions combined waited for more than 10 minutes! And we also see two lines for the session that created this cursor (hard_parse=Y). Two lines because the hard parse took considerable time and ASH samples every second. The P2 info changed during that (last two bytes changed). I assume this means that the hard parse was progressing.

In general the summary statement shows if this wait event happend recently and also that many sessions where waiting for this event.

Finding the problem statement

We can use v$sql or v$sqlarea.

select * --sql_id, sql_text, sql_fulltext 
from v$sql 
where sql_id = 'cq42gcnqfs5vd';

or

select sql_id, module,
       sql_fulltext,
       cpu_time, elapsed_time, concurrency_wait_time, 
       version_count, parse_calls, invalidations
from v$sqlarea
where sql_id = 'cq42gcnqfs5vd';

The column SQL_FULLTEXT will hold the statement. But this information might age out of the SGA. So the sooner after the issue you run this the better. The good news is that the hashed sql_id is stable. Meaning if the same statement is run later again, you will see the same sql_id. And since we are investigating an issue that happened because the exactly same statement was run several times at the same moment (many sessions where waiting), there is a good chance that we get lucky and see this command in v$sql.

historical comparison

Very useful are the event_histograms. It can give an indication how severe the problem is. If we see a recent LAST_UPDATE_TIME for one of the high numbers (>= 4096) then some action might be needed. Also the v$event_histogram view helps to monitor if the issue disappears or is lowered once it is resolved.

select * from v$event_histogram 
where event = 'cursor: pin S wait on X';
EVENT#	EVENT			WAIT_TIME_MILLI	WAIT_COUNT	LAST_UPDATE_TIME
282	cursor: pin S wait on X	1		393	22.05.18 07:09:40,837552 +02:00
282	cursor: pin S wait on X	2		72	06.05.18 20:21:14,069132 +02:00
282	cursor: pin S wait on X	4		302	19.05.18 20:52:37,773557 +02:00
282	cursor: pin S wait on X	8		2043	22.05.18 07:09:41,041724 +02:00
282	cursor: pin S wait on X	16		6586	22.05.18 01:07:19,804808 +02:00
282	cursor: pin S wait on X	32		14719	22.05.18 07:09:41,054201 +02:00
282	cursor: pin S wait on X	64		16058	22.05.18 12:21:14,725227 +02:00
282	cursor: pin S wait on X	128		7514	22.05.18 12:21:13,702598 +02:00
282	cursor: pin S wait on X	256		10496	22.05.18 07:09:40,366636 +02:00
282	cursor: pin S wait on X	512		11360	22.05.18 07:09:40,364821 +02:00
282	cursor: pin S wait on X	1024		2123	22.05.18 07:00:05,691792 +02:00
282	cursor: pin S wait on X	2048		2240	22.05.18 04:21:11,172316 +02:00
282	cursor: pin S wait on X	4096		284	03.05.18 12:22:09,853604 +02:00
282	cursor: pin S wait on X	8192		169	30.04.18 16:22:30,975458 +02:00
282	cursor: pin S wait on X	16384		180	30.04.18 16:22:30,985937 +02:00
282	cursor: pin S wait on X	32768		31	19.09.17 18:50:00,401702 +02:00
282	cursor: pin S wait on X	65536		2	04.05.16 22:09:14,572347 +02:00

I run this on May 22nd. We see mutex sleeps for up to 2 seconds (column wait_time_milli = 2048) at 4 a.m. And a high number of mutex sleeps for 1 second at around 7 a.m. Nothing too critical but so that it might be worth checking what is going on. More about mutex sleeps in the section that explains the problem.

We also see an entry for 16 seconds from 30th of April. This was the time before we fixed the problem. The event history shows that since that time, we never had such long sleeps anymore. Which proves that our fix is working.

What the event histogram does not show, is that the waits from May 22nd and from April 30th are from a different sql_ids. So it give us only a very crude monitoring.

Event detail and mutex analysis

select * 
from GV$MUTEX_SLEEP_HISTORY
where blocking_session = 247;

This should show several sesssions during the problematic time frame. The location is the most interesting column. It will probably be this location

kkslce [KKSCHLPIN2]

We can use this location to check out the general mutex sleep history.

select MUTEX_TYPE,LOCATION,REQUESTING_SESSION,BLOCKING_SESSION,SLEEP_TIMESTAMP
from GV$MUTEX_SLEEP_HISTORY
where location='kkslce [KKSCHLPIN2]'

And to get an impression about the severity of the issue, lets compare the event against other mutex sleeps.

select * from v$mutex_sleep
--where location='kkslce [KKSCHLPIN2]'
order by sleep time desc;

Example result

MUTEX_TYPE	LOCATION	SLEEPS	WAIT_TIME
Cursor Pin	kkslce [KKSCHLPIN2]	1555183	0
Library Cache	kglpndl1  95	116623	0
Library Cache	kglpin1   4	107578	0
Library Cache	kgllkdl1  85	105747	0
Library Cache	kglhdgn2 106	83336	0
Library Cache	kglllal3 111	45584	0
Library Cache	kgllkc1   57	41674	0
Library Cache	kglpnal1  90	37040	0
Library Cache	kglget2   2	12203	0
Library Cache	kgllldl2 112	11882	0
Cursor Pin	kksfbc [KKSCHLFSP2]	11303	0
...

A high number of sleeps here shows that we have some issue.

But enough about such statements. I will explain a little why they are relevant for the problem and also what kind of problem there is.

Some technical background information

There are several recommended publications that helped me to understand and solve the issue. See links to Tanel Poder and Jonathan Lewis articles in the link section at the end of this post.

About this wait event: doc about “cursor: pin S wait on X”

The doc explains a little bit what happens. This wait event occurs when a session requires a shared mutex pin, while another session holds an exclusive mutex pin. Mutex pins are essentially locks on memory objects.

It is a lock to create the execution plan for a statement. The mutex organizes access to a shared cursor. During a hard parse the cursor (=execution plan) is created. The hard parse sets the exclusive lock. Other sessions are checking if there is already a cursor for the SQL statement (based upon the hashed sql_id). This is done by a shared pin call (cursor pin S). However the first session (247 in my case) did not free up this ressource yet, because it is still hard parsing. That is why we see a cursor: pin S wait on X for all the other sessions.

Problem cause

The issue was caused by a select that was extremly expensive to (hard) parse but which was very fast to execute.

The hard parse needed several seconds (usually 9 but up to 24 seconds). Any consecutive statement call was a soft parse using bind variables and executed in way less than 1 second.

This statement was send by some java application at the exact same time to all sessions in the connection pool. So the exact same statement, but with different bind parameters needed to run.

Each database session did this:

  • First it builds a hash (SQL_ID) for the sql statement.
  • Then it searches in the SGA if an entry for this SQL_ID already exists in a memory lookup table.
  • Since another session, at the same time, already tries to create an execution plan (hard parse). Therefore the later session waits for the outcome of this. This is a mutex wait.
  • The mutex then goes to sleep for a short time.
  • It wakes up and since the hard parse is still running, it goes to sleep again. This time a little longer.

The problem with mutex sleeps is twofold.

  • the sleep times tend to increase exponentially. We can see this in the mutex_sleep_history and event_histogram views. It starts with 1 microsecond,
    but very soon reaches an order of several seconds.
  • When a mutex tried to access a latch and sleeps on it – the latch is not free for other sessions to grab it. This can throttle the CPU.

So the mutex sleeps get longer and longer while the hard parse is still running.
Eventually the next mutex sleep will be longer than what the original hard parse time was. So even if the blocking session finished, the waiting session still doesn’t execute because the mutex sleeps now for the next 8 or 16 seconds. This can cascade to other sessions. I have seen total wait times of more than 30 seconds because of this cascading effect.

The process is partially documented (shared pool check), although not in enough detail to cover this mutex problem scenario.

For more information about how mutex sleeps are coupled with CPU starvation, check out the first article in the links section by Tanel Poder.
 

Solutions

Two ways to address the problem immediatly spring to mind:

a) Avoid calling the same statement so many times. Maybe do a single call where all the relevant binds are added as an in-clause. Or as a subquery.

b) Improve the performance of the hard parse.

Best idea is probably to do both. Decrease the number of calls and improve hard parse times.

But often we can influence only the database side, not the caller side. Here are some ways to make the hard parse faster. There are various ways like SQL profiles, pinning the cursor, etc. available. But if you are a developer those are usually not your first choises.

Note that using SQL Baselines will probably not help (see https://jonathanlewis.wordpress.com/2014/11/23/baselines/).

As a developer think about the SQL statement first and if there is a way to help the optimizer.

Ask yourself and try out

Are all the subqueries/views really needed to get the result? Often when adding views, there are joins to tables included that are not needed. While the optimizer can use a feature called table elimination, it still needs time to consider this. Rewrite the SQL and instead of joining the view, join only the needed the base tables.

If there are lookup tables that are joined to the main queries FROM clause consider to move them to the SELECT part. This will not work, if the lookup is part of the filtering. However many lookup tables have a multi-language component. Which in turn often means, they are considered child tables for the optimizer. So that the driving table can not be correctly identified anymore. Changing that will change how the optimizer thinks about the other tables (also 1 less table for the permutations count). Sometimes moveing the lookup logic to plsql is also an option. Those values will stay while the whole process is running, so fetch them once, then use the IDs for filtering.

If larger repeating subqueries exist, consider subquery factoring (WITH clause).

Check if you have unnessecary DISTINCT operations. Sometimes this prevents optimizer possibilities like combining UNION ALL queries (join factorization). Typically this will not help to reduce hard parse times, but I’ve seen one example with lots of joined tables in the UNION ALL query, where it happend.

Hints might be a final option to use. Especially the leading hint is a way to massivly decrease the number of permutations that the optimizer needs to consider. While hints should not be the first choice, here we have a scenario where the developer has better information than the CBO.  An approach that I call “minified hinted outlines” can be used. Essentially create a plan and grab all the leading hints from the outline. Then put those leading hints into the statement. This will freeze the table order but still allows the optimizer to choose between access paths like index access vs. full table scan and between many other options. I blogged about it before: SQL parse time issue – use “minified hinted outlines”. It is a solid compromise to reduce hard parse time, without totally freezing the execution plan.

This is the hint we ended up with.

/* Do not remove the LEADING hints. 
The next statement has an extremly long hard parse time.
The hint minimized the hard parse time. Execution of the statement is very fast.

Without the hint we see major blocking issues (CURSOR: PIN S WAIT ON X) for multiple concurrent sessions 
*/
 select  /*+
          LEADING(@"SEL$9E38E278" "OLIS"@"SEL$11" "ERRORS"@"SEL$11" "EXTRAINFO"@"SEL$11" "WFI"@"SEL$2" "ILI"@"SEL$19" "PL"@"SEL$27" "TUBELOC"@"SEL$29" "RACK"@"SEL$31" "ICH"@"SEL$17" "SOL"@"SEL$25" "BAT"@"SEL$21" "CPD"@"SEL$23")  
          LEADING(@"SEL$FC92CB92" "OSM"@"SEL$12" "O"@"SEL$12" "S"@"SEL$12" "LI"@"SEL$12")  
          LEADING(@"SEL$5B04CACE" "SOA"@"SEL$13" "CE"@"SEL$13")  
          LEADING(@"SEL$AB668856" "POS"@"SEL$8" "SOA"@"SEL$7" "CWFI"@"SEL$6" "LI"@"SEL$6")  
          LEADING(@"SEL$6A1B73E7" "SOA"@"SEL$4" "CWFI"@"SEL$3" "LI"@"SEL$3")  
        */                       
         r.barcode
        ,r.orderid
...

 

Test and reproduce the issue

I tried to build a test script to reproduce such long hard parses. Essentially one needs to create a lot of joins with different filter conditions on all of the tables. Maybe add a couple of union all + distinct makes sure. And we could cheat slightly. By increasing the value of the hidden parameter _optimizer_max_permutations we increase the hard parse time if we join enough tables.

So far I didn’t have the time to completly rebuild the scenario. Mostly because I have multiple statements for a real customer, which I can use to test.

 

Side notes

This is a very rare edge case and the first time I see an effect where using evenly distributed bind variables is slower than not using them. But it must be mentioned, that both cases are far from the optimal execution time.

Forcing the hard parse by not using bind variables would have resulted in an average parsing time of 7-9 seconds. Which is incredibly slow. However it would not have caused any mutex sleeps and therefore no CPU throtteling.
 

Additional reads

Fetch module name from line number in package with pl/scope

Here is a small statement I am using to find the name of a submodule based upon error stack data (utl_call_stack.error_line(x) or dbms_utility.format_error_stack).

The function IDENTIFY_MODULE helps to diagnose errors. If an error happens in plsql the error stack returns only the name of a package (=unit) and the line number. Using this line number we can look up the pl/scope information and make a solid guess about the module name. For various reasons this is only a good guess, not a guarantee (see problem section below).

solution

The following code snippets will only work if the relevant packages were compiled with PLSCOPE_SETTINGS=’IDENTIFIERS:ALL’.

ALTER SESSION SET plscope_settings='identifiers:all';

SQL statement

fetch the nearest procedure or function

select * --type, name, signature, line, usage_id, usage_context_id
from all_identifiers
where object_name = :PACKAGE_NAME
and object_type= 'PACKAGE BODY'
and usage in ( 'DEFINITION','DECLARATION' )
and type in ('PROCEDURE','FUNCTION')
and line <= :LINE_NUMBER
order by line desc
fetch first 1 row only
;

This will find procedures and functions in our code that were declared just before the line of error.

There are many cases where we get some false positives.

But it is a good start.

plsql enrichment

Using a little bit of plsql we can make this logic more robust. And even get data about submodule hierachies.

Put this function in the instrumentation package of your choice (for example logger) and then use it to improve log information. How this is done in detail is out of scope for this post.

create or replace function identify_module (p_owner in varchar2, p_unit in varchar2, p_line in number) return varchar2
is
  /**************************************************************
  /* Name          :  identify_module
  **************************************************************
  * description    :  uses PLSCOPE, to get additional info about the module name of a source code line 
  *  This only works reliably for code compiled with optimizationlevel = 1
  *  higher optimization levels might move code lines. The line reported in error and backtrace stacks (run time) can differ from the lines stored in PL/scope or user_source (compile time).
  *
  * @author: Sven Weller, syntegris information solutions GmbH
  **************************************************************
  * parameter      
  * @param  : p_owner = schema name of unit
  * @param  : p_unit = package name
  * @param  : p_line = line of code, for which we would like to see the name of the modul
  * @return : concatenated submodule names  
  **************************************************************/
 cursor c_search_by_line (cv_owner in varchar2, cv_unit in varchar2, cv_line in number)
  is
    select /*+ first_rows(1) */ i.type, i.name, i.line, i.usage_id, i.usage_context_id, i.usage, i.signature
    from ALL_IDENTIFIERS i
    where i.owner = cv_owner
    and i.object_name = cv_unit
    and i.object_type = 'PACKAGE BODY'
    and i.line <= cv_line 
    -- context must be in same package body
    and i.usage_context_id in (select i2.usage_id from ALL_IDENTIFIERS i2 where i2.owner = cv_owner and i2.object_name = cv_unit and i2.object_type = 'PACKAGE BODY')
    order by line desc, usage_id asc
    ;
  
  cursor c_search_by_usage (cv_owner in varchar2, cv_unit in varchar2, cv_usage_id in number)
  is
    select /*+ first_rows(1) */ type, name, line, usage_id, usage_context_id, usage, signature
    from ALL_IDENTIFIERS
    where owner = cv_owner
    and object_name = cv_unit
    and OBJECT_TYPE = 'PACKAGE BODY'
    and usage_id = cv_usage_id 
    order by decode (usage, 'DEFINITION',1, 'DECLARATION', 2, 3), line desc,  usage_id asc
    ;

  r_result_byLine  c_search_by_line%rowtype;
  r_result_byUsage c_search_by_usage%rowtype;
  r_last_result    c_search_by_usage%rowtype;
  v_owner          all_identifiers.owner%type;
  v_modul_name     all_identifiers.name%type;
  v_first_type     all_identifiers.type%type;
  v_max_hierarchy  binary_integer := 5;
begin
  -- If owner is missing, use the current schema
  v_owner := coalesce(p_owner,sys_context('userenv','current_schema'));
  
  -- find the closest line and check its context.
  open c_search_by_line(v_owner, p_unit, p_line);
  fetch c_search_by_line into r_result_byLine;
  close c_search_by_line;

  if r_result_byLine.usage_context_id = 1 then
    -- we seem to be already in main package body.
    -- this can be either a problem during a parameter call 
    -- or the error happened in the initialisatzion part of the package
    case r_result_byLine.usage 
      when 'DEFINITION' then
        v_modul_name :=r_result_byLine.type||' '||p_unit||'.'||r_result_byLine.name;
      when 'DECLARATION' then
        v_modul_name :='declaration of '||r_result_byLine.type||' '||p_unit||'.'||r_result_byLine.name;
      else 
        v_modul_name :='body of '||p_unit;
    end case;
    
  else    
      r_result_byUsage := r_result_byLine;
      --r_result_byUsage.usage_context_id := r_result_byLine.usage_id;
      
      -- find module names
      <>
      loop 
        if r_result_byUsage.usage in ('DEFINITION', 'DECLARATION')  
           and r_result_byUsage.type in ('PROCEDURE','FUNCTION')
           and (r_last_result.signature != r_result_byUsage.signature or r_last_result.signature is null)
           then
             -- concat multiple submodule names
             v_modul_name := r_result_byUsage.name
                            ||case when v_modul_name is not null then '.'||v_modul_name end;
             v_first_type := coalesce(v_first_type, r_result_byUsage.type);
             -- remember result to compare if we get duplicate entries because of declaration->definition
             r_last_result := r_result_byUsage;
        end if;
        -- stop when package body level is reached
        exit when r_result_byUsage.usage_context_id in (0, 1) or v_max_hierarchy = 0;


        -- it seems to be a submodule, so do an additional call and fetch also the parent module
        open c_search_by_usage(p_owner, p_unit, r_result_byUsage.usage_context_id);
        fetch c_search_by_usage into r_result_byUsage;
        close c_search_by_usage;
    
        
        -- safety counter to prevent endless loops
        v_max_hierarchy := v_max_hierarchy - 1;
      end loop parent_modules;  
    
      -- add info about type (FUNCTION/PROCEDURE)
      if v_modul_name is not null then
        v_modul_name :=v_first_type||' '||p_unit||'.'||v_modul_name;
      --else   
      --  v_modul_name := '--no submodule found--';
      end if;
  end if;  
  return v_modul_name;
exception
  when no_data_found then
    return null;
end identify_module;
/

Example

Check the result for each line of some test package.
You can run this example yourself in LiveSQL .

The function had to be modified slightly to use USER_IDENTIFIERS instead of ALL_IDENTIFIERS to be able to run in LiveSQL.

select line, identify_module(user, name, line) , text 
from user_source
where name='TEST_PACKAGE_FUNC_PROC'
and type = 'PACKAGE BODY';

Result

LINE	IDENTIFY_MODULE(USER,NAME,LINE)	TEXT
1		"package body Test_Package_Func_Proc "
2		"as "
3	declaration of VARIABLE TEST_PACKAGE_FUNC_PROC.GLOBAL_VAR	" global_var number := 0;"
4	FUNCTION TEST_PACKAGE_FUNC_PROC.TEST_FUNC	" function test_func (in_val in number) return number  "
5	FUNCTION TEST_PACKAGE_FUNC_PROC.TEST_FUNC	" is "
6	FUNCTION TEST_PACKAGE_FUNC_PROC.TEST_FUNC	" begin "
7	FUNCTION TEST_PACKAGE_FUNC_PROC.TEST_FUNC	" return in_val; "
8	FUNCTION TEST_PACKAGE_FUNC_PROC.TEST_FUNC	" exception "
9	FUNCTION TEST_PACKAGE_FUNC_PROC.TEST_FUNC	"  when others then "
10	FUNCTION TEST_PACKAGE_FUNC_PROC.TEST_FUNC	"    RAISE; "
11	FUNCTION TEST_PACKAGE_FUNC_PROC.TEST_FUNC	" end; "
12	FUNCTION TEST_PACKAGE_FUNC_PROC.TEST_FUNC	""
13	PROCEDURE TEST_PACKAGE_FUNC_PROC.TEST_PROC	" procedure test_proc  (in_val in number) "
14	PROCEDURE TEST_PACKAGE_FUNC_PROC.TEST_PROC	" is "
15	PROCEDURE TEST_PACKAGE_FUNC_PROC.TEST_PROC.SUBMODULE	"   procedure submodule( in_val in number) is"
16	PROCEDURE TEST_PACKAGE_FUNC_PROC.TEST_PROC.SUBMODULE	"   begin"
17	PROCEDURE TEST_PACKAGE_FUNC_PROC.TEST_PROC.SUBMODULE	"      dbms_output.put_line (in_val); "
18	PROCEDURE TEST_PACKAGE_FUNC_PROC.TEST_PROC.SUBMODULE	"   end;  "
19	PROCEDURE TEST_PACKAGE_FUNC_PROC.TEST_PROC.SUBMODULE	" begin "
20	PROCEDURE TEST_PACKAGE_FUNC_PROC.TEST_PROC	"   submodule(in_val); "
21	PROCEDURE TEST_PACKAGE_FUNC_PROC.TEST_PROC	" exception "
22	PROCEDURE TEST_PACKAGE_FUNC_PROC.TEST_PROC	"  when others then "
23	PROCEDURE TEST_PACKAGE_FUNC_PROC.TEST_PROC	"    RAISE; "
24	PROCEDURE TEST_PACKAGE_FUNC_PROC.TEST_PROC	" end; "
25	PROCEDURE TEST_PACKAGE_FUNC_PROC.TEST_PROC	"begin"
26	body of TEST_PACKAGE_FUNC_PROC	"  global_var := 1;"
27	body of TEST_PACKAGE_FUNC_PROC	"end; "

The test package was copied and modifed based upon Hemant K Chitales “Function and Procedure in Package” LiveSQL demo.

Some problems

  • Currently only ment for package bodies
  • Compiler optimization can move code. That means the line number of an error at runtime is not the same line number as during compile time. PL/Scope only gives us compile time information. So would all_source.
    The only (bad) workaround is to compile with optimization level 1 and recreate the error, so that the correct line of error is shown.
  • Errors that happen in the declaration part of a package can not always be resolved. We might get a false positive for the previously declared object.
  • Does not inform when package body is wrapped.

Further readings

10 more SQL and PLSQL things you might not know

Since my previous blog posts about 10 Oracle SQL features you might not know and “10 Oracle plsql things you probably didn’t know
raised quite some interest, I decided to add some more unknown features. Read careful, I started to write this blog post on 1st of April, so there is an easter egg hidden somewhere in this post. If you are not sure, always test and verify for yourself.

10. NVL can also handle unusual datatypes

NVL can also handle unusual datatypes like BOOLEAN, COLLECTIONS and Advanced Datatypes (ADTs).

Example

set serveroutput on
declare
  b boolean;
begin
  if nvl(b,true) then
     dbms_output.put_line('TRUE');
  else
     dbms_output.put_line('FALSE');
  end if;
end;
/
TRUE

This is part of the sys.standard implementation.
But since boolean is only supported in PLSQL we can’t do much with that in SQL.

9. secret column name “rowlimit_$$_rownumber”

We shouldn’t use “rowlimit_$$_rownumber” or “rowlimit_$$_total” as a column name.

Here is what could happen:

select dummy as "rowlimit_$$_rownumber"
from dual
fetch first 3 rows only;

ERROR at line 1:

ORA-00918: column ambiguously defined

The reason for this can be found when we use the new 12c functionality to expand a query. Typically this is used for views, but it can also be applied to some other features. Here for the logic that does the LIMIT action.

Special thanks to OTN forum members Solomon Yakobson and padders who pointed at the issue in this thread.

What happens behind the scene is that the limit clause “fetch first 3 rows” is changed (expanded) into a subquery that adds a second column “rowlimit_$$_rownumber” to the query. This column uses the row_number analytic function and is later used to filter upon the relevant rows of the LIMIT clause. The error happens because we now have two columns with the same name.

And here is one way to see the expanded code.

set linesize 1000
set longc 1000
set long 1000
variable c clob
exec dbms_utility.expand_sql_text('select dummy from dual fetch first 3 rows only',:c)
print c
SELECT "A1"."DUMMY" "DUMMY"
FROM (  SELECT
            "A2"."DUMMY" "DUMMY",
            ROW_NUMBER() OVER (
                ORDER BY NULL
            ) "rowlimit_$$_rownumber"
        FROM "SYS"."DUAL" "A2"
    ) "A1"
WHERE "A1"."rowlimit_$$_rownumber"

“rowlimit_$$_total” has the same problem. It appears when we use PERCENT in the limit clause.

Example

select dummy as "rowlimit_$$_total"
from dual
fetch first 3 percent rows only;

ORA-00918: column ambiguously defined

And if we expand the working query we see the reason.

SELECT "A1"."DUMMY" "DUMMY"
FROM ( SELECT
            "A2"."DUMMY" "DUMMY",
            ROW_NUMBER() OVER(
                ORDER BY
                    NULL
            ) "rowlimit_$$_rownumber",
            COUNT(*) OVER() "rowlimit_$$_total"
        FROM "SYS"."DUAL" "A2"
    ) "A1"
WHERE "A1"."rowlimit_$$_rownumber"
        <= ceil("A1"."rowlimit_$$_total" * 3 / 100)

The PERCENT keyword requires to do a total count and uses this total count as a filter.

Fortunatly the chance that we by accident name our columns so is very very low.

8. Do you know all plsql pragmas?

Pragmas are instructions for the plsql compiler. There are many of them. Here is the list of pragmas I know or heared about. Not all of them are documented. Not all of them can be used by developers. Several can only be used as sys and come with additional restrictions, so they are only for Oracle internal purposes. Still they pique our curiosity.

The documented and not deprecated pragmas are in bold. At least we should know all of those.

  • autonomous_transaction

    One of the most misunderstood things in plsql.

    Defines that the plsql logic runs independently from the main transaction.

    Typical use case: To log away an error, even if the main transaction is rolled back.

    It is not a workaround for mutating table errors!

  • builtin

    Defines SQL builtin functions and operators.

    This is an internal pragma for usage in package sys.standard.

  • coverage

    This is a new pragma in 12.2.

    from the 12.2. doc: COVERAGE pragma

    The COVERAGE pragma marks PL/SQL source code to indicate that the code may not be feasibly tested for coverage. The pragma marks a specific code section. Marking infeasible code improves the quality of coverage metrics used to assess how much testing has been achieved.

  • deprecate

    Adds a compile time warning if the object is referenced. The message of the warning can be influenced. This new pragma was introduced in 12.2. We can add it to code that should be replaced. Useful in environments where multiple teams of developers call/reference the same code.

  • exception_init

    Combines a plsql exception with an exception number.

  • fipsflag

    Another internal pragma that is used in package sys.standard.

    I guess that the FIPSFLAG pragma has something to do with FIPS from NIST.

    From https://www.howtogeek.com/245859/why-you-shouldnt-enable-fips-compliant-encryption-on-windows/

    FIPS stands for “Federal Information Processing Standards.” It’s a set of government standards that define how certain things are used in the government–for example, encryption algorithms. FIPS defines certain specific encryption methods that can be used, as well as methods for generating encryption keys. It’s published by the National Institute of Standards and Technology, or NIST.

    It seems that US-government computers have a FIPSFLAG enabled. Applications that run on these machines need to be FISMA compliant to be working on those machines.

    Also interesting in that context:
    https://blogs.technet.microsoft.com/secguide/2014/04/07/why-were-not-recommending-fips-mode-anymore/

    https://nvlpubs.nist.gov/nistpubs/FIPS/NIST.FIPS.200.pdf

  • Update: I finally found some documentaion about FIPS.

    Identifying Extensions to SQL92 (FIPS Flagging)

    The Federal Information Processing Standard for SQL (FIPS 127-2) requires a way to identify SQL statements that use vendor-supplied extensions. Oracle provides a FIPS flagger to help you write portable applications.

    When FIPS flagging is active, your SQL statements are checked to see whether they include extensions that go beyond the ANSI/ISO SQL92 standard. If any non-standard constructs are found, then the Oracle Server flags them as errors and displays the violating syntax.

    The FIPS flagging feature supports flagging through interactive SQL statements submitted using Enterprise Manager or SQL*Plus. The Oracle Precompilers and SQL*Module also support FIPS flagging of embedded and module language SQL.

    When flagging is on and non-standard SQL is encountered, the following message is returned:

    ORA-00097: Use of Oracle SQL feature not in SQL92 level Level

    Where level can be either ENTRY, INTERMEDIATE, or FULL.

    So the FIPSFLAG is a way to inform how well a specific function complies to the SQL Standard.

    And we can turn this feature on or off by using SET in sqlplus.

    From the docs

    12.41.24 SET FLAGGER {OFF | ENTRY | INTERMED[IATE] | FULL}

    Checks to make sure that SQL statements conform to the ANSI/ISO SQL92 standard.

    If any non-standard constructs are found, the Oracle Database Server flags them as errors and displays the violating syntax. This is the equivalent of the SQL language ALTER SESSION SET FLAGGER command.

    You may execute SET FLAGGER even if you are not connected to a database. FIPS flagging will remain in effect across SQL*Plus sessions until a SET FLAGGER OFF (or ALTER SESSION SET FLAGGER = OFF) command is successful or you exit SQL*Plus.

    When FIPS flagging is enabled, SQL*Plus displays a warning for the CONNECT, DISCONNECT, and ALTER SESSION SET FLAGGER commands, even if they are successful.

    Btw: In SQL Developer we get an error message “SET FLAGGER ENTRY” is Obsolete.

    The alter session command still works.

    example

    ALTER SESSION SET FLAGGER = FULL;
    Session altered.
    
    select * from dual d1, dual d2
    where d1.dummy=d2.dummy;
    
    Error starting at line : 3 in command -
    select * from dual d1, dual d2
    where d1.dummy=d2.dummy
    Error at Command Line : 3 Column : 1
    Error report -
    SQL Error: ORA-00097: use of Oracle SQL feature not in SQL92 Full Level
    ORA-06550: line 3, column 15:
    PLS-01454: No operator may be used with values of data type CHAR
    
    ALTER SESSION SET FLAGGER = OFF;
    
    Error starting at line : 6 in command -
    ALTER SESSION SET FLAGGER = OFF
    Error report -
    ORA-00097: use of Oracle SQL feature not in SQL92 Full Level
    
    select * from dual d1, dual d2
    where d1.dummy=d2.dummy;
    
    D D
    - -
    X X
    
    

    At first I thought it might be useful to enforce writeing ANSI compliant SQL. Now I seriously doubt there is any kind of practical usage.
    end UPDATE.

  • inline

    Turns submodule inlining on or off. Submodule inlining is a plsql compiler feature implemented since 10g. The compiler can rewrite plsql code so that the resulting code runs faster. Among other options the compiler can add the code from inside a module directly at the point where that code is used (optimization level 3). This is called inlining. The performance advantage is that the expensive submodule call can be avoided. The disadvantage is that the same code is repeated everywhere where the submodule was originally. But we do not have to program this.

    So we as developers can follow the DRY (don’t repeat yourself) paradigm and the optimizer tunes this code for performance. This is the best of two worlds.

  • Because the compiler already does a good job, the pragma is usually not needed. In rare cases we might want to enforce inlining even if compiled with optimization level 2.

  • interface

    Gateway for internal oracle functions to c libraries.
    It is heavily used inside the sys.standard package spec.

    An example

      --#### interface pragmas
    
      --#### Note that for any ICD which maps directly to a PVM
      --#### Opcode MUST be mapped to pes_dummy.
      --#### An ICD which invokes another ICD by flipping operands is
      --#### mapped to pes_flip, and an ICD whose result is the inverse of
      --#### another ICD is mapped to pes_invert
      --#### New ICDs should be placed at the end of this list, and a
      --#### corresponding entry must be made in the ICD table in pdz7
    
      PRAGMA interface(c,length,"pes_dummy",1);
      PRAGMA interface(c,substr,"pes_dummy",1);
      PRAGMA interface(c,instr,"pesist",1);
      PRAGMA interface(c,UPPER,"pesupp",1);
      PRAGMA interface(c,LOWER,"peslow",1);
      PRAGMA interface(c,ASCII,"pesasc");
    
  • new_names

    This is an internal pragma that restricts the use of particular new entries in package standard. It is only valid in package standard.

  • poke_mon

    This is an internal pragma that can be added by database machine learning code. So it might appear by random somewhere in your code. If the schema is pokemon enabled you can use this pragma to train your modules to react faster or to eliminate invalid input data. The pragma was introduced in 19.1.4 using the multi lingual engine (MLE). So far it is only available in autonomous databases (cloud first). If your modules have collected enough power they can be combined to overtake and replace other packages during recompilation. The ultimate goal is to remove all bad performing code from the database.

  • restrict_references
    (RNPS, WNPS, WNDS, RNDS, TRUST)

    This is an outdated pragma. I can remember setting this in an Oracle 7.3 database.
    It informs the database about the intended scope of the module. An error is raised if this pragma is violated.

    RNPS = read no package state
    WNPS = write no package state
    RNDS = read no database state
    WNDS = write no database state
    TRUST = trust me, and don’t double check if all dependend objects do also behave correctly.

    This pragma shouldn’t be needed anymore. Instead make functions DETERMINISTIC.

  • serially_reusable

    Loose all state when the call is finished. Package variables, open cursors and other plsql state is reset when the package is declared with this pragma.

  • timestamp

    This pragma sets/modifies the timestamp value of a package. Valid only in SYS (and probably only for package standard).

  • udf

    Userdefined function

    This pragma can be used if a function is mostly referenced directly inside a SQL statement. It avoids some of the additional overhead during the switch from the SQL to the PLSQL engine. Especially a simplified (less expensive) datatype check is done.

    While the udf pragma is really a great performance feature it is currently very limited. For example the function can only have numeric parameters. If one parameter is a date, then the udf pragma will silently not work anymore, so we will not gain the performance benefit. If you want to improve that behaviour feel free to vote up this database enhancement idea by @LotharFlatz.

    Btw: There are some indications that udf for functions with varchar2 parameters seem to be working in 12.1 but not anymore in 12.2. I didn’t verify this.

How many of the documented pragmas did you know? And how many of the additional ones?
Did you catch them all?

7. LoC limit

There is a limit for how many lines of code (LoC) a plsql object can have.
The limit was increased to 226 DIANA (Descriptive Intermediate Attributed Notation for Ada) nodes (~6 million LoC) in Oracle 8i. Before that it was only about 3000 Lines of Code (215 Diana Nodes).

Nowadays there are other limits that are more likely to be encountered, before the LoC limit is reached. See also: https://docs.oracle.com/en/database/oracle/oracle-database/18/lnpls/plsql-program-limits.html#GUID-00966B4C-B9A5-47D4-94AA-54AEBCC07CE9

Remember: compiler optimizations like inlining might increase your lines of code quite a bit.

6. datatype signtype

There is a datatype signtype. It allows only the numbers -1, 0 and 1.

Example:

set serveroutput on
declare
  v_val pls_integer;
  v_sign signtype;
begin
  for i in 1..10 loop
    v_val := round(dbms_random.value(-5,5));
    v_sign := sign(v_val);
    dbms_output.put_line(v_sign);
  end loop;
end;
/
-1
1
-1
-1
-1
1
0
-1
-1
0

PL/SQL procedure successfully completed.

But this is PLSQL only. In SQL we can not use this type.

create table test(id number, s signtype);

ORA-00902: invalid datatype

Interesting, but so far I never found a need to use it.

5. functions without begin..end

We can declare functions that do not have a function body (no begin..end block).
Example:

create or replace function kommaSepariert(ctx in varchar2)
  return varchar2 deterministic parallel_enable
  aggregate using kommaSepariert_ot;

The secret here is that this function is an user defined aggregation function that uses an object type. And the type body holds the function logic.
 

Here is the matching type definition
create or replace TYPE          "KOMMASEPARIERT_OT"                                          as object(
 str varchar2(4000),

 static function odciaggregateinitialize(
   sctx in out kommaSepariert_ot)
   return number,

  member function odciaggregateiterate(
    self in out kommaSepariert_ot,
    ctx in varchar2)
    return number,

  member function odciaggregateterminate(
    self in kommaSepariert_ot,
    returnval out varchar2,
    flags in number)
    return number,

  member function odciaggregatemerge(
    self in out kommaSepariert_ot,
    ctx2 kommaSepariert_ot)
    return number);
/

create or replace TYPE BODY          "KOMMASEPARIERT_OT" as

  static function odciaggregateinitialize(
    sctx in out kommaSepariert_ot)
    return number
  is
  begin
    sctx := kommaSepariert_ot(null);
    return odciconst.success;
  end;

  member function odciaggregateiterate(
    self in out kommaSepariert_ot,
    ctx in varchar2)
    return number
  is
  begin
    if self.str is not null then
      self.str := self.str ||',';
    end if;
    self.str := self.str || ctx;
    return odciconst.success;
  end;

  member function odciaggregateterminate(
    self in kommaSepariert_ot,
    returnval out varchar2,
    flags in number)
    return number
  is
  begin
    returnval := self.str;
    return odciconst.success;
  end;

  member function odciaggregatemerge(
    self in out kommaSepariert_ot,
    ctx2 kommaSepariert_ot)
    return number
  is
  begin
    if self.str is not null then
      self.str := self.str ||',';
    end if;
    self.str := self.str || ctx2.str;
    return odciconst.success;
  end;
end;
/

 

Such functions have been used in the past to combine strings. Nowadays we can use LISTAGG.
Here is a quick demonstration how it works

with testdata as(select 'abc' t from dual union all
                select 'def' t from dual union all
                select 'ghi' t from dual union all
                select 'jkl' t from dual)
select kommasepariert(t)
from testdata
;
KOMMASEPARIERT(T)
abc,def,ghi,jkl

4. The select clause can influence the number of rows returned

I’m not talking about using select DISTINCT (this is another cruel way where the select clause can change the number of rows returned).

Here is a more surprising situation. Consider those two slightly different queries.

with tbl as (select 1 val from dual union all  
            select 2 val from dual union all  
            select 3 val from dual )  
    SELECT  CASE  0  
            WHEN  0         THEN  'A'  
            WHEN  SUM (val) THEN  'B'  
            END  AS c  
    FROM    tbl;  
Result (3 rows)
A
A
A
with tbl as (select 1 val from dual union all  
            select 2 val from dual union all  
            select 3 val from dual )  
    SELECT  CASE  6  
            WHEN  0         THEN  'A'  
            WHEN  SUM (val) THEN  'B'  
            END  AS c  
    FROM    tbl;  
Result (only 1 row)
B

So 3 rows are returned if we check against 0 and 1 row is returned if we check against 6.

This is a side effect of two rules.
Rule 1: A select with an aggregate function doesn’t need a group by clause and then it is guaranteed to return a single row.
Rule 2: case statements use short-circuit evaluation.

In the first example the sum(val) was never evaluated, so no aggregation took place.

See also this otn thread where the situation was discussed.

I tested the behaviour in 12.1.0.2 SE, 12.2.0.1 SE and in 11.2.0.4 EE.
I also think this should be treated as a bug. Small changes as this to the select clause should not influence the number of rows returned.

3. Default behaviour for windowing clause in analytic functions

This is something I learned from the great Kim Berg Hansen (@Kibeha).

The default windowing clause is “RANGE BETWEEN unbounded preceding and current row”. This can sometimes lead to wrong surprising results. In most cases we should switch and use ROWS BETWEEN. It is something a developer needs to be aware of.
default_windowing
from SQL reference – Analytic Functions

Whenever the order_by_clause results in identical values for multiple rows, the function behaves as follows:

CUME_DIST, DENSE_RANK, NTILE, PERCENT_RANK, and RANK return the same result for each of the rows.

ROW_NUMBER assigns each row a distinct value even if there is a tie based on the order_by_clause. The value is based on the order in which the row is processed, which may be nondeterministic if the ORDER BY does not guarantee a total ordering.

For all other analytic functions, the result depends on the window specification. If you specify a logical window with the RANGE keyword, then the function returns the same result for each of the rows. If you specify a physical window with the ROWS keyword, then the result is nondeterministic.

SUM is one of those “other” analytic functions.

Consider the following example. We have a table with a list of transactions. And we want to see the transaction value but also a cumulative sum for those values.

with testdata as 
   (select 1 trans_id, 10 transaction_value, trunc(sysdate-10) transaction_day from dual union all
    select 2 trans_id, 20 transaction_value, trunc(sysdate-8) transaction_day from dual union all
    select 3 trans_id, -10 transaction_value, trunc(sysdate-2) transaction_day from dual union all
    select 4 trans_id, 30 transaction_value, trunc(sysdate-2) transaction_day from dual union all
    select 5 trans_id, 100 transaction_value, trunc(sysdate) transaction_day from dual 
   )
select trans_id, transaction_day as trans_day, transaction_value as trans_value,  
       sum(transaction_value) over (order by transaction_day) cumulative_sum
from testdata
order by trans_id;   

Result

TRANS_ID TRANS_DAY	TRANS_VALUE	CUMULATIVE_SUM
1	 24.04.18 	10	        10
2	 26.04.18 	20	        30
3	 02.05.18 	-10	        50
4	 02.05.18 	30	        50
5	 04.05.18 	100	        150

As you can see the transaction 3 and 4 have the same cumulative sum. The reason is that our order criteria in the analytic window function does not separate those two rows.

There are two possible solutions. Either make sure that the order is not ambiquious. Or use “rows between”.

with testdata as 
   (select 1 trans_id, 10 transaction_value, trunc(sysdate-10) transaction_day from dual union all
    select 2 trans_id, 20 transaction_value, trunc(sysdate-8) transaction_day from dual union all
    select 3 trans_id, -10 transaction_value, trunc(sysdate-2) transaction_day from dual union all
    select 4 trans_id, 30 transaction_value, trunc(sysdate-2) transaction_day from dual union all
    select 5 trans_id, 100 transaction_value, trunc(sysdate) transaction_day from dual 
   )
select trans_id, transaction_day as trans_day, transaction_value as trans_value,  
       sum(transaction_value) over (order by transaction_day 
       rows between unbounded preceding and current row) cumulative_sum
from testdata
order by trans_id;   
TRANS_ID TRANS_DAY	TRANS_VALUE	CUMULATIVE_SUM
1	 24.04.18 	10	        10
2	 26.04.18 	20	        30
3	 02.05.18 	-10	        20
4	 02.05.18 	30	        50
5	 04.05.18 	100	        150

2. batched commits

The performance of many small commits can be improved when doing batched commits.

Instead of writing

commit;

we can do

commit work write batch;

commit_batch

Here are two real world examples where this was tested.

a) I recommended using batched commits to a colleague of mine, who tried to tune a set of java logic that run in highly parallel mode. The goal was to do 1 select + 2 inserts + commit in 1000 parallel sessions per second.

Switching to batched commits was so hugely successful, that they now raised the performance requirement to 2500 concurrent sessions per second. Which also means now the ball is passed back to the java developers to come up with a better mode to execute lots of small checks against the db.

b) I also tested batched commits in a different and more general context.

Most of our code has code instrumentation logic. That means we can turn on debugging with a certain trace level and while the code is running a lot of tracing information is written into a logging table. The instrumentation call (like logger.log_trace) uses an autonomous function to do so. Essentially it is a single insert followed by a commit. That also means that a lot of commits are executed. Which can put stress on the log writer and the storage system.

So I compared what happens when we do a commit vs. a batched commit while writing lots of tracing data.

The batched commit was orders of magnitude faster than the normal commit.
I plan to write a separate article to show the exact measurements.

UPDATE: I finally managed to recheck this behaviour. Under plsql a commit will always do a kind of batched nowait commit. I tested this in 12.2.0.1 (SE). It is now also documented as C.Neumüller correctly pointed out. (https://docs.oracle.com/en/database/oracle/oracle-database/18/lnpls/static-sql.html#GUID-56EC1B31-CA06-4460-A098-49ABD4706B9C).

Note:

The default PL/SQL commit behavior for nondistributed transactions is BATCH NOWAIT if the COMMIT_LOGGING and COMMIT_WAIT database initialization parameters have not been set.

In older db versions this documentation note was missing, but have been traces by several DBAs who found that commit behaviour in plsql was different from normal commits.

I assume that my previous test (which was on older hardware) was influenced by some other factors, that I wasn’t aware of. I now suspect some disc activities throttled the log writes on the storage system.

Batched commits are still useful for NON-Plsql environments. Like typical .net or java middle tier appliactions that send their statements via connection pool to the database.

Appologies to all readers who got the impression that you could now simply turn a magic key on, and speed up your application.

Writing into logging tables can be speeded up by other means. A blog post about that is currently in the making. Stay tuned!

Another thing to remember is that this different batch mode influences performances comparisons between plsql and java programs heavily in favor of the plsql world.
end UPDATE.

So what is the disadvantage? Why not always use batched commits?
To be clear: For normal situations stay with the normal commit. But if you run into issues where the log writer is not fast enough then this can be a possibility.

The drawback (as I understand it) is that in the case of a database crash, you might not only loose the currently unfinished transactions but also some transactions, that were commited already, but which the logwriter didn’t finalize yet. Typically all transactions from the last 3 seconds are at risk.

1. “CRASH” is a reserved plsql word

I have no idea why.

Oracle 18.1 PL/SQL Reserved Words and Keywords

plsql_reserved_crash

Basic SQL: IF..THEN..ELSE

How to write conditional logic in SQL

This is a question that sometimes is asked by programmers who just started useing SQL.

If-Then-Else-diagram.svg
By P. Kemp – own work created using Dia, CC0, Link

Introduction

Programatic 3rdGL languages all have an IF..THEN..ELSE construct.

Pseudocode:

IF #condition# THEN #doSomething# ELSE #doAnotherThing#

Since SQL is a 4thGL language it has no need for conditional execution. This is sometimes surprising for beginners. IF..THEN..ELSE is very procedural thinking.

We do not tell SQL how to reach a certain result, instead we describe the wanted result.

Although a conditional execution is not needed, there are cases that come quite close. If we want to distinguish data dependend on other data, this can be done using conditional logic or conditional expressions.

In general expressions can not #doSomething# but they return a value (=data). SQL is all about data. A conditional expression in SQL is showing different data depending on other data.

SQL constructs

Here is a (non complete) list of different ways how we can write IF..THEN..ELSE in SQL. Be aware that most of those expressions are only ever useful in the SELECT clause of a query. The WHERE clause can do conditional logic simply by using AND+OR+NOT+() combinations. We do not need extra functional expressions there.

The functions towards the end of the list are a bit of an obscure choice for doing conditional logic. However they are useful to remember for some special scenarios.

  • CASE
  • DECODE(Oracle)
  • NULLIF
  • NVL or NVL2 or COALESCE
  • LNNVL
  • SIGN
  • ABS
  • GREATEST or LEAST

Each of those functions have advantages and disadvantages. I will try to give an indication where the usage seems proper.

All functions have a “translated” syntax in the following form.

SQL syntax: procedural code

The following business case is used
Requirement:
If a person is older than 65 years it should get a pension.

Or to say it in more technical terms (specification):
If today the age of a person – based upon its birthdate – is equal or greater than 65, then the retirement flag should be ‘Y’ else ‘N’.

CASE

CASE when a>b then x else z end: If a>b then x else z

The CASE function should always be the first choice. It is the best compromise between brevity and clarity of the expression.

case when add_months(birthday,12*65) >= trunc(sysdate)
     then 'Y'
     else 'N'
end

The add_months function will give us the day when the age of 65 is reached (retirement age). It will also consider some special calendar effects. For example a person born on 29th Feb. 1953 will get the flag already on 28th Feb. 2017.

We have to be very careful and check some special dates to make sure that our logic is the one we are looking for. For simplicity I assume that this logic for calculating the retirement candidates is correct.

CASE also has a second syntax (simple case expression). But this allows only to compare for equality.

CASE a when b then x else z end: If a=b then x else z

The syntax allows to stack multple case statements. But most of the times this is not needed. Instead use several WHEN sections. And we can profit from short-circuit evaluation.

case when a>b then x 
     when a>c then y
     when d

Short-circuit evaluation here means that if a>b=true the next conditions are not checked. Most importantly y, z and q are never calculated. And usually c,d and f also not. There are a few special exceptions. See this forum thread Oracle bug? Aggregates in CASE expressions negate short-circuit evaluation. for a discussion about the behaviour.

Code written using a CASE statement can sometimes get quite lengthy. But it is very close to natural language and therefore easy to maintain. CASE should be the first choice when conditional logic is needed.

DECODE

DECODE(a,b,x,c,y,z): if a=b then x elsif a=c then y else z

DECODE is nice if we need to check multiple values and provide an alternative value instead. I still like to use it in the ORDER BY clause to do rankings.

DECODE is not part of ANSI SQL and can only be used in Oracle databases.

If the condition is more complex than a simple equality comparison, then a combination of DECODE and SIGN can be used. Especially if numeric values play a role.

decode(sign(add_months(birthday,12*65) - trunc(sysdate),
        1, 'Y'
       -1, 'N'
        0, 'Y'))

The expression “add_months(birthday,12*65) – trunc(sysdate)” returns a positive value if the person is older than 65 and a negative value if he/she is younger.

DECODE+SIGN was used very frequently before CASE was introduced in Oracle 8 or 9. You still find it sometimes in older code.

DECODE also can compare with NULL values.

NULLIF

NULLIF(a,b): If a=b then NULL else a

This function returns a NULL value if input A matches Input B otherwise A is returned.

The following (slightly constructed) example will return the birthdate for all persons that reached retirement age. But NULL for those who didn’t. The GREATEST function is used to help us to level the values.

nullif(birthdate,
       greatest(add_months(trunc(sysdate)+1,-65*12),
                birthdate)
       )

It is not so useful for the choosen example. But it is very handy if we want to ignore some values. Especially in combination with aggregation functions, since they ignore NULL values during the aggregation.

NVL or NVL2 or COALESCE

NVL(a,b): If a is null then b else a
NVL2(a,b,c): If a is null then c else b
COALESCE(a,b,c,d): 
   If a is not null then a 
      elsif b is not null then b 
         elsif c is not null then c 
            elsif d is not null then d

NVL is often used for providing a default value, in case no value was found.
If the birthday of a person is an optional field, we might want to consider setting a default (=standard) age, for our calculation.

nvl(birthdate,date '1980-01-01')

The NVL2 syntax is a little less known but it is very useful and should be remembered.

COALECSE is very similar to NVL. It returns the first value that is not null. But it can be used for checking more than one value. A major advantage is that the second and later expressions are only checked if the first expression is NULL (short-circuit evaluation). This can give an performance advantage over NVL if the second expression is expensive. This performance advantage is why many programmers always prefer COALESCE over NVL. However there are also some special compiler optimizations that do kick in for NVL and not for COALESCE. As a rule of thumb I would stick with NVL if we have a simple second value. I use COALESCE if more than one value needs to be checked or if the second value is complex (like a plsql function or a subquery).

return the most recent activity (lastest date) for each shipped order

coalesce(arrival_date, shipping_date, sent_date, creation_date) as latest_date

LNNVL

LNNVL(a=b): if not(a=b) or a is null or b is null then true

LNNVL is a strange and hard to understand function. It negates a condition. It is used internally by Oracle to rewrite certain queries (not in into not exists). Contrary to all other functions LNNVL returns a boolean result and can be used in the where clause .

One usage is if we want to make sure two values are different and still want a result even if one of the values is null. So it can ease the working with NULL values.

This select returns something if the values a and b are different.

select * from dual
where LNNVL(a=b);

SIGN

See decode.

The SIGN function can be used to rephrase a non-equal comparison into an equality comparison.

if a > b then

is (for numbers only) mathematically the same as

If SIGN(a-b) = 1 then

SIGN is a typical helper function for DECODE. Since DECODE can only compare for equality, SIGN helps to enhance that to do more complex comparisons.

ABS

ABS is sometimes used in combination with SIGN. In rare cases it avoids sorting the input data for the SIGN function.

if a != b then

is (for numbers only) mathematically the same as

if sign(ABS(a-b)) = 1 then

Also much easier would be:

not(a=b)

ABS => Not really useful anymore.

GREATEST or LEAST

GREATEST(a,b,c): if a > b then a else b => result1; if result1 > c then result1 else c)

See NULLIF example.

GREATEST give us the maximum value from a list of values. LEAST gives us the minimum. GREATEST and LEAST can be used to harmonize certain values and then to allow an equality comparison for them. As such they are similiar to SIGN. However SIGN works only with numerical data, whereas GREATEST|LEAST can also be applied to strings.

As with all functions we have to be careful about NULL values. Remember NULL means “UNKNOWN”. So if one of the values in the list is NULL, then GREATEST|LEAST will return NULL (UNKNOWN).

further considerations

Of cause it is also possible to do conditional logic using DML commands.
The WHERE clause of the DML command matches the IF part, the SET part of an update, matches the THEN part. For a delete command the THEN part is to delete the object.

For example a procedural requirement like
“if the data is older than 3 years then delete it”
can be translated into sql like this:

Delete from myData
where insert_date < add_months(trunc(sysdate),-3*12)

As we can see the procedural requirement can be translated into SQL in a very elegant and straight forward way.

Another point to remember:

If you think a CASE expression is needed in the where clause you are most probably wrong. One notable exception to this rule is, if you want to access a function based index (fbi). If a function based index uses a CASE function, then you must use exactly the same function inside your where clause, to be able to profit from this index.

Conclusion

SQL can do conditional logic. The first place to look for it is the WHERE clause. Conditional expressions can be done using the SQL functions CASE, NVL, COALESCE, NVL2, DECODE and NULLIF (in that order). Other functions can help to adapt conditional logic to the specific business requirements.

The EBnoR Manifesto

The “EBnoR” Manifesto

about

Edition based not only redefinition

Author: Sven-Uwe Weller

suw_logo

ceo syntegris information solutions GmbH

Germany

 

 


Mission Statement

Edition based redefinition

EBR – is not well known.

It is barely used.

Edition based not only redefinition

EBnoR – will change that.

EBR is a tool with unique possibilities.

EBnoR deals with cases

beyond

the originally intended scope.

EBR is difficult to use.

EBnoR is easy to use.

EBnoR avoids complexities.

EBnoR is powered by the strengths of EBR.

 

 

 


EBR Basics

Edition based redefinition is a unique feature of the Oracle database that is available since version 11.2. It can be used at no additional license costs in all database editions (EE, SE, SE1, SE2, XE).

The term “edition” is used to describe a set of plsql based objects. Among those object types are plsql packages, views and synonyms. A complete list is here in the Oracle documentation.

A different edition can hold different code for the same plsql object. In essence each edition resembles a version of your database application code. EBR allows to store and run different versions of your application at the same time. The different code versions do exist inside the database at the same time.

This opens up a whole set of some very special possibilities that usually do not exist in other environments. EBR allows to do an upgrade to a new application release without shutting down the application. The new code version is installed in a new edition. Currently running sessions will still work with the old release. Deployment problems can be tested using the new release. Once those issues are solved, then the new edition is made available (as the new default edition). Only when a user ends his (database) session and reconnects then the new code version from the now new default edition will be used.

distinction to VCS

EBR should not be confused with a version control system like SVN or GIT. Although there are some similarities it serves a different purpose. A version control system supports the development and deployment process for a team of programmers. They can store and merge code there. Forks and code branches can be used to support development that temporarily goes into different directions. Code merges allow to combine branches again. This is something that EBR can and will not do. EBR is about running the different code versions in parallel. Application end users profit from EBR, not developers. At best an edition resembles a release in the version control system.

Sven says: A VCS saves and documents code, EBR executes code.

versions of data

Data resides in tables. Tables are not editionable objects. That means two things. Changes to table structures are not editionable directly. And the data itself is not subject to code release changes. That is a good thing. The data and the code that works with the data is separated.

Creating a new edition means that code is duplicated. Code can be duplicated easily,  duplicating data is a much bigger issue.

Sometimes a code change requires to change existing data. This increases the complexity to run the old and the new code version at the same time. EBR offers solutions using cross-edition triggers and editioning views. How to use cross edition triggers is described in Bryn Llewellyn’s excellent white paper from 2009 (http://www.oracle.com/technetwork/articles/grid/edition-based-redefinition-1-133045.pdf). Read and understand this paper first before you go on and read this manifesto.

“Executive Overview

Large, mission critical applications built on Oracle Database 11g Release 1 and earlier versions are often unavailable for tens of hours while the application’s database objects are patched or upgraded. Oracle Database 11g Release 2 introduces edition-based redefinition, a revolutionary new capability that allows online application upgrade with uninterrupted availability of the application. When the installation of the upgrade is complete, the pre-upgrade application and the post-upgrade application can be used at the same time. Therefore an existing session can continue to use the pre-upgrade application until its user decides to end it; and all new sessions can use the post-upgrade application. As soon as no sessions are any longer using the pre-upgrade application, it can be retired. In other words, the application as a whole enjoys hot rollover from the pre-upgrade version to the post-upgrade version.

This whitepaper explains how edition-based redefinition works, and how to write online application upgrade scripts using this capability, at the level of detail needed by engineers who will write such scripts.”


the current status

EBR is in the market for a long time now. Still many developers and DBAs do not know about it. If they know about it, they are not using it actively. I identified a set of problem areas that are responsible for this current status of EBR. I see possible solutions. Using EBR the focus should not only be on redefining the code, instead EBR can open up a set of new and unique possibilities. Zero downtime application upgrades are only one of them. I call the set of solutions “Edition based not only redefinition”  or short EBnoR.

problem areas under the influence of intelligent people (developers and DBAs)

the deployment dilemma shortly before GO-live it is too late to think about EBR
the design change gamble keep changes required by EBR at a minimum, design with EBR in mind
the zero downtime promise Perfectness – if even possible – is expensive. Don’t go for perfectness, almost perfect is good enough.
the cross edition complexity Cross edition triggers are needed for extrem cases. Huge effort for only a tiny result.

problem areas influencing decision makers (architects and project managers)

the cross plattform idependency nonsense EBR is only available for Oracle databases
the marketing confusion Editions are not license editions

EBnoR does offer solutions to each of those problem areas. Using small sidesteps from Oracles recommend path will lead to major improvements in development time, flexibility and ultimately results in a quicker learning curve.

The “deployment dilemma”

The advantages of EBR shine during one of the later steps in the software development lifecycle – the deployment phase. The dilemma is that at such a late point the additional programming effort to use EBR competes with adding more features or bug fixing of the application.

Oracle propagates that the advantage of using EBR to do application upgrades with minimal downtime is so big, that it is worth the additional effort and time. This is true in very rare cases only and mostly only for very large companies than can effort to spent money on this additional effort.

Problems

The problem is the decision point. Budget is reserved for a specific change in one application. There never is enough budget. There never is additional budget. Especially not shortly before the next application upgrade will go live. At this point there is not enough budget and not enough time.

EBR is a tool not an application itself. As such most companies hesitate to set up an extra project or a major software change just for using a tool.

Solution

The decision needs to be moved from budget owners to the developers and to an early point during the software life cycle. Only then will EBR find more integrations.

EBnoR: Start with EBR from day 1 in the project!

Use editions already during development!

The “design change gamble”

EBR requires some changes to the data design and to the schema.

One requirement is to add a view layer on top of all tables. Among others this allows to use cross edition triggers to do data changes. This is certainly a solid approach, however it is a major change to the data model. As such this will face critic from various other parties involved.

Typical arguments that are raised against such a change are:

  • Performance will suffer because of the view
  • The views do not confer to our naming conventions
  • The views hide complexity
  • Some tools work (better) with tables instead of views
  • Enabling EBR can not be undone

The gamble is that EBR will take the blame if implemented at a final stage during development. If any issue arises, e.g. performance in the production environment drops, some will argue that the reason is in the additional view layer. Just because that was the only thing that was changed. It might be hard to prove otherwise, as long as comparable performance data is missing.

Editions can be dropped. Switching on editioning in the database has no implications on using the feature or removing editioning features.

Solution

EBnoR: Avoid major design changes!

Use a view layer if possible. If not, still use EBR without this extra layer.

Test early while using editioning!

The “zero downtime promise”

Absolute zero downtime is very hard to reach. Even a simple ALTER TABLE statement will lock the full table. During that the table is not available. Sessions will wait (which is a different term for downtime)  or even break.

Focus on changes that have a minimal impact on downtime. For example adding a column is usually no problem. Even if the column is added to the old application without adding the new code it should not break anything. Write your application in such a way that added columns will not break existing code. This is easy to do. If the column is mandatory always provide a default value. Updating the data in this column might need more time. Dropping a column should be avoided because this usually breaks old application versions.

If the normal downtime window is a few days, then using EBR can change that window to less than an hour by focusing on the DML changes (tables mostly). For a normal user such a small downtime window is the same as zero. He will often not even notice such a short break. This does not mean that the new application version needs to be available after an hour. But just that after an hour the user is able to continue to work with the old version. After that downtime window the new edition can be installed, tested and then a gradual switch over of user sessions can be done.

Zero downtime is expensive. Even with EBR.

Solution

EBnoR: Aim for very low but not for zero downtime.

Don’t promise zero to users!

The “crossedition complexity”

Bryns paper describes how to use cross edition triggers to handle data changes between one application version and the next one. There is a major intrinsic problem with that. Let us first investigate how cross edition triggers work.

Cross edition triggers allow to manipulate data in one edition and add transformation logic so that the same dataset is correctly represented in the next or previous edition.

  • “A forward crossedition trigger is fired by application DML issued by sessions using the pre-upgrade edition. Such a trigger is used to implement transformations from the old representation forwards into the new representation.“
  • “A reverse crossedition trigger is fired by application DML issued by sessions using the post-upgrade edition. Such a trigger is used to implement transformations from the new representation backwards into the old representation.”

The problem is not how to write a trigger, but in the complexity of forward and backward data manipulations.

Sometimes the data manipulation needed to go to the next application version is simple, but can not be automated easily. Manual data cleansing actions can be such a case.

Some scenarios do not have a simple 1 to 1 transformation rule from data in the edition A and data in edition B. To do the forward transformation is usually part of the project. Very often it is much harder to provide the backward transformation. Sometimes this is only possible while keeping the old data structure thereby duplicating and replicating data. This defies the original reason for change.

The EBR solution is to have this change only temporarily. Therefore old editions should be dropped very soon after the application was switched to the new code (and data) version.

Problems

  • Often requires to store several versions of the same data. Usually in different columns.
  • Need to automate data transformation rules
  • Huge additional effort not related to a better application
  • Multiple editions require lots of cross edition triggers

Solution

Start slow and avoid cross edition data transformations!

This needs some explanation and examples.

Example

For example if a design decision was made to trim the time component from all date values in a column. So that instead of a range comparison an equal comparison can be made.

Old code:

dateCol >= trunc(:searchDate) and dateCol < trunc(:searchDate) + 1

New code:

dateCol = trunc(:searchDate)

The typical change includes a DML statement that updates the column using TRUNC.

update myTable
set dateCol = trunc(dateCol)
where dateCol != trunc(dateCol);

This change can not be undone because information (the time component) is lost.

So to write a reverse crossedition trigger to mimic the exact same source data is difficult.

One solution could involve keeping the old column including time precision and adding a second column that holds only the truncated date value. Then add (or change) a editioning view in the old and new edition that serves as an api to the relevant column. A forward crossedition trigger would truncate the data that is inserted using the old application version. But without knowing where the time component is coming from, it would be impossible to write a reverse edition trigger that adds the time correctly. A default logic would be needed. Maybe depending on some other values. Like the time should be between the time from previous and following rows. So we need several new objects, a fairly complex code to do the reverse change, the data model is more complex than the simple update solution but above all data would be duplicated.

But why care? In this specific example the old application code might still work, even if the time component is truncated. So there a two possible easy ways to avoid complexity

  1. run the update several times, for example after each batch load of new data
  2. add a normal db trigger in the old edition that truncates all newly inserted date values
:new.dateCol := trunc(:old.dateCol)

Yes this would require a change in the old edition and as such would change old application code. If this can not be done because of organisational hurdles, then create an intermediate “preparation” edition and add the trigger there.

Consider a step by step approach. There are many situations out there, where we can get away without the need to implement additional cross edition logic. Avoiding this helps to get EBR started and become acquainted with the feature. Once we got used to it, we can add more – eventually even using cross edition logic for specific use cases.

EBnoR: Do not depend on editioning views, do not use cross edition triggers.

Avoid deleting objects, don’t hesitate to add objects (especially columns).

If you need to delete objects this will influence all editions. After careful consideration just do it. After all, rules are there to be broken. Announce it as a major application version which will take a small downtime. Do not let the zero downtime paradigm make you write complex code. Complex code in the end costs more in terms of maintainability and testing effort.

learning curve considerations

A learning curve comparison shows how a reduced approach will ease the first steps into the feature.

  • plsql code versions: start using EBR already during development. Then the developers get used to working with editions. Building a new release and deploying it in a specific edition then is nothing strange anymore. Do development and tests in the same database but in different editions. Each step in the development lifecycle deserves a new edition.
  • design changes: avoid changes in the data model that would influence old editions in a negative way. For example avoid dropping columns. Also write your code in a way it does not break when a new column is added to a table. Model your data first before you start to build a new application.
  • low or zero downtime: often a tiny downtime is ok. If the aim is to be fast, but not to be perfect then the additional effort suddenly is way reduced.  Do not aim for perfectness!
    Example: Adding a column to a table requires that the table is locked for a moment. Especially if the new column is not null and is filled with some extra values. Other session might wait for this. If the table is very very large such a lock can take some time. If it takes a  minute thats fine. It will not be zero downtime, but almost every project can effort to wait a minute during a new deployment.  If it takes half a day, then you must invest effort to optimise it.
  • cross edition data: avoid using cross edition triggers. They serve only a very specific very special purpose. Build your data changes in a way that you do not depend on cross edition triggers. The need for cross edition triggers is the major contributor to why there is a increased programming effort when EBR is used.

The “cross platform independency nonsense”

“EBR can not be used in our company because we want to stay database vendor independent”

Such an argument can often be heard from java developers or other programmers outside of the database world.

An analogy

You brought a Porsche 911. This is an expensive but very fast car. However you choose not to drive it in 5th and 6th gear because there are other cars out there that do not have a 5th and sth gear. Does this make sense?

In my opinion the claim for database independency is often made to hide the incompetency or inability of the developer to learn about the unique features that their specific database allows. In general it is easy to structure code in such a way that vendor specific features are encapsulated and properly modularized. This will made future changes to another database less problematic.

This is also true for EBR. If you decide to enable and use editions you can do it in such a way that all the specific implementation details are hidden from the normal code. The developers and the DBAs should be “edition aware”. But after some initial setup, there is no extra effort that EBR requires. In general you would simply use a connect string that connects to the edition of your choice. There is almost no need to consider editions directly in the code.

There are a few exceptions. Like if you want to start a scheduled job then you proabably want to start that job in the same edition as the session that is currently running and creates that job. This is possible using a job class using a service that is tied to an edition. Or more general: Everytime you have a kind of client access to your database, you want to make sure that you can influence the edition that is used at connection time.

Solution

Modularize and structure your code well!

Immediately counter the argument by using strong analogies.

The “marketing confusion”

I think the EBR name is suboptimal (nicely spoken). First of all the term “edition” is confusing, because it is also used as a name for database editions (EE, SE, XE). The R stands for “redefinition”. The term is a poor description of adding a new code release to your application.

I admit I’m not creative enough to suggest a better name, but as an analyst I’m able to pinpoint a flaw when I see one.

Nowadays every “thing” or “technology” has a smart speaking name and a funny logo.  The name usually is also an abbreviation, and one could think this is only of minor importance. It is not. This is the name that feature/tool goes by. This name and logo is used in slides, presentations, and social media channels.

some examples 

Node JS packages: https://www.npmjs.com

Here are just a few of them. But every of those javascript packages has a small but recognisable logo added.

ebnor_npm_packages

Architecture components for Spark: https://www.linkedin.com/pulse/hadoop-summit-2015-takeaway-lambda-architecture-laurent-bride

ebnor_hadoop_summit_logos

Just look at all those small but important looking pics in this simple architecture diagram. It makes you feel as if you missed something important, if you don’t know them all.

Problems

  • EBR is hard to pronounce and to spell out.
  • The term “edition” is already established in the Oracle namespace. And usually it stands close to “license costs”.
  • A nice enchanting logo is missing.

Suggestions

  • EBnoR / EbanoR = Edition based not only Redefinition
  • Edi
  • ebar
  • PRETI = plsql runtime environment and test integration
  • or something completely different = OSCD (could be in use already by some other organisation)

Solution

Oracle, get some logo wizards and marketing people to work on that!

 

 

 

 


beyond EBR

The EBR feature allows to do more than the standard model intended by the Oracle development team.

The standard model is

  1. install a new application version in a new edition in production
  2. test if the deployment went well
  3. switch the current edition to the new edition
  4. after some short grace period drop the old edition

During this whole time the impact on connected users is minimal.

This is great!

But there is more…

Using EBR in certain ways can open up new possibilities and ease the upgrade pain.

New possibilities include

restore and run old code

Assume a problem is reported, where you suspect that the reason is gone meanwhile. But the problem existed inside some older code version. Useing EBR it is now possible to quickly prove that assumption. Create a new edition in the environment where you want to reproduce the issue (usually DEV, but might be some integration or QA test database). Fetch the old plsql code from your versioning system and add it into this new edition.

Run the test in both editions to see if the problem occurs with the old code but not with the new code.

After that test simple throw away the whole edition (drop edition cascade).

develop and test in the same database

Build releases in ORA$BASE

=> test in edition TST$BETA

=> develop in edition DEV$ALPHA.

This is something I discovered when actively developing and working with EBR. It is one of the best things to do. It might even save license costs because you can eliminate a complete test database. Although some test databases do not need additional licenses. It does not mean you can save up upon tests. But certain types of tests can profit hugely from EBR. Testers use one edition, developers use another (child) edition. Developers can actively analyse bugs that testers found without the need to move data from the test to the dev database. And developers can fix bugs without disturbing tests running in the same database.

Build the test environment automatically from your versioning system.

provide backward compatibility

This is often useful for API development (see also ThickDB | SmartDB paradigm).

APIs provide a specific interface (usually views and plsql packages). APIs expose a limited set of objects and methods for a specific task. APIs do not expose data directly. This is were EBR shines. All those objects are editionalble objects!

Often we have multiple and different consumers using the same api. If API functionality is enhanced, then we can provide the newer version of the API in a new edition. If we keep the old edition instead of dropping it almost immediately, consumers can still use their old code without the need to immediatly switch to the new version.

As long as cross edition triggers can be avoided, there are no major problems with keeping multiple API versions for a longer period of time.

Tipp: All editioned objects should be compiled in each edition even if there was no change. This actualizes a separate code version in each new edition.

 


Recommendation

the EBnoR (Edition Based not only Redefinition) approach

  • Start using EBR on day 1 in the project
  • Aim for low but not zero downtime
  • Avoid (complex) cross edition logic
  • Develop and test in the same database in different editions

 

 


Some final words

I had planned to publish this manifesto for quite a long time time (several years now). Since I started to put down those thoughts, EBR was enhanced. New recommendations have been made by Oracle plsql development. Other people, including the AskTom Team, encountered similar things with EBR. Some of those ideas and recommendations match with EBnoR. Still I expect EBnoR to be very controversial. This is intended.

For sake of brevety I skipped a larger part of examples and direct code demos.

EBR also still has issues where I see no immediate solution. For example editions are database level objects. As a developer I would prefere them to be schema level objects. The problems connected to this are beyond the scope of even EBnoR.