10 little Oracle SQL features – that you are probably not using enough

This is a collection of some small features, patterns or snippets, that I seem to use frequently and like to share. The features are written in no specific order. However we can structure the features into performance related [P], convenience [C] or enablers [E].

Enablers [E] would be features that allow us to do something, that is otherwise impossible or at least very hard to achieve. Sometimes even to avoid an error.


Convenience [C] features are things that offer an alternative construct. Often they can be used instead of another option. Sometimes making the code easier to change (maintainability).


Performance [P] features improve execution speed. Often they come with a drawback or one should keep exceptions or rare race conditions in mind.

1. find multiple words using regexp_like [C]

instead of writing multiple like conditions (or repeating the same select using like multiple times), we can write a single regexp_like expression.


Cumbersome:

where (str like '%word1%' OR str like '%word2%' OR ...)

Much easier:


where regexp_like(str,'(word1|word2|...)')

The pipe “|” inside the regular expression acts as the OR operator and the parenthesis () form the subexpression that is needed for that OR.

A very typical case is to query the v$parameter table for a set of parameter names.
example

select name, value 
from v$parameter 
where regexp_like(name,'(listen|dispatch|service)');
NAMEVALUE
service_namesfeno1pod
enable_dnfs_dispatcherFALSE
dispatchers(PROTOCOL=TCP) (SERVICE=feno1podXDB)
max_dispatchers
query result

What I especially like about it, is that the need for wildcards is gone and it is so easy to add more “words” to search for.

2. fetch first row only [C]

Do you still use rownum frequently? Why not use to the row limiting clause instead?

I use it a lot, mostly for ad hoc queries. One advantage is that the need to create an inner query that does the correct ordering disappears.


example

-- show the top 3 tables with the most rows
select table_name, num_rows
from user_tables
order by num_rows desc
fetch first 3 rows only;
TABLE_NAMENUM_ROWS
IMP_SD_2744-12_45_1819696
IMP_SD_2822-14_28_5319692
IMP_SD_194-09_40_5019545
Query result

3. use expression lists (a,b) [E]

I use this frequently in update statements but also sometimes in joins. Instead of setting each column separately we can set multiple columns at once. Simply by using parenthesis and a comma between the columns.

example

update persons u
set (u.first_name, u.last_name) = (select a.firstname, a.name 
                                   from applicants a 
                                   where a.person_id = u.id)
where u.status = 'NEEDS REFRESH';

Or we can filter on a combination of columns (a,b) in ((c,d),(d,c),(e,f)). The need for this should be rare, but it happens. A strange edge case is when we do an equality comparison of two expression lists. For some reason the right side needs an additional set of ().

(a,b) = ((c,d))

4. DeMorgans law [C,E]

This might be the single most important math/logic rule, that SQL developers should know. It is usually applied inside the where clause and knowing it by heart will prevent a lot of easy mistakes.

{\displaystyle {\begin{aligned}{\overline {A\cup B}}&={\overline {A}}\cap {\overline {B}},\\{\overline {A\cap B}}&={\overline {A}}\cup {\overline {B}},\end{aligned}}}
demorgans law

written as a SQL expression

not(A or B) = not(A) and not(B)

not(A and B) = not(A) or not(B)

It is easy to remember. OR changes to AND when the parenthesis are resolved (and the other way round).
A and B here are full SQL expressions, for example x=y.


Why is that so important? SQL is a very logic based language. In daily speach we often use logical operators (AND/OR/NOT) differently than what is needed when they should be applied in SQL. This can become confusing very easily. Knowing DeMorgans law helps to quickly check if the logic is used correctly.

An example

Task: “Find all employees that are not working in sales and marketing.”

Converting this sentence 1:1 into SQL would result in this:

select * from emp
where not (department = 'SALES' and department = 'MARKETING'); 

Obviously what is ment, is that we want to find those employees that are working in some other department.

Applying Damorgans Law, we can reformulate our select statement. Also we replace NOT (x=y) with x!=y .

select * from emp
where  department != 'SALES' 
    OR department != 'MARKETING'; 

A developer should understand now, that this condition will always be true. For each row the department is either sales or not sales. And if it is sales, then it is not marketing. So the combination is always true (excluding NULL value considerations). Which is probably not what is wanted.

Here the row logic also plays a part. The where clause is applied to a single row, but normal speach often uses the boolean operation to combine data sets.

The correct query would be

select * from emp
where  not (department = 'SALES' 
             OR department = 'MARKETING'); 


or even better use IN instead or OR

select * from emp
where  department not in ( 'SALES' , 'MARKETING'); 

Historic anecdote: Da Morgan was not the first one to discover this logic. Centuries before Da Morgan, a guy called William of Ockam already wrote about it. He probably had it from Aristotle. Occam (the spelling changed over the centuries) nowadays is more famous for his razor.

5. rollup and grouping sets [C]

To get a total row for a query that uses sum or count, simply add rollup.


example rollup

select tablespace_name, count(*)
from dba_tables 
group by rollup (tablespace_name);
TABLESPACE_NAME	COUNT(*)
DATA	        362
DBFS_DATA	2
SAMPLESCHEMA	14
SYSAUX	        1357
SYSTEM	        1056
	        322
	        3113

The last line is the total number of tables. One problem here are tables without a tablespace (null). The grouping() function helps to separate a normal row with a data value NULL from a superaggregate row (19c grouping).

If there are multiple columns then instead of rollup I use grouping sets. The full set (=total) in a grouping set expression can be expressed by ().

example grouping sets + grouping

select case when grouping(tablespace_name) = 1 then '-all-' else tablespace_name end as tablespace_name
     , case when grouping(cluster_name) = 1 then '-all-' else cluster_name end as cluster_name
     , count(*) 
from dba_tables 
group by grouping sets ((tablespace_name, cluster_name),());
TABLESPACE_NAME	CLUSTER_NAME	COUNT(*)
		                322
DATA		                362
SYSAUX		                1356
SYSAUX	SMON_SCN_TO_TIME_AUX	1
SYSTEM		                1020
SYSTEM	C_RG#	                2
SYSTEM	C_TS#	                2
SYSTEM	C_OBJ#	                17
SYSTEM	C_COBJ#	                2
SYSTEM	C_MLOG#	                2
SYSTEM	C_USER#	                2
SYSTEM	C_FILE#_BLOCK#	        2
SYSTEM	C_OBJ#_INTCOL#	        1
SYSTEM	C_TOID_VERSION#	        6
DBFS_DATA		        2
SAMPLESCHEMA		        14
-all-	-all-	                3113

Again the last line shows the total number of tables. Since I didn’t want to see subtotals for tablespace or cluster grouping sets is the perfect solution t add this total row.

Also notice that the first line has all null values for the names, same as the total line would have. Using the grouping function allows to find out which is the total line and give it a meaningful text.

6. enhanced listagg [E]

In the more recent database versions, the very useful LISTAGG command got even better. For production code I nowadays always try to remember to add some safety protection in place in case result of the aggregation becomes big. Otherwise it could happen to get ORA-01489: result of string concatenation is too long.

Since 12.2 we can avoid the error by using the OVERFLOW clause

listagg (... on overflow truncate without count) ...

So instead of the error message, when the maximum string size is reached (4000 bytes or 32k bytes depending on max_string_size parameter) we get usable text without the statement raising an error.

example

select count(*)
, listagg(table_name,', ' on overflow truncate) within group (order by tablespace_name desc, table_name) all_tables
from dba_tables;
COUNT(*)ALL_TABLES
3113AQ$_ALERT_QT_G, AQ$_ALERT_QT_H, AQ$_ALERT_QT, … many many more tables …, SDO_CRS_GEOGRAPHIC_PLUS_HEIGHT, SDO_CS_SRS, SDO_DATUMS, …(1304)

The three dots “…” are called an ellipsis and can be configured. without count would avoid writing the total number of entries to the end of the list. with count is the default if truncate is specified.

Although the overflow clause is very usfull, the ultimate goal would be to give the developer more influence over it. Recently there was an interesting twitter discussion around that topic.

Other useful enhancements (19c) were LISTAGG distinct. example on LiveSQL

7. Using sys_context [C,P]

A sys_context is something like a global variable in other languages. The normal context is for the session, but it is also possible to use application wide contexts.

Oracle provides several “preconfigured” contexts for us. The most common are ‘USERENV’ and ‘APEX$SESSION’ (for apex developers). Contexts are also used for security policies with VPD.

Here are the contexts that I frequently like to use

  • sys_context('userenv','client_identifier')
    value set by dbms_application_info.set_client_info
  • sys_context('userenv','current_edition_name')
    when using edition based redefinition (ebr), this shows which edition I’m in. Always good to double check!
  • sys_context('userenv','current_user')
    similar to pseudocolumn user. The schema name that we connected with.
    In some oracle versions (12.1) much faster than user, my recent test shows that this performance difference is now gone (19c)
  • sys_context('userenv','proxy_user')
    When doing a proxy connect then this is the name of the authenticated user, not the target schema
  • sys_context('userenv','os_user')
    useful when client_identifier is empty, for example the name of the Windows login account when using SQL Developer under Windows.
  • sys_context('apex$session','app_user')
    apex authenticated user
  • sys_context('apex$session','app_id')
    id of the current apex app
  • sys_context('trigger_ctl','audit_on')
    Is auditing enabled/disabled? Part of my custom adaptation/enhancement of Connor McDonalds Audit-Utility package
  • sys_context('userenv','sid')
    the session ID of the current database session

Side note: UPPER/lower case does not make a difference for contexts. Only for the values of the context.

Warning: when you start using your own custom contexts, be careful with the grant create any context privilege. It can be a security risk! Revoke it after it was used.

8. hint /*+ driving_site */ [P]

I feel like this is a classic for me. In one of my longest running projects we had a lot of databases connected by database links. One of the most important tuning activities was to understand how to do remote joins in a performant way. The driving_site hint was extremely important for that.

General rule: When doing remote queries, avoid mixing local and remote tables in one select statement. If only remote tables are in the statement always use the driving_site hint.

Nowadays the need for this hint diminishes, mostly because databases are less connected by DB links, but more by webservices (which does not really improve performance, but that is a different story)

9. join via using [C]

This only works reliably when the naming conventions of the data model fit to it.

example

We have a table PERSON and a table ADDRESS. The primary key in PERSON is PERS_ID. The relationship column (FK) in ADDRESS is also called PERS_ID. It wouldn’t work as well with ADR_PERS_ID for example.

Then we can do a quick join like this:

select * 
from person 
join address using (pers_id) 
where address.type='MAIN';

There are some slight differences compared to the normal way using ON. Mostly one then can not differentiate anymore from which table the pers_id originates. For example you can not refer to person.pers_id in the where clause anymore.

Currently I use it exclusively for ad hoc queries, not in plsql code.

10. interval literals [C]

If I want to add some time to a date or a timestamp, I always use interval literals (unless it is a full day or a full month).

example
Let’s say we want to check tickets that were entered during the last hour. Using the date datatype we could easily calculate an hour by dividing a full day / 24. This is how many developers calculate time. And it is perfectly ok to do so.

select * from tickets 
where createddt >= sysdate - 1/24 ;

The term sysdate-1/24 resembles one hour before “now”.

This is how I prefer to write it:

select * from tickets 
where createddt >= sysdate - interval '1' hour ;

I feel the code documents itself in a much better way.

Additionally requirements like this easily change.
Let’s change both expressions to 15 minutes.

sysdate - 15/24/60

sysdate - interval '15' minute

Which one is better to read and was quicker to change?

Warning: Sayan Malakshinov in the comments mentions that we should not use interval literals (ymintervals) to add full months. I agree with that. Calendar arithmetic is complex – there are specific functions like add_months for that. So do not use interval '1' month.

honorable mentions

Some more snippets or functions that didn’t quite make it into the list.

  • nvl2 [C]
  • the third parameter of to_char (nlsparams) [E]
  • query block hints [P]
  • hierarchical with clause [C,E]
  • lateral join [C]
  • alter sequence restart [C,E]
  • cross join [C]
  • lnnvl [C,E]
  • hex format “XXXX” [E]
  • sys.odcinumberlist [C]
  • analytic sum + correct window clause: rows between instead of range between [E]

Summary

I hope I mentioned something that you didn’t know about before and liked reading about.

Outlook

Here are some features/patterns that I’m currently not using myself, but where I feel that they might become quite important in the future.

  • SQL Macros
  • boolean in SQL (not implemented yet, but Oracle seems to be working on it – I expect some news about this very soon)
  • median and other statistical functions like cume_dist, percentile_xxx, percent_rank
  • match_recognize
  • with clause udf functions (still think they are overrated, but I might be wrong)
  • approx_count_distinct
  • analytical views

8 thoughts on “10 little Oracle SQL features – that you are probably not using enough

  1. Hi Swen,

    just my 2 cents:
    > 4. DeMorgans law [C,E]
    DeMorgans law is for binary logic, but SQL uses ternary logic, so you always need to analyze if there can be NULLs.

    >select * from emp
    >where not (department = ‘SALES’ and department = ‘MARKETING’);
    department can’t be ‘SALES’ and ‘MARKETING at the same time, so this query will return no rows. You need OR here.

    >9. join via using [C]
    Natural joins have a lot of cons, so I wouldn’t suggest it at all.

    >10. interval literals [C]
    Oh, intervals is a pretty big topic with a number of nuances 🙂
    for example:

    SQL> select sysdate-interval'1' month from dual;
    select sysdate-interval'1' month from dual
                  *
    ERROR at line 1:
    ORA-01839: date not valid for month specified
    

    or

    SQL> select dump(timestamp'2021-01-01 00:00:00'-1) dmp1 from dual;
    
    DMP1
    ---------------------------------
    Typ=13 Len=8: 228,7,12,31,0,0,0,0
    
    SQL> select dump(timestamp'2021-01-01 00:00:00'-interval'1'day) dmp2 from dual;
    
    DMP2
    -----------------------------------------------------------
    Typ=187 Len=20: 228,7,12,31,0,0,0,0,0,0,0,0,0,0,3,0,0,0,0,0
    
    • Hi Sayan,

      Thanks for your valuable feedback.

      1) You are right about the tenary logic and I (very briefly) also mentioned NULL.
      The following example was exactly to demonstrate how DeMorgans law helps to identify such logical errors.

      2) USING is not a NATURAL JOIN. I’m not suggesting to use natural joins.
      The main problem with NATURAL JOIN is this: I can change the outcome of a query, by adding another column that now happens to be in both tables. This won’t happen with USING.

      3) Oh yes, there are many side effects with date operations.
      And yes ymintervals are usually not recommended. However dsintervals work pretty well. Your last example is such a case:
      timestamp – number = date, but timestamp – dsinterval = timestamp.
      I certainly like to stay with the original datatype.

      Regards
      Sven (with “v” not “w” )

  2. Hello Sven,

    Just a short remark regarding SYS_CONTEXT:

    The exact equivalent of USER is SYS_CONTEXT(‘USERENV’,’SESSION_USER’).

    This may differ from SYS_CONTEXT(‘USERENV’,’CURRENT_USER’) for example if used inside
    a definer-rights subprogram.

    An optimizer problem related to using SYS_CONTEXT is that, when used as replacements for bind variables, they do not allow for bind variable peeking, as shown by Connor McDonald here:

    https://connor-mcdonald.com/2016/10/20/taking-a-peek-at-sys_context/

    I think that this is still the case in the later/current Oracle versions.

    Cheers & Best Regards,
    Iudith Mentzel

  3. Hello!

    Just a comment on regexp_like. Yes way simpler to write an expression this way, however it can take much more CPU to do a regexp_like than it would take to do the LIKE operation. Depending on the volume of data and user load, this may or may not be an issue. But I have been to shops where the use of regexp_like is basically forbidden because of the CPU consumption. Yes it could be used but you had to prove that it was really best way to get the data. Some expression are just unwieldy when trying to use the good old LIKE operator. Test test test and test again!

    Have a great day – Ric Van Dyke

    • Hello Ric,

      the simple expression that I showed, has no problem with CPU consumption. I would even assume that the alternative using multiple LIKE comparisons combined by OR will be a tiny bit slower.

      You are right to point out that certain other regular expressions can lead to excessive backtracking of the regexp engine. And these are expressions should be watched out for because they can burn your CPU like hell. Mostly it is the greedy quantifier (*) but the OR operator is not completely out of that problematic behaviour.

      I would not recommend to use an expression as this: “.*(abc|def){0,10}=.*” The danger of excessive backtracking is very high there. So at least not without proper testing.

      Regards Sven

  4. hello there and thank you for your information – I’ve definitely picked up something new from right here. I did however expertise a few technical issues using this web site, since I experienced to reload the website lots of times previous to I could get it to load correctly. I had been wondering if your web host is OK? Not that I’m complaining, but sluggish loading instances times will often affect your placement in google and can damage your quality score if advertising and marketing with Adwords. Well I am adding this RSS to my e-mail and can look out for much more of your respective interesting content. Ensure that you update this again very soon..

  5. For the regex word hunter: If Oracle’s regex support were fully Perl-compatible you’d be able to use “\b” to detect word boundaries. So you could use \b(this|that|the|other)\b and only get whole words (with “words” defined by \w, that is, alphanumerics and underscore).

    But alas, Oracle’s regexes don’t support \b. However they do support \w and its negation \W. So here’s the alternative that I’ve been using for years:
    (^|\W)(this|that|the|other)(\W|$)
    to find words that are delimited at each end by some non-word character, or else by either end of the string.

    • Hi Jason,

      thanks for you comment. Very good advice about word boundaries. In the context of my REGEXP_LIKE example I did mention “word” but I really meant any snippet of text. When we consider how an alternative search using LIKE looks like, we easily see the huge advantage REGEXP_LIKE gives us. The ability to use even more complex expressions – like the one you mentioned – is a welcomed additional bonus. Of cause we could also consider further alternatives like useing oracle text and the CONTAINS operator if the search logic becomes even more complex.

      Regards
      Sven

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.