This is a collection of some small features, patterns or snippets, that I seem to use frequently and like to share. The features are written in no specific order. However we can structure the features into performance related [P], convenience [C] or enablers [E].
Enablers [E] would be features that allow us to do something, that is otherwise impossible or at least very hard to achieve. Sometimes even to avoid an error.
Convenience [C] features are things that offer an alternative construct. Often they can be used instead of another option. Sometimes making the code easier to change (maintainability).
Performance [P] features improve execution speed. Often they come with a drawback or one should keep exceptions or rare race conditions in mind.
1. find multiple words using
instead of writing multiple like conditions (or repeating the same select using
like multiple times), we can write a single
where (str like '%word1%' OR str like '%word2%' OR ...)
The pipe “|” inside the regular expression acts as the OR operator and the parenthesis () form the subexpression that is needed for that OR.
A very typical case is to query the v$parameter table for a set of parameter names.
select name, value from v$parameter where regexp_like(name,'(listen|dispatch|service)');
What I especially like about it, is that the need for wildcards is gone and it is so easy to add more “words” to search for.
2. fetch first row only [C]
Do you still use
rownum frequently? Why not use to the row limiting clause instead?
I use it a lot, mostly for ad hoc queries. One advantage is that the need to create an inner query that does the correct ordering disappears.
-- show the top 3 tables with the most rows select table_name, num_rows from user_tables order by num_rows desc fetch first 3 rows only;
3. use expression lists (a,b) [E]
I use this frequently in update statements but also sometimes in joins. Instead of setting each column separately we can set multiple columns at once. Simply by using parenthesis and a comma between the columns.
update persons u set (u.first_name, u.last_name) = (select a.firstname, a.name from applicants a where a.person_id = u.id) where u.status = 'NEEDS REFRESH';
Or we can filter on a combination of columns
(a,b) in ((c,d),(d,c),(e,f)). The need for this should be rare, but it happens. A strange edge case is when we do an equality comparison of two expression lists. For some reason the right side needs an additional set of ().
(a,b) = ((c,d))
4. DeMorgans law [C,E]
This might be the single most important math/logic rule, that SQL developers should know. It is usually applied inside the where clause and knowing it by heart will prevent a lot of easy mistakes.
written as a SQL expression
not(A or B) = not(A) and not(B)
not(A and B) = not(A) or not(B)
It is easy to remember. OR changes to AND when the parenthesis are resolved (and the other way round).
A and B here are full SQL expressions, for example x=y.
Why is that so important? SQL is a very logic based language. In daily speach we often use logical operators (AND/OR/NOT) differently than what is needed when they should be applied in SQL. This can become confusing very easily. Knowing DeMorgans law helps to quickly check if the logic is used correctly.
Task: “Find all employees that are not working in sales and marketing.”
Converting this sentence 1:1 into SQL would result in this:
select * from emp where not (department = 'SALES' and department = 'MARKETING');
Obviously what is ment, is that we want to find those employees that are working in some other department.
Applying Damorgans Law, we can reformulate our select statement. Also we replace
NOT (x=y) with x
select * from emp where department != 'SALES' OR department != 'MARKETING';
A developer should understand now, that this condition will always be true. For each row the department is either sales or not sales. And if it is sales, then it is not marketing. So the combination is always true (excluding NULL value considerations). Which is probably not what is wanted.
Here the row logic also plays a part. The where clause is applied to a single row, but normal speach often uses the boolean operation to combine data sets.
The correct query would be
select * from emp where not (department = 'SALES' OR department = 'MARKETING');
or even better use IN instead or OR
select * from emp where department not in ( 'SALES' , 'MARKETING');
Historic anecdote: Da Morgan was not the first one to discover this logic. Centuries before Da Morgan, a guy called William of Ockam already wrote about it. He probably had it from Aristotle. Occam (the spelling changed over the centuries) nowadays is more famous for his razor.
5. rollup and grouping sets [C]
To get a total row for a query that uses sum or count, simply add
select tablespace_name, count(*) from dba_tables group by rollup (tablespace_name);
TABLESPACE_NAME COUNT(*) DATA 362 DBFS_DATA 2 SAMPLESCHEMA 14 SYSAUX 1357 SYSTEM 1056 322 3113
The last line is the total number of tables. One problem here are tables without a tablespace (null). The
grouping() function helps to separate a normal row with a data value NULL from a superaggregate row (19c grouping).
If there are multiple columns then instead of
rollup I use
grouping sets. The full set (=total) in a grouping set expression can be expressed by ().
grouping sets +
select case when grouping(tablespace_name) = 1 then '-all-' else tablespace_name end as tablespace_name , case when grouping(cluster_name) = 1 then '-all-' else cluster_name end as cluster_name , count(*) from dba_tables group by grouping sets ((tablespace_name, cluster_name),());
TABLESPACE_NAME CLUSTER_NAME COUNT(*) 322 DATA 362 SYSAUX 1356 SYSAUX SMON_SCN_TO_TIME_AUX 1 SYSTEM 1020 SYSTEM C_RG# 2 SYSTEM C_TS# 2 SYSTEM C_OBJ# 17 SYSTEM C_COBJ# 2 SYSTEM C_MLOG# 2 SYSTEM C_USER# 2 SYSTEM C_FILE#_BLOCK# 2 SYSTEM C_OBJ#_INTCOL# 1 SYSTEM C_TOID_VERSION# 6 DBFS_DATA 2 SAMPLESCHEMA 14 -all- -all- 3113
Again the last line shows the total number of tables. Since I didn’t want to see subtotals for tablespace or cluster
grouping sets is the perfect solution t add this total row.
Also notice that the first line has all null values for the names, same as the total line would have. Using the
grouping function allows to find out which is the total line and give it a meaningful text.
6. enhanced listagg [E]
In the more recent database versions, the very useful LISTAGG command got even better. For production code I nowadays always try to remember to add some safety protection in place in case result of the aggregation becomes big. Otherwise it could happen to get ORA-01489: result of string concatenation is too long.
Since 12.2 we can avoid the error by using the
listagg (... on overflow truncate without count) ...
So instead of the error message, when the maximum string size is reached (4000 bytes or 32k bytes depending on
max_string_size parameter) we get usable text without the statement raising an error.
select count(*) , listagg(table_name,', ' on overflow truncate) within group (order by tablespace_name desc, table_name) all_tables from dba_tables;
|3113||AQ$_ALERT_QT_G, AQ$_ALERT_QT_H, AQ$_ALERT_QT, … many many more tables …, SDO_CRS_GEOGRAPHIC_PLUS_HEIGHT, SDO_CS_SRS, SDO_DATUMS, …(1304)|
The three dots “…” are called an ellipsis and can be configured.
without count would avoid writing the total number of entries to the end of the list.
with count is the default if
truncate is specified.
Although the overflow clause is very usfull, the ultimate goal would be to give the developer more influence over it. Recently there was an interesting twitter discussion around that topic.
Other useful enhancements (19c) were LISTAGG distinct. example on LiveSQL
A sys_context is something like a global variable in other languages. The normal context is for the session, but it is also possible to use application wide contexts.
Oracle provides several “preconfigured” contexts for us. The most common are ‘USERENV’ and ‘APEX$SESSION’ (for apex developers). Contexts are also used for security policies with VPD.
Here are the contexts that I frequently like to use
value set by dbms_application_info.set_client_info
when using edition based redefinition (ebr), this shows which edition I’m in. Always good to double check!
similar to pseudocolumn
user. The schema name that we connected with.
In some oracle versions (12.1) much faster than user, my recent test shows that this performance difference is now gone (19c)
When doing a proxy connect then this is the name of the authenticated user, not the target schema
useful when client_identifier is empty, for example the name of the Windows login account when using SQL Developer under Windows.
apex authenticated user
id of the current apex app
Is auditing enabled/disabled? Part of my custom adaptation/enhancement of Connor McDonalds Audit-Utility package
the session ID of the current database session
Side note: UPPER/lower case does not make a difference for contexts. Only for the values of the context.
Warning: when you start using your own custom contexts, be careful with the
grant create any context privilege. It can be a security risk! Revoke it after it was used.
8. hint /*+ driving_site */ [P]
I feel like this is a classic for me. In one of my longest running projects we had a lot of databases connected by database links. One of the most important tuning activities was to understand how to do remote joins in a performant way. The
driving_site hint was extremely important for that.
General rule: When doing remote queries, avoid mixing local and remote tables in one select statement. If only remote tables are in the statement always use the
Nowadays the need for this hint diminishes, mostly because databases are less connected by DB links, but more by webservices (which does not really improve performance, but that is a different story)
9. join via
This only works reliably when the naming conventions of the data model fit to it.
We have a table PERSON and a table ADDRESS. The primary key in PERSON is PERS_ID. The relationship column (FK) in ADDRESS is also called PERS_ID. It wouldn’t work as well with ADR_PERS_ID for example.
Then we can do a quick join like this:
select * from person join address using (pers_id) where address.type='MAIN';
There are some slight differences compared to the normal way using ON. Mostly one then can not differentiate anymore from which table the pers_id originates. For example you can not refer to person.pers_id in the where clause anymore.
Currently I use it exclusively for ad hoc queries, not in plsql code.
10. interval literals [C]
If I want to add some time to a date or a timestamp, I always use interval literals (unless it is a full day or a full month).
Let’s say we want to check tickets that were entered during the last hour. Using the date datatype we could easily calculate an hour by dividing a full day / 24. This is how many developers calculate time. And it is perfectly ok to do so.
select * from tickets where createddt >= sysdate - 1/24 ;
sysdate-1/24 resembles one hour before “now”.
This is how I prefer to write it:
select * from tickets where createddt >= sysdate - interval '1' hour ;
I feel the code documents itself in a much better way.
Additionally requirements like this easily change.
Let’s change both expressions to 15 minutes.
sysdate - 15/24/60
sysdate - interval '15' minute
Which one is better to read and was quicker to change?
Warning: Sayan Malakshinov in the comments mentions that we should not use interval literals (ymintervals) to add full months. I agree with that. Calendar arithmetic is complex – there are specific functions like
add_months for that. So do not use
interval '1' month.
Some more snippets or functions that didn’t quite make it into the list.
- the third parameter of
- query block hints [P]
- hierarchical with clause [C,E]
- lateral join [C]
alter sequence restart[C,E]
- cross join [C]
- hex format “XXXX” [E]
- analytic sum + correct window clause:
rows betweeninstead of
I hope I mentioned something that you didn’t know about before and liked reading about.
Here are some features/patterns that I’m currently not using myself, but where I feel that they might become quite important in the future.
- SQL Macros
- boolean in SQL (not implemented yet, but Oracle seems to be working on it – I expect some news about this very soon)
- median and other statistical functions like cume_dist, percentile_xxx, percent_rank
- with clause udf functions (still think they are overrated, but I might be wrong)
- analytical views