Basic SQL: IF..THEN..ELSE

How to write conditional logic in SQL

This is a question that sometimes is asked by programmers who just started useing SQL.

If-Then-Else-diagram.svg
By P. Kemp – own work created using Dia, CC0, Link

Introduction

Programatic 3rdGL languages all have an IF..THEN..ELSE construct.

Pseudocode:

IF #condition# THEN #doSomething# ELSE #doAnotherThing#

Since SQL is a 4thGL language it has no need for conditional execution. This is sometimes surprising for beginners. IF..THEN..ELSE is very procedural thinking.

We do not tell SQL how to reach a certain result, instead we describe the wanted result.

Although a conditional execution is not needed, there are cases that come quite close. If we want to distinguish data dependend on other data, this can be done using conditional logic or conditional expressions.

In general expressions can not #doSomething# but they return a value (=data). SQL is all about data. A conditional expression in SQL is showing different data depending on other data.

SQL constructs

Here is a (non complete) list of different ways how we can write IF..THEN..ELSE in SQL. Be aware that most of those expressions are only ever useful in the SELECT clause of a query. The WHERE clause can do conditional logic simply by using AND+OR+NOT+() combinations. We do not need extra functional expressions there.

The functions towards the end of the list are a bit of an obscure choice for doing conditional logic. However they are useful to remember for some special scenarios.

  • CASE
  • DECODE(Oracle)
  • NULLIF
  • NVL or NVL2 or COALESCE
  • LNNVL
  • SIGN
  • ABS
  • GREATEST or LEAST

Each of those functions have advantages and disadvantages. I will try to give an indication where the usage seems proper.

All functions have a “translated” syntax in the following form.

SQL syntax: procedural code

The following business case is used
Requirement:
If a person is older than 65 years it should get a pension.

Or to say it in more technical terms (specification):
If today the age of a person – based upon its birthdate – is equal or greater than 65, then the retirement flag should be ‘Y’ else ‘N’.

CASE

CASE when a>b then x else z end: If a>b then x else z

The CASE function should always be the first choice. It is the best compromise between brevity and clarity of the expression.

case when add_months(birthday,12*65) >= trunc(sysdate)
     then 'Y'
     else 'N'
end

The add_months function will give us the day when the age of 65 is reached (retirement age). It will also consider some special calendar effects. For example a person born on 29th Feb. 1953 will get the flag already on 28th Feb. 2017.

We have to be very careful and check some special dates to make sure that our logic is the one we are looking for. For simplicity I assume that this logic for calculating the retirement candidates is correct.

CASE also has a second syntax (simple case expression). But this allows only to compare for equality.

CASE a when b then x else z end: If a=b then x else z

The syntax allows to stack multple case statements. But most of the times this is not needed. Instead use several WHEN sections. And we can profit from short-circuit evaluation.

case when a>b then x 
     when a>c then y
     when d

Short-circuit evaluation here means that if a>b=true the next conditions are not checked. Most importantly y, z and q are never calculated. And usually c,d and f also not. There are a few special exceptions. See this forum thread Oracle bug? Aggregates in CASE expressions negate short-circuit evaluation. for a discussion about the behaviour.

Code written using a CASE statement can sometimes get quite lengthy. But it is very close to natural language and therefore easy to maintain. CASE should be the first choice when conditional logic is needed.

DECODE

DECODE(a,b,x,c,y,z): if a=b then x elsif a=c then y else z

DECODE is nice if we need to check multiple values and provide an alternative value instead. I still like to use it in the ORDER BY clause to do rankings.

DECODE is not part of ANSI SQL and can only be used in Oracle databases.

If the condition is more complex than a simple equality comparison, then a combination of DECODE and SIGN can be used. Especially if numeric values play a role.

decode(sign(add_months(birthday,12*65) - trunc(sysdate),
        1, 'Y'
       -1, 'N'
        0, 'Y'))

The expression “add_months(birthday,12*65) – trunc(sysdate)” returns a positive value if the person is older than 65 and a negative value if he/she is younger.

DECODE+SIGN was used very frequently before CASE was introduced in Oracle 8 or 9. You still find it sometimes in older code.

DECODE also can compare with NULL values.

NULLIF

NULLIF(a,b): If a=b then NULL else a

This function returns a NULL value if input A matches Input B otherwise A is returned.

The following (slightly constructed) example will return the birthdate for all persons that reached retirement age. But NULL for those who didn’t. The GREATEST function is used to help us to level the values.

nullif(birthdate,
       greatest(add_months(trunc(sysdate)+1,-65*12),
                birthdate)
       )

It is not so useful for the choosen example. But it is very handy if we want to ignore some values. Especially in combination with aggregation functions, since they ignore NULL values during the aggregation.

NVL or NVL2 or COALESCE

NVL(a,b): If a is null then b else a
NVL2(a,b,c): If a is null then c else b
COALESCE(a,b,c,d): 
   If a is not null then a 
      elsif b is not null then b 
         elsif c is not null then c 
            elsif d is not null then d

NVL is often used for providing a default value, in case no value was found.
If the birthday of a person is an optional field, we might want to consider setting a default (=standard) age, for our calculation.

nvl(birthdate,date '1980-01-01')

The NVL2 syntax is a little less known but it is very useful and should be remembered.

COALECSE is very similar to NVL. It returns the first value that is not null. But it can be used for checking more than one value. A major advantage is that the second and later expressions are only checked if the first expression is NULL (short-circuit evaluation). This can give an performance advantage over NVL if the second expression is expensive. This performance advantage is why many programmers always prefer COALESCE over NVL. However there are also some special compiler optimizations that do kick in for NVL and not for COALESCE. As a rule of thumb I would stick with NVL if we have a simple second value. I use COALESCE if more than one value needs to be checked or if the second value is complex (like a plsql function or a subquery).

return the most recent activity (lastest date) for each shipped order

coalesce(arrival_date, shipping_date, sent_date, creation_date) as latest_date

LNNVL

LNNVL(a=b): if not(a=b) or a is null or b is null then true

LNNVL is a strange and hard to understand function. It negates a condition. It is used internally by Oracle to rewrite certain queries (not in into not exists). Contrary to all other functions LNNVL returns a boolean result and can be used in the where clause .

One usage is if we want to make sure two values are different and still want a result even if one of the values is null. So it can ease the working with NULL values.

This select returns something if the values a and b are different.

select * from dual
where LNNVL(a=b);

SIGN

See decode.

The SIGN function can be used to rephrase a non-equal comparison into an equality comparison.

if a > b then

is (for numbers only) mathematically the same as

If SIGN(a-b) = 1 then

SIGN is a typical helper function for DECODE. Since DECODE can only compare for equality, SIGN helps to enhance that to do more complex comparisons.

ABS

ABS is sometimes used in combination with SIGN. In rare cases it avoids sorting the input data for the SIGN function.

if a != b then

is (for numbers only) mathematically the same as

if sign(ABS(a-b)) = 1 then

Also much easier would be:

not(a=b)

ABS => Not really useful anymore.

GREATEST or LEAST

GREATEST(a,b,c): if a > b then a else b => result1; if result1 > c then result1 else c)

See NULLIF example.

GREATEST give us the maximum value from a list of values. LEAST gives us the minimum. GREATEST and LEAST can be used to harmonize certain values and then to allow an equality comparison for them. As such they are similiar to SIGN. However SIGN works only with numerical data, whereas GREATEST|LEAST can also be applied to strings.

As with all functions we have to be careful about NULL values. Remember NULL means “UNKNOWN”. So if one of the values in the list is NULL, then GREATEST|LEAST will return NULL (UNKNOWN).

further considerations

Of cause it is also possible to do conditional logic using DML commands.
The WHERE clause of the DML command matches the IF part, the SET part of an update, matches the THEN part. For a delete command the THEN part is to delete the object.

For example a procedural requirement like
“if the data is older than 3 years then delete it”
can be translated into sql like this:

Delete from myData
where insert_date < add_months(trunc(sysdate),-3*12)

As we can see the procedural requirement can be translated into SQL in a very elegant and straight forward way.

Another point to remember:

If you think a CASE expression is needed in the where clause you are most probably wrong. One notable exception to this rule is, if you want to access a function based index (fbi). If a function based index uses a CASE function, then you must use exactly the same function inside your where clause, to be able to profit from this index.

Conclusion

SQL can do conditional logic. The first place to look for it is the WHERE clause. Conditional expressions can be done using the SQL functions CASE, NVL, COALESCE, NVL2, DECODE and NULLIF (in that order). Other functions can help to adapt conditional logic to the specific business requirements.