plsql collection lookup with multiple keys

problem scenario

A recent question on otn asked how to access a plsql collection using multiple keys.

It is a typical scenario to build a second collection, to be able to access the main data in the first collection. The second collection fulfills the same role as an index on a table. It allows fast access to the main data record. I usually call the second collection an “index collection”.

Here is an article by Steven Feuerstein who explains the concept of index collections in more detail: https://blogs.oracle.com/oraclemagazine/on-the-pga-and-indexing-collections

But the forum question was not how to do a simple key => value lookup, but instead have two keys (based upon record set values) and use them to access the main collection.

solution

There are two general ways.
Combine the keys into a single key or build a nested collection. I will show a quick example for both ways.

Way 1) Combined key

We can consolidate the two keys into one single key. Typically using some delimiter. And then use that combined key for the index_collection.

combinedKey := key1||':'||key2;

Of cause we need to make sure the delimiter is some value that does not exist in any of the keys.

Example
If you try to copy and run this example please note that the syntax highlighter removed the label “build_index”
You might want to add this again, just after the “– index data” comment.

set serveroutput on
declare
  cursor c is (select trunc((level+2)/3) lvkey, chr(ASCII('A')+mod(level*2,3)) letter, round(dbms_random.value(1,100)) val 
               from dual 
               connect by level <=10
                );
  type tabdata_t is table of c%rowtype index by binary_integer;   
  tabdata tabdata_t;
  
  type keylookup_t is table of binary_integer index by varchar2(100);
  keylookup keylookup_t;
begin
  -- load data
  open c;
  fetch c 
  bulk collect into tabdata; -- notice no limit clause here. For larger record sets you need to use a loop and LIMIT!
  close c;
  
  -- index data
  <>
  for i in 1..tabdata.count loop
    keylookup(to_char(tabdata(i).lvKey)||':'||tabdata(i).letter) := i;
  end loop build_index;
  -- index is now complete. 
 
   -- Test the index first
  dbms_output.put_line('Index 1B=>'|| to_char(keylookup('1:B')));
  dbms_output.put_line('Index 3A=>'|| to_char(keylookup('3:A')));

  -- now fetch the data using the index
  dbms_output.put_line('Data 1B=>'|| tabdata(keylookup('1:B')).val);
  dbms_output.put_line('Data 3A=>'|| tabdata(keylookup('3:A')).val);
end;
/

Output

Index 1B=>2
Index 3A=>9
Data 1B=>32
Data 3A=>67

PL/SQL procedure successfully completed.

Way 2) collection of collection

A collection can be part of another collection. Nesting collections means we could do a lookup using two (or more) key values where each key is used for one collection.

lookup(key1) => col2
this returns a collection (e.g. col2). We can access the elements of the collection using the second key.

col2(key2) => element

or the short form:
lookup(key1)(key2) => element

For readability we could add a record layer in between, but that is not neccessary.
lookup(key1).list(key2) => element

Example
Notice the definition of the index collection is in line 14 with the matching type definitions in line 11 and 12.

set serveroutput on
declare
  cursor c is (select trunc((level+2)/3) lvkey, chr(ASCII('A')+mod(level*2,3)) letter, round(dbms_random.value(1,100)) val 
               from dual 
               connect by level <=10
                );
  type tabdata_t is table of c%rowtype index by binary_integer;   
  
  tabdata tabdata_t;
  
  type letterlookup_t is table of binary_integer index by varchar2(10);
  type keylookup_t is table of letterlookup_t index by binary_integer; -- number not allowed!
  
  keylookup keylookup_t;
  empty_key letterlookup_t;
begin
  -- load data
  open c;
  fetch c 
  bulk collect into tabdata; -- notice no limit clause here. For larger record sets you need to use a loop and LIMIT!
  close c;
  
  -- index data
  <>
  for i in 1..tabdata.count loop
    if keylookup.exists(tabdata(i).lvKey) then
      if keylookup(tabdata(i).lvKey).exists(tabdata(i).letter) then 
        -- same key twice?
        -- maybe add the values, maybe raise an error
        raise dup_val_on_index;
      else  
        dbms_output.put_line('build index KEY='||tabdata(i).lvKey||',+letter='||tabdata(i).letter);
        keylookup(tabdata(i).lvKey)(tabdata(i).letter) := i;
      end if;      
    else -- key not in index yet
      dbms_output.put_line('build index +KEY='||tabdata(i).lvKey||',+letter='||tabdata(i).letter);
      keylookup(tabdata(i).lvKey) := empty_key;
      keylookup(tabdata(i).lvKey)(tabdata(i).letter) := i;
    end if;      
      
  end loop build_index;
  -- index is now complete. 
 
  -- Lets access the data using some combinations of keyLv and letters
  -- Test the index first
  dbms_output.put_line('Index 1B=>'|| to_char(keylookup(1)('B')));
  dbms_output.put_line('Index 3A=>'|| to_char(keylookup(3)('A')));
  --dbms_output.put_line('1F='|| keylookup(1)('F')); -- this will raise NO_DATA_FOUND
  
  -- now fetch the data using the index
  dbms_output.put_line('Data 1B=>'|| tabdata(keylookup(1)('B')).val);
  dbms_output.put_line('Data 3A=>'|| tabdata(keylookup(3)('A')).val);
  
end;
/

Output
build index +KEY=1,+letter=C
build index KEY=1,+letter=B
build index KEY=1,+letter=A
build index +KEY=2,+letter=C
build index KEY=2,+letter=B
build index KEY=2,+letter=A
build index +KEY=3,+letter=C
build index KEY=3,+letter=B
build index KEY=3,+letter=A
build index +KEY=4,+letter=C
Index 1B=>2
Index 3A=>9
Data 1B=>62
Data 3A=>34

PL/SQL procedure successfully completed.

If we try to access a collection that does not exist we get an error (usually no_data_found). Since the index collections are always sparse, this is something to keep in mind. If you are not sure if the key combination is already indexed, then either check for existence or react on the NO_DATA_FOUND.

comparison

It is not easy to compare the two approaches. The first way looks slightly less complex. It depends also how familar other developers are with collections and especially with nested collections. For many the double parenthesis syntax “myCol()()” is a little complicated at the beginning.

For very complex scenarios the first way might be the better way. It depends on data distribution (the more sparsly populated the key combinations are, the better is this way) and on how many keys (=dimensions) we have.

I once measured the performance in a system where we needed 5 dimensions (5 different keys) to access some data. The combined key lookup was faster than building a complex collection of collection of collection of collection. But the additional time to concat the key values in the end also was a large performance burden.

So in my specific case
lookup('A:B:C:D:E') >>> lookup('A')('B')('C')('D')('E')

I do not think this performance experience is representativ.

conclusion

It is possible to use two keys as an index collection to lookup data in the main collection. Nested collections is a tool that every plsql developer should know about. Only when we know our tools, we can decide when to use them or when not.

6 thoughts on “plsql collection lookup with multiple keys

  1. Hi Sven,

    Thanks for this. One question: why an explicit cursor?

    FOR REC IN C LOOP

    END LOOP;

    Wouldn’t that work just as well, while letting the language open and close the implicit cursor automatically, and with no worries about the size of the returned collection?

    Best regards, Stew

    • Hi Stew,

      thanks for the question.

      For this example a simple SELECT … BULK COLLECT INTO would work too.

      A FOR loop would do an implicit bulk fetch, but here we need access to the collection afterwards. Using a cursor + fetch bulk collect we gain access to the resulting collection. The assumption is that we need to work with this collection later on.

      I also often use explicit cursors to be able to define record variables based upon the cursor. And then I can access values from those records even outside the for loop. In an when others exception handler for example.

      You are absolutely right in mentioning to consider the size of the collections. I have a comment inside the code to point this out, but I didn’t explain how to do that correctly. This mostly effects the “main” collection. That’s why I didn’t cover it in detail. My focus was on demonstrating how to create a slightly more sophisticated index collection.

      Regards
      Sven

  2. I don’t understand why you say you need to access the collection afterwards. What is wrong with the following code?

    declare
    cursor c is
    select level field1, level*2 field2, level*3 field3 from dual
    connect by level <= 9;
    l_prev_rec c%rowtype;
    type tt_numarray is table of number index by pls_integer;
    type ttt_numarray is table of tt_numarray index by pls_integer;
    ltt_numarray ttt_numarray;
    begin
    for rec in c loop
    if rec.field1 = l_prev_rec.field1 and rec.field2 = l_prev_rec.field2 then
    raise_application_error(-20001, 'Same values twice for field1 and field2');
    end if;
    ltt_numarray(rec.field1)(rec.field2) := rec.field3;
    l_prev_rec := rec;
    end loop;
    dbms_output.put_line(ltt_numarray(4)(8));
    end;
    /

  3. Hi Stew,

    the use for the index collection is to give us faster access to the main collection. It can be compared to an index and a table in the sql world.
    What you demonstrate is comparable to an index-organized table (IOT). This is a possible and totally legit scenario. So there is nothing wrong with the way you demonstrate. Especially if the index based access is the only access we want.

    I would be careful about performance when the original collection is large (in the sense of lot of data per element). Like the example only returns a simple value. Typically this would be a record. Which could hold even other collections.

    So this line duplicates data from the implicit collection (build by the FOR loop) into the target (IOT) collection:
    ltt_numarray(rec.field1)(rec.field2) := rec;

    Furthermore I like to structure the plsql code in independend steps.
    1) Load the data
    2) build the index
    3) manipulate the data
    4) store the manipulations

    It vastly improves maintainablility of the code – sometimes at cost of performance. Useful when we add things like LIMIT clauses and then need to work with the data chunks that are returned.

    regards
    Sven

  4. Hello Sven, thank you very much for your so interesting publications.

    One question, please. Can you please explain, what is the reason of your using
    type tabdata_t is table of c%rowtype index by binary_integer;
    why don’t use simple nested table for such cases
    type tabdata_t is table of c%rowtype;
    May be there are some valued reasons for this?

    • Hello Dmitri,

      thanks for your interest in my blog. You are asking why I use an associative array (table of index by) and not a nested table (table of).

      Both types of collections could have been used in the example. The main difference is how we want to loop through the elements of the collection.

      In general I loop differently through the collections.
      For an associative array I loop using

      for i in 1..collection.count loop
        doSomething;
      end loop;
      

      For an nested table I typically loop using

      i := collection.first
      loop 
        doSomething;
        exit when i = collection.last;
        i := collection.next(i);
      end loop;
      

      Note that this kind of loop is not really needed, because in the example the collection will always be dense.

      But this is also my main reason. If the collection is dense, then I use an associative array. If the collection can be sparse I use a nested table or an associative array indexed by varchar2 (index lookup collections).

      so the short answer is: personal preference for index by collections.

      regards
      Sven

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.