Scalable sequences – size check

Motivation

In recent times I speculate where and when to use scalable sequences. Especially if it makes sense to use them in general for sequence driven ID columns of larger tables. I know this sounds a bit like an early sign of CTD (compulsive tuning disorder), but at least I’m aware of it.

Scalable sequences offer a performance benefit in certain special situations (hot block, index contention). This post is not about those benefits.

I assume the reader has a basic understanding what a scalable sequence is. If not, check out this older post of mine: All about sequences .

A scalable sequence provides larger numeric values than a normal sequence does.

A normal sequence => 3
A scalable sequence => 10176700000003

Here I only analyze and test the effects of those bigger numbers, especially for indexes.

Side note: The first 3 digits of the scalable sequence are the instance number + 100. The second 3 digits are the session id. The remainder is the normal sequence value. Different sessions provide different numbers and by this the sequence numbers are not monotonously increasing anymore. At least not when looking at multiple sessions. Which also means inserts are done all over the index and not only on the right hand side of the index.

Bigger numbers need more storage

The number 3 needs 2 bytes, the number 10176700000003 needs 8 bytes of storage. Use the DUMP function to measure it. Values provided by scalable sequences might use up to 18 bytes of storage.

This effects multiple areas, table segment size but also indexes tend to be bigger. We can safely assume the sequences are used to provide values for ID columns which are almost always supported by an unique index. But there are also foreign key columns using those numbers and there the index will not be unique for similar ID values.

Because values are bigger less index references fit into a single index block – meaning we need more index blocks. Here I focus explictly on the size of indexes and not on the size of columns. I believe that a few additional bytes for a whole table row is usually not relevant. In case of an index however the whole row is that single value plus references (index branch block) or rowids (index leaf block).

For comparison reasons I also check the size of an reverse key index since they were used in the past to combat index contention issues. Scalable sequences are the new kid in town that should replace reverse key indexes.

So let’s measure it.

Testscript 1 – single session insert

First we create a table with 3 number columns. Then we create a normal and a scalable sequence that provide some values. We add two normal and a reverse key index on those 3 columns and populate the table with values originating from the sequences.

------------------------------------------------------------------------------- Scenario: Test index size
--------------------------------------------------------------------------------
-- create a table with two columns files with IDs
create table test_sequence_effects 
  (id_normal number, 
   id_scalable number,
   id_reverse number);

-- create a normal and a scalable sequence
create sequence test_sequence_normal;
create sequence test_sequence_scale scale;

-- create 3 indexes
create index test_sequence_normal_ix on test_sequence_effects(id_normal); 
create index test_sequence_scale_ix on test_sequence_effects(id_scalable);
create index test_sequence_reverse_ix on test_sequence_effects(id_reverse) reverse ;

-- insert 1 million datapoints
insert into test_sequence_effects 
select test_sequence_normal.nextval, test_sequence_scale.nextval, test_sequence_normal.nextval 
from dual connect by level <= 1000000;

commit;

-- create statistics
execute dbms_stats.gather_index_stats(user, 'TEST_SEQUENCE_NORMAL_IX',estimate_percent => 100);
execute dbms_stats.gather_index_stats(user, 'TEST_SEQUENCE_SCALE_IX',estimate_percent => 100);
execute dbms_stats.gather_index_stats(user, 'TEST_SEQUENCE_REVERSE_IX',estimate_percent => 100);

result after single session insert

We can have a look at the index statistics and the segment sizes.

-- check index stats
select index_name, blevel, leaf_blocks, distinct_keys, clustering_factor 
from dba_indexes 
where index_name like 'TEST_SEQUENCE_%'; 
INDEX_NAMEBLEVELLEAF_BLOCKSDISTINCT_KEYSCLUSTERING_FACTOR
TEST_SEQUENCE_NORMAL_IX2199910000004271
TEST_SEQUENCE_SCALE_IX2337910000004271
TEST_SEQUENCE_REVERSE_IX 229651000000999996

Now all three indexes still have the same blevel, meaning for a single direct index access the performance should be identical. However when the number of entries increases eventually the ID values created by a scalable sequence might create a new level in the btree before the same happens for IDs from a normal sequence. I tested when this will happen. If you are curious check out the appendix section “blevel3 break point”, but try to guess the value before you check.

-- check index size
select segment_name, segment_type, segment_subtype, blocks, bytes/1024/1024 as size_in_MB
from user_segments
where segment_name like 'TEST_SEQUENCE_%';
SEGMENT_NAMESEGMENT_TYPESEGMENT_SUBTYPEBLOCKSSIZE_IN_MB
TEST_SEQUENCE_EFFECTS TABLE ASSM 4352 34
TEST_SEQUENCE_NORMAL_IX INDEX ASSM 2176 17
TEST_SEQUENCE_REVERSE_IX INDEX ASSM 3072 24
TEST_SEQUENCE_SCALE_IX INDEX ASSM 3584 28
table and index size

The scaleable index in this experiment is 65% larger than the normal index. The cause for this increment can be attributed directly to the bigger numbers that needs to be stored.

We have to keep in mind that the test scenario was run in a single session. In reality multiple sessions will be used to run the insert(s). This is where scalable sequences should give a performance boost, even at the cost of potentially slightly higher storage consumption.

Testscript 2 – parallel inserts from multiple sessions

Ok now repeat the same experiment but using multiple sessions instead of a single session.

I created a code block that repeatetly does single row inserts in blocks of 10 and waits 1 second. This was run in up to 100 parallel sessions using dbms_scheduler jobs until approximately 1 million rows were inserted. The code for this test can be found in the appendix section “Testscript 2 – code”. Because of the 1 sec. pause this will run for some time.

Index stats after parallel inserts

INDEX_NAMEBLEVELLEAF_BLOCKSDISTINCT_KEYSCLUSTERING_FACTOR
TEST_SEQUENCE_NORMAL_IX23649 1000198 943211
TEST_SEQUENCE_SCALE_IX26577 1000198 32329
TEST_SEQUENCE_REVERSE_IX23240 1000198 986633

The number of blocks increased for all indexes. But more importantly we see that the clustering factor for the normal index went through the roof and is now close to the number of rows in the table. Which is bad. It also shows that scalable sequences seem to keep the promise of avoiding index contention and hot blocks, even if this test was not set up to measure any performance implications.

If you want to know more about clustering factor, read appendix section “index clustering factor”.

Side note: There were a few more rows inserted than in the single session approach (0.02%). That was because how the stopping mechanism worked for the parallel jobs. It does not influence the comparison at all.

Index sizes after parallel inserts

SEGMENT_NAMESEGMENT_TYPESEGMENT_SUBTYPEBLOCKSSIZE_IN_MB
TEST_SEQUENCE_EFFECTSTABLEASSM4480 35
TEST_SEQUENCE_NORMAL_IXINDEXASSM3840 30
TEST_SEQUENCE_REVERSE_IXINDEXASSM3328 26
TEST_SEQUENCE_SCALE_IXINDEXASSM6784 53

So the size of a normal index rose from 17 to 30MB (76% increase) and for the scaleable index from 28 to 53MB (83%). The reason for both increaments in size should be because now we see more 50-50 index block splits resulting in a considerable amount of blocks not being full and rarely being reused.

Overall the size didn’t explode, but it is still a noticable amount.

Side note: It is possible to shrink the indexes after such parallel insert sessions, but the effect is not great. Shrink and coalesce really should be considered when frequent deletes are run against the table. Scalable sequences potentially can provide values that might fill some of the gaps created by deletes. An index rebuild is not needed or advisable as long as more inserts are to come.

Conclusion

Values created by scalable sequences are using more storage space than values from normal sequences. For indexes we can expect a growth in index size between 50% and 100% compared to a normal index. The growth is partially because of larger numbers but also because there will be more not yet used space in many of the index blocks (caused by 50-50 index block splits instead of 90-10).

The clustering factor for scalable sequences always is better or equal than any alternative.

For all customers that I work with such an increment in storage size would not be any problem at all, especially if there is a performance improvement.

For me the storage size increase is not an argument against scalable sequences.

Appendix

Testscript 2 – code

--------------------------------------------------------------------------------
-- Scenario: Test index size after parallel inserts
--------------------------------------------------------------------------------
-- cleanup logic
drop table test_sequence_effects purge;
drop sequence test_sequence_normal;
drop sequence test_sequence_scale;

execute dbms_scheduler.drop_program('test_sequence_do_inserts');
begin
 for j in (select job_name from all_scheduler_jobs where job_name like 'TEST_SEQUENCE%') loop
     dbms_scheduler.drop_job(j.job_name);
 end loop;    
end;
/
--------------------------------------------------------------------------------
-- create a table with three columns to be filled with IDs
create table test_sequence_effects 
  (id_normal number, 
   id_scalable number,
   id_reverse number);

-- create a normal and a scalable sequence
create sequence test_sequence_normal;
create sequence test_sequence_scale scale;

-- create 3 indexes
create index test_sequence_normal_ix on test_sequence_effects(id_normal); 
create index test_sequence_scale_ix on test_sequence_effects(id_scalable);
create index test_sequence_reverse_ix on test_sequence_effects(id_reverse) reverse ;

-- insert 1 million datapoints
-- run as 100 parallel scheduled jobs 
-- each job inserts a batch of 10 IDs using single row inserts 
-- each job waits for 10 seconds before the next batch is run
-- each job stops after the max number for the normal ID is reached.

--create a program for dbms_scheduler to be used in jobs later
begin
    DBMS_SCHEDULER.create_program (
    program_name   => 'test_sequence_do_inserts',
    program_type   => 'PLSQL_BLOCK',
    program_action => q'[declare
  last_id number;
  max_id number := 1000000;
  cnt_id number := 0;
  e_stop exception;
  pragma exception_init (e_Stop, -20022);
begin 
  for j in 1..max_id/10 loop
    -- single insert for 10 rows 
    for i in 1..10 loop
        insert into test_sequence_effects 
        values (test_sequence_normal.nextval -- ID_NORMAL
             , test_sequence_scale.nextval  -- ID_SCALABLE
             , test_sequence_normal.nextval -- ID_REVERSE
             )
        returning id_normal into last_id;
        cnt_id := cnt_id + 1;
        if last_id >= max_id then 
          commit;
          raise e_stop;
        end if;   
    end loop;
    commit; 
    -- pause for 1 sec  
    dbms_session.sleep(1);
  end loop;  
exception
 when e_stop then
   -- finish gracefully
   dbms_output.put_line('This session inserted '||to_char(cnt_id)||' rows');
   commit;
end;
]',
    enabled        => TRUE,
    comments       => 'Insert 10 rows, 1 sec pause.');


   -- activate the program. This does not run the logic yet.
   DBMS_SCHEDULER.enable (name => 'test_sequence_do_inserts');

end;
/

-- create multiple jobs
begin 
  for j in 1..100 loop
     dbms_scheduler.create_job
     (job_name      => 'test_sequence_do_inserts_job'||to_char(j,'FM000'),
      program_name  => 'test_sequence_do_inserts',
      start_date => systimestamp,
      enabled => true);

     dbms_scheduler.run_job (job_name => 'test_sequence_do_inserts_job'||to_char(j,'FM000'),
                              use_current_session => false);
   end loop;
end;
/



After executing this script the scheduled job runs for quite some time.

We can check the number of rows in the table to see if it is finished.

select max(id_normal) last_value,count(*) row_count from test_sequence_effects;

other interesting selects to monitor the scheduled jobs

-- check scheduler jobs
select * from all_scheduler_programs;
select * from all_scheduler_jobs where job_name like 'TEST_SEQUENCE%';
select * from all_scheduler_running_jobs where job_name like 'TEST_SEQUENCE%';
select * from all_scheduler_job_log where job_name like 'TEST_SEQUENCE%';
select * from all_scheduler_job_run_details where job_name like 'TEST_SEQUENCE%';

And of cause the statistics need to be gathered

-- compute statistics for all indexes
execute dbms_stats.gather_index_stats(user, 'TEST_SEQUENCE_NORMAL_IX',estimate_percent => 100);
execute dbms_stats.gather_index_stats(user, 'TEST_SEQUENCE_SCALE_IX',estimate_percent => 100);
execute dbms_stats.gather_index_stats(user, 'TEST_SEQUENCE_REVERSE_IX',estimate_percent => 100);

Index clustering factor

What is the clustering factor of an index? Essentially it tells us how good the index and the table data match. How many table blocks need to be read if each block is accessed via the index? The value itself needs to be compared with the number of blocks (min value) and rows (max value) in the table.

Here are some links that might improve your understanding of the clustering factor:

http://dbaparadise.com/2018/05/what-is-index-clustering-factor/

Ask Tom: https://asktom.oracle.com/pls/apex/f?p=100:11:0::::P11_QUESTION_ID:1032431852141

David Fitzjarrell in detail about the clustering factor and Oracles calculation of it

Blevel 3 break point

I run some further tests to see when the index based upon a scalable sequence will reach blevel3.

For a single session insert, blevel 3 was reached after approximately 38 million inserts.

INDEX_NAME                  BLEVEL LEAF_BLOCKS DISTINCT_KEYS CLUSTERING_FACTOR
--------------------------- ------ ----------- ------------- -----------------
TEST_SEQUENCE_NORMAL_IX          2       81148      38200000            173362
TEST_SEQUENCE_REVERSE_IX         2      125912      38200000          38199995
TEST_SEQUENCE_SCALE_IX           3      129055      38200000            173362

For multiple sessions the same amount of blocks/leaf blocks should be reached earlier. However the distribution would be considerably more even. Both effects will partially cancel each other out. So far I didn’t measure it.