Scalable sequences – size check

Motivation

In recent times I speculate where and when to use scalable sequences. Especially if it makes sense to use them in general for sequence driven ID columns of larger tables. I know this sounds a bit like an early sign of CTD (compulsive tuning disorder), but at least I’m aware of it.

Scalable sequences offer a performance benefit in certain special situations (hot block, index contention). This post is not about those benefits.

I assume the reader has a basic understanding what a scalable sequence is. If not, check out this older post of mine: All about sequences .

A scalable sequence provides larger numeric values than a normal sequence does.

A normal sequence => 3
A scalable sequence => 10176700000003

Here I only analyze and test the effects of those bigger numbers, especially for indexes.

Side note: The first 3 digits of the scalable sequence are the instance number + 100. The second 3 digits are the session id. The remainder is the normal sequence value. Different sessions provide different numbers and by this the sequence numbers are not monotonously increasing anymore. At least not when looking at multiple sessions. Which also means inserts are done all over the index and not only on the right hand side of the index.

Bigger numbers need more storage

The number 3 needs 2 bytes, the number 10176700000003 needs 8 bytes of storage. Use the DUMP function to measure it. Values provided by scalable sequences might use up to 18 bytes of storage.

This effects multiple areas, table segment size but also indexes tend to be bigger. We can safely assume the sequences are used to provide values for ID columns which are almost always supported by an unique index. But there are also foreign key columns using those numbers and there the index will not be unique for similar ID values.

Because values are bigger less index references fit into a single index block – meaning we need more index blocks. Here I focus explictly on the size of indexes and not on the size of columns. I believe that a few additional bytes for a whole table row is usually not relevant. In case of an index however the whole row is that single value plus references (index branch block) or rowids (index leaf block).

For comparison reasons I also check the size of an reverse key index since they were used in the past to combat index contention issues. Scalable sequences are the new kid in town that should replace reverse key indexes.

So let’s measure it.

Testscript 1 – single session insert

First we create a table with 3 number columns. Then we create a normal and a scalable sequence that provide some values. We add two normal and a reverse key index on those 3 columns and populate the table with values originating from the sequences.

------------------------------------------------------------------------------- Scenario: Test index size
--------------------------------------------------------------------------------
-- create a table with two columns files with IDs
create table test_sequence_effects 
  (id_normal number, 
   id_scalable number,
   id_reverse number);

-- create a normal and a scalable sequence
create sequence test_sequence_normal;
create sequence test_sequence_scale scale;

-- create 3 indexes
create index test_sequence_normal_ix on test_sequence_effects(id_normal); 
create index test_sequence_scale_ix on test_sequence_effects(id_scalable);
create index test_sequence_reverse_ix on test_sequence_effects(id_reverse) reverse ;

-- insert 1 million datapoints
insert into test_sequence_effects 
select test_sequence_normal.nextval, test_sequence_scale.nextval, test_sequence_normal.nextval 
from dual connect by level <= 1000000;

commit;

-- create statistics
execute dbms_stats.gather_index_stats(user, 'TEST_SEQUENCE_NORMAL_IX',estimate_percent => 100);
execute dbms_stats.gather_index_stats(user, 'TEST_SEQUENCE_SCALE_IX',estimate_percent => 100);
execute dbms_stats.gather_index_stats(user, 'TEST_SEQUENCE_REVERSE_IX',estimate_percent => 100);

result after single session insert

We can have a look at the index statistics and the segment sizes.

-- check index stats
select index_name, blevel, leaf_blocks, distinct_keys, clustering_factor 
from dba_indexes 
where index_name like 'TEST_SEQUENCE_%';

INDEX_NAME	BLEVEL	LEAF_BLOCKS	DISTINCT_KEYS	CLUSTERING_FACTOR
TEST_SEQUENCE_NORMAL_IX	2	1999	1000000	4271
TEST_SEQUENCE_SCALE_IX	2	3379	1000000	4271
TEST_SEQUENCE_REVERSE_IX	2	2965	1000000	999996

Now all three indexes still have the same blevel, meaning for a single direct index access the performance should be identical. However when the number of entries increases eventually the ID values created by a scalable sequence might create a new level in the btree before the same happens for IDs from a normal sequence. I tested when this will happen. If you are curious check out the appendix section “blevel3 break point”, but try to guess the value before you check.

-- check index size
select segment_name, segment_type, segment_subtype, blocks, bytes/1024/1024 as size_in_MB
from user_segments
where segment_name like 'TEST_SEQUENCE_%';

SEGMENT_NAME	SEGMENT_TYPE	SEGMENT_SUBTYPE	BLOCKS	SIZE_IN_MB
TEST_SEQUENCE_EFFECTS	TABLE	ASSM	4352	34
TEST_SEQUENCE_NORMAL_IX	INDEX	ASSM	2176	17
TEST_SEQUENCE_REVERSE_IX	INDEX	ASSM	3072	24
TEST_SEQUENCE_SCALE_IX	INDEX	ASSM	3584	28

table and index size

The scaleable index in this experiment is 65% larger than the normal index. The cause for this increment can be attributed directly to the bigger numbers that needs to be stored.

We have to keep in mind that the test scenario was run in a single session. In reality multiple sessions will be used to run the insert(s). This is where scalable sequences should give a performance boost, even at the cost of potentially slightly higher storage consumption.

Testscript 2 – parallel inserts from multiple sessions

Ok now repeat the same experiment but using multiple sessions instead of a single session.

I created a code block that repeatetly does single row inserts in blocks of 10 and waits 1 second. This was run in up to 100 parallel sessions using dbms_scheduler jobs until approximately 1 million rows were inserted. The code for this test can be found in the appendix section “Testscript 2 – code”. Because of the 1 sec. pause this will run for some time.

Index stats after parallel inserts

INDEX_NAME	BLEVEL	LEAF_BLOCKS	DISTINCT_KEYS	CLUSTERING_FACTOR
TEST_SEQUENCE_NORMAL_IX	2	3649	1000198	943211
TEST_SEQUENCE_SCALE_IX	2	6577	1000198	32329
TEST_SEQUENCE_REVERSE_IX	2	3240	1000198	986633

The number of blocks increased for all indexes. But more importantly we see that the clustering factor for the normal index went through the roof and is now close to the number of rows in the table. Which is bad. It also shows that scalable sequences seem to keep the promise of avoiding index contention and hot blocks, even if this test was not set up to measure any performance implications.

If you want to know more about clustering factor, read appendix section “index clustering factor”.

Side note: There were a few more rows inserted than in the single session approach (0.02%). That was because how the stopping mechanism worked for the parallel jobs. It does not influence the comparison at all.

Index sizes after parallel inserts

SEGMENT_NAME	SEGMENT_TYPE	SEGMENT_SUBTYPE	BLOCKS	SIZE_IN_MB
TEST_SEQUENCE_EFFECTS	TABLE	ASSM	4480	35
TEST_SEQUENCE_NORMAL_IX	INDEX	ASSM	3840	30
TEST_SEQUENCE_REVERSE_IX	INDEX	ASSM	3328	26
TEST_SEQUENCE_SCALE_IX	INDEX	ASSM	6784	53

So the size of a normal index rose from 17 to 30MB (76% increase) and for the scaleable index from 28 to 53MB (83%). The reason for both increaments in size should be because now we see more 50-50 index block splits resulting in a considerable amount of blocks not being full and rarely being reused.

Overall the size didn’t explode, but it is still a noticable amount.

Side note: It is possible to shrink the indexes after such parallel insert sessions, but the effect is not great. Shrink and coalesce really should be considered when frequent deletes are run against the table. Scalable sequences potentially can provide values that might fill some of the gaps created by deletes. An index rebuild is not needed or advisable as long as more inserts are to come.

Conclusion

Values created by scalable sequences are using more storage space than values from normal sequences. For indexes we can expect a growth in index size between 50% and 100% compared to a normal index. The growth is partially because of larger numbers but also because there will be more not yet used space in many of the index blocks (caused by 50-50 index block splits instead of 90-10).

The clustering factor for scalable sequences always is better or equal than any alternative.

For all customers that I work with such an increment in storage size would not be any problem at all, especially if there is a performance improvement.

For me the storage size increase is not an argument against scalable sequences.

Appendix

Testscript 2 – code

--------------------------------------------------------------------------------
-- Scenario: Test index size after parallel inserts
--------------------------------------------------------------------------------
-- cleanup logic
drop table test_sequence_effects purge;
drop sequence test_sequence_normal;
drop sequence test_sequence_scale;

execute dbms_scheduler.drop_program('test_sequence_do_inserts');
begin
 for j in (select job_name from all_scheduler_jobs where job_name like 'TEST_SEQUENCE%') loop
     dbms_scheduler.drop_job(j.job_name);
 end loop;    
end;
/
--------------------------------------------------------------------------------
-- create a table with three columns to be filled with IDs
create table test_sequence_effects 
  (id_normal number, 
   id_scalable number,
   id_reverse number);

-- create a normal and a scalable sequence
create sequence test_sequence_normal;
create sequence test_sequence_scale scale;

-- create 3 indexes
create index test_sequence_normal_ix on test_sequence_effects(id_normal); 
create index test_sequence_scale_ix on test_sequence_effects(id_scalable);
create index test_sequence_reverse_ix on test_sequence_effects(id_reverse) reverse ;

-- insert 1 million datapoints
-- run as 100 parallel scheduled jobs 
-- each job inserts a batch of 10 IDs using single row inserts 
-- each job waits for 10 seconds before the next batch is run
-- each job stops after the max number for the normal ID is reached.

--create a program for dbms_scheduler to be used in jobs later
begin
    DBMS_SCHEDULER.create_program (
    program_name   => 'test_sequence_do_inserts',
    program_type   => 'PLSQL_BLOCK',
    program_action => q'[declare
  last_id number;
  max_id number := 1000000;
  cnt_id number := 0;
  e_stop exception;
  pragma exception_init (e_Stop, -20022);
begin 
  for j in 1..max_id/10 loop
    -- single insert for 10 rows 
    for i in 1..10 loop
        insert into test_sequence_effects 
        values (test_sequence_normal.nextval -- ID_NORMAL
             , test_sequence_scale.nextval  -- ID_SCALABLE
             , test_sequence_normal.nextval -- ID_REVERSE
             )
        returning id_normal into last_id;
        cnt_id := cnt_id + 1;
        if last_id >= max_id then 
          commit;
          raise e_stop;
        end if;   
    end loop;
    commit; 
    -- pause for 1 sec  
    dbms_session.sleep(1);
  end loop;  
exception
 when e_stop then
   -- finish gracefully
   dbms_output.put_line('This session inserted '||to_char(cnt_id)||' rows');
   commit;
end;
]',
    enabled        => TRUE,
    comments       => 'Insert 10 rows, 1 sec pause.');


   -- activate the program. This does not run the logic yet.
   DBMS_SCHEDULER.enable (name => 'test_sequence_do_inserts');

end;
/

-- create multiple jobs
begin 
  for j in 1..100 loop
     dbms_scheduler.create_job
     (job_name      => 'test_sequence_do_inserts_job'||to_char(j,'FM000'),
      program_name  => 'test_sequence_do_inserts',
      start_date => systimestamp,
      enabled => true);

     dbms_scheduler.run_job (job_name => 'test_sequence_do_inserts_job'||to_char(j,'FM000'),
                              use_current_session => false);
   end loop;
end;
/

After executing this script the scheduled job runs for quite some time.

We can check the number of rows in the table to see if it is finished.

select max(id_normal) last_value,count(*) row_count from test_sequence_effects;

other interesting selects to monitor the scheduled jobs

-- check scheduler jobs
select * from all_scheduler_programs;
select * from all_scheduler_jobs where job_name like 'TEST_SEQUENCE%';
select * from all_scheduler_running_jobs where job_name like 'TEST_SEQUENCE%';
select * from all_scheduler_job_log where job_name like 'TEST_SEQUENCE%';
select * from all_scheduler_job_run_details where job_name like 'TEST_SEQUENCE%';

And of cause the statistics need to be gathered

-- compute statistics for all indexes
execute dbms_stats.gather_index_stats(user, 'TEST_SEQUENCE_NORMAL_IX',estimate_percent => 100);
execute dbms_stats.gather_index_stats(user, 'TEST_SEQUENCE_SCALE_IX',estimate_percent => 100);
execute dbms_stats.gather_index_stats(user, 'TEST_SEQUENCE_REVERSE_IX',estimate_percent => 100);

Index clustering factor

What is the clustering factor of an index? Essentially it tells us how good the index and the table data match. How many table blocks need to be read if each block is accessed via the index? The value itself needs to be compared with the number of blocks (min value) and rows (max value) in the table.

Here are some links that might improve your understanding of the clustering factor:

http://dbaparadise.com/2018/05/what-is-index-clustering-factor/

Ask Tom: https://asktom.oracle.com/pls/apex/f?p=100:11:0::::P11_QUESTION_ID:1032431852141

Demystifying Oracle’s Clustering Factor

David Fitzjarrell in detail about the clustering factor and Oracles calculation of it

Blevel 3 break point

I run some further tests to see when the index based upon a scalable sequence will reach blevel3.

For a single session insert, blevel 3 was reached after approximately 38 million inserts.

INDEX_NAME                  BLEVEL LEAF_BLOCKS DISTINCT_KEYS CLUSTERING_FACTOR
--------------------------- ------ ----------- ------------- -----------------
TEST_SEQUENCE_NORMAL_IX          2       81148      38200000            173362
TEST_SEQUENCE_REVERSE_IX         2      125912      38200000          38199995
TEST_SEQUENCE_SCALE_IX           3      129055      38200000            173362

For multiple sessions the same amount of blocks/leaf blocks should be reached earlier. However the distribution would be considerably more even. Both effects will partially cancel each other out. So far I didn’t measure it.

	svenweller on ORDS 22.x – How to setup…
	Alexander on ORDS 22.x – How to setup…
	svenweller on ORDS 22.x – How to setup…
	svenweller on dbms_stats quickie: show globa…
	svenweller on ORDS 22.x – How to setup…

svenweller

Notes about Oracle SQL, PLSQL and APEX

Scalable sequences – size check

Motivation

Bigger numbers need more storage

Testscript 1 – single session insert

result after single session insert

Testscript 2 – parallel inserts from multiple sessions

Conclusion

Appendix

Testscript 2 – code

Index clustering factor

Blevel 3 break point

Leave a comment Cancel reply

Motivation

Bigger numbers need more storage

Testscript 1 – single session insert

result after single session insert

Testscript 2 – parallel inserts from multiple sessions

Conclusion

Appendix

Testscript 2 – code

Index clustering factor

Blevel 3 break point

Share this:

Related

Leave a comment Cancel reply