SQL Quickie: Column usage

A question directed at me recently was if we can somehow monitor how a table is accessed useing which where conditions. The goal was to decide about index creation for such a table.

Fortunatly Oracle already collects some relevant information for this.

The sys.col_usage$ table is used by the dbms_stats package to decide which columns might profit by adding histograms. If we have DBA privs, we can access this information:

-- column usage 
select o.owner, o.object_name,  c.column_name,
       u.*
from dba_objects o
join dba_tab_columns c on c.owner = o.owner and c.table_name = o.object_name 
left join SYS.COL_USAGE$ u  on o.object_id = u.obj# and u.intcol# = c.column_id
where o.object_type = 'TABLE'
and c.owner = 'LARRY' -- enter appropriate schema here
and c.table_name = 'CLOUDNO9' -- enter appropriate table here
order by c.table_name, c.column_id;

And this is how the result could look like

OWNER	OBJECT_NAME	COLUMN_NAME	OBJ#	INTCOL#	EQUALITY_PREDS	EQUIJOIN_PREDS	NONEQUIJOIN_PREDS	RANGE_PREDS	LIKE_PREDS	NULL_PREDS	TIMESTAMP
LARRY	CLOUDNO9	COL1		68200	1	725	5	0	0	1	0	10.03.2016 15:42
LARRY	CLOUDNO9	COL2									
LARRY	CLOUDNO9	COL3									
LARRY	CLOUDNO9	COL4									
LARRY	CLOUDNO9	COL5									
LARRY	CLOUDNO9	COL6		68200	6	0	1	0	0	0	0	19.04.2012 13:33
LARRY	CLOUDNO9	COL7		68200	7	0	1	0	0	0	0	19.04.2012 13:33
LARRY	CLOUDNO9	COL8		68200	8	0	1	0	0	0	2	25.06.2014 10:28
LARRY	CLOUDNO9	COL9		68200	9	326	3	0	0	1	0	24.02.2016 09:13
LARRY	CLOUDNO9	COL10		68200	10	316	1	0	0	0	0	24.02.2016 09:13
LARRY	CLOUDNO9	COL11		68200	11	315	1	0	0	0	0	24.02.2016 09:13
LARRY	CLOUDNO9	ID		68200	12	891	1674	0	0	0	0	10.03.2016 17:57
LARRY	CLOUDNO9	FK1_ID		68200	13	2236	3	1	0	0	0	10.03.2016 19:43
LARRY	CLOUDNO9	FK2_ID		68200	14	1	508	0	0	0	3	10.03.2016 17:57
LARRY	CLOUDNO9	CHANGED									
LARRY	CLOUDNO9	CHANGED_BY									
LARRY	CLOUDNO9	COL12		68200	17	0	161	0	0	0	161	10.03.2016 14:57

The different columns are statistic counts with some obvious meaning

  • EQUALITY_PREDS = equality predicates
    for example a where condition as this

    col1 = 'ABC'
  • EQUIJOIN_PREDS = column used by an equi join
    for example a join as this

    ...from a join b on a.id = b.a_id

    IN / NOT IN comparisons often also count as equi joins

    ...from a 
    where a.id not in (select b.a_id from b)
  • NONEQUIJOIN_PREDS = column used by a non equi join
    for example a join as this

    ...from a join b on a.col1 between b.col1 and b.col2
  • RANGE_PREDS = column used by a range comparison
    for example a where condition as this

    where a.col_date >= trunc(sysdate)-5
  • LIKE_PREDS = column used by a like comparison
    for example a where condition as this

    where a.col1 like 'AMORE%'

    Any kind of like expression counts.

  • NULL_PREDS = column compared with IS NULL
    for example a where condition as this

    where a.col1 is null

    IS NOT NULL is not tracked. And only for columns that are optional.

  • TIMESTAMP = tells us when this column was last used using one of those methods

Doing such statements the information is first stored and collected somewhere in the SGA. We can force it into this table by flushing it using

execute DBMS_STATS.FLUSH_DATABASE_MONITORING_INFO;

Recommendation

We can add a simple index check to the previous select.
This helps to quickly identify columns that might profit from an index.
Obvious candidates would be columns where no index exists but with a high count of EQUALITY_PREDS+EQUIJOIN_PREDS, but also RANGE_PREDS.

-- column usage + index check
select o.owner, o.object_name,  c.column_name,
       case when exists (select null 
                          from dba_ind_columns i 
                          where i.table_owner = c.owner 
                          and i.table_name = c.table_name 
                          and i.column_name = c.column_name) 
        then 'YES' else 'NO' 
       end  as index_exists,
       u.*
from dba_objects o
join dba_tab_columns c on c.owner = o.owner and c.table_name = o.object_name 
left join SYS.COL_USAGE$ u  on o.object_id = u.obj# and u.intcol# = c.column_id
where o.object_type = 'TABLE'
and c.owner = 'LARRY' 
and c.table_name = 'CLOUDNO9'
order by c.table_name, c.column_id;

And if you want to quickly scan your whole schema try this version.

-- column usage without index for the whole schema 
select * 
from (
  -- column usage + index check
  select o.owner, c.table_name,  c.column_name, 
         case when exists (select null 
                            from dba_ind_columns i 
                            where i.table_owner = c.owner 
                            and i.table_name = c.table_name 
                            and i.column_name = c.column_name) 
          then 'YES' else 'NO' 
         end  as index_exists,
         c.num_distinct, c.nullable, 
         u.*
  from dba_objects o
  join dba_tab_columns c on c.owner = o.owner and c.table_name = o.object_name 
  left join SYS.COL_USAGE$ u  on o.object_id = u.obj# and u.intcol# = c.column_id
  where o.object_type = 'TABLE'
  and c.owner = 'LARRY' 
  --and c.table_name = 'XXX'
  )
where index_exists = 'NO' 
and equality_preds+equijoin_preds+range_preds > 100 
and equality_preds+equijoin_preds+range_preds > like_preds+null_preds
and num_distinct > 1
order by equality_preds+equijoin_preds+range_preds desc, table_name, intcol#;

References

Maria Colgan mentioned this table in her blog about column histograms:
https://blogs.oracle.com/optimizer/entry/how_does_the_method_opt

Ulrike Schwinn wrote a nice article for the German community about table and column monitoring:
https://apex.oracle.com/pls/apex/germancommunities/dbacommunity/tipp/1261/index.html