ODA Quickie – How to solve “Validate kernel log level – Failed” problem during ODA patching

During an ODA X7-2S upgrade from 19.9 to 19.13 we encountered the following issue. The prepatch report mentioned that the check “Validate kernel log level” failed with the message

OS kernel log level is set to debug, this may result in a failure when patching Clusterware
If kernel OS log level is more than KERN_ERR(3) then GI patching may fail

This problem also seems to exist in versions 19.10+ . It is a problem that can not be ignored. Trying to update the server anyways will lead to an error.

Here is an example how such a prepatch report might look like

Patch pre-check report
------------------------------------------------------------------------
                 Job ID:  d94c910d-5ed5-4b02-9c65-9e525c176817
            Description:  Patch pre-checks for [OS, ILOM, GI, ORACHKSERVER]
                 Status:  FAILED
                Created:  March 14, 2023 11:52:10 AM CET
                 Result:  One or more pre-checks failed for [GI]


Node Name
---------------
ODA01

Pre-Check                      Status   Comments
------------------------------ -------- --------------------------------------
__OS__
Validate supported versions     Success   Validated minimum supported versions.
Validate patching tag           Success   Validated patching tag: 19.13.0.0.0.
Is patch location available     Success   Patch location is available.
Verify OS patch                 Success   Verified OS patch
Validate command execution      Success   Validated command execution

__ILOM__
Validate supported versions     Success   Validated minimum supported versions.
Validate patching tag           Success   Validated patching tag: 19.13.0.0.0.
Is patch location available     Success   Patch location is available.
Checking Ilom patch Version     Success   Successfully verified the versions
Patch location validation       Success   Successfully validated location
Validate command execution      Success   Validated command execution

__GI__
Validate GI metadata            Success   Successfully validated GI metadata
Validate supported GI versions  Success   Validated minimum supported versions.
Validate available space        Success   Validated free space under /u01
Is clusterware running          Success   Clusterware is running
Validate patching tag           Success   Validated patching tag: 19.13.0.0.0.
Is system provisioned           Success   Verified system is provisioned
Validate ASM in online          Success   ASM is online
Validate kernel log level       Failed    OS kernel log level is set to debug,
                                          this may result in a failure when
                                          patching Clusterware If kernel OS log
                                          level is more than KERN_ERR(3) then
                                          GI patching may fail

Validate minimum agent version  Success   GI patching enabled in current
                                          DCSAGENT version
Validate Central Inventory      Success   oraInventory validation passed
Validate patching locks         Success   Validated patching locks
Validate clones location exist  Success   Validated clones location
Validate DB start dependencies  Success   DBs START dependency check passed
Validate DB stop dependencies   Success   DBs STOP dependency check passed
Evaluate GI patching            Success   Successfully validated GI patching
Validate command execution      Success   Validated command execution

__ORACHK__
Running orachk                  Success   Successfully ran Orachk
Validate command execution      Success   Validated command execution

You can check the setting of the kernel log level like this

[root@ODA01 ~]# cat /proc/sys/kernel/printk

10      4       1       7

The first entry “10” means the loglevel is set to debug. It should be set to “3” (=error).

However changing the /proc/sys/kernel/printk file is not the correct way to solve the issue.

One must edit the file /etc/default/grub and to remove the “debug” entry there.

Then run

grub2-mkconfig -o /boot/efi/EFI/redhat/grub.cfg

Grub is the “grand unified bootloader” that activates when the ODA ( or a VM ) is started. The above command takes the default config and applies it to all relevant vm configurations.

On some ODAs (I believe older ODA X6-2) you might need to apply the change to a different configuration file. On our ODA X7-2S this file was not existent so we did not change it.

grub2-mkconfig -o /boot/grub2/grub.cfg

After this the server needs to be restarted, so that the new setting is applied.

And here is a link to a MOSC thread that helped to solve the issue.
https://community.oracle.com/mosc/discussion/comment/16906486#Comment_16906486

I hope this saves you some time, in case you encounter the same problem.

ODA Quickie – How to solve ODABR Error: Dirty bit is set.

The problem

A little while ago during an ODA X7-2S upgrade from 19.6 to 19.9 the following error was encountered.

SUCCESS: 2021-06-04 10:02:05: ...EFI device backup saved as '/opt/odabr/out/hbi/efi.img'
INFO: 2021-06-04 10:02:05: ...step3 - checking EFI device backup
ERROR: 2021-06-04 10:02:05: Error running fsck over /opt/odabr/out/hbi/efi.img
ERROR: 2021-06-04 10:02:05: Command: 'fsck -a /opt/odabr/out/hbi/efi.img' failed as fsck from util-linux 2.23.2 fsck.fat 3.0.20 (12 Jun 2013) 0x25: Dirty bit is set. Fs was not properly unmounted and some data may be corrupt.  Automatically removing dirty bit. Performing changes. /opt/odabr/out/hbi/efi.img: 23 files, 1245/63965 clusters
INFO: 2021-06-04 10:02:05: Mounting EFI back
ERROR: 2021-06-04 10:02:06: Backup not completed, exiting...

This seems to be a known issue for Bare Metal ODAs. But the way to solve the problem is poorly documented.

The mos notes

The Oracle ODABR support document mentions the problem twice and gives slightly different solutions.

Check the “ODABR – Use Case” and the “known issues section”.

https://support.oracle.com/epmos/faces/DocumentDisplay?id=2466177.1

The document also mentions Internal Bug 31435951 ODABR FAILS IN FSCK WITH “DIRTY BIT IS SET”.

From the public ODABR document

This is not an ODABR issue. ODABR is signalling a fsck error because your (in this case) efi partition is not in expected status… 
To fix this:

unmount efi
fsck.vfat -v -a -w <efidevice>
mount efi

Unfortunatly the workaround is a bit vague and hard to understand. The efi partition is mounted as /boot/efi . The “efi device” is not the same as the mount point but can be gathered from that.


Here are the exact commands that helped me to solve the issue.

The solution

First check your filesystem (the output was taken after we repaired the issue) – your mileage may vary.

[root@ODA01 odabr]# df -h
Filesystem                          Size  Used Avail Use% Mounted on
devtmpfs                             94G   24K   94G   1% /dev
tmpfs                                94G  1.4G   93G   2% /dev/shm
tmpfs                                94G  4.0G   90G   5% /run
tmpfs                                94G     0   94G   0% /sys/fs/cgroup
/dev/mapper/VolGroupSys-LogVolRoot   30G   11G   17G  40% /
/dev/mapper/VolGroupSys-LogVolU01   148G   92G   49G  66% /u01
/dev/mapper/VolGroupSys-LogVolOpt    59G   43G   14G  77% /opt
tmpfs                                19G     0   19G   0% /run/user/1001
tmpfs                                19G     0   19G   0% /run/user/0
/dev/asm/commonstore-13             5.0G  367M  4.7G   8% /opt/oracle/dcs/commonstore
/dev/asm/reco-215                   497G  260G  238G  53% /u03/app/oracle
/dev/asm/datredacted-13             100G   28G   73G  28% /u02/app/oracle/oradata/redacted
/dev/asm/datredacted2-13            100G   74G   27G  74% /u02/app/oracle/oradata/redacted2
/dev/md0                            477M  208M  244M  47% /boot
/dev/sda1                           500M  9.8M  490M   2% /boot/efi

This shows us the “efi device” is /dev/sda1

Then we did the steps as described in the documentation:

[root@ODA01 odabr]# umount /boot/efi

[root@ODA01 odabr]# fsck.vfat -v -a -w /dev/sda1
fsck.fat 3.0.20 (12 Jun 2013)
fsck.fat 3.0.20 (12 Jun 2013)
Checking we can access the last sector of the filesystem
0x25: Dirty bit is set. Fs was not properly unmounted and some data may be corrupt.
 Automatically removing dirty bit.
Boot sector contents:
System ID "mkdosfs"
Media byte 0xf8 (hard disk)
       512 bytes per logical sector
      8192 bytes per cluster
        16 reserved sectors
First FAT starts at byte 8192 (sector 16)
         2 FATs, 16 bit entries
    131072 bytes per FAT (= 256 sectors)
Root directory starts at byte 270336 (sector 528)
       512 root directory entries
Data area starts at byte 286720 (sector 560)
     63965 data clusters (524001280 bytes)
63 sectors/track, 255 heads
         0 hidden sectors
   1024000 sectors total
Reclaiming unconnected clusters.
Performing changes.
/dev/sda1: 23 files, 1245/63965 clusters

[root@ODA01 odabr]# mount /boot/efi

After this, we could sucessfully create an ODABR snapshot

[root@ODA01 odabr]# ./odabr backup -snap -osize 50 -usize 80
INFO: 2021-06-04 12:14:49: Please check the logfile '/opt/odabr/out/log/odabr_87615.log' for more details


│▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒│
 odabr - ODA node Backup Restore - Version: 2.0.1-62
 Copyright Oracle, Inc. 2013, 2020
 --------------------------------------------------------
 Author: Ruggero Citton <ruggero.citton@oracle.com>
 RAC Pack, Cloud Innovation and Solution Engineering Team
│▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒│

INFO: 2021-06-04 12:14:49: Checking superuser
INFO: 2021-06-04 12:14:49: Checking Bare Metal
INFO: 2021-06-04 12:14:49: Removing existing LVM snapshots
WARNING: 2021-06-04 12:14:49: LVM snapshot for 'opt' does not exist
WARNING: 2021-06-04 12:14:49: LVM snapshot for 'u01' does not exist
WARNING: 2021-06-04 12:14:49: LVM snapshot for 'root' does not exist
INFO: 2021-06-04 12:14:49: Checking LVM size
INFO: 2021-06-04 12:14:49: Boot device backup
INFO: 2021-06-04 12:14:49: Getting EFI device
INFO: 2021-06-04 12:14:49: ...step1 - unmounting EFI
INFO: 2021-06-04 12:14:50: ...step2 - making efi device backup
SUCCESS: 2021-06-04 12:14:54: ...EFI device backup saved as '/opt/odabr/out/hbi/efi.img'
INFO: 2021-06-04 12:14:54: ...step3 - checking EFI device backup
INFO: 2021-06-04 12:14:54: Getting boot device
INFO: 2021-06-04 12:14:54: ...step1 - making boot device backup using tar
SUCCESS: 2021-06-04 12:15:05: ...boot content saved as '/opt/odabr/out/hbi/boot.tar.gz'
INFO: 2021-06-04 12:15:05: ...step2 - unmounting boot
INFO: 2021-06-04 12:15:05: ...step3 - making boot device backup using dd
SUCCESS: 2021-06-04 12:15:10: ...boot device backup saved as '/opt/odabr/out/hbi/boot.img'
INFO: 2021-06-04 12:15:10: ...step4 - mounting boot
INFO: 2021-06-04 12:15:10: ...step5 - mounting EFI
INFO: 2021-06-04 12:15:11: ...step6 - checking boot device backup
INFO: 2021-06-04 12:15:12: OCR backup
INFO: 2021-06-04 12:15:13: ...ocr backup saved as '/opt/odabr/out/hbi/ocrbackup_87615.bck'
INFO: 2021-06-04 12:15:13: Making LVM snapshot backup
SUCCESS: 2021-06-04 12:15:13: ...snapshot backup for 'opt' created successfully
SUCCESS: 2021-06-04 12:15:15: ...snapshot backup for 'u01' created successfully
SUCCESS: 2021-06-04 12:15:15: ...snapshot backup for 'root' created successfully
SUCCESS: 2021-06-04 12:15:15: LVM snapshots backup done successfully

Side note: We used smaller backup sizes, to circumvent issues with not having enough space for the snapshot, although there was enough space. But this was not connected to the “dirty bit” issue.

I hope this helps others to troubleshoot their ODA.