Posted On: July 24, 2018
Topic: ERP Application Management
A recent MI and RCA brought this to light, and I thought it important enough to share with all of you. This blog should provide detail, not just as to the best practice of having consistent disk sizes per +ASM diskgroup, but why it’s a best practice and the impact of not following such best practice.
When Oracle +ASM writes data to a diskgroup, it attempts to stripe that data across all disks in the group. It will do so proportionally on a per disk basis. So if you have 4 by 100Gb disks in DATA diskgroup, and you create a new tablespace with the SQL command CREATE TABLESPACE NEW_TS; the database will create a tablespace of default size of 100M in the DATA diskgroup (provided of course, the parameter DB_CREATE_FILE_DEST is set to +DATA in the init or spfile.) So the database will send the call to +ASM to create that new datafile, and +ASM will stripe that datafile proportionally, to all disks in the diskgroup, ultimately placing 25mb on each disk. Great!
But consider a DATA diskgroup with varying sizes – for instance 4 disks, with sizes 25Gb, 50Gb, 75Gb, and 250Gb. When creating that same 100M tablespace, Oracle will attempt to create that datafile proportionally, attempting to write around 6mb to the first disk, 12mb to the second disk, 24mb to the third disk, and 58mb to the fourth disk. Later, a decision is made to standardize on 100G lun sizes, so in a single operation the DBA issues an ‘Alter diskgroup DATA add disk DISK4, DISK5, DISK6, DISK7 drop disk DATA000, DATA001, DAT002, DATA003 [rebalance power x ] ;’ The database migrates the data from the old disks to the new disks, without interrupting the database operation.
Unfortunately, the diskgroup is now in an ‘IMBALANCED’ state. The imbalance, will may cause misinformation or at least implied misinformation to be presented. This recently occurred in a production environment for one of our customer’s most important databases. The +ASM diskgroup via sqlplus and asmcmd showed more than 140Gb of free space, in a 4Tb DATA diskgroup. However, when attempting to extend an existing table or create a new tablespace, the following errors were returned:
ORA-17502: ksfdcre:4 Failed to create file +DATA
ORA-15041: diskgroup "DATA" space exhausted
So how could this be? Both the +ASM instance and asmcmd clearly showed plenty of free space! Here’s an excerpt from the Oracle documentation, (I added formatting for emphasis):
How can we tell if we have disk imbalance in a +ASM diskgroup, as this is not something we monitor for? The attached sql file asm_imbalance.sql will report imbalance. Simply supply the diskgroup name when prompted. Below is sample out from the script:
SYS@+ASM>select max(free_mb) biggest, min(free_mb) lowest, avg(free_mb) AVG,
2 trunc(GREATEST ((avg(free_mb)*100/max(free_mb)),(min(free_mb)*100/avg(free_mb))),2)||'%' as balanced,
3 trunc(100-(GREATEST ((avg(free_mb)*100/max(free_mb)),(min(free_mb)*100/avg(free_mb)))),2)||'%' as imbalanced
4 from v$asm_disk
5 where group_number in
6 (select group_number from v$asm_diskgroup where name = upper('&DG'));
Enter value for dg: DATA
BIGGEST LOWEST AVG BALANC IMBALA
---------- ---------- --------- ------ ------
73280 54524 69863 95.33% 4.66%
One would think 4.66% is a tolerable amount. However, in a 4Tb database 4.66% is 190Gb. While the database and asmcmd reported 140Gb free space in the diskgroup, which was technically accurate, refer to the portion of FREE_MB description: , without considering any data imbalance. So because one of the disks was actually full, due to the data imbalance, database writes failed because of the imbalance in the disk group.
Follow up action for all DBA’s
The imbalance experienced here was extremely rare, however, it’s important to understand, this diskgroup WAS CURRENTLY using consistent sizes. The diskgroup was made of 8 x 512Gb disks. It was not, however, always that way. Prior to January 2016, the diskgroup consisted of multiple different sizes. It’s also important to mention the last sentence under FREE_MB is NOT entirely accurate: Make sure that you initiate a manual rebalance to force even data distribution which results in an accurate presentation of the values in the FREE_MB column.
Hybrid IT infrastructure that combines on-premises and public cloud capabilities is a strategy many enterprises are embracing. Download Now
Why is it important for organizations to embrace digital transformation? Just ask anyone that once worked for Blockbuster. It’s not that we quit... Continue Reading