A recent MI and RCA brought this to light, and I thought it important enough to share with all of you. This blog should provide detail, not just as to the best practice of having consistent disk sizes per +ASM diskgroup, but why it’s a best practice and the impact of not following such best practice.
When Oracle +ASM writes data to a diskgroup, it attempts to stripe that data across all disks in the group. It will do so proportionally on a per disk basis. So if you have 4 by 100Gb disks in DATA diskgroup, and you create a new tablespace with the SQL command CREATE TABLESPACE NEW_TS; the database will create a tablespace of default size of 100M in the DATA diskgroup (provided of course, the parameter DB_CREATE_FILE_DEST is set to +DATA in the init or spfile.) So the database will send the call to +ASM to create that new datafile, and +ASM will stripe that datafile proportionally, to all disks in the diskgroup, ultimately placing 25mb on each disk. Great!
But consider a DATA diskgroup with varying sizes – for instance 4 disks, with sizes 25Gb, 50Gb, 75Gb, and 250Gb. When creating that same 100M tablespace, Oracle will attempt to create that datafile proportionally, attempting to write around 6mb to the first disk, 12mb to the second disk, 24mb to the third disk, and 58mb to the fourth disk. Later, a decision is made to standardize on 100G lun sizes, so in a single operation the DBA issues an ‘Alter diskgroup DATA add disk DISK4, DISK5, DISK6, DISK7 drop disk DATA000, DATA001, DAT002, DATA003 [rebalance power x ] ;’ The database migrates the data from the old disks to the new disks, without interrupting the database operation.
Unfortunately, the diskgroup is now in an ‘IMBALANCED’ state. The imbalance, will may cause misinformation or at least implied misinformation to be presented. This recently occurred in a production environment for one of our customer’s most important databases. The +ASM diskgroup via sqlplus and asmcmd showed more than 140Gb of free space, in a 4Tb DATA diskgroup. However, when attempting to extend an existing table or create a new tablespace, the following errors were returned:
ORA-17502: ksfdcre:4 Failed to create file +DATA
ORA-15041: diskgroup “DATA” space exhausted
So how could this be? Both the +ASM instance and asmcmd clearly showed plenty of free space! Here’s an excerpt from the Oracle documentation, (I added formatting for emphasis):
- TOTAL_MB is the total usable capacity of a disk group in megabytes. The calculations for data in this column take the disk header overhead into consideration. The disk header overhead depends on the number of ASM disks and ASM files. This value is typically about 1% of the total raw storage capacity. For example, if the total LUN capacity provisioned for ASM is 100 GB, then the value in the TOTAL_MB column would be about 99 GB.
- FREE_MB is the unused capacity of the disk group in megabytes, without considering any data imbalance. Therefore, there may be situations where the value in the FREE_MB column shows unused capacity but because one of the ASM disks is full, database writes fail because of the imbalance in the disk group. Make sure that you initiate a manual rebalance to force even data distribution which results in an accurate presentation of the values in the FREE_MB column.
How can we tell if we have disk imbalance in a +ASM diskgroup, as this is not something we monitor for? The attached sql file asm_imbalance.sql will report imbalance. Simply supply the diskgroup name when prompted. Below is sample out from the script:
SYS@+ASM>select max(free_mb) biggest, min(free_mb) lowest, avg(free_mb) AVG,
2 trunc(GREATEST ((avg(free_mb)*100/max(free_mb)),(min(free_mb)*100/avg(free_mb))),2)||’%’ as balanced,
3 trunc(100-(GREATEST ((avg(free_mb)*100/max(free_mb)),(min(free_mb)*100/avg(free_mb)))),2)||’%’ as imbalanced
4 from v$asm_disk
5 where group_number in
6 (select group_number from v$asm_diskgroup where name = upper(‘&DG’));
Enter value for dg: DATA
BIGGEST LOWEST AVG BALANC IMBALA
———- ———- ——— —— ——
73280 54524 69863 95.33% 4.66%
One would think 4.66% is a tolerable amount. However, in a 4Tb database 4.66% is 190Gb. While the database and asmcmd reported 140Gb free space in the diskgroup, which was technically accurate, refer to the portion of FREE_MB description: , without considering any data imbalance. So because one of the disks was actually full, due to the data imbalance, database writes failed because of the imbalance in the disk group.
Follow up action for all DBA’s
The imbalance experienced here was extremely rare, however, it’s important to understand, this diskgroup WAS CURRENTLY using consistent sizes. The diskgroup was made of 8 x 512Gb disks. It was not, however, always that way. Prior to January 2016, the diskgroup consisted of multiple different sizes. It’s also important to mention the last sentence under FREE_MB is NOT entirely accurate: Make sure that you initiate a manual rebalance to force even data distribution which results in an accurate presentation of the values in the FREE_MB column.