As Leighton L. Nelson point me out in G+, there must other way to corrupt blocks directly in ASM making this task easier for testing purposes. My scenario to make this happen is the following:
* OEL5.7 (Red Hat Enterprise Linux Server release 5.7)
* 2 RAC nodes 11g (11.2.0.1.0)
* VirtualBox (4.1.10)
* VirtualBox shared storage disk /dev/sdh (1GB)
We will need a separate DiskGroup with only one ASM disk to accommodate the tablespace and the table with random data to corrupt.
[root@rac1 oracle]# /etc/init.d/oracleasm createdisk TEST_DSK /dev/sdh1
Marking disk "TEST" as an ASM disk: [ OK ]
[oracle@rac1 bin]$ asmcmd lsdg
State Type Rebal Sector Block AU Total_MB Free_MB Req_mir_free_MB Usable_file_MB Offline_disks Voting_files Name
....
MOUNTED EXTERN N 512 4096 1048576 1019 924 0 924 0 N TEST_DG/
SQL> create tablespace ts_small datafile '+TEST_DG/RAC/DATAFILE/tdg.df' size 100k autoextend on;
Tablespace created.
SQL> alter user user_dbc default tablespace ts_small;
User altered.
Now using user_dbc create a small table that fits easily on 100k tablespace:
SQL> create table t_small_corrupt tablespace ts_small as select 'rownum:' || rownum R1 from dual connect by level <= 1200;
Table created.
First thing now is to discover datafile inside ASM, we can use X$KFFXP that contains the mapping between files, extents and AUs. With this table we can track the position of datafile inside ASM (please note that external redundancy applies in this case):
SQL> column DSK_PATH format a25;
SQL> select distinct GROUP_KFFXP,DISK_KFFXP,NUMBER_KFFXP, (select path from v$asm_disk where group_number=GROUP_KFFXP and disk_number=DISK_KFFXP) DSK_PATH from x$kffxp where number_kffxp=(select file_number from v$asm_alias where name='tdg.df') and GROUP_KFFXP=(select group_number from v$asm_alias where name='tdg.df');
GROUP_KFFXP DISK_KFFXP NUMBER_KFFXP DSK_PATH
----------- ---------- ------------ -------------------------
3 0 256 /dev/oracleasm/disks/TEST_DSK
GROUP_KFFXP - ASM disk group number
DISK_KFFXP - Disk number where extent is allocated
NUMBER_KFFXP - ASM file number
DSK_PATH - Device path for Disk
Next, we need to get the extent map for the datafile tdg.df (group 3, file number 256):
SQL> select xnum_kffxp,lxn_kffxp,pxn_kffxp,au_kffxp from x$kffxp where group_kffxp=3 and number_kffxp=256;
XNUM_KFFXP LXN_KFFXP PXN_KFFXP AU_KFFXP
---------- ---------- ---------- ----------
0 0 0 102
XNUM_KFFP - File extent number
LXN_KFFXP - 0 means that is a primary extent and not a mirrored extent (1 or 2, depends of redundancy)
PXN_KFFXP - Progressive file extent number that in this case (no mirrored extents) is always the same as XNUM_KFFXP
AU_KFFXP - Is the relative position of AU from the beginning of the disk
Time to use dd to access directy to disk using information we got from extent map of datafile. Use bs=1024k as AU for ASM is 1MB and skip parameter = (some) AU_KFFXP.
[root@rac1 oracle]# dd if=/dev/oracleasm/disks/TEST_DSK bs=1024k count=1 skip=102|strings|more
}|{z
TS_SMALL
r d V H : ,
rownum:458,
rownum:457,
rownum:456,
rownum:455,
...
1+0 records in
1+0 records out
1048576 bytes (1.0 MB) copied, 0.081027 seconds, 12.9 MB/s
So for sure, we hit the data now. No chance to miss!! :)
[root@rac1 oracle]# dd if=/dev/zero of=/dev/oracleasm/disks/TEST_DSK bs=1024k count=1 skip=102
1+0 records in
1+0 records out
1048576 bytes (1.0 MB) copied, 0.011318 seconds, 92.6 MB/s
Ok, all done, let's see if Oracle is aware of some block corruption:
SQL> select * from v$database_block_corruption;
no rows selected
To unsure we are not missing anything let's run Oracle Health Monitor manually:
Basic Run Information
Run Name : my_test_run
Run Id : 1521
Check Name : DB Structure Integrity Check
Mode : MANUAL
Status : COMPLETED
Start Time : 2012-03-22 17:55:19.578660 +00:00
End Time : 2012-03-22 17:55:19.593574 +00:00
Error Encountered : 0
Source Incident Id : 0
Number of Incidents Created : 0
Conclusion here is simple: no data corruption. My next ideia was to use strace to find anything that can give some hints:
mmap(NULL, 1060864, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f1537f06000
ioctl(0, MGSL_IOCSTXIDLE or MTIOCGET or SNDCTL_MIDI_MPUCMD, 0x7fffc53e6c60) = -1 ENOTTY (Inappropriate ioctl for device)
lseek(0, 106954752, SEEK_CUR) = 0
read(0, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1048576) = 1048576
write(1, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1048576) = 1048576
munmap(0x7f1537f06000, 1060864) = 0
close(0) = 0
close(1) = 0
...
write(2, "1+0 records in\n1+0 records out\n", 311+0 records in
1+0 records out
) = 31
write(2, "1048576 bytes (1.0 MB) copied", 291048576 bytes (1.0 MB) copied) = 29
write(2, ", 0.00314435 seconds, 333 MB/s\n", 31, 0.00314435 seconds, 333 MB/s
I'm no strace expert, but besides ENOTTY error (Inappropriate I/O control operation) it SEEMS that dd can seek and write 0's on the disk with success.
Conclusions so far:
0) I can backup datafile with RMAN with no errors
1) I can "full table scan" data (even after reset buffer cache and shared pool)
2) v$database_block_corruption returns no rows
3) Health Monitor reports nothing
4) Nothing on alert.log (on both instances)
5) Can read table data using dd (dd if=/dev/oracleasm/disks/TEST_DSK bs=1024k count=1 skip=102|strings|more)
6) ??
7) I was unable to corrupt anything!!
This is probably a VirtualBox issue or trick, i'm now short on time to make a few more tests (using VMWare for example), but i excepted that this method works on a non-virtualized environment. Next time i will blog about things that work :-)
PS: Thanks to Martin Berger for time spent discussing this issue.