Problem description:
Oracle 11g utilities like tnsping, lsnrctl or sqlplus on AIX 6.1 hanged for about 2,5 minutes (150 seconds) before connection with alias to any destination when DNS server was unreachable. In the same conditions (host and configuration) 10g utilities connected without delay. Oracle*Net was configured for local naming method only – NAMES.DIRECTORY_PATH=(tnsnames) in the both 10g and 11g Oracle Home. The tnsnames.ora files where identical in the 10g and 11g Oracle Home. The aliases in the tnsnames.ora used host names instead of IP addresses in the connection descriptions; if a host name was substituted by IP address the delay disappeared for this alias. Nevertheless all the host names from tnsnames where described in the /etc/hosts and resolution order was hosts = local, bind in the netsvc.conf (analog of the nsswitch.conf in other UNIXs). This configuration should guarantee local name resolution when DNS server was unreachable. What is the root reason of this delay in 11g Oracle*Net ?
Diagnostic:
First of all I checked configuration of the Oracle*Net – sqlnet.ora, tnsnames.ora and AIX configuration of network name resolution – /etc/resolv.conf, /etc/netsvc.conf. The configuration was typical.
The next steps were Oracle*Net and OS level tracing both 11g and 10g tnsping utility when DNS was reachable and unreachable. To emulate unreachable DNS server it’s IP address was changed in the /etc/resolv.conf
The Oracle*Net Support level tracing was enabled:
TNSPING.TRACE_LEVEL = 16
TRACE_LEVEL_CLIENT = 16
The comparison of 11g Oracle*Net traces gave no clue. Traces with and without reachable DNS server were almost identical. The comparison of 10g and 11g Oracle*Net traces did not help too. There were some differences but generally they were similar.
To trace tnsping at OS level were used AIX truss utility (truss -aefo tnsping.trc tnsping db_test). I checked the traces but din’t found any system call (like gethostbyname, gethostbyname2, gethostbyaddr or getaddrinfo) which could use DNS for the name resolution.
Comparing 11g with 10g trace files I found some new actions in the 11g trace. Here are some parts of tracing tnsping (11g and 10g) when DNS server was unreachable:
10g part:
kopen(“/etc/resolv.conf”, O_RDONLY) = 4
kioctl(4, 22528, 0x0000000000000000, 0x0000000000000000) Err#25 ENOTTY
kioctl(4, 22528, 0x0000000000000000, 0x0000000000000000) Err#25 ENOTTY
kread(4, ” n a m e s e r v e r\t 1″.., 4096) = 38
kread(4, ” n a m e s e r v e r\t 1″.., 4096) = 0
statx(“/etc/resolv.conf”, 0x0FFFFFFFFFFFA2D8, 176, 0) = 0
close(4) = 0
socket(1, 1, 0) = 4
kfcntl(4, F_SETFD, 0x0000000000000001) = 0
connext(4, 0x0FFFFFFFFFFF9DA8, 1025) Err#2 ENOENT
close(4) = 0
kopen(“/etc/netsvc.conf”, O_RDONLY) = 4
kioctl(4, 22528, 0x0000000000000000, 0x0000000000000000) Err#25 ENOTTY
kioctl(4, 22528, 0x0000000000000000, 0x0000000000000000) Err#25 ENOTTY
kread(4, ” # @ ( # ) 4 3 “.., 4096) = 4096
kread(4, ” o n a n d r e s o l”.., 4096) = 663
close(4) = 0
_thread_self() = 33816773
__libc_sbrk(0x0000000000010020) = 0x00000001109A2AA0
_thread_self() = 33816773
kopen(“/etc/hesiod.conf”, O_RDONLY) Err#2 ENOENT
kopen(“/etc/irs.conf”, O_RDONLY) Err#2 ENOENT
_thread_self() = 33816773
getdomainname(0x09001000A00DB290, 1024) = 0
_thread_self() = 33816773
_thread_self() = 33816773
_thread_self() = 33816773
getdomainname(0x09001000A00DB290, 1024) = 0
_thread_self() = 33816773
_thread_self() = 33816773
_thread_self() = 33816773
_thread_self() = 33816773
_thread_self() = 33816773
_thread_self() = 33816773
_thread_self() = 33816773
getdomainname(0x09001000A00DB290, 1024) = 0
_thread_self() = 33816773
_thread_self() = 33816773
_thread_self() = 33816773
getdomainname(0x09001000A00DB290, 1024) = 0
_thread_self() = 33816773
_thread_self() = 33816773
__libc_sbrk(0x0000000000010020) = 0x00000001109B2AC0
kopen(“/etc/hosts”, O_RDONLY) = 4
kioctl(4, 22528, 0x0000000000000000, 0x0000000000000000) Err#25 ENOTTY
kfcntl(4, F_SETFD, 0x0000000000000001) = 0
kioctl(4, 22528, 0x0000000000000000, 0x0000000000000000) Err#25 ENOTTY
kread(4, ” # @ ( # ) 4 7\t 1 . 2″.., 4096) = 2029
kread(4, ” # @ ( # ) 4 7\t 1 . 2″.., 4096) = 0
close(4) = 0
socket(2, 1, 0) = 4
getsockopt(4, 65535, 4104, 0x0FFFFFFFFFFFC124, 0x0FFFFFFFFFFFC120) = 0
connext(4, 0x00000001109B9E30, 16) = 0
ngetsockname(4, 0x0FFFFFFFFFFFC1D4, 0x0FFFFFFFFFFFC1D0) = 0
getsockopt(4, 65535, 4097, 0x0FFFFFFFFFFFC324, 0x0FFFFFFFFFFFC320) = 0
getsockopt(4, 65535, 4098, 0x0FFFFFFFFFFFC324, 0x0FFFFFFFFFFFC320) = 0
setsockopt(4, 6, 1, 0x0FFFFFFFFFFFC32C, 4) = 0
kfcntl(4, F_SETFD, 0x0000000000000001) = 0
sigprocmask(2, 0x09001000A01BAAC0, 0x0FFFFFFFFFFFC330) = 0
_sigaction(13, 0x0FFFFFFFFFFFC460, 0x0FFFFFFFFFFFC4B8) = 0
thread_setmystate(0x0FFFFFFFFFFFBF20, 0x0000000000000000) = 0
kwrite(4, ” W0101 901 ,”.., 87) = 87
kread(4, ” A04 ” 5″.., 2064) = 65
close(4) = 0
lseek(3, 27136, 0) = 27136
kread(3, “\r\r ® V\r ¯”.., 512) = 512
kwrite(1, ” O K ( 1 0 m s e c )”.., 13) = 13
kfcntl(1, F_GETFL, 0x0000000000000008) = 67110914
kfcntl(1, F_GETFL, 0x0000000000000008) = 67110914
close(3) = 0
kfcntl(1, F_GETFL, 0x00000000120400C5) = 67110914
kfcntl(2, F_GETFL, 0x00000000120400C5) = 67110914
_exit(0)
11g part:
kopen(“/etc/resolv.conf”, O_RDONLY) = 4
kioctl(4, 22528, 0x0000000000000000, 0x0000000000000000) Err#25 ENOTTY
kioctl(4, 22528, 0x0000000000000000, 0x0000000000000000) Err#25 ENOTTY
kread(4, ” n a m e s e r v e r\t 1″.., 4096) = 38
kread(4, ” n a m e s e r v e r\t 1″.., 4096) = 0
statx(“/etc/resolv.conf”, 0x0FFFFFFFFFFF7B08, 176, 0) = 0
close(4) = 0
socket(1, 1, 0) = 4
kfcntl(4, F_SETFD, 0x0000000000000001) = 0
connext(4, 0x0FFFFFFFFFFF75D8, 1025) Err#2 ENOENT
close(4) = 0
kopen(“/etc/netsvc.conf”, O_RDONLY) = 4
kioctl(4, 22528, 0x0000000000000000, 0x0000000000000000) Err#25 ENOTTY
kioctl(4, 22528, 0x0000000000000000, 0x0000000000000000) Err#25 ENOTTY
kread(4, ” # @ ( # ) 4 3 “.., 4096) = 4096
kread(4, ” o n a n d r e s o l”.., 4096) = 663
close(4) = 0
_thread_self() = 22020181
_thread_self() = 22020181
kopen(“/etc/hesiod.conf”, O_RDONLY) Err#2 ENOENT
kopen(“/etc/irs.conf”, O_RDONLY) Err#2 ENOENT
_thread_self() = 22020181
getdomainname(0x09001000A00DB290, 1024) = 0
_thread_self() = 22020181
_thread_self() = 22020181
_thread_self() = 22020181
getdomainname(0x09001000A00DB290, 1024) = 0
_thread_self() = 22020181
_thread_self() = 22020181
_thread_self() = 22020181
_thread_self() = 22020181
_thread_self() = 22020181
_thread_self() = 22020181
_thread_self() = 22020181
getdomainname(0x09001000A00DB290, 1024) = 0
_thread_self() = 22020181
_thread_self() = 22020181
_thread_self() = 22020181
getdomainname(0x09001000A00DB290, 1024) = 0
_thread_self() = 22020181
_thread_self() = 22020181
__libc_sbrk(0x0000000000010020) = 0x0000000110A699A0
kopen(“/etc/hosts”, O_RDONLY) = 4
kioctl(4, 22528, 0x0000000000000000, 0x0000000000000000) Err#25 ENOTTY
kfcntl(4, F_SETFD, 0x0000000000000001) = 0
kioctl(4, 22528, 0x0000000000000000, 0x0000000000000000) Err#25 ENOTTY
kread(4, ” # @ ( # ) 4 7\t 1 . 2″.., 4096) = 2029
kread(4, ” # @ ( # ) 4 7\t 1 . 2″.., 4096) = 0
close(4) = 0
kopen(“/etc/hosts”, O_RDONLY) = 4
kioctl(4, 22528, 0x0000000000000000, 0x0000000000000000) Err#25 ENOTTY
kfcntl(4, F_SETFD, 0x0000000000000001) = 0
kioctl(4, 22528, 0x0000000000000000, 0x0000000000000000) Err#25 ENOTTY
kread(4, ” # @ ( # ) 4 7\t 1 . 2″.., 4096) = 2029
kread(4, ” # @ ( # ) 4 7\t 1 . 2″.., 4096) = 0
socket(2, 2, 0) = 5
getsockopt(5, 65535, 4104, 0x0FFFFFFFFFFF65E4, 0x0FFFFFFFFFFF65E0) = 0
connext(5, 0x09001000A0024658, 16) = 0
_esend(5, 0x0FFFFFFFFFFF74F0, 35, 0, 0x0000000000000000) = 35
_poll(0x0FFFFFFFFFFF66B0, 1, 5000) (sleeping…)
_poll(0x0FFFFFFFFFFF66B0, 1, 5000) = 0
close(5) = 0
socket(2, 2, 0) = 5
getsockopt(5, 65535, 4104, 0x0FFFFFFFFFFF65E4, 0x0FFFFFFFFFFF65E0) = 0
connext(5, 0x09001000A0024658, 16) = 0
_esend(5, 0x0FFFFFFFFFFF74F0, 35, 0, 0x0000000000000000) = 35
_poll(0x0FFFFFFFFFFF66B0, 1, 10000) (sleeping…)
_poll(0x0FFFFFFFFFFF66B0, 1, 10000) = 0
close(5) = 0
socket(2, 2, 0) = 5
getsockopt(5, 65535, 4104, 0x0FFFFFFFFFFF65E4, 0x0FFFFFFFFFFF65E0) = 0
connext(5, 0x09001000A0024658, 16) = 0
_esend(5, 0x0FFFFFFFFFFF74F0, 35, 0, 0x0000000000000000) = 35
_poll(0x0FFFFFFFFFFF66B0, 1, 20000) (sleeping…)
_poll(0x0FFFFFFFFFFF66B0, 1, 20000) = 0
close(5) = 0
socket(2, 2, 0) = 5
getsockopt(5, 65535, 4104, 0x0FFFFFFFFFFF65E4, 0x0FFFFFFFFFFF65E0) = 0
connext(5, 0x09001000A0024658, 16) = 0
_esend(5, 0x0FFFFFFFFFFF74F0, 35, 0, 0x0000000000000000) = 35
_poll(0x0FFFFFFFFFFF66B0, 1, 40000) (sleeping…)
_poll(0x0FFFFFFFFFFF66B0, 1, 40000) = 0
close(5) = 0
socket(2, 2, 0) = 5
getsockopt(5, 65535, 4104, 0x0FFFFFFFFFFF65E4, 0x0FFFFFFFFFFF65E0) = 0
connext(5, 0x09001000A0024658, 16) = 0
_esend(5, 0x0FFFFFFFFFFF74F0, 27, 0, 0x0000000000000000) = 27
_poll(0x0FFFFFFFFFFF66B0, 1, 5000) (sleeping…)
_poll(0x0FFFFFFFFFFF66B0, 1, 5000) = 0
close(5) = 0
socket(2, 2, 0) = 5
getsockopt(5, 65535, 4104, 0x0FFFFFFFFFFF65E4, 0x0FFFFFFFFFFF65E0) = 0
connext(5, 0x09001000A0024658, 16) = 0
_esend(5, 0x0FFFFFFFFFFF74F0, 27, 0, 0x0000000000000000) = 27
_poll(0x0FFFFFFFFFFF66B0, 1, 10000) (sleeping…)
_poll(0x0FFFFFFFFFFF66B0, 1, 10000) = 0
close(5) = 0
socket(2, 2, 0) = 5
getsockopt(5, 65535, 4104, 0x0FFFFFFFFFFF65E4, 0x0FFFFFFFFFFF65E0) = 0
connext(5, 0x09001000A0024658, 16) = 0
_esend(5, 0x0FFFFFFFFFFF74F0, 27, 0, 0x0000000000000000) = 27
_poll(0x0FFFFFFFFFFF66B0, 1, 20000) (sleeping…)
_poll(0x0FFFFFFFFFFF66B0, 1, 20000) = 0
close(5) = 0
socket(2, 2, 0) = 5
getsockopt(5, 65535, 4104, 0x0FFFFFFFFFFF65E4, 0x0FFFFFFFFFFF65E0) = 0
connext(5, 0x09001000A0024658, 16) = 0
_esend(5, 0x0FFFFFFFFFFF74F0, 27, 0, 0x0000000000000000) = 27
_poll(0x0FFFFFFFFFFF66B0, 1, 40000) (sleeping…)
_poll(0x0FFFFFFFFFFF66B0, 1, 40000) = 0
close(5) = 0
close(4) = 0
socket(2, 1, 0) = 4
kfcntl(4, F_GETFL, 0x0000000000000000) = 2
kfcntl(4, F_SETFL, 0x0000000000000006) = 0
kioctl(4, -2147195266, 0x0FFFFFFFFFFF89C8, 0x0000000000000000) = 0
getsockopt(4, 65535, 4104, 0x0FFFFFFFFFFF9594, 0x0FFFFFFFFFFF9590) = 0
connext(4, 0x0000000110A76590, 16) = 0
kfcntl(4, F_GETFL, 0x0000000000000000) = 6
kfcntl(4, F_SETFL, 0x0000000000000002) = 0
kioctl(4, -2147195266, 0x0FFFFFFFFFFF89C8, 0x0000000000000000) = 0
ngetsockname(4, 0x0FFFFFFFFFFF9FD8, 0x0FFFFFFFFFFF9680) = 0
getsockopt(4, 65535, 4097, 0x0FFFFFFFFFFFA634, 0x0FFFFFFFFFFFA630) = 0
getsockopt(4, 65535, 4098, 0x0FFFFFFFFFFFA634, 0x0FFFFFFFFFFFA630) = 0
setsockopt(4, 6, 1, 0x0FFFFFFFFFFFA63C, 4) = 0
kfcntl(4, F_SETFD, 0x0000000000000001) = 0
sigprocmask(2, 0x09001000A01BAAC0, 0x0FFFFFFFFFFFB0C0) = 0
_sigaction(13, 0x0FFFFFFFFFFFB1F0, 0x0FFFFFFFFFFFB248) = 0
thread_setmystate(0x0FFFFFFFFFFFACB0, 0x0000000000000000) = 0
__libc_sbrk(0x0000000000010020) = 0x0000000110A799C0
kwrite(4, ” W0101 :01 ,”.., 87) = 87
kread(4, ” A04 ” 5″.., 8208) = 65
close(4) = 0
lseek(3, 27136, 0) = 27136
kread(3, “0E\t Ý \\t Þ”.., 512) = 512
kwrite(1, ” O K ( 1 5 0 0 2 0 m”.., 17) = 17
kfcntl(1, F_GETFL, 0x0000000000000008) = 67110914
kfcntl(1, F_GETFL, 0x0000000000000008) = 67110914
close(3) = 0
kfcntl(1, F_GETFL, 0x0000000011500055) = 67110914
kfcntl(2, F_GETFL, 0x0000000011500055) = 67110914
_exit(0)
We can see some new actions in the 11g part:
- /etc/hosts look up were done 2 times; after the second lookup the file wasn’t close;
- socket(2, 2, 0) was opened and closed 8 times (2 cycles by 4 times);
- every time through this socket was sent a message (_esend) and status of the message queue was checked (_poll);
- every time the checking of this message queue was waiting for response and finally timed out;
- the waiting time was doubled (5000, 10000, 20000, 40000 ms) in the every next try in the cycle;
- if we sum all the waiting time (5+10+20+40)*2 = 150 seconds !
The question is why all this new actions were done?
The reason
The answer is IPv6. It’s support was included in 11g Oracle*Net.
Even when the host name was resolved in IPv4 successfuly, the second look up in /etc/hosts is needed for IPv6 address host name resolution. If IPv6 address was not found in /etc/hosts then Oracle*Net requests DNS server for IPv6 name resolution because name resolution order is local, bind configured in the /etc/netsvc.conf.
Solution
To resolve the problem we should replace local, bind by local4, bind4 in the /etc/netsvc.conf. The meaning of the local4 and bind4 is explained in the comments in /etc/netsvc.conf:
# bind4 Uses BIND/DNS services for resolving only IPv4 addresses
# bind6 Uses BIND/DNS services for resolving only IPv6 addresses
# local4 Searches the local /etc/hosts file for resolving only IPv4 addresses
# local6 Searches the local /etc/hosts file for resolving only IPv6 addresses
https://odenysenko.wordpress.com/2012/02/11/investigation-of-the-11g-oraclenet-connection-delay-when-dns-server-is-unreachable/#more-482