728x90
 

Question

I found some indexes marked invalid after rollforward, or a HADR standby database become switched to primary.
Why the indexes are marked as invalid, and how I can recover the invalid indexes.

Cause

Db2 does not write index build or rebuild log records by default, rather Db2 marks the index as invalid so that recovery task can trigger to rebuild the index.

Answer

"Index rebuilding" means index creation, index reorg, classic table reorg, and rebuilding index in this article.
 
Indexes can be marked as invalid when:
  • Index rebuilding is not logged ( LOGINDEXBUILD=NO), then index rebuilding replayed.
  • Index rebuilding is logged ( LOGINDEXBUILD=YES), then index rebuilding replayed but INDEXREC is set as ACCESS_NO_REDO or RESTART_NO_REDO.
  • Failed offline reorg, or such failed offline reorg is replayed.
  • db2dart runs with /MI option.

Indexes can be rebuilt when:
  • There are one or more indexes marked invalid when a database restarts. i,e. Run RESTART DB with INDEXREC=RESTART
  • Someone access an index marked as invalid.
  • Run classic REORG TABLE command
  • Run REORG INDEXES or REORG INDEX command
  • Run CREATE INDEX statement
  • Run LOAD command, and it uses REBUILD option as INDEXING MODE
  • Run IMPORT or LOAD command with REPLACE option
  • Run TRAUNCATE TABLE statement

We can list all the indexes marked as invalid by the following query:
db2 "select tabschema, tabname, indname, datapartitionid, index_partitioning from table(sysproc.admin_get_index_info('','','')) as t where index_requires_rebuild='Y'"

We can recover the indexes marked as invalid by ether of the following ways.
 
a) Run RESTART DATABASE command (in case db cfg INDEXREC is set as RESTART)
The restart database command rebuild all the indexes set as INDEX_REQUIRES_REBUILD=Y
db2 restart database <database_name>
b) Run reorg indexes
db2 reorg indexes all on table <table_name> allow no access

Note: 
You can set LOGINDEXBUILD as YES to prevent indexes to be marked as invalid.
So, this setting is highly recommended in HADR databases.
Example:
db2 connect to <database_name>
db2 update db cfg using LOGINDEXBUILD YES

 

https://www.ibm.com/support/pages/db2-causes-mark-index-invalid-and-how-recover-indexes

 

[Db2] The causes to mark index invalid, and how to recover the indexes.

[Db2] The causes to mark index invalid, and how to recover the indexes.

www.ibm.com

 

728x90
728x90

Problem description:

Oracle 11g utilities like tnsping, lsnrctl or sqlplus on AIX 6.1 hanged for about 2,5 minutes (150 seconds) before connection with alias to any destination when DNS server was unreachable. In the same conditions (host and configuration) 10g utilities connected without delay. Oracle*Net was configured for local naming method only – NAMES.DIRECTORY_PATH=(tnsnames) in the both 10g and 11g Oracle Home. The tnsnames.ora files where identical in the 10g and 11g Oracle Home. The aliases in the tnsnames.ora used host names instead of IP addresses in the connection descriptions; if a host name was substituted by IP address the delay disappeared for this alias. Nevertheless all the host names from tnsnames where described in the /etc/hosts and resolution order was hosts = local, bind in the netsvc.conf (analog of  the nsswitch.conf in other UNIXs). This configuration should guarantee local name resolution when DNS server was unreachable. What is the root reason of this delay in 11g Oracle*Net ?

 

Diagnostic:

First of all I checked configuration of the Oracle*Net – sqlnet.ora, tnsnames.ora and AIX configuration of network name resolution  – /etc/resolv.conf, /etc/netsvc.conf. The configuration was typical.

 

The next steps were Oracle*Net and OS level tracing both 11g and 10g tnsping utility when DNS was reachable and unreachable. To emulate unreachable DNS server it’s IP address was changed in the /etc/resolv.conf

The Oracle*Net Support level tracing was enabled:

TNSPING.TRACE_LEVEL = 16
TRACE_LEVEL_CLIENT = 16

The comparison of 11g Oracle*Net traces gave no clue. Traces with and without reachable DNS server were almost identical. The comparison of 10g and 11g Oracle*Net traces did not help too. There were some differences but generally they were similar.

To trace tnsping at OS level were used AIX truss utility (truss -aefo tnsping.trc tnsping db_test). I checked the traces but din’t found any system call (like gethostbyname, gethostbyname2, gethostbyaddr or getaddrinfo) which could use DNS for the name resolution.

Comparing 11g with 10g trace files I found some new actions in the 11g trace. Here are some parts of tracing tnsping (11g and 10g) when DNS server was unreachable:

10g part:

kopen(“/etc/resolv.conf”, O_RDONLY)    = 4
kioctl(4, 22528, 0x0000000000000000, 0x0000000000000000) Err#25 ENOTTY
kioctl(4, 22528, 0x0000000000000000, 0x0000000000000000) Err#25 ENOTTY
kread(4, ” n a m e s e r v e r\t 1″.., 4096) = 38
kread(4, ” n a m e s e r v e r\t 1″.., 4096) = 0
statx(“/etc/resolv.conf”, 0x0FFFFFFFFFFFA2D8, 176, 0) = 0
close(4)                = 0
socket(1, 1, 0)            = 4
kfcntl(4, F_SETFD, 0x0000000000000001) = 0
connext(4, 0x0FFFFFFFFFFF9DA8, 1025)    Err#2  ENOENT
close(4)                = 0
kopen(“/etc/netsvc.conf”, O_RDONLY)    = 4
kioctl(4, 22528, 0x0000000000000000, 0x0000000000000000) Err#25 ENOTTY
kioctl(4, 22528, 0x0000000000000000, 0x0000000000000000) Err#25 ENOTTY
kread(4, ” #   @ ( # ) 4 3        “.., 4096) = 4096
kread(4, ” o n   a n d   r e s o l”.., 4096) = 663
close(4)                = 0
_thread_self()            = 33816773
__libc_sbrk(0x0000000000010020)    = 0x00000001109A2AA0
_thread_self()            = 33816773
kopen(“/etc/hesiod.conf”, O_RDONLY)    Err#2  ENOENT
kopen(“/etc/irs.conf”, O_RDONLY)    Err#2  ENOENT
_thread_self()            = 33816773
getdomainname(0x09001000A00DB290, 1024) = 0
_thread_self()            = 33816773
_thread_self()            = 33816773
_thread_self()            = 33816773
getdomainname(0x09001000A00DB290, 1024) = 0
_thread_self()            = 33816773
_thread_self()            = 33816773
_thread_self()            = 33816773
_thread_self()            = 33816773
_thread_self()            = 33816773
_thread_self()            = 33816773
_thread_self()            = 33816773
getdomainname(0x09001000A00DB290, 1024) = 0
_thread_self()            = 33816773
_thread_self()            = 33816773
_thread_self()            = 33816773
getdomainname(0x09001000A00DB290, 1024) = 0
_thread_self()            = 33816773
_thread_self()            = 33816773
__libc_sbrk(0x0000000000010020)    = 0x00000001109B2AC0
kopen(“/etc/hosts”, O_RDONLY)        = 4
kioctl(4, 22528, 0x0000000000000000, 0x0000000000000000) Err#25 ENOTTY
kfcntl(4, F_SETFD, 0x0000000000000001) = 0
kioctl(4, 22528, 0x0000000000000000, 0x0000000000000000) Err#25 ENOTTY
kread(4, ” #   @ ( # ) 4 7\t 1 . 2″.., 4096) = 2029
kread(4, ” #   @ ( # ) 4 7\t 1 . 2″.., 4096) = 0
close(4)                = 0
socket(2, 1, 0)            = 4
getsockopt(4, 65535, 4104, 0x0FFFFFFFFFFFC124, 0x0FFFFFFFFFFFC120) = 0
connext(4, 0x00000001109B9E30, 16)    = 0
ngetsockname(4, 0x0FFFFFFFFFFFC1D4, 0x0FFFFFFFFFFFC1D0) = 0
getsockopt(4, 65535, 4097, 0x0FFFFFFFFFFFC324, 0x0FFFFFFFFFFFC320) = 0
getsockopt(4, 65535, 4098, 0x0FFFFFFFFFFFC324, 0x0FFFFFFFFFFFC320) = 0
setsockopt(4, 6, 1, 0x0FFFFFFFFFFFC32C, 4) = 0
kfcntl(4, F_SETFD, 0x0000000000000001) = 0
sigprocmask(2, 0x09001000A01BAAC0, 0x0FFFFFFFFFFFC330) = 0
_sigaction(13, 0x0FFFFFFFFFFFC460, 0x0FFFFFFFFFFFC4B8) = 0
thread_setmystate(0x0FFFFFFFFFFFBF20, 0x0000000000000000) = 0
kwrite(4, ” W0101 901 ,”.., 87) = 87
kread(4, ” A04 ” 5″.., 2064) = 65
close(4)                = 0
lseek(3, 27136, 0)            = 27136
kread(3, “\r\r ® V\r ¯”.., 512) = 512
kwrite(1, ” O K   ( 1 0   m s e c )”.., 13) = 13
kfcntl(1, F_GETFL, 0x0000000000000008) = 67110914
kfcntl(1, F_GETFL, 0x0000000000000008) = 67110914
close(3)                = 0
kfcntl(1, F_GETFL, 0x00000000120400C5) = 67110914
kfcntl(2, F_GETFL, 0x00000000120400C5) = 67110914
_exit(0)

11g part:

kopen(“/etc/resolv.conf”, O_RDONLY)    = 4
kioctl(4, 22528, 0x0000000000000000, 0x0000000000000000) Err#25 ENOTTY
kioctl(4, 22528, 0x0000000000000000, 0x0000000000000000) Err#25 ENOTTY
kread(4, ” n a m e s e r v e r\t 1″.., 4096) = 38
kread(4, ” n a m e s e r v e r\t 1″.., 4096) = 0
statx(“/etc/resolv.conf”, 0x0FFFFFFFFFFF7B08, 176, 0) = 0
close(4)                = 0
socket(1, 1, 0)            = 4
kfcntl(4, F_SETFD, 0x0000000000000001) = 0
connext(4, 0x0FFFFFFFFFFF75D8, 1025)    Err#2  ENOENT
close(4)                = 0
kopen(“/etc/netsvc.conf”, O_RDONLY)    = 4
kioctl(4, 22528, 0x0000000000000000, 0x0000000000000000) Err#25 ENOTTY
kioctl(4, 22528, 0x0000000000000000, 0x0000000000000000) Err#25 ENOTTY
kread(4, ” #   @ ( # ) 4 3        “.., 4096) = 4096
kread(4, ” o n   a n d   r e s o l”.., 4096) = 663
close(4)                = 0
_thread_self()            = 22020181
_thread_self()            = 22020181
kopen(“/etc/hesiod.conf”, O_RDONLY)    Err#2  ENOENT
kopen(“/etc/irs.conf”, O_RDONLY)    Err#2  ENOENT
_thread_self()            = 22020181
getdomainname(0x09001000A00DB290, 1024) = 0
_thread_self()            = 22020181
_thread_self()            = 22020181
_thread_self()            = 22020181
getdomainname(0x09001000A00DB290, 1024) = 0
_thread_self()            = 22020181
_thread_self()            = 22020181
_thread_self()            = 22020181
_thread_self()            = 22020181
_thread_self()            = 22020181
_thread_self()            = 22020181
_thread_self()            = 22020181
getdomainname(0x09001000A00DB290, 1024) = 0
_thread_self()            = 22020181
_thread_self()            = 22020181
_thread_self()            = 22020181
getdomainname(0x09001000A00DB290, 1024) = 0
_thread_self()            = 22020181
_thread_self()            = 22020181
__libc_sbrk(0x0000000000010020)    = 0x0000000110A699A0
kopen(“/etc/hosts”, O_RDONLY)        = 4
kioctl(4, 22528, 0x0000000000000000, 0x0000000000000000) Err#25 ENOTTY
kfcntl(4, F_SETFD, 0x0000000000000001) = 0
kioctl(4, 22528, 0x0000000000000000, 0x0000000000000000) Err#25 ENOTTY
kread(4, ” #   @ ( # ) 4 7\t 1 . 2″.., 4096) = 2029
kread(4, ” #   @ ( # ) 4 7\t 1 . 2″.., 4096) = 0
close(4)                = 0
 kopen(“/etc/hosts”, O_RDONLY)        = 4
 kioctl(4, 22528, 0x0000000000000000, 0x0000000000000000) Err#25 ENOTTY
 kfcntl(4, F_SETFD, 0x0000000000000001) = 0
 kioctl(4, 22528, 0x0000000000000000, 0x0000000000000000) Err#25 ENOTTY
 kread(4, ” #   @ ( # ) 4 7\t 1 . 2″.., 4096) = 2029
 kread(4, ” #   @ ( # ) 4 7\t 1 . 2″.., 4096) = 0
 socket(2, 2, 0)            = 5
getsockopt(5, 65535, 4104, 0x0FFFFFFFFFFF65E4, 0x0FFFFFFFFFFF65E0) = 0
connext(5, 0x09001000A0024658, 16)    = 0
_esend(5, 0x0FFFFFFFFFFF74F0, 35, 0, 0x0000000000000000) = 35
_poll(0x0FFFFFFFFFFF66B0, 1, 5000) (sleeping…)
_poll(0x0FFFFFFFFFFF66B0, 1, 5000)    = 0
close(5)                = 0
 socket(2, 2, 0)            = 5
getsockopt(5, 65535, 4104, 0x0FFFFFFFFFFF65E4, 0x0FFFFFFFFFFF65E0) = 0
connext(5, 0x09001000A0024658, 16)    = 0
_esend(5, 0x0FFFFFFFFFFF74F0, 35, 0, 0x0000000000000000) = 35
_poll(0x0FFFFFFFFFFF66B0, 1, 10000) (sleeping…)
_poll(0x0FFFFFFFFFFF66B0, 1, 10000)    = 0
close(5)                = 0
socket(2, 2, 0)            = 5
getsockopt(5, 65535, 4104, 0x0FFFFFFFFFFF65E4, 0x0FFFFFFFFFFF65E0) = 0
connext(5, 0x09001000A0024658, 16)    = 0
_esend(5, 0x0FFFFFFFFFFF74F0, 35, 0, 0x0000000000000000) = 35
_poll(0x0FFFFFFFFFFF66B0, 1, 20000) (sleeping…)
_poll(0x0FFFFFFFFFFF66B0, 1, 20000)    = 0
close(5)                = 0
socket(2, 2, 0)            = 5
getsockopt(5, 65535, 4104, 0x0FFFFFFFFFFF65E4, 0x0FFFFFFFFFFF65E0) = 0
connext(5, 0x09001000A0024658, 16)    = 0
_esend(5, 0x0FFFFFFFFFFF74F0, 35, 0, 0x0000000000000000) = 35
_poll(0x0FFFFFFFFFFF66B0, 1, 40000) (sleeping…)
_poll(0x0FFFFFFFFFFF66B0, 1, 40000)    = 0
close(5)                = 0
 socket(2, 2, 0)            = 5
getsockopt(5, 65535, 4104, 0x0FFFFFFFFFFF65E4, 0x0FFFFFFFFFFF65E0) = 0
connext(5, 0x09001000A0024658, 16)    = 0
_esend(5, 0x0FFFFFFFFFFF74F0, 27, 0, 0x0000000000000000) = 27
_poll(0x0FFFFFFFFFFF66B0, 1, 5000) (sleeping…)
_poll(0x0FFFFFFFFFFF66B0, 1, 5000)    = 0
close(5)                = 0
socket(2, 2, 0)            = 5
getsockopt(5, 65535, 4104, 0x0FFFFFFFFFFF65E4, 0x0FFFFFFFFFFF65E0) = 0
connext(5, 0x09001000A0024658, 16)    = 0
_esend(5, 0x0FFFFFFFFFFF74F0, 27, 0, 0x0000000000000000) = 27
_poll(0x0FFFFFFFFFFF66B0, 1, 10000) (sleeping…)
_poll(0x0FFFFFFFFFFF66B0, 1, 10000)    = 0
close(5)                = 0
socket(2, 2, 0)            = 5
getsockopt(5, 65535, 4104, 0x0FFFFFFFFFFF65E4, 0x0FFFFFFFFFFF65E0) = 0
connext(5, 0x09001000A0024658, 16)    = 0
_esend(5, 0x0FFFFFFFFFFF74F0, 27, 0, 0x0000000000000000) = 27
_poll(0x0FFFFFFFFFFF66B0, 1, 20000) (sleeping…)
_poll(0x0FFFFFFFFFFF66B0, 1, 20000)    = 0
close(5)                = 0
socket(2, 2, 0)            = 5
getsockopt(5, 65535, 4104, 0x0FFFFFFFFFFF65E4, 0x0FFFFFFFFFFF65E0) = 0
connext(5, 0x09001000A0024658, 16)    = 0
_esend(5, 0x0FFFFFFFFFFF74F0, 27, 0, 0x0000000000000000) = 27
_poll(0x0FFFFFFFFFFF66B0, 1, 40000) (sleeping…)
_poll(0x0FFFFFFFFFFF66B0, 1, 40000)    = 0
 close(5)                = 0
close(4)                = 0
socket(2, 1, 0)            = 4
kfcntl(4, F_GETFL, 0x0000000000000000) = 2
kfcntl(4, F_SETFL, 0x0000000000000006) = 0
kioctl(4, -2147195266, 0x0FFFFFFFFFFF89C8, 0x0000000000000000) = 0
getsockopt(4, 65535, 4104, 0x0FFFFFFFFFFF9594, 0x0FFFFFFFFFFF9590) = 0
connext(4, 0x0000000110A76590, 16)    = 0
kfcntl(4, F_GETFL, 0x0000000000000000) = 6
kfcntl(4, F_SETFL, 0x0000000000000002) = 0
kioctl(4, -2147195266, 0x0FFFFFFFFFFF89C8, 0x0000000000000000) = 0
ngetsockname(4, 0x0FFFFFFFFFFF9FD8, 0x0FFFFFFFFFFF9680) = 0
getsockopt(4, 65535, 4097, 0x0FFFFFFFFFFFA634, 0x0FFFFFFFFFFFA630) = 0
getsockopt(4, 65535, 4098, 0x0FFFFFFFFFFFA634, 0x0FFFFFFFFFFFA630) = 0
setsockopt(4, 6, 1, 0x0FFFFFFFFFFFA63C, 4) = 0
kfcntl(4, F_SETFD, 0x0000000000000001) = 0
sigprocmask(2, 0x09001000A01BAAC0, 0x0FFFFFFFFFFFB0C0) = 0
_sigaction(13, 0x0FFFFFFFFFFFB1F0, 0x0FFFFFFFFFFFB248) = 0
thread_setmystate(0x0FFFFFFFFFFFACB0, 0x0000000000000000) = 0
__libc_sbrk(0x0000000000010020)    = 0x0000000110A799C0
kwrite(4, ” W0101 :01 ,”.., 87) = 87
kread(4, ” A04 ” 5″.., 8208) = 65
close(4)                = 0
lseek(3, 27136, 0)            = 27136
kread(3, “0E\t Ý \\t Þ”.., 512) = 512
kwrite(1, ” O K   ( 1 5 0 0 2 0   m”.., 17) = 17
kfcntl(1, F_GETFL, 0x0000000000000008) = 67110914
kfcntl(1, F_GETFL, 0x0000000000000008) = 67110914
close(3)                = 0
kfcntl(1, F_GETFL, 0x0000000011500055) = 67110914
kfcntl(2, F_GETFL, 0x0000000011500055) = 67110914
_exit(0)

We can see some new actions in the 11g part:

  • /etc/hosts look up were done 2 times; after the second lookup the file wasn’t close;
  • socket(2, 2, 0) was opened and closed 8 times (2 cycles by 4 times);
  • every time through this socket was sent a message (_esend) and status of the message queue was checked (_poll);
  • every time the checking of this message queue was waiting for response and finally timed out;
  • the waiting time was doubled (5000, 10000, 20000, 40000 ms) in the every next try in the cycle;
  • if we sum all the waiting time (5+10+20+40)*2 = 150 seconds !

The question is why all this new actions were done?

 

The reason

The answer is IPv6. It’s support was included in 11g Oracle*Net.

Even when the host name was resolved in IPv4 successfuly, the second look up in /etc/hosts is needed for IPv6 address host name resolution. If IPv6 address was not found in /etc/hosts then Oracle*Net requests DNS server for IPv6 name resolution because name resolution order is local, bind configured in the /etc/netsvc.conf.

 

Solution

To resolve the problem we should replace local, bind by local4, bind4  in the /etc/netsvc.conf. The meaning of the local4 and bind4 is explained in the comments in /etc/netsvc.conf:

# bind4       Uses BIND/DNS services for resolving only IPv4 addresses
# bind6       Uses BIND/DNS services for resolving only IPv6 addresses
# local4      Searches the local /etc/hosts file for resolving only IPv4 addresses
# local6      Searches the local /etc/hosts file for resolving only IPv6 addresses

 

https://odenysenko.wordpress.com/2012/02/11/investigation-of-the-11g-oraclenet-connection-delay-when-dns-server-is-unreachable/#more-482

 

728x90

'Oracle > oracle' 카테고리의 다른 글

오라클에서 access 파일 조회하기  (0) 2019.08.01
hex string 을 blob 에 입력하기  (0) 2016.11.22
Windows 7 64 bit에서 Oracle 11g Client 설치 문제  (0) 2013.02.15
hash_value VS sql_id  (0) 2012.07.17
TRANSLATE 함수 사용법  (0) 2012.01.04
728x90

Troubleshooting


Problem

Connections to Informix Dynamic Server (IDS) hang or take a long time to connect. Other users who are already connected do not see a performance problem. Once a user is connected there is no performance problem.

Symptom

Connections to Informix Dynamic Server (IDS) hang or take a long time to connect. Other users who are already connected do not see a performance problem. Once a user is connected there is no performance problem. Users connecting by way of shared memory do not experience a connection hang. Only users using network connections see this problem.

Cause


There are not enough files descriptors allocated for IDS to effectively handle the number of network connections. See the Related information section for a link to more information regarding file descriptors.

Note: This is only one possible cause.

Resolving The Problem


1. Increase the number of file descriptors available to a process. Ask your operating system administrator to increase the number of file descriptors .The suggested are values 2048, 4096, or unlimited. You can temporarily increase the number of file descriptors by running the ulimit -n command:


2. Restart IDS after the file descriptor increase is implemented.

Note: If this does not resolve the problem then look for the Collecting Data document that will help you collect the correct troubleshooting data for a hang. After you collect this data contact technical support.

Also look at this document; Slow connect time to Informix when using TCP/IP

 

https://www.ibm.com/support/pages/connections-ids-hang

 

Connections to IDS hang

 

www.ibm.com

 

728x90
728x90

Troubleshooting


Problem

Client applications using the TCP/IP protocol to connect to Informix experience poor performance during connection. Once the connection has been established the performance is improved.

Cause

  • During connection establishment, the IP address of the client is used by the IDS server to look up its host name. The method used to accomplish this is dependent on the configuration of the operating system. If the Domain Name System (DNS) is used, an operation known as a reverse query is used to map the IP address to host name. If the configuration of the DNS server is incomplete this type of query may take some time.

    The command line utility nslookup may be used to perform DNS queries. If a reverse lookup operation takes a long time to complete using nslookup then the IDS server will also take a long time to complete the same operation.

    The first example shows a reverse lookup operation for an IP address that is known to the DNS server and completes immediately. The second example uses an IP address that is unknown taking several seconds before returning an error.

    C:\>nslookup 1.2.3.4
    Server:  host1.domain.com
    Address:  9.8.7.6

    Name:    host222.domain.com
    Address:  1.2.3.4

    C:\>nslookup 9.69.1.126
    Server:  host1.domain.com
    Address:  9.8.7.6

    DNS request timed out.
    timeout was 2 seconds.
    *** Request to host1.domain.com timed-out


  • The lookup for the service name could be taking a while.
  • Another cause may by the use of IPv6 addresses with DNS servers that are not configured for IPv6.

Diagnosing The Problem

To isolate the problem to tcp/ip connections try to connect by way of a shared memory connection. If the connection is now completed quickly then the problem outlined above may be the cause of poor connection performance.

Follow this link if you need instructions for setting up shared memory connections:


A shared-memory connection (UNIX)

Resolving The Problem

  • Refer to your operating system documentation for details of how to configure DNS. If the DNS server runs on the Microsoft Windows® platform then it is possible that the zone used to implement reverse lookup is not implemented. Details regarding DNS on the Windows operating system can be found in the related URLs below. Ensure that the reverse-lookup zone is populated for all clients that connect to the IDS server.
  • Ensure that local hostname resolution is performed before other methods of hostname resolution (NIS, NIS+, DNS, and so on). Files like these are used on various operating systems to determine lookup order:

    /etc/netsrv.conf
    /etc/irs.conf
    /etc/nsswitch.conf
  • Reconfigure your operating system to not use DNS, for example by including the host names and IP addresses of all client machines in the local hosts file /etc/hosts or %WINDIR%\system32\drivers\etc\hosts on Windows.
  • Change the host name in the sqlhosts file to an IP address.
  • Change the service name in the sqlhosts file to the port number.
  • Connections to IDS hang

 

https://www.ibm.com/support/pages/slow-connect-time-informix-when-using-tcpip

  •  
 

Slow connect time to Informix when using TCP/IP

Slow connect time to Informix when using TCP/IP

www.ibm.com

 

728x90
728x90

APAR status

  • Closed as program error.

Error description

  • The listener thread may receive an OS error when trying to bind
    the port with an IPv6 address. This results in an AF similar to:
    
    Assert Warning: Unable to bind to the port (18105) on the host
    () for the server (server1).
    
    Stack for thread: 11 soctcplst
    
    (oninit) afstack
    (oninit) afhandler
    (oninit) afwarn_interface
    (oninit) listen_af_alarm
    (oninit) lisnsocket
    (oninit) tlListen
    (oninit) slSQIlisten
    (oninit) pfListen
    (oninit) cmListen
    (oninit) alListen
    (oninit) ASF_Call
    (oninit) sql_listener
    
    This may occur when using a specific IPv6 address other than the
    loopback address ::1.
    
    This problem may occur on releases of Linux where the option
    "myhostname" is permitted for the hosts entry in
    /etc/nsswitch.conf. The presence of this entry avoids the need
    to enter the host's specific IP address into /etc/hosts and will
    result in an IPv6 address being used by the listener thread if
    IPv6 has been configured for at least one network interface. Red
    Hat 7 is one such OS version where this option is available.
    

Local fix

  • The workaround is to allow the bind operation to use any of the
    available IPv6 addresses on the server by prefixing the hostname
    or address in column 3 of the sqlhosts file with an asterix "*".
    Alternatively if IPv6 is not required it may be disabled by
    setting the environment variable IFX_DISABLE_IPV6 or creating
    the empty file $INFORMIXDIR/etc/IFX_DISABLE_IPV6.
    

Problem summary

  • ****************************************************************
    * USERS AFFECTED:                                              *
    * informix user with IPv6 enabled                              *
    ****************************************************************
    * PROBLEM DESCRIPTION:                                         *
    * See Error Description                                        *
    ****************************************************************
    * RECOMMENDATION:                                              *
    * Update to IBM Informix Server 12.10.xC6                      *
    ****************************************************************
    

Problem conclusion

  • Problem Fixed In IBM Informix Server 12.10.xC6
    

Temporary fix

  •  

Comments

  •  

APAR Information

  • APAR number
  • IT11530
  • Reported component name
  • INFORMIX SERVER
  • Reported component ID
  • 5725A3900
  • Reported release
  • C10
  • Status
  • CLOSED PER
  • PE
  • NoPE
  • HIPER
  • NoHIPER
  • Special Attention
  • NoSpecatt
  • Submitted date
  • 2015-10-01
  • Closed date
  • 2015-12-30
  • Last modified date
  • 2015-12-30
728x90
728x90

Troubleshooting


Problem

A problem with DNS lookup can cause slow connection to database server. This article explains one scenario and a possible solution for that problem.

Symptom

Connection request to database server from dbaccess is trying to access DNS server for hostname resolution, after it found the same in the local host file.

On certain Operating System (for example: Linux) you may noticed dbaccess (and other client applications) always doing DNS lookup while connecting to a database, even after found out the hostname or IP address in the local host file. This behavior sometimes caused slow connection, if you have problem related to DNS. Following is an excerpt of strace output shows the sequence of file accessed by a dbaccess request:

Cause

Traditionally, hostname and service name resolution were performed by functions such as gethostbyname(), getservbyname() etc. These traditional lookup functions are still available, however those are not forward compatible to IPv6. Instead, the IPv6 socket API provides new lookup functions that consolidate the functionality of several traditional functions. These new lookup functions are also backward compatible with IPv4, so a programmer can use the same translation algorithm in an application for both the IPv4 and Ipv6. The getaddrinfo() is the new primary lookup function and a connection request from the dbaccess ultimately calls this socket API. You can pass several parameters to the getaddrinfo(), one of those parameter is addrinfo structure. By default, dbaccess passes value “AF_INET6” for addrinfo.ai_family. The ai_family field indicates the protocol family associated with the request, and will be PF_INET6 for IPv6 or PF_INET for IPv4.

If the ai_family set to AF_INET6 (IPv6) the getaddrinfo() will search the DNS every time. If the ai_family set to AF_INET, then it don't query the DNS server. You can consult the 'man' page for getaddrinfo() for detailed information.

Beginning with Informix 10.00.xC4 and Client SDK 2.90.xC4, the database server checks, on startup, whether IPv6 is supported in the underlying operating system. If IPv6 is supported it is used. If the underlying operating system does not support IPv6, the IPv4 address is used.

Diagnosing The Problem

You can use the attached 'C' program to reproduce and verify the problem outside Informix.

/* 
 * gt_adr_info.c - An example of using getaddrinfo() function to validate IPV6.
 *                 compile: cc -o gt_adr_info  gt_adr_info.c 
 *                 usage: gt_adr_info <HostName>
 */

#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <netdb.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <arpa/inet.h>

int
lookup_host (const char *host)
{
  struct addrinfo hints, *res;
  int err;
  char addrstr[100];
  void *ptr;
  char *result;

  memset (&hints, 0, sizeof (hints));

  result = getenv("IFX_DISABLE_IPV6");

  if (result != NULL)
      hints.ai_family = AF_INET;
  else
      hints.ai_family = AF_INET6;

  hints.ai_flags = AI_PASSIVE;

  if ((err = getaddrinfo(host, NULL, &hints, &res)) != 0)
    {
      perror ("getaddrinfo");
      return -1;
    }

  printf ("HostName: %s\n", host);
  while (res)
    {
      inet_ntop (res->ai_family, res->ai_addr->sa_data, addrstr, 100);

      switch (res->ai_family)
        {
        case AF_INET:
          ptr = &((struct sockaddr_in *) res->ai_addr)->sin_addr;
          break;
        case AF_INET6:
          ptr = &((struct sockaddr_in6 *) res->ai_addr)->sin6_addr;
          break;
        }
      inet_ntop (res->ai_family, ptr, addrstr, 100);
      printf ("IPv%d address: %s (%s)\n", res->ai_family == PF_INET6 ? 6 : 4,
              addrstr, res->ai_canonname);
      res = res->ai_next;
    }

  return 0;
}

int
main (int argc, char *argv[])
{
  if (argc < 2)
    exit (1);
  return lookup_host (argv[1]);
}
gt_adr_info.c.txt

How to use the attached 'C' program?

1. Save the attached file as 'gt_adr_info.c'.

2, Compile the 'C' program using following command:


3. Run it as 'gt_adr_info <HostName>', where HostName is the machine name that you want to connect.

Resolving The Problem

In case of a problem with DNS lookup and encountering slow connection to database, you may use the environment variable IFX_DISABLE_IPV6 (IFX_DISABLE_IPV6=1) to disable Ipv6 and this will set the ai_family to AF_INET only and will not do subsequent query to the DNS server.

 

https://www.ibm.com/support/pages/possible-cause-slow-connection-database-server

 

A possible cause for slow connection to the database server

A possible cause for slow connection to the database server

www.ibm.com

 

728x90
728x90

Symptom

DML operations failing with ISAM Error -104

Cause

The 32K File Descriptors (open tables) per thread limit has been reached

Diagnosing The Problem

Use onstat -g opn and look at the ucount column

IBM Informix Dynamic Server Version 11.70.FC2 -- On-Line -- Up 14:23:05 -- 2964472 Kbytes
tid rstcb isfd op_mode op_flags partnum ucount ocount lockmode
671522 0x000000009c9e0a20 0 0x00000400 0x00000397 0x004000da 2 2 0
671522 0x000000009c9e0a20 1 0x00000002 0x00000003 0x004000da 2 2 0
671522 0x000000009c9e0a20 2 0x00000400 0x00000397 0x00100003 2 2 0
671522 0x000000009c9e0a20 3 0x00000002 0x00000003 0x00100003 2 2 0
671522 0x000000009c9e0a20 8 0x01000400 0x00000407 0x0090003c 32692 0 0
671522 0x000000009c9e0a20 9 0x01000440 0x00000797 0x0090003d 32692 0 0

(please note there are 2 partitions - 0x0090003c, 0x0090003d - each of which is opened 32692 times)

Resolving The Problem

*Check the application logic.
*Check that UDRs are not compiled with PDQ set


If you have a user defined routine (UDR) which is called as a
part of many update/delete triggers on different tables
and if the application logic in the update/delete triggers
mimics some kind of hierarchy (so a modification in one table
can cause many changes in many other tables), the UDR can get
called multiple (thousands) times in one session.

If the UDR
1) contains a query which needs to materialize some view
2) was created with PDQPRIORITY > 0 (or the last 'update
statistics low [for procedure]' command was run with PDQPRIORITY
> 0)

each of its invocations materializes the view, leaving the base
tables opened at the RSAM level.

 

 

728x90
728x90

Question

Your read cache rate (onstat -p) rises durning server use to ninety-nine percent. You suspect that you have overallocated buffers to the extent that you are wasting memory. How can you determine how much of your buffer cache memory is used?

Answer

Count the number of buffers in use at regular intervals and compare with the total allocated. You can use the following two methods to count the number of buffer pages in use at a given point in time:

  • Execute the onstat -B command. The number of data lines in the output is the number of buffers in use. The onstat -B output includes all the configured buffer pools, grouped by page size. You will have to count the lines in the individual groups if you have more than one buffer pool.


  • Open the sysmaster database and execute the following query:

    SELECT COUNT(*) AS pages_used
    FROM sysbufhdr
    WHERE bufsize = <buffer_page_size>
    AND bflags != 4;

    The buffer_page_size is the page size (in bytes) for the buffer pool you are examining. The buffer page size is listed in the onstat -b/B output. If you only have one buffer pool, you can leave out the buffer page size filter and the query becomes:

    SELECT COUNT(*) AS pages_used
    FROM sysbufhdr
    WHERE bflags != 4;

 

https://www.ibm.com/support/pages/how-do-i-know-if-my-buffers-are-overallocated

 

How do I know if my Buffers are Overallocated?

How do I know if my Buffers are Overallocated?

www.ibm.com

 

728x90
728x90

Question

Your IBM Informix server is processing queries more slowly than expected. You want to determine what is slowing the instance. Were do you start your analysis?

Answer

Preconditions

  • This performance problem is not limited to one query or a class of queries.
  • The database statistics (UPDATE STATISTICS) were collected with appropriate performance considerations and are current.
  • The performance is a problem for both network connections to the server and shared memory connections.

Recommended Approach

Informix servers provides a number of commands, the onstats, for reporting system status. You can generally start your analysis using seven of the commands. Examination of the data often leads to further analysis and resolution. The recommended commands are:

  • onstat -m

    Message log. The message log contains warning and error messages. The onstat -m only reports recent updates to the log file. You may want to look at earlier log entries.

  • onstat -u

    User thread status. Look for user threads waiting on a resource. You will see a wait indicator in the first position of the flags field. The address of the resource is in the wait field. The specific wait flag, the type of resource, and cross references follow:

    B - wait on buffer - match the wait address to a buffer in onstat -b

    C - wait on checkpoint - examine onstat -g ckp and onstat -g con

    G - wait on write to log buffer - match the wait address to a log buffer in onstat -l

    L - wait on lock - match the wait address to the address of a lock in onstat -k

    S - wait on latch - match the wait address to a latch (mutex) in onstat -lmx and onstat -wmx

    Y - wait on condition - listed in onstat -g con and do not typically reflect performance problems to the extent that they help with diagnosis

    There are several other flags but they are rarely observed.

    The first field of the onstat -u output, address, maps to the rstcb field of the onstat -g ath output for thread identification. The sessid field is the session id (SID). You can learn more about resources allocated to a session and its activity with onstat -g ses <SID>.

    Collect onstat -u several times in rapid succession. If the waiters persist over a measurable clock time, then the chances are very good that the waits reflect a processing problem that affects performance. Some waiters are normal but they should disappear quickly. Keep in mind that lock waits are programmed.

    The last two fields of onstat -u, nreads and nwrites, can be useful indicators of thread activity.

  • onstat -p

    Server statistics. The single most important performance statistic in this output is the read cache rate (the first %cached field). Your goal for an OLTP system is to have a read cache rate above 95 percent during normal processing. Try for a rate above 99 percent. Increase the cache rate by adding buffers, which are configured using the BUFFERPOOL configuration parameter. Make sure that you have plenty of system memory (onstat -g osi) when increasing the Informix server memory allocation. Increasing the server memory footprint can indirectly slow the instance by increasing paging/swapping at the OS side. You can use a sysmaster query (see Related Information below) to help determine if you have configured too many buffers.

    Other statistics, like bufwaits (waiting for buffer), seqscans (sequential scans), and the read aheads should be considered. In the case of read aheads, the sum of ixda-RA, idx-RA, and da-RA should be close to the value of RA-pgsused as an indicator of effective read-ahead configuration.

    Many of the same statistics are viewed on a partnum level with onstat -g ppf.

    Consider collecting and retaining onstat -p outputs at regular intervals for comparison. Note that cumulative statistics are zeroed with onstat -z if you want to observe statistics over a limited time interval.

  • onstat -g rea

    Ready queue. Reports threads ready to execute at one moment in time. These threads are prepared for execution but lack cpu resource. If the number remains above the number of cpu virtual processors (onstat -g glo) through repetitions of onstat -g rea, then your system is likely limited by processing power. Look for inefficient queries and non-server processing on the machine. See onstat -g glo output for cpu use integrated over time and onstat -g osi for system resources.

  • onstat -g act

    Active threads. You will usually see poll threads and listen threads in this queue. Look for threads doing query-related work, like sqlexec and I/O threads. If none show up or they are rare, then work is not getting to the processors.

  • onstat -g ioq

    I/O request queues. The statistic to monitor is the maxlen column. This is the maximum length of a queue after the engine was brought online or onstat -z was executed. If the number is too large, then at some point the I/O requests were not serviced fast enough. Try executing onstat -z and checking to see how long it takes for large maxlen values to return.

    For informix aio monitor the gfd (global file descriptor) queues. The maxlen should not be greater than 25. The system uses gfd 0, 1, and 2 (stdin, stdout, stderr), so the informix chunks start with gfd 3 (chunk 1).

    If you are using kaio monitor the kio queues. The maxlen values should not exceed 32. There will be one kio queue for each cpu virtual processor.

    Recommendations for maxlen with DIRECT_IO enabled have not been determined, but are not expected to be larger than 32.

  • onstat -g seg

    Shared memory allocations. The shared memory configuration parameters should be tuned so that the server dynamically allocates at most two virtual segments during normal processing.

 

 

https://www.ibm.com/support/pages/where-do-i-start-diagnosis-generally-slow-performance-informix-servers

 

Where do I start with Diagnosis of Generally Slow Performance on Informix Servers?

Where do I start with Diagnosis of Generally Slow Performance on Informix Servers?

www.ibm.com

 

728x90
728x90

Question

This document describes a few methods that can be used to find files opened by a process. This information can be useful when debugging processes that open too many files, or have a file leak.

Answer


Introduction

A system administrator might find it necessary to obtain information about all files that are currently opened by a process. A process might have a defect that causes it to continuously create files without closing them, or it might open files, read and write to those files, but fail to close the files afterwards. In such cases it is useful to obtain as much information as possible about the files that have been opened by a process to help pinpoint the cause of the problem. AIX has a virtual file system mounted at /proc that provides information about running processes, including the files opened by those processes. AIX also includes a number of commands that can be used to obtain information about files opened by processes. Open Source commands such as lsof can also be used.

/proc File System

The /proc file system is a virtual file system, meaning it does not contain actual files residing on a disk or in RAM. But the /proc file system contains virtual files that can be manipulated just like real files. These virtual files provide information about processes currently running on a system, using standard UNIX commands and methods for accessing files. Under /proc there are virtual directories named with the process IDs (PIDs) of all processes currently running on the system. Inside of each of these directories are more subdirectories. These subdirectories organize all of the available information about running processes. One of the subdirectories is named fd, an abbreviation for file descriptor. Inside fd is a list of virtual files with numbers for file names. These numbers are the file descriptor numbers assigned by the operating system to the real files that have been opened by the process. In the following example, we find that the process with PID 184422 has only one opened file with file descriptor 4.

# cd /proc/184422/fd
# ls -l total 16 -r--r--r--   1 root system   4811 Jul 12 2004  4

procfiles Command

The AIX procfiles command lists all files opened by a process. For each file the command also provides the inode number for the file, and additional information such as the file size, and uid and gid. Here is an example of procfiles output for the same process with PID 184422 that we found in the /proc file system above.

# procfiles 184422

184422 : /usr/sbin/hostmibd
  Current rlimit: 2147483647 file descriptors
   4: S_IFREG mode:0444 dev:10,5 ino:13407 uid:0 gid:0 rdev:0,0
      O_RDONLY size:4811

Again we see that process 184422 has one opened file with file descriptor 4. File descriptor 4 has major,minor numbers of 10,5 and an inode number of 13407. We can use the following procedure to find the device where the file is located.

# cd /dev
# ls -l | grep "10, *5"
brw-rw----   1 root     system       10,  5 Oct 10 2005  hd2
crw-rw----   1 root     system       10,  5 Oct 10 2005  rhd2

So the device or logical volume that contains the file system in this example is /dev/hd2.

# lsfs | grep hd2
/dev/hd2        --         /usr           jfs2 3801088 yes no

This filesystem is mounted at /usr.

We can use the following command to obtain information about the file with file descriptor 4 and inode 13407.

# istat 13407 /usr
Inode 13407 on device 10/5
File Protection: rw-r--r--
Owner: 2(bin)           Group: 2(bin)
Link count:   1         Length 4811 bytes
Last updated:   Tue Aug 24 16:14:48 CDT 2004
Last modified:  Mon Jul 12 11:33:31 CDT 2004
Last accessed:  Wed Aug  9 09:16:28 CDT 2006
Block pointers (hexadecimal): 1892c

We can use find command to find all file names in the filesystem /usr with an inode of 13407.

# cd /usr
# find . -inum 13407 -exec ls -l {} \;
-rw-r--r--   1 bin      bin            4811 Jul 12 2004
./lib/nls/msg/en_US/hostmibd.cat

Notice the 1 just before the first bin. This indicates that there is only 1 hard link, meaning the file name hostmibd.cat is the only file name associated with this inode.

pstat Command

The AIX pstat command can be used to list all files opened by a process. Here is an example that finds all files currently opened by the cron process.

# ps -ef | grep cron
    root 323762      1   0   Oct 06      -  0:07 /usr/sbin/cron

The PID for cron is 323762, which is 0x4F0B2 in hex.

# pstat -a | grep -i 4F0B2
SLT ST    PID   PPID   PGRP   UID  EUID  TCNT  NAME
 79 a   4f0b2      1  4f0b2     0     0     1  cron

We can use the slot number to display the file system state info and the file descriptor table. In this example we see that cron has 13 opened files, numbered from 0 to 12.

# pstat -u 79 | grep FILE
FILE SYSTEM STATE
FILE DESCRIPTOR TABLE

# pstat -u 79 | grep -p "FILE DESCRIPTOR TABLE"
FILE DESCRIPTOR TABLE
    *ufd: 0xf00000002ff49e20
    fd 0:  fp = 0xf1000714500080e0   flags = 0x0080  count = 0x0000
    fd 1:  fp = 0xf100071450007fa0   flags = 0x0080  count = 0x0000
    fd 2:  fp = 0xf100071450007fa0   flags = 0x0080  count = 0x0000
    fd 3:  fp = 0xf100071450007780   flags = 0x0080  count = 0x0000
    fd 4:  fp = 0xf100071450007af0   flags = 0x0080  count = 0x0000
    fd 5:  fp = 0xf1000714500079b0   flags = 0x0080  count = 0x0000
    fd 6:  fp = 0xf1000714500066a0   flags = 0x0080  count = 0x0000
    fd 7:  fp = 0xf100071450008270   flags = 0x0080  count = 0x0000
    fd 8:  fp = 0xf1000714500081d0   flags = 0x0080  count = 0x0000
    fd 9:  fp = 0xf100071450008220   flags = 0x0080  count = 0x0000
    fd 10:  fp = 0xf100071450008180   flags = 0x0080  count = 0x0000
    fd 11:  fp = 0xf1000714500082c0   flags = 0x0080  count = 0x0001
    fd 12:  fp = 0xf100071450008130   flags = 0x0081  count = 0x0000

lsof Command

The lsof command is an open source command available for free on the internet. lsof is a very powerful command with many options so we only list a few uses for lsof in this document.

# lsof -u account | wc -l
Displays the total number of open file handles in the specified account.

# lsof -u account | grep pid | wc -l
or
# lsof -p pid
Displays the total number of open files in the specified account name for the specified pid.

lsof indicates if the file descriptor is associated with an open socket or an open file.

Conclusion

The /proc virtual file system and the AIX commands procfiles and pstat can be used to list information about files that are currently opened by a process. This information can be used to investigate processes that are having certain types of problems with files. The open source lsof command is also useful for providing information about files opened by processes, and files opened under specific user accounts.

 

 

 

728x90

+ Recent posts