728x90

Problem(Abstract)

This article goes over the Operating System Kernel that can be configured to avoid error KAIO out of resource on the HP-UX platform

Resolving the problem

PROBLEM

When using Kernel Asynchronous I/O (KAIO) with IBM Informix Dynamic Servers, you might experience the following message in the database server message log:

      KAIO out of resource errno=11
    or
      KAIO out of resource errno=35


CAUSE 

This error indicates a shortage of KAIO resources. 


SOLUTION 

Tune the KAIO subsystem: 

1. Configure the number of KAIO requests. 

    Note: Check Related information for more detail. 

2. Tune the Operating System Kernel parameters. 


Kernel parameter
Description
Minimum
Maximum
Default
aio_listio_max
Specifies how many POSIX asynchronous I/O operations are allowed in a single listio() call.
2
65536
256
aio_max_ops
Specifies the system-wide maximum number of POSIX asynchronous I/O operations that may be queued at any given time.
1
1048576
2048
aio_physmem_pct
Percentage of physical memory that can be locked for use in POSIX asynchronous I/O operations.
5
50
10
aio_prio_delta_max
Maximum delta that a process can decrease its asynchronous I/O priority level
0
20
20
max_async_ports
Maximum number of ports to the asynchronous disk-I/O driver that processes can have open at any given time.
1
2147483647
50


Note:
  • All configurable kernel parameters must be specified using an integer value or a formula consisting of a valid integer expression.
  • The maximum and/or default values of certain parameters may change between releases or to vary between 32-bit and 64-bit processors.

Warning: 

Changing kernel parameters to improper or inappropriate values or combinations of values can cause data loss, system panics, or other (possibly very obscure and/or difficult to diagnose) operating anomalies, depending on which parameters are set to what values. 

  • Before altering the value of any configurable kernel parameter, be sure you know the implications of making the change.
  • Never set any system parameter to a value outside the allowable range for that parameter (SAM refuses to store values outside of the allowable range).
  • Many parameters interact, and their values must be selected in a balanced way.

For more information about POSIX asynchronous I/O, see the HP-UX Reference entry aio(5). Please contact HP-UX OS support for more information about these parameters and their management.



http://www-01.ibm.com/support/docview.wss?uid=swg21138073

728x90
728x90

Problem(Abstract)

This article goes over the environment variable that can be used to change the default resources used by the engine for Kernel Asynchronous I/O (KAIO) on the HP-UX platform.

Resolving the problem

PROBLEM

Database performance is poor because of input/output (I/O) requests. The IBM® Informix® Dynamic Server™ message log might display one of these error messages:

      KAIO out of OS resources, errno = 11

    or   
      KAIO out of OS resources, errno = 35

CAUSE 

Each time the database server makes a kernel asynchronous input output (KAIO) request, it uses a special operating system structure. A certain number of these structures are allocated when the database server starts and are reused as long the instance is online. 

The problem will occur if there are not enough structures available to handle all of the I/O requests made by the database server. 


Note: This is only one possible cause for this problem. If this document does not solve your problem, see if other documents exist for the same problem. 


SOLUTION 

Increase the number of KAIO requests allowed simultaneously by the database server. Do this by increasing the value of the IFMX_HPKAIO_NUM_REQ environment variable. 

Set the variable to the desired number at the command line and then shut down and restart the database server. 

    Limits of the 
    IFMX_HPKAIO_NUM_REQ 
    environment variable.
    Minimum Value
    10
    Maximum Value
    5000
    Default Value
    1000

    Note: These values may change in future versions of the operating system or database server.

    Example:
    This example shows how to increase the number of concurrent KAIO requests to 2300 from the default of 1000 using Korn shell:
      $ IFMX_HPKAIO_NUM_REQ=2300 
      $ export IFMX_HPKAIO_NUM_REQ

You can also change some operating system kernel parameters for KAIO. If you are still experiencing poor performance, contact your local technical support office.



http://www-01.ibm.com/support/docview.wss?rs=0&uid=swg21142285

728x90
728x90


Problem(Abstract)

You try to start the Informix server but it fails to start with messages like these in the online.log: 21:45:30 IBM Informix Dynamic Server Started. 21:45:30 size of resident + virtual segments 10443875920 + 4831838208 > 9663676416 total allowed by configuration parameter SHMTOTAL

Symptom

  • output of oninit -v:

    Checking group membership to determine server run mode...succeeded
    Reading configuration file '/opt/informix/etc/onconfig.test'...succeeded
    Creating /INFORMIXTMP/.infxdirs...succeeded
    Checking config parameters...succeeded
    Allocating and attaching to shared memory...FAILED
  • messages in online.log:

    21:45:30 IBM Informix Dynamic Server Started.
    21:45:30 size of resident + virtual segments 10443875920 + 4831838208 > 9663676416
    total allowed by configuration parameter SHMTOTAL

Cause

The size of the resident and virtual portions of shared memory exceed the shared memory limit set by the onCONFIG parameter SHMTOTAL.


Resident portion of shared memory 
The resident portion of the database server shared memory stores the following data structures that do not change in size while the database server is running:

  • Shared-memory header
  • Buffer pool
  • Logical-log buffer
  • Physical-log buffer
  • Lock table

Virtual portion of shared memory 
The virtual portion of shared memory size is determined by the value of the onCONFIG parameter SHMVIRTSIZE and stores the following data:
  • Internal tables
  • Big buffers
  • Session data
  • Thread data (stacks and heaps)
  • Data-distribution cache
  • Dictionary cache
  • SPL routine cache
  • SQL statement cache
  • Sorting pool
  • Global pool

Diagnosing the problem

Resolving the problem

  • Increase the value of the onCONFIG parameter SHMTOTAL if possible.
  • Decrease the value of one or more of these onCONFIG parameters:

    - BUFFERPOOL ( the buffers value )
    - LOCKS
    - SHMVIRTSIZE


728x90
728x90


Problem(Abstract)

When there is a data transmission issue between Primary and HDR Secondary (usually caused by a network problem) the client applications which are working with the Primary may become blocked and may look hung even if the data replication is configured to be asynchronous (DRINTERVAL > 0).

Symptom

Once the ping timeout is written to the online.log file of the Primary/Secondary instance (see the sample output below), user sessions return to normal work.

11:27:57  DR: ping timeout                                             
11:27:57  DR: Receive error                                            
11:27:57  ASF Echo-Thread Server: asfcode = -25582: oserr = 4: errstr =
: Network connection is broken.                                        
11:27:57  DR_ERR set to -1                                             
11:27:59  DR: Turned off on primary server    

Cause

When data replication is established, primary and secondary regularly exchange ping messages. If the ping acknowledge is not received by the time when DRTIMEOUT is elapsed, a server re-sends ping message three more times and then reports ping timeout and turns off the DR subsytem. From this, the time span between first ping and the "DR: ping timeout" message can be as large as (DRTIMEOUT x 4).


For example, if DRTIMEOUT is set to be 180 second, it will take 12 minutes before DR is turned off.

Scenario #1:
Although with asynchronous replication transactions do not wait for acknowledgement from HDR secondary after the logical log record was put in DR buffer, when there is a transmission failure, the DR buffer may fill up pretty quickly (the time required for that depends on DRTIMEOUT value, LOGBUFF value and the activity that the instance is having). Until DR is not turned off, a user session has to wait until DR buffer has enough space for the logical log record.

Scenario #2:
In addition to the above scenario, a checkpoint can be requested on Primary between the first ping failure and the time when the "DR: ping timeout" message is reported. The checkpoints are synchronous between Primary and Secondary regardless of the DRINTERVAL value. once checkpoint is requested, it will prevent any threads from entering the critical section. The instance will remain blocked until checkpoint acknowledgment is received from the Secondary or until DR is turned off.


Diagnosing the problem

For scenario #1 check if the corresponding user thread demonstrates a stack similar to the following:


Stack for thread: 73 sqlexec                  
base: 0x0700000011abc000                      
len: 69632                                    
pc: 0x00000001000370f4                        
tos: 0x0700000011acafe0                       
state: sleeping                               
vp: 8                                         
                                              
0x00000001000370f4 (oninit)yield_processor_mvp
0x0000000100041f30 (oninit)mt_yield           
0x000000010076a5ac (oninit)cdrTimerWait
0x0000000100716908 (oninit)dr_buf_deq_int
0x00000001001fe3c0 (oninit)dr_logcopy         
0x00000001001f2d0c (oninit)logwrite
0x000000010011b7c4 (oninit)log_put            
0x0000000100121384 (oninit)logm_write         
0x00000001001f3e68 (oninit)logputx            
0x000000010017137c (oninit)rscommit           
0x000000010022b70c (oninit)iscommit           
0x00000001002865e4 (oninit)sqiscommit         
0x0000000100533d38 (oninit)committx           
0x0000000100536480 (oninit)commitcmd          
0x000000010053b01c (oninit)excommand          
0x000000010042893c (oninit)sq_execute         
0x000000010026becc (oninit)sqmain             
0x00000001002d51a4 (oninit)listen_verify      
0x00000001002d33b8 (oninit)spawn_thread       
0x0000000100e0b59c (oninit)startup       

For scenario #2 check the 'onstat -g ath' output and see if the user threads are having "cond wait cp" status.


Resolving the problem

To resolve the problem it may be required to:

1) Fix any problems that can cause data transmission issues between Primary and HDR Secondary (e.g. increase network reliability and throughput)

2) Decrease the value of DRTIMEOUT configuration parameter.

Note: increasing the LOGBUFF may also help to reduce the blockage time, however having a large logical log buffer may result in data loss in case of the Primary failure.



http://www-01.ibm.com/support/docview.wss?uid=swg21643957

728x90
728x90


Problem(Abstract)

Sometime you can get "ping timeout" and "send error" in online.log,and check network environment,which are all normal.Last HDR relation had been broken.Why do it occur ?

Symptom

ping timeout,received error,send error

Cause

ping timeout will occur if "DR_MSG_PING" can't flow between primary and secondary,or ack duration exceed 4*DRTIMEOUT."ping timeout" is a message type in DR BUFFER QUEUE,so it require waiting for dr buffer space,therefor PING TIMEOUT maybe occur due to dr buffer is full or its priority is too lower than logical log buffer.

Logical log buffer can't be transfer maybe lead to the 'ping timeout'.


Environment

HDR environment

Diagnosing the problem

DR BUFFER size is same as logical log buffer,and "DR_MSG_PING" save in dr buffer,so we can configure LOGBUFF to adjust DR BUFFER.

Primary server send logical log to Secondary server to keep consistent data as following description.
1.primary : logical log buffer -> dr buffer
2.primary : dr_prsend thread send these logical log to dr buffer on secondary server across network using TCP/IP.
3.secondary:dr_secrecv thread received those logical log in secondary.

HDR primary server and secondary server will ping each other and must waiting for a acknowledgment during a appointed times ,otherwise HDR relation will be broken due to "ping timeout" error .The ack duration is 4 times as DRTIMEOUT value.


Resolving the problem

To avoid "ping timeout" occur according to following mention.

1.Increasing LOGBUFF value to adjust dr buffer size and lay more signal message.
2.Secondary server hang maybe lead to the "ping timeout" due to dr buffer can't be received immediately or DR_MSG_PING lower priority.
3.Long checkpoint duration in secondary server.



http://www-01.ibm.com/support/docview.wss?uid=swg21413380

728x90
728x90


Problem(Abstract)

After JDBC upgrade to 3.50.JC1 onwards, customer who are using Multibyte codeset might received error such as : "FAILED: Fetch statement failed: Encoding or code set not supported. " error -79783

Symptom

Error message: "FAILED: Fetch statement failed: Encoding or code set not supported. "

error -79783

Cause

JDBC version 3.50.JC1 introduced the following APAR:

IC49877 - 
JDBC DRIVER ALLOWS INSERTION OF INVALID CHARACTERS FOR CHARACTER SET. 
http://www-01.ibm.com/support/docview.wss?uid=swg1IC49877 

While the behavior for this APAR is correct, it might lead to problems for those customers who use Native Multibyte data on the application side but store them into en_us locale on the server side. 

In some case, for customer who are using zh_tw.big5 locale on the server side; but stored illegal characters in the table will also be effected.


Diagnosing the problem

Error message happened right after upgrade.

Resolving the problem

Since version JDBC 3.50.JC5, user can set a flag IFX_USE_STRENC to switch to old style of encoding.


Here's an example on how to use it: 
"jdbc:informix-sqli://inst:port:dbname:informixserver=XXX;user=informix;password=XX;DB_LOCALE=en_us.819;IFX_USE_STRENC=true;"



http://www-01.ibm.com/support/docview.wss?uid=swg21502902

728x90
728x90


Problem(Abstract)

Prior to Informix Cluster environments, locks were ignored on an HDR secondary. With updateable secondary servers in 11.50.xC8 and above, the lock request is sent to the Primary. This document will help you identify and track these locks from a primary server back to the secondary server it came from.

Symptom

Lock requests showing up on a primary server that do not map to sessions on the primary.


Resolving the problem

To associate an open transaction on the primary with a session on the SDS node, we first start with 'onstat -k' from the primary, we see some locks that are causing problems:

  Locks
address    wtlist  owner      lklist    type     tblsnum  rowid    key#/bsiz
443139e8   0       4b7d0bf0   0         HDR+S    100002   205         0    
44313b68   0       4b7d99c8   4451cae8  HDR+X    1001c8   100         0
4444c168   0       4b7d24f8   0             S    100002   204         0     
4444c7e8   0       4b7d99c8   4451c868  HDR+S    100002   206         0     
444b43e8   0       4b7d2d50   0         HDR+S    100002   204         0     
4451c668   0       4b7d35a8   0             S    100002   204         0     
4451c868   0       4b7d99c8   0         HDR+X    649      3           0     
4451cae8   0       4b7d99c8   4444c7e8  HDR+IX   1001c8   0           0    
 8 active, 20000 total, 16384 hash buckets, 0 lock table overflows

The owner '4b7d99c8' has been identified as a lock on a table we need access to, or maybe while running an onstat –x, you see a open transaction that looks older than it should:

$ onstat –x
Transactions
                                                                     est.   
address  flags userthread locks begin_logpos   current logpos  isol  rb_time
4b811af0 A-B-- 4b7d99c8   4     2500:0x1014018 2578:0x1014080 
... 

To find the session, on the primary, use the owner from ‘onstat -k’ and run:

$ onstat -g ath|grep 4b7d99c8
 tid     tcb      rstcb    prty status                vp-class       name
 1770    4d0e9568 4b7d99c8 1    sleeping secs: 1       5cpu         proxyTh

...make note of the 'tid' value, then run:

$ onstat -g proxy all
IBM Informix Dynamic Server Version 11.50.F       -- on-Line
Secondary  Proxy      Reference Transaction  Hot Row   
Node       ID         Count     Count        Total     
sds2       1609       0         1            0         

TID      Flags      Proxy  Source   Proxy    Current  sqlerrno iserrno 
                    ID     SessID   TxnID    Seq                       
1770     0x00008224 1609   22       3        2        0        0       

...here we see the Proxy ID of 1609, and the Source SessID of 3, and the ProxyTxnID of 2.

On the SDS node, using the ‘Source SessId’ from the ‘onstat –g proxy all’ output, run the following:

$ onstat -g sql 22
IBM Informix Dynamic Server Version 11.50.F       -- Updatable (SDS) -- Up 06:08:18 – 443096 Kbytes

Sess       SQL          Current     Iso Lock       SQL  ISAM F.E.
Id         Stmt type    Database    Lvl Mode       ERR  ERR  Vers Explain
22         -            pwhite      DR  Not Wait   0    0    9.24  Off

Last parsed SQL statement :
  insert into tab1 values(0,"Howdy") { commit work; }

You can also find similar information using the ‘ProxyID’ and the ‘ProxyTxnID’ from the ‘onstat –g proxy all’ output from the primary:

$ onstat -g proxy 1609 3

IBM Informix Dynamic Server Version 11.50.F       -- Updatable (SDS) -- Up 06:07:33 -- 443096 Kbytes
Sequence Operation rowid    Table                          sqlerrno
Number   Type               Name                                   
1        *Insert   0        pwhite:informix.tab1           0  

Conclusion:
Using this method, you can track down user 'pwhite' and find out why he is still causing Informix problems?



http://www-01.ibm.com/support/docview.wss?uid=swg21659774

728x90
728x90

Problem(Abstract)

A java program getting "OutOfMemory" when handling blobs. Informix JDBC Driver uses the java File.deleteOnExit when handling blobs which causes the OutOfMemory problem in the JVM heap.

Resolving the problem

PROBLEM

A java application that uses Informix JDBC Driver to deal with BLOBs runs out of memory. Informix JDBC Driver uses the java File.deleteOnExit which causes the OutOfMemory problem in the JVM heap.

Example of error java.lang.OutOfMemoryError

The heap will have a lot of blocks that contain file names that have names like ifxb_123456 or ifxb_123456890. The shorter file names in JDBC drivers upwards from 2.21.JC6 and 3.00.JC1 and the longer filenames in earlier versions. 


CAUSE

Reason for this is a known bug regarding File.deleteOnExit() calls:

http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4813777
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4513817


SOLUTION

To workaround this set the environment variable LOBCACHE to -1.


EXTERNAL REFERENCES

To read more about the environment variable LOBCACHE see the manual "IBM Informix JDBC Driver Programmer’s Guide".

http://www-306.ibm.com/software/data/informix/pubs/library/


http://www-01.ibm.com/support/docview.wss?uid=swg21260832

728x90
728x90


Problem(Abstract)

When migrating from an older server that supports only the ISO-8859-1 locale to a newer server that uses UTF-8 locale to configure a database, the new database might not accept all characters without first using a conversion process.

Symptom

When attempting to load data from a database created with ISO-8859-1 locale into a database using UTF-8 locale, reject files are created for rows containing unrecognizable characters even though CLIENT_LOCALE, DB_LOCALE and SERVER_LOCALE have all been set accordingly.


Cause

It is possible that setting CLIENT_LOCALE, DB_LOCALE and SERVER_LOCALE will still leave some characters unrecognizable to a database created with the UTF-8 locale.

Resolving the problem

Most of the time 8859-1 will load into a UTF-8 database without error, however if you have already attempted to load the data and failed, complete the following steps:

First, set the environment variables CLIENT_LOCALE, DB_LOCALE and SERVER_LOCALE so they have the appropriate settings. After migrating the old data onto the UTF-8 server, use a conversion process such as the iconv API to ensure all data will be accepted into the new database.

For example, once you have migrated the data file, you could run the command:

iconv -f ISO-8859-1 -t UTF-8 {filename} > {pipe, file}

Once the file has been converted, you can direct it to a new file or a pipe before loading it into the destination table.


http://www-01.ibm.com/support/docview.wss?uid=swg21661529&myns=swgimgmt&mynp=OCSSGU8G&mync=E

728x90
728x90


Problem(Abstract)

A long running session may allocate huge memory pool if DONTDRAINPOOLS environment variable was enabled on server's startup

Symptom

You notice that long running sessions have large memory pools allocated (can be up to several gigabytes).

onstat -g ses


session                                      #RSAM    total      used       dynamic 
id       user     tty      pid      hostname threads  memory     memory     explain

<...>
229697969 user1 -        -1       192.168. 1        94208      87480      off 
229694677 user3 162      21785    host1    1        7140388864 946472     off 
229673199 user5 -        29050    host1    1        118784     92752      off 
229668414 user1 -        -1       192.168. 1        131072     107280     off 
229667792 user1 -        -1       192.168. 1        102400     69608      off 
229667782 user1 -        -1       192.168. 1        106496     85304      off 
229667776 user1 -        -1       192.168. 1        94208      87480      off 
229657782 user6 494      11416    host1    1        102400     71400      off 
229634560 user2   -        -1       192.168. 1        94208      67504      off 
229630635 user1 -        -1       192.168. 1        131072     107280     off 
229630075 user1 -        -1       192.168. 1        102400     69608      off 
229630064 user1 -        -1       192.168. 1        106496     85304      off 
229630055 user1 -        -1       192.168. 1        94208      87480      off 
229626281 user7 406      125      host1    1        139264     76040      off 
229623552 user5 -        25571    host1    1        471040     399904     off 
229617823 user8  -        6118     host1    1        798720     693920     off 
229612008 user4  -        -1       192.168. 1        90112      67472      off 
229611119 user4  -        -1       192.168. 1        90112      67472      off 
229605475 user9   951      7697     host1    1        348160     121784     off 
229588448 user10 1207     28006    host1    1        737280     521296     off 
229578372 user4  -        -1       192.168. 1        90112      67448      off 
229576124 informix -        29244    host1    2        131670016  129610816  off 
229565621 user1 -        -1       192.168. 1        131072     107280     off 
229565020 user1 -        -1       192.168. 1        102400     69608      off 
229565011 user1 -        -1       192.168. 1        106496     85664      off 
229565004 user1 -        -1       192.168. 1        94208      87480      off 
229546565 user2   -        -1       192.168. 1        94208      67504      off 
229531732 user2   -        -1       192.168. 1        98304      90568      off 
229531707 user2   -        -1       192.168. 1        94208      84976      off 
229519448 user3 154      19733    host1    1        753565696  30887704   off 
229512098 user3 133      16367    host1    1        851968     626672     off 
<...>

However, when you look at the session information, most of the allocated memory is shown as free.

onstat -g ses 229519448 

session           effective                            #RSAM    total      used       dynamic 
id       user     user      tty      pid      hostname threads  memory     memory     explain 
229519448 billproc -         154      19733    mobis    1        753565696  30887704   off 

Program :
-

tid      name     rstcb            flags    curstk   status
241358080 sqlexec  9a0ea1170        --BPR--  11263    ready-

Memory pools    count 3
name         class addr              totalsize  freesize   #allocfrag #freefrag 
229519448    V     80cb19040        753258496  722660672  604        31        
229519448*O  V     7e51bd040        12288      9000       1          3         
229519448_S  V     7eed67040        294912     8320       29         3         

But in time, the amount of allocated memory keeps growing which may result in additional segments allocated for the virtual portion of shared memory.

online.log

<...>
13:35:08  Maximum server connections 2943 
13:35:08  Checkpoint Statistics - Avg. Txn Block Time 0.004, # Txns blocked 63, Plog used 1561232, Llog used 180616

13:35:14  Requested shared memory segment size rounded from 409600KB to 425984KB
13:35:15  Dynamically allocated new virtual shared memory segment (size 425984KB)
13:35:15  Memory sizes:resident:27820032 KB, virtual:9899008 KB, no SHMTOTAL limit
13:35:15  Segment locked: addr=987000000, size=436207616
13:35:50  Requested shared memory segment size rounded from 409600KB to 425984KB
13:35:50  Dynamically allocated new virtual shared memory segment (size 425984KB)
13:35:50  Memory sizes:resident:27820032 KB, virtual:10324992 KB, no SHMTOTAL limit
13:35:50  Segment locked: addr=9a1000000, size=436207616
<...>

Cause

Having DONTDRAINPOOLS environment variable set on server's startup changes behavior of Informix memory manager to keep the memory in the local session and not release it back to the main memory pool until that session terminates and frees its pool.

Diagnosing the problem

Run 'onstat -g env' and see if the DONTDRAINPOOLS environment variable was set on startup:


Server start-up environment:

Variable Value [values-list]
DBDATE dmy4
DBDELIMITER |
DBMONEY .
DBPATH .
DBPRINT lp -s
DBTEMP /tmp
DONTDRAINPOOLS 1
IGNORE_UNDERFLOW 1
INFORMIXCONRETRY 1
INFORMIXCONTIME 10
INFORMIXDIR /informix/mobserver/inf11
<...>

Check messages in online.log file produced during server's startup if there is a message like "Server is disabling pools draining".

Resolving the problem

- You can run 'onmode -F' to free allocated memory.

- Unset DONTDRAINPOOLS environment variable and restart the Informix server.


http://www-01.ibm.com/support/docview.wss?uid=swg21627991

728x90

+ Recent posts