728x90


Problem(Abstract)

When configuring IDS to use LPA (ssha1, ssha256 ,sblowfish, smb5, etc) you get -951/-952 errors on AIX

Cause

The underlaying problem is that the default settings on AIX for the LPA mapping config file are

such that only root can read the file. Crypt() function in IDS is run on a CPU VP which is non-root and AIX requires it to be run as root if using LPA, a workaround is to use PAM (which will use MSC VP for authentication )


Resolving the problem

Use PAM to configure LPA on AIX


Following example instructions to set it: 

1) Add to /etc/pam.conf: 
idslogin auth required pam_aix 
idslogin account required pam_aix 

2) Define DBSERVERALIAS ids_pam_srv 

3) Add to SQLHOSTS: 
ids_pam_srv onsoctcp <host> <port> 
s=4,pam_serv=idslogin,pamauth=password 

Don't forget to add an appropriate entry to /etc/services (if needed). 

4) check /etc/security/login.cfg for: 
auth_type = STD_AUTH 
pwd_algorithm = sblowfish 


5) and final IDS check: 

bash-3.2$ dbaccess - - 
> connect to "test@ids_pam_srv" user "tester"; 
ENTER PASSWORD: 

Connected.


http://www-01.ibm.com/support/docview.wss?uid=swg21624912

728x90
728x90


Problem(Abstract)

After configuring LPA (Loadable Password Algorithm) on AIX, you see -951 and -952 errors when trying to connect as user informix. -and using PAM is not an immediate option

Cause

AIX has made the /etc/security/pwdalg.cfg file unreadable to normal users (users other than root), hence -attempting to authenticate as user informix does not work.

Resolving the problem

You can use PAM to get around this problem, but if that is not an immediate option and re-writing applications to handle PAM can not be completed immediately - you can use the following work-around:

Warning: You need to ensure you are in a secure environment to use this work-around - i.e. your Informix Server machine is secure and you really trust user 'informix'. If this is true, then you can proceed.

Add user 'informix' to the 'security' group on AIX - this will allow the /etc/security/pwdalg.cfg to be read by user informix and authentication should work without throwing an error.

Note - using PAM is the preferred method for handling this situation, so use this as a temporary work-around only while taking steps to properly remedy the issue.


http://www-01.ibm.com/support/docview.wss?uid=swg21664590

728x90
728x90

Problem(Abstract)

An onbar backup can run on XBSA errors, preceded by BAR_TIMEOUT warnings, due to a tcp/ip send problem in AIX.

Symptom

In baract.log file you'd see XBSA errors like the following:


    XBSA Error: (BSACreateObject) A system error occurred. Aborting XBSA session.


or 

    XBSA Error: (BSAEndTxn) The transaction was aborted.


Due to the nature of the underlying tcp problem - a network communication hang causing the storage manager to wait on a package which the onbar client (XBSA library) has sent, but the tcp layer never fully transmitted - the errors might be preceded by BAR_TIMEOUT warnings for the given onbar_d process: 
    (-43296) WARNING: BAR_TIMEOUT Storage Manager Progress may be stalled.


Other BSA* functions might be affected too, depending on storage manager used.

The current backup would be aborted after such error (and possibly retried depending on BAR_RETRY setting).


Cause

This has been identified as an AIX product defect under APAR IV22133


Environment

IBM Informix onbar on AIX. The problem might be more likely on AIX lpars using virtual network adapters.

Diagnosing the problem

When hitting such XBSA errors, determine whether AIX OS is affected by APAR IV22133 or its siblings.

Resolving the problem

Apply latest AIX fixpacks.



http://www-01.ibm.com/support/docview.wss?uid=swg21668439&myns=swgimgmt&mynp=OCSSGU8G&mync=E

728x90
728x90


Problem(Abstract)

IBM® Informix® Server™ version11.70.FC5W1 and higher will produce a warning message in the online.log and to STDERR, when you have RESIDENT enabled in the onCONFIG file and either KAIO or DIRECT_IO enabled. This warning is only displayed for Informix when running on AIX® and is not displayed in earlier versions.

Symptom

An example of the warning as displayed in the online.log:

      11:17:09  WARNING: Shared-memory residency is enabled while direct I/O and KAIO (kernel asynchronous I/O) are active.

      This configuration could lead to runtime KAIO errors, which might shut down the instance or set chunks offline.

      Set the RESIDENT configuration parameter to 0 to turn off shared-memory residency, and then restart the database server. Refer to the machine notes and APAR IC76872 for more information.


Cause

APAR "IC82623 : on AIX, WE SHOULD CHECK FOR THE SETTING OF RESIDENT BEFORE BRINGING THE INSTANCE onLINE"

On AIX, setting RESIDENT to -1 may lead to the instance running into an AIX related APAR:

    APAR IC76872 : AIX: HARD TO DIAGNOSE KAIO ERRORS 22 (EINVAL) WHEN SYSTEM RUNNING LOW on PINNABLE MEMORY PAGES. 

    On AIX systems with a lot of allocated pinnable ("resident") memory and KAIO or DIRECT_IO being used by Informix, it is possible that KAIO read or write calls fail with errno 22 (EINVAL), potentially leading to down dbspaces or system aborts. 

    Sample message log error: 

    04:30:40  KAIO: error in kaio_WRITE, kaiocbp = 0x22b620d0, errno= 22
    04:30:40  fildes = 258 (gfd 3), buf = 0x700000122b64000, nbytes= 4096, offset = 130785280
     

    The reason for these EINVAL errors usually is the OS running low on 'pinnable' memory pages (by default 80% of the available RAM). This can be caused by Informix having a lot of shared memory segments allocated as "resident" plus pinned OS kernel memory plus KAIO resources. 
The solution to IC82623 is to warn of the potential problems of APAR IC76872 by displaying a suitable warning message when the instance comes on-line. 

This solution is included within Informix Server 11.70.FC6 and the Post Interim Drop 11.70.FC5W1.

Resolving the problem

Changes are required to the onCONFIG -> RESIDENT parameter. For reference, here are the comments from an onconfig.std file.


    ###################################################################
    # Shared Memory Configuration Parameters
    ###################################################################
    # RESIDENT         - Controls whether shared memory is resident.
    #                    Acceptable values are:
    #                    0 off (default)
    #                    1 lock the resident segment only
    #                    n lock the resident segment and the next n-1
    #                       virtual segments, where n < 100
    #                    -1 lock all resident and virtual segments
     

Recommendation : 
  1. Turn RESIDENT off (i.e. 0). Optionally set the AIX parameters recommended by AIX Technical Support (below)
    or
  2. Set RESIDENT to a positive integer so that only the initial memory segment(s) are pinned and that amount of memory does not exceed approximately 80% of the physical memory of the machine.

Q: Will there be a performance penalty?
A: AIX Technical Support assures us (the Informix team) that the performance penalty for not using RESIDENT in a machine dedicated to the database server is negligible. There are some AIX parameters that can be set to simulate the behavior of RESIDENT while using DIRECT_IO or Kernel AIO without incurring the problems that RESIDENT can cause.

Q: Tell me more about the AIX parameters that can be set to simulate the behavior of RESIDENT while using DIRECT_IO or Kernel AIO without incurring the problems that RESIDENT can cause.
A: Use larger memory page sizes. For example, a 64K page size can be enabled for the stack, data and text pages of oninit processes using this command when starting the Informix Server:LDR_CNTRL=DATAPSIZE=64K@STACKPSIZE=64K@TEXTPSIZE=64K@SHMPSIZE=64K oninit

Q: How does LDR_CNTRL compare to IFX_LARGE_PAGES?
A: First, they are two separate settings and therefore can be used independently. The current view is that LDR_CNTRL will give a small to negligible performance gain in addition to IFX_LARGE_PAGES. For some customers, since IFX_LARGE_PAGES is used with 16MB pages it may be that a LDR_CNTRL of 64K yields no discernible performance improvement.
The OS should always have sufficient pinned memory available for its own operations. For example, if there is insufficient memory errno 22 EINVAL may occur with KAIO operations (APAR IC76872 / 228003).
It is not safe to set RESIDENT to -1 no matter how much memory a system has as it can eventually be consumed and Informix corruption may result.


The amount of pinned memory currently in use may be displayed with the command svmon -G.

Example of setting pinned memory via the operating system.

  • Run svmon -G and look at "L" row within the "PageSize / PoolSize" section of the output.
  • export IFX_LARGE_PAGES=1
  • vmo -p -o lgpg_regions=3200 -o lgpg_size=16777216
  • vmo -p -o v_pinshm=1
  • Start the engine.

You may wish to consult with AIX Technical Support for further information regarding use of the vmo command on your system.


http://www-01.ibm.com/support/docview.wss?uid=swg21608334

728x90
728x90

Problem(Abstract)

This article goes over the Operating System Kernel that can be configured to avoid error KAIO out of resource on the HP-UX platform

Resolving the problem

PROBLEM

When using Kernel Asynchronous I/O (KAIO) with IBM Informix Dynamic Servers, you might experience the following message in the database server message log:

      KAIO out of resource errno=11
    or
      KAIO out of resource errno=35


CAUSE 

This error indicates a shortage of KAIO resources. 


SOLUTION 

Tune the KAIO subsystem: 

1. Configure the number of KAIO requests. 

    Note: Check Related information for more detail. 

2. Tune the Operating System Kernel parameters. 


Kernel parameter
Description
Minimum
Maximum
Default
aio_listio_max
Specifies how many POSIX asynchronous I/O operations are allowed in a single listio() call.
2
65536
256
aio_max_ops
Specifies the system-wide maximum number of POSIX asynchronous I/O operations that may be queued at any given time.
1
1048576
2048
aio_physmem_pct
Percentage of physical memory that can be locked for use in POSIX asynchronous I/O operations.
5
50
10
aio_prio_delta_max
Maximum delta that a process can decrease its asynchronous I/O priority level
0
20
20
max_async_ports
Maximum number of ports to the asynchronous disk-I/O driver that processes can have open at any given time.
1
2147483647
50


Note:
  • All configurable kernel parameters must be specified using an integer value or a formula consisting of a valid integer expression.
  • The maximum and/or default values of certain parameters may change between releases or to vary between 32-bit and 64-bit processors.

Warning: 

Changing kernel parameters to improper or inappropriate values or combinations of values can cause data loss, system panics, or other (possibly very obscure and/or difficult to diagnose) operating anomalies, depending on which parameters are set to what values. 

  • Before altering the value of any configurable kernel parameter, be sure you know the implications of making the change.
  • Never set any system parameter to a value outside the allowable range for that parameter (SAM refuses to store values outside of the allowable range).
  • Many parameters interact, and their values must be selected in a balanced way.

For more information about POSIX asynchronous I/O, see the HP-UX Reference entry aio(5). Please contact HP-UX OS support for more information about these parameters and their management.



http://www-01.ibm.com/support/docview.wss?uid=swg21138073

728x90
728x90

Problem(Abstract)

This article goes over the environment variable that can be used to change the default resources used by the engine for Kernel Asynchronous I/O (KAIO) on the HP-UX platform.

Resolving the problem

PROBLEM

Database performance is poor because of input/output (I/O) requests. The IBM® Informix® Dynamic Server™ message log might display one of these error messages:

      KAIO out of OS resources, errno = 11

    or   
      KAIO out of OS resources, errno = 35

CAUSE 

Each time the database server makes a kernel asynchronous input output (KAIO) request, it uses a special operating system structure. A certain number of these structures are allocated when the database server starts and are reused as long the instance is online. 

The problem will occur if there are not enough structures available to handle all of the I/O requests made by the database server. 


Note: This is only one possible cause for this problem. If this document does not solve your problem, see if other documents exist for the same problem. 


SOLUTION 

Increase the number of KAIO requests allowed simultaneously by the database server. Do this by increasing the value of the IFMX_HPKAIO_NUM_REQ environment variable. 

Set the variable to the desired number at the command line and then shut down and restart the database server. 

    Limits of the 
    IFMX_HPKAIO_NUM_REQ 
    environment variable.
    Minimum Value
    10
    Maximum Value
    5000
    Default Value
    1000

    Note: These values may change in future versions of the operating system or database server.

    Example:
    This example shows how to increase the number of concurrent KAIO requests to 2300 from the default of 1000 using Korn shell:
      $ IFMX_HPKAIO_NUM_REQ=2300 
      $ export IFMX_HPKAIO_NUM_REQ

You can also change some operating system kernel parameters for KAIO. If you are still experiencing poor performance, contact your local technical support office.



http://www-01.ibm.com/support/docview.wss?rs=0&uid=swg21142285

728x90
728x90


Problem(Abstract)

You try to start the Informix server but it fails to start with messages like these in the online.log: 21:45:30 IBM Informix Dynamic Server Started. 21:45:30 size of resident + virtual segments 10443875920 + 4831838208 > 9663676416 total allowed by configuration parameter SHMTOTAL

Symptom

  • output of oninit -v:

    Checking group membership to determine server run mode...succeeded
    Reading configuration file '/opt/informix/etc/onconfig.test'...succeeded
    Creating /INFORMIXTMP/.infxdirs...succeeded
    Checking config parameters...succeeded
    Allocating and attaching to shared memory...FAILED
  • messages in online.log:

    21:45:30 IBM Informix Dynamic Server Started.
    21:45:30 size of resident + virtual segments 10443875920 + 4831838208 > 9663676416
    total allowed by configuration parameter SHMTOTAL

Cause

The size of the resident and virtual portions of shared memory exceed the shared memory limit set by the onCONFIG parameter SHMTOTAL.


Resident portion of shared memory 
The resident portion of the database server shared memory stores the following data structures that do not change in size while the database server is running:

  • Shared-memory header
  • Buffer pool
  • Logical-log buffer
  • Physical-log buffer
  • Lock table

Virtual portion of shared memory 
The virtual portion of shared memory size is determined by the value of the onCONFIG parameter SHMVIRTSIZE and stores the following data:
  • Internal tables
  • Big buffers
  • Session data
  • Thread data (stacks and heaps)
  • Data-distribution cache
  • Dictionary cache
  • SPL routine cache
  • SQL statement cache
  • Sorting pool
  • Global pool

Diagnosing the problem

Resolving the problem

  • Increase the value of the onCONFIG parameter SHMTOTAL if possible.
  • Decrease the value of one or more of these onCONFIG parameters:

    - BUFFERPOOL ( the buffers value )
    - LOCKS
    - SHMVIRTSIZE


728x90
728x90


Problem(Abstract)

When there is a data transmission issue between Primary and HDR Secondary (usually caused by a network problem) the client applications which are working with the Primary may become blocked and may look hung even if the data replication is configured to be asynchronous (DRINTERVAL > 0).

Symptom

Once the ping timeout is written to the online.log file of the Primary/Secondary instance (see the sample output below), user sessions return to normal work.

11:27:57  DR: ping timeout                                             
11:27:57  DR: Receive error                                            
11:27:57  ASF Echo-Thread Server: asfcode = -25582: oserr = 4: errstr =
: Network connection is broken.                                        
11:27:57  DR_ERR set to -1                                             
11:27:59  DR: Turned off on primary server    

Cause

When data replication is established, primary and secondary regularly exchange ping messages. If the ping acknowledge is not received by the time when DRTIMEOUT is elapsed, a server re-sends ping message three more times and then reports ping timeout and turns off the DR subsytem. From this, the time span between first ping and the "DR: ping timeout" message can be as large as (DRTIMEOUT x 4).


For example, if DRTIMEOUT is set to be 180 second, it will take 12 minutes before DR is turned off.

Scenario #1:
Although with asynchronous replication transactions do not wait for acknowledgement from HDR secondary after the logical log record was put in DR buffer, when there is a transmission failure, the DR buffer may fill up pretty quickly (the time required for that depends on DRTIMEOUT value, LOGBUFF value and the activity that the instance is having). Until DR is not turned off, a user session has to wait until DR buffer has enough space for the logical log record.

Scenario #2:
In addition to the above scenario, a checkpoint can be requested on Primary between the first ping failure and the time when the "DR: ping timeout" message is reported. The checkpoints are synchronous between Primary and Secondary regardless of the DRINTERVAL value. once checkpoint is requested, it will prevent any threads from entering the critical section. The instance will remain blocked until checkpoint acknowledgment is received from the Secondary or until DR is turned off.


Diagnosing the problem

For scenario #1 check if the corresponding user thread demonstrates a stack similar to the following:


Stack for thread: 73 sqlexec                  
base: 0x0700000011abc000                      
len: 69632                                    
pc: 0x00000001000370f4                        
tos: 0x0700000011acafe0                       
state: sleeping                               
vp: 8                                         
                                              
0x00000001000370f4 (oninit)yield_processor_mvp
0x0000000100041f30 (oninit)mt_yield           
0x000000010076a5ac (oninit)cdrTimerWait
0x0000000100716908 (oninit)dr_buf_deq_int
0x00000001001fe3c0 (oninit)dr_logcopy         
0x00000001001f2d0c (oninit)logwrite
0x000000010011b7c4 (oninit)log_put            
0x0000000100121384 (oninit)logm_write         
0x00000001001f3e68 (oninit)logputx            
0x000000010017137c (oninit)rscommit           
0x000000010022b70c (oninit)iscommit           
0x00000001002865e4 (oninit)sqiscommit         
0x0000000100533d38 (oninit)committx           
0x0000000100536480 (oninit)commitcmd          
0x000000010053b01c (oninit)excommand          
0x000000010042893c (oninit)sq_execute         
0x000000010026becc (oninit)sqmain             
0x00000001002d51a4 (oninit)listen_verify      
0x00000001002d33b8 (oninit)spawn_thread       
0x0000000100e0b59c (oninit)startup       

For scenario #2 check the 'onstat -g ath' output and see if the user threads are having "cond wait cp" status.


Resolving the problem

To resolve the problem it may be required to:

1) Fix any problems that can cause data transmission issues between Primary and HDR Secondary (e.g. increase network reliability and throughput)

2) Decrease the value of DRTIMEOUT configuration parameter.

Note: increasing the LOGBUFF may also help to reduce the blockage time, however having a large logical log buffer may result in data loss in case of the Primary failure.



http://www-01.ibm.com/support/docview.wss?uid=swg21643957

728x90
728x90


Problem(Abstract)

Sometime you can get "ping timeout" and "send error" in online.log,and check network environment,which are all normal.Last HDR relation had been broken.Why do it occur ?

Symptom

ping timeout,received error,send error

Cause

ping timeout will occur if "DR_MSG_PING" can't flow between primary and secondary,or ack duration exceed 4*DRTIMEOUT."ping timeout" is a message type in DR BUFFER QUEUE,so it require waiting for dr buffer space,therefor PING TIMEOUT maybe occur due to dr buffer is full or its priority is too lower than logical log buffer.

Logical log buffer can't be transfer maybe lead to the 'ping timeout'.


Environment

HDR environment

Diagnosing the problem

DR BUFFER size is same as logical log buffer,and "DR_MSG_PING" save in dr buffer,so we can configure LOGBUFF to adjust DR BUFFER.

Primary server send logical log to Secondary server to keep consistent data as following description.
1.primary : logical log buffer -> dr buffer
2.primary : dr_prsend thread send these logical log to dr buffer on secondary server across network using TCP/IP.
3.secondary:dr_secrecv thread received those logical log in secondary.

HDR primary server and secondary server will ping each other and must waiting for a acknowledgment during a appointed times ,otherwise HDR relation will be broken due to "ping timeout" error .The ack duration is 4 times as DRTIMEOUT value.


Resolving the problem

To avoid "ping timeout" occur according to following mention.

1.Increasing LOGBUFF value to adjust dr buffer size and lay more signal message.
2.Secondary server hang maybe lead to the "ping timeout" due to dr buffer can't be received immediately or DR_MSG_PING lower priority.
3.Long checkpoint duration in secondary server.



http://www-01.ibm.com/support/docview.wss?uid=swg21413380

728x90
728x90


Problem(Abstract)

After JDBC upgrade to 3.50.JC1 onwards, customer who are using Multibyte codeset might received error such as : "FAILED: Fetch statement failed: Encoding or code set not supported. " error -79783

Symptom

Error message: "FAILED: Fetch statement failed: Encoding or code set not supported. "

error -79783

Cause

JDBC version 3.50.JC1 introduced the following APAR:

IC49877 - 
JDBC DRIVER ALLOWS INSERTION OF INVALID CHARACTERS FOR CHARACTER SET. 
http://www-01.ibm.com/support/docview.wss?uid=swg1IC49877 

While the behavior for this APAR is correct, it might lead to problems for those customers who use Native Multibyte data on the application side but store them into en_us locale on the server side. 

In some case, for customer who are using zh_tw.big5 locale on the server side; but stored illegal characters in the table will also be effected.


Diagnosing the problem

Error message happened right after upgrade.

Resolving the problem

Since version JDBC 3.50.JC5, user can set a flag IFX_USE_STRENC to switch to old style of encoding.


Here's an example on how to use it: 
"jdbc:informix-sqli://inst:port:dbname:informixserver=XXX;user=informix;password=XX;DB_LOCALE=en_us.819;IFX_USE_STRENC=true;"



http://www-01.ibm.com/support/docview.wss?uid=swg21502902

728x90

+ Recent posts