'Informix/informix troubleshooting' 카테고리의 글 목록 (8 Page)

Global Transaction이 정상적으로 종료되지 않았을 때 해결방법에 관한 포스트입니다. 사례가 상세해 소개해 드립니다.

(출처: comp.databases.informix newsgroup)

On my Informix instance I have two global transactions holding some locks. See onstat -G and onstat -x output below.
How can I get rid of them?
Also after a server reboot they are still there.
onmode -Z <address> or onmode -H <address> do not work (see below).

IBM Informix Dynamic Server Version 11.50.UC7IE on Solaris SPARC 10.

Global Transaction Identifiers
address flags isol timeout fID gtl bql data
56656bd8 -LX-G LC 0 131075 30 28 312D2D35336530656466663A613139383.....
56656e28 -LX-G LC 0 131075 30 28 312D2D35336530656466663A613139383.....
2 active, 128 total

Transactions

est.

address flags userthread locks begin_logpos current logpos isol rb_time retrys coord

56655018 A---- 56623018 0 - - COMMIT - 0

56655268 A---- 56623638 0 - - COMMIT - 0

566554b8 A---- 56623c58 0 - - COMMIT - 0

56655708 A---- 56624278 0 - - COMMIT - 0

56655958 A---- 56624898 0 - - COMMIT - 0

56655ba8 A---- 56624eb8 0 - - COMMIT - 0

56655df8 A---- 566254d8 0 - - COMMIT - 0

56656048 A---- 56625af8 0 - - COMMIT - 0

56656298 A---- 56626118 0 - - COMMIT - 0

566564e8 A---- 56626738 0 - - COMMIT - 0

56656738 A---- 56626d58 0 - - COMMIT - 0

56656988 A---- 56627378 0 - - COMMIT - 0

56656bd8 -LX-G 0 2 logid 303 303:0x64b2622c COMMIT 2:58 0

56656e28 -LX-G 0 2 logid 303 303:0x64b2922c COMMIT 2:58 0

56657078 A---- 56627fb8 0 - - COMMIT - 0

566572c8 A---- 56627998 0 - - COMMIT - 0

56657e58 A---- 56629218 0 - - COMMIT - 0

위의 예와 같이 flags 값이 '-LX-G'인 트랜잭션에는 userthread 정보가 없습니다. 이러한 트랜잭션을 orphaned transaction이라고 하는데 userthread가 트랜잭션에서 떨어져나가 멈춰있는 상태입니다. 만약에 이러한 트랜잭션이 Logical Log를 사용중인 채로 멈춰버리면 Long Transaction 상태에 빠질 수도 있습니다.

> onmode -Z 1449487912
onmode: Cannot kill transaction 0x56656e28.
Only I-STAR subordinates that are PREPARE'd or HEURISTICally ABORT'd
may be heuristically completed.

> onmode -H 1449487912
onmode -H may only be used to kill heuristically completed transactions.

여기서는 address를 십진수로 명시해서 onmode를 실행했습니다. onmode -Z (또는 -H) 0x56656e28 같은 형식으로 실행해도 됩니다.

포스트 스레드를 읽다보면 다음과 같은 내용이 있습니다.

Try onmode -l until you have forced a long transaction, this usually
help us. We have requested a way to get rid of these, but it seems
that the view of IBM that the TM should handle this i all cases.
Problem is that the TM fails sometimes, and then you are stuck.

Regards

Ulf

처음 질문했던 Frank의 경우에는 로그 스위칭이 자연스럽게 발생하면서 Long Transaction이 유발되었습니다. 이는 LTXHWM 구성변수 설정값에 따라 달라지는데 기본값은 70입니다. 만약 Logical Log가 10개라면, 한 트랜잭션이 7번의 로그 스위칭이 일어나는 동안 완료되지 않으면 일반적으로 Long Transaction이 유발됩니다.

Today, the transactions were rolled back by the server:

23:42:09 Aborting Long Transaction: tx: 0x56656bd8 no user info due to XA or distributed (2-phase commit) transaction
23:42:09 Aborting Long Transaction: tx: 0x56656e28 no user info due to XA or distributed (2-phase commit) transaction
23:42:13 Checkpoint Completed: duration was 4 seconds.
23:42:13 Thu Jul 1 - loguniq 306, logpos 0x3fff33b4, timestamp: 0xdedfaa19 Interval: 5616

23:42:13 Maximum server connections 15
23:42:13 Checkpoint Statistics - Avg. Txn Block Time 0.000, # Txns blocked 1, Plog used 83432, Llog used 182271

23:42:13 Session completed abnormally. Rolling back tx id 24, flags 0x108463b
23:42:13 Session completed abnormally. Rolling back tx id 32, flags 0x108463b
23:42:13 Long Transaction 0x56656e28 Aborted. Rollback Duration: 0 Seconds
23:42:14 Long Transaction 0x56656bd8 Aborted. Rollback Duration: 0 Seconds

onmode -l로 수동으로 로그 스위칭을 발생시키거나 로그가 쌓이면서 LTXHWM 한계치를 넘을 경우 위와 같이 처리됩니다. 일반적으로는 위와 같이 처리되었을 때 인스턴스나 다른 트랜잭션에는 영향이 없습니다.

onmode -H에 대해서는 공식적인 문서를 찾기는 어렵습니다. 다만 에러메시지로 유추해보건데 (onmode -H may only be used to kill heuristically completed transactions.) 휴리스틱 트랜잭션에만 적용되는 것으로 보입니다.

제 경험상으로는 onstat -G에서 flags 값이 다음과 같은 경우에 사용합니다.

700000379440490 -TH-G 0 0 108002:0xaf7a018 108072:0xd068 DIRTY 26:56 0

70000038ad89b48 -TH-G 0 0 107898:0x4524018 107968:0x6a018 COMMIT 0:00 0

70000038c7ccfa0 -TH-G 0 0 107993:0x14781050 108063:0x9404 DIRTY 32:58 0

70000038c7e1160 -TH-G 0 0 108002:0x2dea018 108072:0x27678 DIRTY 27:10 0

70000038d69e178 -TH-G 0 0 108002:0x39e018 108072:0xe118 DIRTY 27:14 0

flags 값이 '-TH-G' 상태인 경우 LTXHWM 한계치를 넘어도 트랜잭션이 제거되지 않습니다. 이러한 경우에는 부득이 -H 옵션으로 제거할 수는 있으나 Infocenter나 onmode 명령에 -H 옵션에 대한 설명이 없으므로 사용에 주의해야 합니다.

http://www.iiug.org/forums/ids/index.cgi/noframes/read/2975

https://groups.google.com/forum/?hl=en&fromgroups=#!topic/comp.databases.informix/egoEcVkTmbY

https://www.ibm.com/support/knowledgecenter/SSGU8G_11.50.0/com.ibm.admin.doc/ids_admin_1066.htm

'Informix > informix troubleshooting' 카테고리의 다른 글

Unloading around table corruption (0)	2013.03.08
Is there an alternate way to install IBM Informix Dynamic Server? (0)	2013.03.02
Error "log_put( OLDRSAM:66, 35052): log record too long" (0)	2013.02.13
slow ontape backup or archive of Informix database server instance (0)	2013.01.16
Corrupt Page During an Archive (0)	2013.01.03

Problem(Abstract)

IBM® Informix Server™ may Assert Failed showing messages such as "log_put( OLDRSAM:66, 35052): log record too long" in the with Assert Failed File and/or "Checkpoint log record may not fit into the logical log buffer. Recommended minimum value for LOGBUFF is 36" in the online.log file.

Resolving the problem

PROBLEM

There is an undocumented limitation in IBM® Informix Server™ that sets the maximum number of transactions opened at checkpoint time. If this limit is exceeded, the instance will stop

If the number of open transactions open at checkpoint time exceeds certain number (This number is dependant of the Logical Log Buffer Size), the instance will stop. You can check that the Assert Failure File will show this message:

14:51:48 log_put( OLDRSAM:66, 35052): log record too long

logrec

Bc87a2044: 000088ec 00000042 00000010 00000000 .......B ........

Bc87a2054: 00000001 01c72018 09bae247 00000000 ...... . ...G....

Also you will notice some warnings in the online.log before the instance stops with the following messages:

15:24:55 Checkpoint log record may not fit into the logical log buffer.

Recommended minimum value for LOGBUFF is 36.

CAUSE

Informix Server uses the Logical Log Buffer (LLB) to write logical log records (LLR) to the Logical Log File. A logical Log record cannot be bigger than the LLB, if such a situation is found, the engine will abort to prevent an incomplete write to the Logical Log File.

Usually, LLR are very small compared with a 32KB buffer size (The default LLB size), however the size of a checkpoint record depends on the number of transactions opened at checkpoint time. In version 9.4 the space required by a Checkpoint Logical Log Record will be :

36 + 24xN bytes

where N is the number of transactions open at that moment.

A 32KB Logical Log Buffer size will allow for only 1363 open transactions at checkpoint time. If that number is exceeded, the engine will crash.

SOLUTION

Increase the Logical Log Buffer to a higher value. The formula given above is for version 9.4 but can be used for any version as the sizes of the structures do not change that much.

http://www-01.ibm.com/support/docview.wss?uid=swg21217145

저작자표시 비영리 변경금지 (새창열림)

'Informix > informix troubleshooting' 카테고리의 다른 글

Is there an alternate way to install IBM Informix Dynamic Server? (0)	2013.03.02
how to terminate global transactions? (0)	2013.02.13
slow ontape backup or archive of Informix database server instance (0)	2013.01.16
Corrupt Page During an Archive (0)	2013.01.03
Database Server Blocked in Checkpoint (Blocked:CKPT) due to disabled dbspace (0)	2012.07.05

Problem(Abstract)

You run an ontape backup but it runs too slowly.

Cause

TAPEBLK is set too low in the onCONFIG to take full advantage of the device's I/O throughput potential.
The actual I/O speed is lower than the expected I/O speed of the devices used.
Not enough resources available on computer like CPU, memory, and so on. The system might be busy.

Diagnosing the problem

Run a timed test to determine approximate speed of I/O from a chunk to the backup device . Run the dd command with a block size (bs) equal to the block size value for TAPEBLK in the onCONFIG. Try to use a chunk that is at least 2 GB.

Note: This test might not work on a running instance because of an OS lock on the chunk that will not allow the dd command to run. If you cannot run the dd command on an actual chunk then you must use or create a file which resides in the same location as the chunks ( the same directory, same disk, same NFS mount, same SAN, and so on ). This is critical for this test to yield accurate timings.

Here is the command:

timex dd if=/full/path/to/informix/chunk of=/full/path/to/backup_directory/timetest.out bs=128k

Here is an actual example:

timex dd if=/informix/chunks/rootdbs of=/informix/backups/timetest.out bs=256k
586+1 records in
586+1 records out

real 7.76
user 0.00
sys 0.48

This dd test shows the speed for this backup would be about 18.9 MB/second or 66.4 GB/hour. This would be the approximate speed of ontape I/O with some consideration for ontape overhead.

Capture and review the ontape function stacks to find out exactly what ontape is doing. Soon after you start the ontape backup you should see these 3 threads listed in the onstat -g ath output:

ontape
arcbackup1
arcbackup2

The output will be similar to this:

...
213 41dac918 40347be0 1 cond wait netnorm 1cpu ontape 214 41980a28 40348c50 1 IO Wait 1cpu arcbackup1 215 41980c88 40344a90 2 sleeping secs: 1 1cpu arcbackup2

To get the stack trace from each thread you will need to 3 separate commands. The first column of the onstat -g ath output is the thread id. Using the example output from above you would run these commands to continuously capture stack information for each of the 3 threads every 5 seconds:

onstat -g stk 213 -r > ontape.out
onstat -g stk 214 -r > arcbackup1.out
onstat -g stk 215 -r > arcbackup2.out

You can run these in the background or in 3 separate windows. Stop them when the backup is complete.

Resolving the problem

Increase the value of the onCONFIG configuration parameter TAPEBLK. If using a tape device, then use the device's suggested maximum block size. If writing the backup to disk, then continue increasing TAPEBLK on subsequent backups until you reach the point where increasing TAPEBLK does not decrease time to complete the backup ( you could start at 128 then go to 256, 512, 1024, and so on ).

Use faster backup device or find ways to increase I/O of that device. Some examples:
- use a faster tape drive or a tape that can handle higher I/O speeds
- backup to local disk then ftp the backup to a SAN

Collect ontape thread stack traces and analyze or send to technical support for analysis.

http://www-01.ibm.com/support/docview.wss?uid=swg21503185

저작자표시 비영리 변경금지 (새창열림)

'Informix > informix troubleshooting' 카테고리의 다른 글

how to terminate global transactions? (0)	2013.02.13
Error "log_put( OLDRSAM:66, 35052): log record too long" (0)	2013.02.13
Corrupt Page During an Archive (0)	2013.01.03
Database Server Blocked in Checkpoint (Blocked:CKPT) due to disabled dbspace (0)	2012.07.05
Tuning Kernel Asynchronous IO (KAIO) for IBM Informix on AIX (0)	2012.06.16

The message Archive detects that page is corrupt indicates that page validation failed.

During an archive, the database server validates every page before writing it to the archive device. This validation checks that the elements on the page are consistent with the expected values. When a page fails this validation, a message similar to the following is written to the online.log file:

15:06:37  Assert Failed: Archive detects that page 0xc00021 is corrupt.
15:06:37  IBM Informix Dynamic Server Version 9.40.UC1
15:06:37   Who: Session(25, informix@cronus, 67612, 1085259772)
                Thread(50, arcbackup1, 40acf758, 4)
                File: rsarcbu.c Line: 2549
15:06:37  stack trace for pid 67367 written to /tmp/af.41ad7b9
15:06:37   See Also: /tmp/af.41ad7b9

below 9.40 version (Expanded chunk capacity mode: disabled)

16:27:49 Assert Warning: Archive detects that page 1:10164 is corrupt. 
16:27:49 IBM Informix Dynamic Server Version 11.50.FC7
16:27:49 Who: Session(5, informix@cronus, 23467, 10a921048)
Thread(40, arcbackup1, 10a8e8ae8, 1)
File: rsarcbu.c Line: 2915
16:27:49 stack trace for pid 23358 written to /tmp/af.41043f4
16:27:49 See Also: /tmp/af.41043f4
16:27:49 Archive detects that page 1:10164 is corrupt.
16:27:50 Archive on rootdbs Completed with 1 corrupted pages detected.

above 10.00 version (Expanded chunk capacity mode: enabled, always)

The page number is printed in hexadecimal. The format for page number is 0xCCCPPPPP where CCC represents the chunk number, and PPPPP represents the page number. For this example, the corrupted page is in chunk 0xc (12 decimal) and page 0x21 (33 decimal). The archive aborts after detecting 10 corrupt pages. The online.log file displays the full error message, including the page address, for the first 10 errors. Subsequently, only the count of the number of corrupt pages is put in to the online.log.

When you receive this message, identify which table the corrupt page belongs to by examining the output of the oncheck –pe command. (해당 정보는 sysmaster:sysextents 테이블에서도 확인할 수 있습니다.) To determine the extent of the corruption, execute the oncheck –cID command for that table.

A corrupt page is saved onto the backup media. During a restore, the corrupt page is returned in its corrupt form. No errors messages are written to the online.log when corrupt pages are restored, only when they are archived.

http://publib.boulder.ibm.com/infocenter/idshelp/v10/index.jsp?topic=/com.ibm.bar.doc/barmst369.htm

http://publib.boulder.ibm.com/infocenter/idshelp/v115/index.jsp?topic=%2Fcom.ibm.bar.doc%2Fids_bar_055.htm

http://www-01.ibm.com/support/docview.wss?uid=swg21209842

저작자표시 비영리 변경금지 (새창열림)

'Informix > informix troubleshooting' 카테고리의 다른 글

Error "log_put( OLDRSAM:66, 35052): log record too long" (0)	2013.02.13
slow ontape backup or archive of Informix database server instance (0)	2013.01.16
Database Server Blocked in Checkpoint (Blocked:CKPT) due to disabled dbspace (0)	2012.07.05
Tuning Kernel Asynchronous IO (KAIO) for IBM Informix on AIX (0)	2012.06.16
Prevent and prepare for assertion failures (0)	2012.06.06

Problem(Abstract)

How to unblock the database server after a dbspace has been disabled and the onDBSPACEDOWN configuration parameter is set to 2.

Symptom

The database server is blocked with a checkpoint request that never finishes. User sessions are only able to perform read operations and any update, delete or insert operations will hang.

When you run the onstat - command, the output shows that the database server is Online, but blocked on a checkpoint request:

There is a message in the online.log file that says:

The output of the onstat -d command shows one or more dbspaces marked as down (indicated by a D in the flags column). Any dbspace marked as down will have all its chunks marked as down as well.

Example

The following output shows that the dbspace test has been marked as down and its associated chunks (numbers 5 and 6) have also been marked as down. The database server is waiting on a checkpoint request - on-Line (CKPT REQ) :

(right click the image for a larger view using your browser)

Cause

When one or more dbspaces become unavailable and the configuration parameter ONDBSPACEDOWNis set to 2, the database server will prevent any further updates to the database and waits for intervention from the database administrator.

There are two reasons why the database server will mark a dbspace as down:

1. The database server has been unable to access the primary chunk of a dbspace and its mirror chunk either does not exist or is not accessible.

2. The database server has detected corruption within the chunk. The database server will mark the chunk as down to help preserve the integrity of the database.

Resolving the problem

There are two possible solutions available.

Run onmode -O (Override)

The database server will issue a checkpoint and will then be available for use. The down dbspaces will continue to be disabled and will only become available if they are restored from archive.

Important:

This action can not be reversed as information about down dbspaces is written to the reserved pages of the database server.

1. There will be a message on the screen after issuing the

onmode -O

command:

2. Press the keys

y+Enter

to confirm or press the keys

n+Enter

to cancel.

Run onmode -ky (Shutdown)

The database server will stop without issuing a checkpoint and information about down dbspaces will not be written to the reserved pages of the database server.

There will be a message on the screen after issuing the onmode -ky command (except for versions 9.21x and 7.30x):

There will be also a message in the online.log (except for versions 9.21x and 7.30x):

The onmode -ky option should be used if the database server has been unable to access the chunks (marked as down) due to an operating system error or hardware problem. You must take corrective action to ensure that the database server can access all of the allocated chunks . You must ensure that there has been no modification or corruption of the data pages within these chunks. When the database server is restarted, any chunks that were previously marked as down will be online again.

http://www-01.ibm.com/support/docview.wss?uid=swg21153098

저작자표시 비영리 변경금지 (새창열림)

'Informix > informix troubleshooting' 카테고리의 다른 글

slow ontape backup or archive of Informix database server instance (0)	2013.01.16
Corrupt Page During an Archive (0)	2013.01.03
Tuning Kernel Asynchronous IO (KAIO) for IBM Informix on AIX (0)	2012.06.16
Prevent and prepare for assertion failures (0)	2012.06.06
Correcting assertion failures (AF) (0)	2012.06.06

Question

This article goes over the environment variable that can be used to change the default resources used by the engine for Kernel Asynchronous I/O (KAIO) on the IBM AIX platform.

Answer

If you seem to have sluggish performance during peak I/O usage times or see a message in the online.log that says:

KAIO out of OS resources errno = 11

or

*** _aiowrite() returning EAGAIN ***

Then the Kernel Asynchronous I/O (KAIO) subsystem may need to be tuned. The IBM Informix engines do this thru an Operating System (OS) structure called an Asynchronous IO Control Block (AIOCB). A certain number of AIOCB's are allocated at initialization time and reused as long the instance is online. If the number of AIOCB's is low, performance will be negatively impacted because the engine will be forced to wait until an AIOCB becomes available.

To change the number of AIOCB's defined, set the environment variable IFMX_AIXKAIO_NUM_REQ to the desired number, then shutdown and restart the database engine. If the value of IFMX_AIXKAIO_NUM_REQ is greater than 4096, then 4096 AIOCB's will be allocated. If the value of IFMX_AIXKAIO_NUM_REQ is less than 128, then 128 AIOCBs will be allocated. If IFMX_AIXKAIO_NUM_REQ is not set, then the default value is 1024. These values may change in future versions of the OS or engines.

Example:

To lower the number of concurrent KAIO requests to 512 from the default of 1024 :

$ export IFMX_AIXKAIO_NUM_R

EQ=512

Tuning Kernel Asynchronous IO (KAIO) for IBM Informix on AIX

http://www-304.ibm.com/support/docview.wss?uid=swg21083472

IC50164: KAIO: OUT OF RESOURCES, ERRNO 11 REQUIRES INCREASE IN IFMX_AIXKAIO_NUM_REQ on AIX FROM 4K TO 8K

http://www-304.ibm.com/support/docview.wss?uid=swg1IC50164

저작자표시 비영리 변경금지 (새창열림)

'Informix > informix troubleshooting' 카테고리의 다른 글

Corrupt Page During an Archive (0)	2013.01.03
Database Server Blocked in Checkpoint (Blocked:CKPT) due to disabled dbspace (0)	2012.07.05
Prevent and prepare for assertion failures (0)	2012.06.06
Correcting assertion failures (AF) (0)	2012.06.06
Using PDQ excessively impacts the performance of database server (0)	2012.04.27

You can perform routine administrative tasks that can prevent assertion failures or help prepare for recovering from assertion failures.

To prevent assertion failures on Informix®, perform the following tasks regularly:

Table 1. Preventing assertion failures.
Prevention task	Purpose
Test your applications thoroughly in a realistic environment before releasing your applications into production.	Applications can trigger assertion failures. Make sure that you have allocated enough resources: for example, physical and logical logs and shared memory.
Check and repair the consistency of your data. Run the following oncheck commands to check your system: oncheck -cD: Data pages oncheck -cI: Index pages oncheck -cr: Reserved pages oncheck -ce: Chunk-free list oncheck -cc: System catalogs oncheck -pe: Physical information about chunks	Diagnosing and repairing consistency problems quickly can prevent major problems like data loss and assertion failures.
Upgrade to the latest major and fix pack release.	Assertion failures can be caused by product defects. You can avoid many product defects by upgrading to the most current release.
Update statistics. When you change a table or indexing schema, update statistics to update the dictionary cache. By default, statistics are updated automatically by the Scheduler. You can customize automatic statistics updating for your system.	Queries that use stale statistics can use too many resources and cause assertion failures. Stale statistics can also have a negative affect on query performance. Always update statistics after upgrading.
Monitor and complete in-place alter operations. You can view pending in-place alter operations in the sysadmin:ph_alerttable or the Health Center > Alerts screen in the IBM® OpenAdmin Tool (OAT) for Informix. You can force the completion of in-place alter operations for a specific table by running the following dummy UPDATE statement: `UPDATE table_name SET column_one = column_one WHERE 1=1;`	Pending in-place alter operations can use too many resources and cause assertion failures. Complete in-place alters before upgrading.

To prepare for assertion failures on Informix, perform the following tasks regularly:

Table 2. Preparing for assertion failures.
Preparation task	Purpose
Back up your data and logical logs. You can use the on-Bar utility or the ontape utility to back up your data and logical logs. The on-Bar utility requires a storage manager.	If data loss occurs, you can recover your data and recent transactions.
Back up your database and storage space schema information. Run the following commands to back up your schema information: dbschema -ss -d `database_name` > dbschema_ss_`database_name`.out: Saves the schema information about the specified database in a file. dbschema -c > dbschema_c.out: Saves the commands to reproduce storage spaces, chunks, physical logs, and logical logs in a file.	In a catastrophic failure, you can recreate your system.

http://publib.boulder.ibm.com/infocenter/idshelp/v117/index.jsp?topic=%2Fcom.ibm.support.core.doc%2Fifx_ts_afprevent.htm

저작자표시 비영리 변경금지 (새창열림)

'Informix > informix troubleshooting' 카테고리의 다른 글

Database Server Blocked in Checkpoint (Blocked:CKPT) due to disabled dbspace (0)	2012.07.05
Tuning Kernel Asynchronous IO (KAIO) for IBM Informix on AIX (0)	2012.06.16
Correcting assertion failures (AF) (0)	2012.06.06
Using PDQ excessively impacts the performance of database server (0)	2012.04.27
Insert into a table is slow (0)	2012.04.27

An assertion failure occurs when the database server cannot continue normal processing and must shut down. You can correct some of the problems that cause assertion failures, such as disk issues. For other problems that cause assertion failures, you must contact IBM® Software Support.

If your production server is currently down, contact IBM Software Support immediately.

To correct an assertion failure:

Look at the online log for the assertion failure message.
The location of the online log is specified by the MSGPATH configuration parameter. Default:
- UNIX: $INFORMIXDIR/tmp/online.log
- Windows: %INFORMIXDIR%\online.log

Perform any action mentioned in the assertion failure message. For example, you might need to run an oncheck command or increase the size of the physical or logical log buffer.

The following table lists the common types of assertion failures and what you can do to correct the underlying problems.

Table 1. Common types of assertion failures.
Error	Cause	Solutions
bfcheck	Corrupted data page, index, or partition page	Run the oncheck command in the assertion failure message. If an index is corrupted, drop and recreate the index. If a partition page is corrupted, you might need to restore the data.
I/O write chunk	Not enough shared memory or a hardware problem	Look at your operating system logs for hardware problems or problems with other software. Sometimes other software can trigger an Informix® assertion failure. Fix any hardware or other software problems. Free up or increase shared memory for the database server.
No exception handler	Memory corruption	Look at your operating system logs for hardware problems or problems with other software. Fix any hardware or other software problems.
sigkill	The oninit daemon was stopped.	Restart the database server.
segv sigbus	Memory corruption An ill-behaved C user-defined routine A defect in the database server	Look at your operating system logs for hardware problems or problems with other software. Fix any hardware or other software problems. If you use C user-defined routines, check your code for problems. Make sure that your user-defined routines follow the guidelines for well-behaved routines. If you are not running the latest Informix fix pack, check the latest Fixed and known defects file to see whether this issue is fixed. Upgrade to the latest Informix fix pack.

Restart the database server. Some assertion failures can be solved by the database server during startup.

If the database server does not start or the problem recurs, collect the following files for IBM Software Support in addition to the standard list of information to gather.

Table 2. Files for support.
File	Location
assertion failure files: af.xxx, where xxx is the hexadecimal number associated with the failure	Specified by the DUMPDIR configuration parameter. Default: UNIX: $INFORMIXDIR/tmp Windows: %INFORMIXDIR%\tmp
sqlhosts file	Specified by the INFORMIXSQLHOSTS environment variable. Default: UNIX: $INFORMIXDIR/etc Windows: registry
onconfig file	Specified by the ONCONFIG environment variable. Default: UNIX: $INFORMIXDIR/etc Windows: %INFORMIXDIR%\etc
online log file	Specified by the MSGPATH configuration parameter. Default: UNIX: $INFORMIXDIR/tmp/online.log Windows: %INFORMIXDIR%\online.log
shared memory dump files:shmem.xxx, where xxx is the hexadecimal number associated with the failure	Specified by the DUMPDIR configuration parameter. Default: UNIX: $INFORMIXDIR/tmp Windows: %INFORMIXDIR%\tmp Generated only if the DUMPSHMEM configuration parameter is set to 1 or 2.

Contact IBM Software Support.

저작자표시 비영리 변경금지 (새창열림)

'Informix > informix troubleshooting' 카테고리의 다른 글

Tuning Kernel Asynchronous IO (KAIO) for IBM Informix on AIX (0)	2012.06.16
Prevent and prepare for assertion failures (0)	2012.06.06
Using PDQ excessively impacts the performance of database server (0)	2012.04.27
Insert into a table is slow (0)	2012.04.27
Locked record errors -244 and -107 or -245 and -144 when updating a record (0)	2012.03.30

Problem(Abstract)

The database server's performance becomes bad when there are many sessions accessing the server simultaneously.

Symptom

Query response time becomes longer and longer for all queries.

Cause

Overuse of PDQ feature.

Diagnosing the problem

Find there are many sessions were accumulated in PDQ queue.

onstat -g mgm

IBM Informix Dynamic Server Version 11.50.FC4 -- on-Line -- Up 18:04:23 -- 16821136 Kbytes

Memory Grant Manager (MGM)
--------------------------

MAX_PDQPRIORITY: 50
DS_MAX_QUERIES: 32000
DS_MAX_SCANS: 1048576
DS_NONPDQ_QUERY_MEM: 1024 KB
DS_TOTAL_MEMORY: 4096000 KB

Queries: Active Ready Maximum
205 42 32000

Memory: Total Free Quantum
(KB) 4096000 0 128

Scans: Total Free Quantum
1048576 1048374 1

Load Control: (Memory) (Scans) (Priority) (Max Queries) (Reinit)
Gate 1 Gate 2 Gate 3 Gate 4 Gate 5
(Queue Length) 1 0 41 0 0

Active Queries:
---------------
Session Query Priority Thread Memory Scans Gate
22 700000228709438 25 700000227c24028 0/0 1/1 -
42 70000022804fbf8 25 700000227ff5970 0/0 1/1 -
43 700000228a9fbf8 25 7000002280a7a30 0/0 1/1 -
44 700000228b53bf8 25 700000228b09028 0/0 1/1 -
45 700000228c08bf8 25 700000228bbd028 0/0 1/1 -
46 700000228cbcbf8 25 700000228c60310 0/0 1/1 -
47 700000228d70bf8 25 700000228d145b0 0/0 1/1 -
48 700000228e24bf8 25 700000228dc8850 0/0 1/1 -
49 700000228ed8bf8 25 700000228e7cb38 0/0 1/1 -
50 700000228f8cbf8 25 700000228eebd70 0/0 1/1 -
51 700000229041bf8 25 700000228ff6028 0/0 1/1 -
52 7000002290f5bf8 25 700000229099280 0/0 1/1 -
53 7000002291a9bf8 25 700000229151610 0/0 1/1 -
54 700000229205bf8 25 7000002291bb028 0/0 1/1 -
55 70000022930dbf8 25 70000022926f028 0/0 1/1 -
57 700000229422bf8 25 7000002293c6280 0/0 1/1 -
58 7000002294d6bf8 25 70000022947a460 0/0 1/1 -
59 70000022958abf8 25 70000022952e658 0/0 1/1 -
60 70000022963ebf8 25 7000002295e2850 0/0 1/1 -
......

Resolving the problem

1. Set PDQPRIORITY lower in application, especially in OLTP environment.

2. If no time to change the application, we can turn off the PDQ feature by setting MAX_PDQPRIORITY to 0 in onconfig or execute onmode -D 0 without need to restart the server.

Related information

PDQPRIORITY
MAX_PDQPRIORITY

http://www-01.ibm.com/support/docview.wss?uid=swg21409944

저작자표시 비영리 변경금지 (새창열림)

'Informix > informix troubleshooting' 카테고리의 다른 글

Prevent and prepare for assertion failures (0)	2012.06.06
Correcting assertion failures (AF) (0)	2012.06.06
Insert into a table is slow (0)	2012.04.27
Locked record errors -244 and -107 or -245 and -144 when updating a record (0)	2012.03.30
IC56169: REQUEST JDBC DRIVER TO PARSE SERVICENAME WHEN USING SQLHOSTS LOOKUP IMPLEMENTATION (0)	2012.03.21

Problem(Abstract)

You are inserting data into a table but the insert speed is slower than expected.

Cause

These are the common causes, there could be others:

Slow I/O ( Informix or disk )
Too many indexes
Indexes need to be reorganized

Here are the basic steps involved in an insert into a table with an index:

Find free space in the table for the row.
Insert row into table.
Read through index keys to find the slot for the new index key.
Insert new key into index. This leads to index page splits which

Steps 3 and 4 are repeated for each index. This process is slowed when:

indexes grow larger
index has many extents
duplicate values exist in the index

Diagnosing the problem

Slow I/O

Slow I/O can typically be seen by examining onstat -g ioq output or by using the dd command on UNIX/Linux.

onstat -g ioq:

Look at the maxlen column of onstat -g ioq. Generically speaking, if you use aio and you see maxlen at 25 or greater that indicates you need more aio vps.
dd ( UNIX/Linux ):

Determine the potential speed of the disk. Copy a file that is at least 1 GB in size to the disk using dd. If the dd copy is significantly slower than what is expected from the disk specifications then that could easily cause inserts ( and any other operations ) to run slowly. Run man dd for the syntax. Here is an example:

dd if=/path/to/1GB/file.test of=/path/to/slow/disk/file.test

Too many indexes

Are all the indexes used?
Could indexes be consolidated and queries rewritten in a way to use 1 index instead of 3?

Indexes need to be reorganized

If the index has many extents which are physically sitting all over on a disk there will be more time associated with I/O as the disk controller moves all over the disk. Use this query to return the number of extents a specific named index has:

SELECT a.dbsname,a.tabname,count(*) num_of_extents,sum(b.pe_size)

total_size

FROM systabnames a,sysptnext b

WHERE a.partnum = b.pe_partnum

AND a.tabname[1,3]!='sys'

AND a.tabname!='TBLSpace'

AND a.dbsname[1,3]!='sys'

and a.tabname ="index_name"

GROUP BY 1, 2

ORDER BY 3 desc,4 desc

More than 10 extents could indicate performance gains would be seen after a rebuild of an index.

Resolving the problem

Slow I/O

Add more aio vps ( see onmode -p, VPCLASS, NUMAIOVPS, AUTO_AIOVPS ) or troubleshoot slow disk I/O problems with the help of your system administrator.

Too many indexes

Drop and consolidate as many indexes as possible. Make sure you rewrite queries to take advantage of the new indexes. Here is an example:

Some of these indexes should be consolidated to reduce the number of indexes. Look at these 4 indexes on table slowio:

create index index1 on slowio (col4) using btree;

create index index2 on slowio (col1,col2,col4) using btree;

create index index3 on slowio (col1,col4) using btree;

create index index4 on slowio (col4,col2,col3) using btree;

Considerations for data types and uniqueness must be made. Assuming those were made the 4 indexes could be recreated as 1 index like this:

create index consolidate_idx on slowio (col4, col1, col2, col3) using btree;

You would likely have to rewrite some existing queries changing the order in the where clause to align with this index so it would be used. Here are the scenarios that would allow a query to use this index:

select ... where col4 ...;

select ... where col4 ... and/or col1 ...;

select ... where col4 ... and/or col1 ... and/or col2 ...;

select ... where col4 ... and/or col1 ... and/or col2 ... and/or col3 ...;

Not all of the columns in an index have to be queried to utilize an index but they do have to queried in the order of the index and you cannot skip a column to use a following column. Here are some scenarios
where the index would not be used:

select ... where col1 ...;

select ... where col3 ...;

select ... where col4 ... and/or col2 ...;

select ... where col4 ... and/or col3 ... and/or col2 ... and/or col1 ...;

Indexes need to be reorganized

Drop and recreate the index.

http://www-01.ibm.com/support/docview.wss?uid=swg21578714

저작자표시 비영리 변경금지 (새창열림)

'Informix > informix troubleshooting' 카테고리의 다른 글

Correcting assertion failures (AF) (0)	2012.06.06
Using PDQ excessively impacts the performance of database server (0)	2012.04.27
Locked record errors -244 and -107 or -245 and -144 when updating a record (0)	2012.03.30
IC56169: REQUEST JDBC DRIVER TO PARSE SERVICENAME WHEN USING SQLHOSTS LOOKUP IMPLEMENTATION (0)	2012.03.21
Cannot install 32 bit application on a 64 bit Windows machine (0)	2012.03.18