'HDR' 태그의 글 목록

HDR

Client sessions may become blocked for processing on Primary during data transmission issue between the Primary and the HDR Secondary 2014.03.24
HDR ping timeout dive 2014.03.24
Comparison between Informix Enterprise Replication(ER/CDR) and MACH 11(HDR, SDS and RSS) 2011.06.23

Client sessions may become blocked for processing on Primary during data transmission issue between the Primary and the HDR Secondary

@ipajama 2014. 3. 24. 09:26

2014. 3. 24. 09:26

Problem(Abstract)

When there is a data transmission issue between Primary and HDR Secondary (usually caused by a network problem) the client applications which are working with the Primary may become blocked and may look hung even if the data replication is configured to be asynchronous (DRINTERVAL > 0).

Symptom

Once the ping timeout is written to the online.log file of the Primary/Secondary instance (see the sample output below), user sessions return to normal work.

11:27:57 DR: ping timeout
11:27:57 DR: Receive error
11:27:57 ASF Echo-Thread Server: asfcode = -25582: oserr = 4: errstr =
: Network connection is broken.
11:27:57 DR_ERR set to -1
11:27:59 DR: Turned off on primary server

Cause

When data replication is established, primary and secondary regularly exchange ping messages. If the ping acknowledge is not received by the time when DRTIMEOUT is elapsed, a server re-sends ping message three more times and then reports ping timeout and turns off the DR subsytem. From this, the time span between first ping and the "DR: ping timeout" message can be as large as (DRTIMEOUT x 4).

For example, if DRTIMEOUT is set to be 180 second, it will take 12 minutes before DR is turned off.

Scenario #1:
Although with asynchronous replication transactions do not wait for acknowledgement from HDR secondary after the logical log record was put in DR buffer, when there is a transmission failure, the DR buffer may fill up pretty quickly (the time required for that depends on DRTIMEOUT value, LOGBUFF value and the activity that the instance is having). Until DR is not turned off, a user session has to wait until DR buffer has enough space for the logical log record.

Scenario #2:
In addition to the above scenario, a checkpoint can be requested on Primary between the first ping failure and the time when the "DR: ping timeout" message is reported. The checkpoints are synchronous between Primary and Secondary regardless of the DRINTERVAL value. once checkpoint is requested, it will prevent any threads from entering the critical section. The instance will remain blocked until checkpoint acknowledgment is received from the Secondary or until DR is turned off.

Diagnosing the problem

For scenario #1 check if the corresponding user thread demonstrates a stack similar to the following:

Stack for thread: 73 sqlexec
base: 0x0700000011abc000
len: 69632
pc: 0x00000001000370f4
tos: 0x0700000011acafe0
state: sleeping
vp: 8

0x00000001000370f4 (oninit)yield_processor_mvp
0x0000000100041f30 (oninit)mt_yield
0x000000010076a5ac (oninit)cdrTimerWait
0x0000000100716908 (oninit)dr_buf_deq_int
0x00000001001fe3c0 (oninit)dr_logcopy
0x00000001001f2d0c (oninit)logwrite
0x000000010011b7c4 (oninit)log_put
0x0000000100121384 (oninit)logm_write
0x00000001001f3e68 (oninit)logputx
0x000000010017137c (oninit)rscommit
0x000000010022b70c (oninit)iscommit
0x00000001002865e4 (oninit)sqiscommit
0x0000000100533d38 (oninit)committx
0x0000000100536480 (oninit)commitcmd
0x000000010053b01c (oninit)excommand
0x000000010042893c (oninit)sq_execute
0x000000010026becc (oninit)sqmain
0x00000001002d51a4 (oninit)listen_verify
0x00000001002d33b8 (oninit)spawn_thread
0x0000000100e0b59c (oninit)startup

For scenario #2 check the 'onstat -g ath' output and see if the user threads are having "cond wait cp" status.

Resolving the problem

To resolve the problem it may be required to:

1) Fix any problems that can cause data transmission issues between Primary and HDR Secondary (e.g. increase network reliability and throughput)

2) Decrease the value of DRTIMEOUT configuration parameter.

Note: increasing the LOGBUFF may also help to reduce the blockage time, however having a large logical log buffer may result in data loss in case of the Primary failure.

http://www-01.ibm.com/support/docview.wss?uid=swg21643957

저작자표시 비영리 변경금지 (새창열림)

'Informix > informix troubleshooting' 카테고리의 다른 글

Tuning Kernel Asynchronous I/O (KAIO) for IBM Informix on HP-UX (0)	2014.03.26
Cannot start Informix with error size of resident + virtual segments (0)	2014.03.24
HDR ping timeout dive (0)	2014.03.24
FAILED: Fetch statement failed: Encoding or code set not supported. error -79783 (0)	2014.03.07
Identifying Locks from Updatable Secondary Servers (0)	2014.02.24

HDR ping timeout dive

@ipajama 2014. 3. 24. 09:24

2014. 3. 24. 09:24

Problem(Abstract)

Sometime you can get "ping timeout" and "send error" in online.log,and check network environment,which are all normal.Last HDR relation had been broken.Why do it occur ?

Symptom

ping timeout,received error,send error

Cause

ping timeout will occur if "DR_MSG_PING" can't flow between primary and secondary,or ack duration exceed 4*DRTIMEOUT."ping timeout" is a message type in DR BUFFER QUEUE,so it require waiting for dr buffer space,therefor PING TIMEOUT maybe occur due to dr buffer is full or its priority is too lower than logical log buffer.

Logical log buffer can't be transfer maybe lead to the 'ping timeout'.

Environment

HDR environment

Diagnosing the problem

DR BUFFER size is same as logical log buffer,and "DR_MSG_PING" save in dr buffer,so we can configure LOGBUFF to adjust DR BUFFER.

Primary server send logical log to Secondary server to keep consistent data as following description.
1.primary : logical log buffer -> dr buffer
2.primary : dr_prsend thread send these logical log to dr buffer on secondary server across network using TCP/IP.
3.secondary:dr_secrecv thread received those logical log in secondary.

HDR primary server and secondary server will ping each other and must waiting for a acknowledgment during a appointed times ,otherwise HDR relation will be broken due to "ping timeout" error .The ack duration is 4 times as DRTIMEOUT value.

Resolving the problem

To avoid "ping timeout" occur according to following mention.

1.Increasing LOGBUFF value to adjust dr buffer size and lay more signal message.
2.Secondary server hang maybe lead to the "ping timeout" due to dr buffer can't be received immediately or DR_MSG_PING lower priority.
3.Long checkpoint duration in secondary server.

http://www-01.ibm.com/support/docview.wss?uid=swg21413380

저작자표시 비영리 변경금지 (새창열림)

'Informix > informix troubleshooting' 카테고리의 다른 글

Cannot start Informix with error size of resident + virtual segments (0)	2014.03.24
Client sessions may become blocked for processing on Primary during data transmission issue between the Primary and the HDR Secondary (0)	2014.03.24
FAILED: Fetch statement failed: Encoding or code set not supported. error -79783 (0)	2014.03.07
Identifying Locks from Updatable Secondary Servers (0)	2014.02.24
Java program getting "OutOfMemory" when handling blobs. (0)	2014.02.21

Comparison between Informix Enterprise Replication(ER/CDR) and MACH 11(HDR, SDS and RSS)

@ipajama 2011. 6. 23. 22:33

2011. 6. 23. 22:33

Informix Enterprise Replication(CDR)	Informix MACH11 (HDR, SDS and RSS)
Replication granularity is at table and column level	Replication granularity is at the instance level
Supports hierarchical routing. (Supports root, non-root and leaf servers).	All secondary servers has to be directly connected to primary server
Supports update anywhere, data consolidation and data dissimination models	Secondary servers are read-only servers and can be used only for reporting activity
Supports hetorogenious replication. i.e ER can replicate data between 11.10 and 7.31 servers	Primary and secondary server version has to be same
ER servers doesn't have to be running on the same operating system platform	Hardware platform has to be same between primary and all secondary servers
ER needs primary key for all the replicated tables	Doesn't need primary key for replication
Can co-exist in MACH11 environment	Can co-exist with ER
Database must be logging database	Database must be logging database
Supports blob space blobs along with smartblobs and partition blobs	Doesn't support blobspace blobs. Supports smartblobs and partition blobs
Source and target must use the same code set for replicated tables	Database codeset must be same between primary and secondary servers
Supports network encryption	Supports network encryption
Supports compression before transmitting data through network	Doesn't support data compression

http://www.inturi.net/coranto/viewnews.cgi?id=EkpuAFpkkyOKeKEaaY

저작자표시 비영리 변경금지 (새창열림)

'Informix > informix reference' 카테고리의 다른 글

The archecker Schema Reference (0)	2011.08.31
Informix Pocket Guide (0)	2011.07.15
Salvaging Logical-log files using ontape (0)	2011.04.28
11.70 : 통계 정보 관련 (fragmnet-level 및 자동 통계 갱신) (0)	2011.04.28
IDS 11.7 Floating User 라이센스에 대한 설명 (0)	2011.04.17

PREV 이전 1 NEXT 다음

pajama

HDR

Client sessions may become blocked for processing on Primary during data transmission issue between the Primary and the HDR Secondary

Problem(Abstract)

Symptom

Cause

Diagnosing the problem

Resolving the problem

'Informix > informix troubleshooting' 카테고리의 다른 글

HDR ping timeout dive

Problem(Abstract)

Symptom

Cause

Environment

Diagnosing the problem

Resolving the problem

'Informix > informix troubleshooting' 카테고리의 다른 글

Comparison between Informix Enterprise Replication(ER/CDR) and MACH 11(HDR, SDS and RSS)

'Informix > informix reference' 카테고리의 다른 글

+ Recent posts

티스토리툴바