728x90

안녕하세요. 인포믹스 12.10.FC12, 12.10.FC12W1, 14.10.FC1 버전에서 발생하는 문제를 소개드립니다.


문제가 발생한 이후부터는 인포믹스 프로세스가 시작되지 않으므로 매우 치명적인 결함입니다.


문제가 발생할 수 있는 조건은 아래와 같습니다.

1. 해당 테이블에 VARCHAR 컬럼을 추가한다.

2. 추가한 VARCHAR 컬럼의 사이즈를 축소한다.


위와 같은 상황에서 UPDATE 문장을 실행하면 오류가 발생하면서 인포믹스 프로세스가 다운됩니다.

그러나 항상 다운되는 것은 아니고 대상 테이블에 저장된 데이터 길이에 따라 다운될 수도, 다운되지 않을 수도 있습니다.

아래는 인포믹스 12.10.FC12와 14.10.FC1 버전의 환경에서 다운되는 현상을 재현했을 때의 로그 메시지입니다.


## 12.10.FC12W1

11:20:01  Assert Failed: Buffer modified in inconsistent chunk.

11:20:01  IBM Informix Dynamic Server Version 12.10.FC12W1WE

11:20:01   Who: Session(87, informix@pilma01, 39452766, 700000020593858)

                Thread(222, sqlexec, 700000020553068, 10)

                File: rsdebug.c Line: 1047

11:20:01   Results: Chunk 13 is being taken OFFLINE.

11:20:01   Action: Restore space containing this chunk from the archive.

11:20:01  stack trace for pid 36175890 written to /work2/ifx1210fc12w1we/tmp/af.4c6a350

11:20:02   See Also: /work2/ifx1210fc12w1we/tmp/af.4c6a350, shmem.4c6a350.0

11:20:10  Buffer modified in inconsistent chunk.

11:20:11  Assert Failed: INFORMIX-OnLine Must ABORT

        Critical media failure.

11:20:11  IBM Informix Dynamic Server Version 12.10.FC12W1WE

11:20:11   Who: Session(87, informix@pilma01, 39452766, 700000020593858)

                Thread(222, sqlexec, 700000020553068, 10)

                File: rsmirror.c Line: 2080

11:20:11  stack trace for pid 36175890 written to /work2/ifx1210fc12w1we/tmp/af.4c6a350

11:20:12   See Also: /work2/ifx1210fc12w1we/tmp/af.4c6a350

11:20:19  Thread ID 222 will NOT be suspended because

          it is in a critical section.

11:20:19   See Also: /work2/ifx1210fc12w1we/tmp/af.4c6a350

11:20:19  rsmirror.c, line 2080, thread 222, proc id 36175890, INFORMIX-OnLine Must ABORT

        Critical media failure..

11:20:19  Fatal error in ADM VP at mt_fn.c:14593

11:20:19  Unexpected virtual processor termination: pid = 36175890, exit status = 0x1.

11:20:19  PANIC: Attempting to bring system down


## 14.10.FC1DE

12:46:54  Assert Failed: Buffer modified in inconsistent chunk.

12:46:54  IBM Informix Dynamic Server Version 14.10.FC1DE

12:46:54   Who: Session(46, informix@ifxdb1, 62804, 0x4526fbc8)

                Thread(55, sqlexec, 4522d8c8, 1)

                File: rsdebug.c Line: 908

12:46:54   Results: Chunk 1 is being taken OFFLINE.

12:46:54   Action: Restore space containing this chunk from the archive.

12:46:54  stack trace for pid 62571 written to /opt/IBM/Informix_Software_Bundle/tmp/af.41fb7ad

12:46:54   See Also: /opt/IBM/Informix_Software_Bundle/tmp/af.41fb7ad, shmem.41fb7ad.0

12:46:57  Buffer modified in inconsistent chunk.

12:46:58  Assert Failed: INFORMIX-OnLine Must ABORT

        Critical media failure.

12:46:58  IBM Informix Dynamic Server Version 14.10.FC1DE

12:46:58   Who: Session(46, informix@ifxdb1, 62804, 0x4526fbc8)

                Thread(55, sqlexec, 4522d8c8, 1)

                File: rsmirror.c Line: 2062

12:46:58  stack trace for pid 62571 written to /opt/IBM/Informix_Software_Bundle/tmp/af.41fb7ad

12:46:58   See Also: /opt/IBM/Informix_Software_Bundle/tmp/af.41fb7ad

12:47:02  Thread ID 55 will NOT be suspended because

          it is in a critical section.

12:47:02   See Also: /opt/IBM/Informix_Software_Bundle/tmp/af.41fb7ad

12:47:02  Starting crash time check of:

12:47:02  1. memory block headers

12:47:02  2. stacks

12:47:02  Crash time checking found no problems

12:47:02  rsmirror.c, line 2062, thread 55, proc id 62571, INFORMIX-OnLine Must ABORT

        Critical media failure..

12:47:02  The Master Daemon Died

12:47:02  PANIC: Attempting to bring system down



관련 내용으로 IBM에 기술지원을 요청했었습니다.

해당 문제는 해외에서 발생한 사례가 이미 있었다고 하고 재현 시나리오도 구체적이진 않지만 존재했습니다.

관련 문제를 서술한 문서의 링크는 아래입니다.

http://www-01.ibm.com/support/docview.wss?crawler=1&uid=swg1IT27997


장애 재현시의 메시지 로그에서 보이는 assert failure file 내용을 살펴보니 해당 사용자 스레드의 stack trace를 확인할 수 있었습니다.


0x00000001000af9cc (oninit)afstack

0x00000001000aeb5c (oninit)afhandler

0x00000001000af038 (oninit)affail_interface

0x00000001001b8844 (oninit)buffcheck

0x00000001002371a0 (oninit)buffput

0x0000000100b88640 (oninit)ckpgversion

0x0000000100b87af4 (oninit)rewrecord

0x0000000100b870ec (oninit)rsrewrec

0x000000010071ab00 (oninit)fmrewrec

0x00000001008382a0 (oninit)aud_sqisrewrec

0x0000000100d40a90 (oninit)doupdate

0x0000000100d3ff2c (oninit)chkrowcons

0x000000010114ea04 (oninit)dodmlrow

0x0000000101150eac (oninit)dodelupd

0x000000010083ee30 (oninit)aud_dodelupd

0x0000000100d1ec24 (oninit)excommand

0x00000001008c8590 (oninit)sq_execute

0x00000001008103ac (oninit)sqmain

0x00000001014d6898 (oninit)listen_verify

0x00000001014d530c (oninit)spawn_thread

0x0000000101482ae0 (oninit)th_init_initgls

0x00000001018f86e0 (oninit)startup


defect IT27997에 대해 서술한 내용과 상당히 유사한 stack trace 임을 알 수 있었습니다.

stack trace 중에서 ckpgversion function 부분이 문제인 듯 합니다.

그리고 12.10.FC12 버전부터 In-place alter 기능이 개선되었다고 알려져있는데, 이 개선으로 인해 defect가 발생한 것이 아닌가 생각도 듭니다. 물론 제 추측이기 때문에 정확한 원인으로 보기는 어렵습니다.

IIUG2018의 발표자료중 Jeff McMahon and Nick Geib가 발표한 What’s New in Informix 장표를 살표보면 VARCHAR 에서 VARCHAR 간 (smaller or larger) 사이즈를 변경하는 유형도 In-place alter 방식으로 이루어진다고 나와 있습니다.


아래는 위의 IT27997의 내용을 토대로 작성한 장애재현 스크립트 내용입니다.

테스트용 데이터는 60자 고정길이 필드 한개로 만드시면 재현이 쉽습니다.


drop table test;

create table test (a varchar(60));


load from test.unl insert into test;


alter table test add b int;

alter table test add c varchar(5);

alter table test add d varchar(5);

alter table test add e varchar(5);

alter table test modify c varchar(1);

alter table test modify d varchar(1);

alter table test modify e varchar(1);


update test set a=' qui officia deserunt mollit anim id est laborum.Lorem ip';


해당 문제는 12.10.FC12 및 12.10.FC12W1 에서 발생하는 문제이므로 해당버전을 사용중이라면 IBM에 패치버전을 요청하시거나 12.10.FC11 이하 버전으로 다운그레이드하면 문제가 발생하지 않습니다.


참고가 되시길 바랍니다.

728x90
728x90

Technote (FAQ)


Question

How to determine which tables require dummy updates in order to eliminate outstanding in-place alters before reversion to an earlier version?

Cause

In IBM Informix 11.70 there is a requirement for migration reversion: "In-place ALTER TABLE statements performed in the new version of the server must not be outstanding against any table."

Answer

1) Run 'oncheck -pt' command to identify which tables were altered in the current version. Qualifying tables must have tblspace flag 0x800000 set. For example:


    # oncheck -pt tdb:tab1 
    TBLspace Report for tdb:dmitriyr.tab1 

     Table fragment partition par1 in DBspace dbs1 

        Physical Address               2:5 
        Creation date                  06/25/2012 11:58:16 
        TBLspace Flags                 800802     Row Locking 
                                                  TBLspace use 4 bit bit-maps 
    <...>

As an alternative to 'oncheck -pt' you may run the following query against the Informix sysmaster database to find all qualifying tables: 
    SELECT ta.dbsname, ta.tabname, pt.partnum, HEX(pt.flags) flag 
    FROM sysmaster:systabnames ta, sysmaster:sysptnhdr pt 
    WHERE ta.partnum = pt.partnum 
    AND BIT_AND(HEX(pt.flags),'00800000'::BINARYVAR) = '00800000';

Sample result: 
    dbsname       tdb 
    tabname       tab1 
    partnum       2097154 
    flag          0x00800802

In above examples tblspace flag value is 0x800802. It is clear that value 0x800000 is set here. 

2) Run 'oncheck -pT <database>:<table>' against those tables identified in p.1 to find which tables have outstanding in-place alters. 

You must look at "Home Data Page Version Summary" section. For example: 
    <...> 
        Home Data Page Version Summary 

                     Version                                 Count 

                           0 (oldest)                           25 
                           1 (current)                          15 

    <...>

In the above example there are 25 pages to be altered to a new version. That means that a dummy update is required to be executed against that table. The steps to perform a dummy update is explained in detail within the migration guide:


http://www-01.ibm.com/support/docview.wss?uid=swg21599602

728x90
728x90

Technote (FAQ)


Question

How do you identify outstanding in-place alters in order to resolve them?

These are sometimes call incomplete, or "unrealised" in-place alters.

This document is specifically for IDS version 11 and later. For IDS versions 7 to 10,
see http://www-01.ibm.com/support/docview.wss?uid=swg21226410

Cause

Since version 7.24, some ALTER TABLE statements have been done as "fast alters", where rows are only altered to fit the new schema when the data page they are on is next written to. If there are many alters done, this can lead to the data pages for a table being a variety of versions. When migrating to a later version (or just for the performance benefit) it is a good idea to "complete" the outstanding in-place alters.

Answer

The first step is to work out which tables have data pages at old versions:

There are several documented ways to determine altered tables, including:

  • The migration guide has SQL to identify all tables with in place alters (chapter 3, page 52). However, some versions of the manual have SQL like this:

      -- first set OPTCOMPIND to 0 in onconfig and restart instance 
        set isolation to dirty read; 

      select pg_partnum + pg_pagenum - 1 partn 
          from sysmaster:syspaghdr, sysmaster:sysdbspaces a 
         where pg_partnum = 1048576 * a.dbsnum + 1 
           and pg_next != 0 
          into temp pp with no log ; 

      select b.dbsname database, b.tabname table 
          from sysmaster:systabnames b, pp 
         where partn = partnum ; 
    which shows all tables that ever had an in-place or fast alter, and does not exclude those even where every data page has been rewritten (and therefore updated) since the ALTER TABLE was run.
  • The oncheck -pT utility prints a table showing all current versions on data pages, and page counts for each version, but it runs serially, locking the whole table, and takes a long while to run because it reports on many other things as well.


As these methods have drawbacks, the following steps can be used to obtain a more specific list of outstanding in-place alters: 

1. Set OPTCOMPIND to 0 in the onCONFIG configuration file and restart the instance. 

2. Set isolation to dirty read. 

3. Find every partnum in the tblspace tbspace of each non-temp dbspace where pg_next (pointer to extended partition page) is set to something other than zero. These tablespaces have been altered, and pg_next points to the pnvers (partition version) page. 

select (p.pg_partnum + p.pg_pagenum - 1) as partn, a.name as dbspace 
 from sysmaster:syspaghdr p, sysmaster:sysdbspaces a 
 where p.pg_partnum = 1048576 * a.dbsnum + 1 
  and a.is_temp = 0 and a.is_blobspace = 0 and a.is_sbspace = 0 
  and p.pg_next != 0 
into temp altpts with no log ; 


4. Get the database and table names, minimum and maximum possible page versions, plus row and data page numbers for all those partnums. 

  select b.dbsname database, b.tabname table, a.dbspace dbspace, 
          hex(a.partn) partnum, i.ti_nrows nrows, i.ti_npdata datapages, 
          p.pta_oldvers as oldvers, p.pta_newvers as newvers, 
          -1 as mindver, 999 as maxdver 
  from altpts a, outer sysmaster:sysactptnhdr p, 
       outer sysmaster:systabnames b, outer sysmaster:systabinfo i 
  where a.partn = b.partnum 
    and a.partn = i.ti_partnum 
    and a.partn = p.partnum 
   into temp tabvers with no log ; 


5. Scan all non-empty home data pages for each altered tablespace and get the 
minimum and maximum versions: 

  update tabvers set mindver = 
        ( select (min(pg_next)) / 16777216 
            from sysmaster:syspaghdr p, altpts a 
           where p.pg_partnum = a.partn 
             and tabvers.partnum = a.partn 
             and sysmaster:bitval(pg_flags, '0x1') = 1   
             and sysmaster:bitval(pg_flags, '0x8') <> 1 
             and pg_frcnt < (2020 - 4*pg_nslots) 
        ) 
   where 1 = 1 ; 

  update tabvers set maxdver =   
        ( select (max(pg_next)) / 16777216 
            from sysmaster:syspaghdr p, altpts a 
           where p.pg_partnum = a.partn 
             and tabvers.partnum = a.partn 
             and sysmaster:bitval(pg_flags, '0x1') = 1 
             and sysmaster:bitval(pg_flags, '0x8') <> 1 
             and pg_frcnt < (2020 - 4*pg_nslots) 
        )   
    where 1 = 1 ; 


6. Display the tablespaces which require updating to complete the in-place alters: 

   select database, table, dbspace, nrows, datapages, 
          oldvers as oldestv, mindver, maxdver, newvers as newestv 
     from tabvers 
    where newvers > mindver 
    order by 1, 2; 


The second step is to do a dummy update on at least one row on each of those pages not updated to the current version. 

For all tables which have been altered, but are not yet "complete" , you can run a "dummy update" statement, setting the primary key column to itself (or other "change" that changes nothing). This will bring all of the pages to the most current version: 

update TABNAME set PKEY = PKEY where 1 = 1 ; 


http://www-01.ibm.com/support/docview.wss?uid=swg21389693

728x90
728x90

Technote (FAQ)


Question

How do you identify outstanding in-place alters in order to resolve them?

This document is specifically for IDS versions 7 to 10. For IDS version 11 and later,
see http://www-01.ibm.com/support/docview.wss?uid=swg21389693

Cause

Since version 7.24, some ALTER TABLE statements have been done as "fast alters", where rows are only altered to fit the new schema when the data page they are on is next written to. If there are many alters done, this can lead to the data pages for a table being a variety of versions. When migrating to a later version (or just for the performance benefit) it is a good idea to "complete" the outstanding in-place alters.

Answer

The first step is to work out which tables have data pages at old versions:

There are several documented ways to determine altered tables, including:

  • The 9.4 migration guide has SQL to identify all tables with in place alters (chapter 5, page 5):
    first set OPTCOMPIND to 0 in onconfig and restart instance
     set isolation to dirty read;

    select pg_partnum + pg_pagenum - 1 partn
       from sysmaster:syspaghdr, sysmaster:sysdbspaces a
      where pg_partnum = 1048576 * a.dbsnum + 1
        and pg_next != 0
       into temp pp with no log ;

    select b.dbsname database, b.tabname table
       from sysmaster:systabnames b, pp
      where partn = partnum ;

    Unfortunately, this shows all tables that ever had an in-place or fast alter, and does not exclude those even where every data page has been rewritten (and therefore updated) since the ALTER TABLE was run.
  • The oncheck -pT utility prints a table showing all current versions on data pages, and page counts for each version, but it runs serially, locks the whole table, takes a long while to run, and reports on many other things as well.

    This SQL will report the minimum and maximum version page version for a table (or by default, all tables) with incomplete alters. Dirty reads mean no locks, and it reports which dbspaces are involved too. It runs faster than oncheck -pT.

As these methods have drawbacks, the following steps can be used to obtain a more specific list of outstanding in-place alters: 

1. Set OPTCOMPIND to 0 in the onCONFIG configuration file and restart the instance. 

2. Set isolation to dirty read. 

3. Find every partnum in the tblspace tbspace of each non-temp dbspace where pg_next (pointer to extended partition page) is set to something other than zero. These tablespaces have been altered, and pg_next points to the pnvers (partition version) page. 

select (p.pg_partnum + p.pg_pagenum - 1) as partn, a.name as dbspace 
 from sysmaster:syspaghdr p, sysmaster:sysdbspaces a 
 where p.pg_partnum = 1048576 * a.dbsnum + 1   
           -- Tblspace tblspace partnums have format 0xnnn00001 
  and a.is_temp = 0 and a.is_blobspace = 0 and a.is_sbspace = 0 
           -- exclude tempspaces, blobspaces and smart blob spaces 
  and p.pg_next != 0 
           -- non zero = altered 
into temp altpts with no log ; 

4. Get the database and table names, plus row and data page numbers for all those partnums. 

select b.dbsname database, b.tabname table, 
       a.dbspace dbspace, hex(a.partn) partnum, 
       i.ti_nrows nrows, i.ti_npdata datapages, 
       -1 as minver, 999 as maxver 
  from altpts a, outer sysmaster:systabnames b, 
    outer sysmaster:systabinfo i 
  where a.partn = b.partnum and a.partn = i.ti_partnum 
into temp tabvers with no log ; 

5. Scan all non-empty home data pages for each altered tablespace and get 
the minimum and maximum versions: 

update tabvers set minver = ( 
  select (min(pg_next)) / 16777216 
  from sysmaster:syspaghdr p, altpts a 
  where p.pg_partnum = a.partn 
  and tabvers.partnum = a.partn 
  and sysmaster:bitval(pg_flags, '0x1') = 1  -- data page 
  and sysmaster:bitval(pg_flags, '0x8') <> 1 -- not remainder page 
  and pg_frcnt < (2020 - 4*pg_nslots)        -- with some data on it. 
) where 1 = 1 ; 

update tabvers set maxver = ( select (max(pg_next)) / 16777216 
  from sysmaster:syspaghdr p, altpts a 
  where p.pg_partnum = a.partn 
  and tabvers.partnum = a.partn 
  and sysmaster:bitval(pg_flags, '0x1') = 1 
  and sysmaster:bitval(pg_flags, '0x8') <> 1 
  and pg_frcnt < (2020 - 4*pg_nslots) 
)  where 1 = 1 ; 

Note the use of 2020 above is based on a dbspace with a 2KB pagesize, less the 24 byte header and 4 byte timestamp. For a 4KB pagesize use 4096 - 28 = 4078, for 8KB pagesize use 8192 - 28 = 8164, and so on. 

6. Display the tablespaces which require updating to complete the in-place alters: 

select database, table, dbspace, nrows, datapages, minver, maxver, 
 decode(maxver - minver, 0, "Maybe Needs Update", "Definitely Needs Update") 
 from tabvers 
order by 1, 2; 

The output of this SQL provides a list of which tablespaces (tables or fragments of tables) where in-place alters exist. If there are some data pages at a lower version from other data pages, then a dummy update is definitely required. If minver = maxver, then if there was an alter table statement run and no data pages have been written to since then (i.e. no inserts, updates, or deletes), then a dummy update may still be required. Prior to IDS version 11, it is not possible to determine from the sysmaster database what the potential maximum version of a data page is for a table. In version 11 there is a new sysmaster table called sysactptnhdr that does give this information. 


The second step is to do a dummy update on at least one row on each of those pages not updated to the current version. 

For all tables which have been altered, but are not yet "complete" , you can run a "dummy update" statement, setting the primary key column to itself (or other "change" that changes nothing). This will bring all of the pages to the most current version: 

update TABNAME set PKEY = PKEY where 1 = 1 ; 


http://www-01.ibm.com/support/docview.wss?uid=swg21226410

728x90
728x90

Question

How do you find outstanding in-place alters in a database

Answer

Download and run the attached script. It will find and display the tables with outstanding in-place alters in the database. Here is the command line syntax:

    inplacealter.ksh -d database_name -t [ALL| table_name ] [-o [ver|sum]]

    -d database      the name of the database (required) 
    -t [ALL|table]   the name of the table (default is all tables) 
    -o ver|sum       print either verbose or summary (default) report


The verbose report is similar to the oncheck -pT output. The summary output prints out whether the table has any outstanding in-place alters. The utility generates two output files: 
  • inplacealter.out -- in-place alter information
  • inplacealter.update -- SQL dummy update statements


The advantages of this script are: 
  • It places no locks on any tables, so it can be run anytime by the DBA
  • It doesn't run the expensive oncheck -pT to gather the statistics, so it is extremely fast even in the busiest of times on the server.
  • It creates a seperate script for the DBA with update statements for the table to remove the outstanding in-place alters. The update statement chooses a non-index column to be updated so that the engine does a sequential scan to select all pages used by the table.
  • It is written is Korn Shell and awk which are widely available on UNIX systems.

Warning: This script is not supported by Informix Technical Support. If it does not work on your system ask your system administrator for help debugging it. It is written in Korn Shell and may not work in other shells. The generated script containing the update statements can be run as-is; however, this script/updates have a direct impact on database logging and performance (time for the update(s) to complete). Serious consideration needs to be given to both these issues if any of tables referenced in the update statetments are logged and/or are very large.



http://www-01.ibm.com/support/docview.wss?rs=0&uid=swg21160923

728x90

+ Recent posts