728x90
 

Question

I found some indexes marked invalid after rollforward, or a HADR standby database become switched to primary.
Why the indexes are marked as invalid, and how I can recover the invalid indexes.

Cause

Db2 does not write index build or rebuild log records by default, rather Db2 marks the index as invalid so that recovery task can trigger to rebuild the index.

Answer

"Index rebuilding" means index creation, index reorg, classic table reorg, and rebuilding index in this article.
 
Indexes can be marked as invalid when:
  • Index rebuilding is not logged ( LOGINDEXBUILD=NO), then index rebuilding replayed.
  • Index rebuilding is logged ( LOGINDEXBUILD=YES), then index rebuilding replayed but INDEXREC is set as ACCESS_NO_REDO or RESTART_NO_REDO.
  • Failed offline reorg, or such failed offline reorg is replayed.
  • db2dart runs with /MI option.

Indexes can be rebuilt when:
  • There are one or more indexes marked invalid when a database restarts. i,e. Run RESTART DB with INDEXREC=RESTART
  • Someone access an index marked as invalid.
  • Run classic REORG TABLE command
  • Run REORG INDEXES or REORG INDEX command
  • Run CREATE INDEX statement
  • Run LOAD command, and it uses REBUILD option as INDEXING MODE
  • Run IMPORT or LOAD command with REPLACE option
  • Run TRAUNCATE TABLE statement

We can list all the indexes marked as invalid by the following query:
db2 "select tabschema, tabname, indname, datapartitionid, index_partitioning from table(sysproc.admin_get_index_info('','','')) as t where index_requires_rebuild='Y'"

We can recover the indexes marked as invalid by ether of the following ways.
 
a) Run RESTART DATABASE command (in case db cfg INDEXREC is set as RESTART)
The restart database command rebuild all the indexes set as INDEX_REQUIRES_REBUILD=Y
db2 restart database <database_name>
b) Run reorg indexes
db2 reorg indexes all on table <table_name> allow no access

Note: 
You can set LOGINDEXBUILD as YES to prevent indexes to be marked as invalid.
So, this setting is highly recommended in HADR databases.
Example:
db2 connect to <database_name>
db2 update db cfg using LOGINDEXBUILD YES

 

https://www.ibm.com/support/pages/db2-causes-mark-index-invalid-and-how-recover-indexes

 

[Db2] The causes to mark index invalid, and how to recover the indexes.

[Db2] The causes to mark index invalid, and how to recover the indexes.

www.ibm.com

 

728x90
728x90

인포믹스에서 함수 인덱스를 찾는 방법에 대한 질문이 있어서 찾아보았습니다.

우선 카탈로그 테이블인 sysindexes 부터 살펴보니 함수 인덱스임을 표시하는 컬럼은 없었습니다.

그래서 sysindexes 테이블의 함수 인덱스 정보가 다른 인덱스와 차이가 있는지 확인해봤습니다.

 

먼저 테이블과 함수 인덱스를 만들었습니다. 내용은 IBM 문서를 참고했습니다.

www.ibm.com/developerworks/data/library/techarticle/dm-0712wilcox/index.html

$ dbaccess stores_demo -
Database selected.
> CREATE TABLE circles ( radius FLOAT );
Table created.
> CREATE FUNCTION circleArea( radius FLOAT ) RETURNS float
>     WITH (NOT VARIANT);
>
>     RETURN 3.14159 * radius * radius;
> END FUNCTION;
Routine created.
> CREATE INDEX areaOfCircleIndex on circles( circleArea( radius ) );
Index created.

 

해당 인덱스에 대한 sysindexes 테이블의 정보를 확인해보았습니다.

> select * from sysindexes where idxname='areaofcircleindex';
idxname    areaofcircleindex
owner      informix
tabid      162
idxtype    D
clustered
part1      0
part2      0
part3      0
part4      0
part5      0
part6      0
part7      0
part8      0
part9      0
part10     0
part11     0
part12     0
part13     0
part14     0
part15     0
part16     0
levels     1
leaves     1.000000000000
nunique    0.00
clust      0.00
1 row(s) retrieved.

특이하게도 일반 인덱스와 달리 part1 값이 0으로 표시됩니다. 인덱스를 만들 때 컬럼 순서를 바꾸고 일반 컬럼과 혼용해도 동일한 결과를 보여줍니다. 그럼 part1 값이 0이면 함수 인덱스인가? 라는 생각이 들어서 정말 그런지 실행해보았습니다.

> select owner, idxname from sysindexes where tabid > 99 and idxname[1,1] <> ' ' and part1 = 0;
owner    informix
idxname  ucnameindex
owner    informix
idxname  ts_data_location_spix
owner    informix
idxname  areaofcircleindex
3 row(s) retrieved.

 

데모용 데이터베이스인 stores_demo에서 수행해보니 opclass에 대한 인덱스와 함수 인덱스 두개가 보입니다. 정확히 함수 인덱스만 나오는 것은 아니고 일반적인 타입의 인덱스는 제외되는 것 같습니다. 어설프지만 확인은 가능하네요.

그래서 좀 더 정확한 방법에 대한 조언을 구하기위해 IBM Community에 질문을 올렸습니다. 바로 Art Kagel씨와 Paul Watson씨가 명쾌한 답을 주시는군요. 답은 sysindices 카탈로그 테이블에 있었습니다.

> select idxname from sysindices where indexkeys::lvarchar matches '*<[0-9]*>*';
idxname  ucnameindex
idxname  areaofcircleindex
2 row(s) retrieved.
> select idxname, indexkeys from sysindices where indexkeys::lvarchar matches '*<[0-9]*>*';
idxname    ucnameindex
indexkeys  2 [1], <1757>(3) [1], 1 [1]
idxname    areaofcircleindex
indexkeys  <1758>(1) [1]
2 row(s) retrieved.

sysindices 테이블의 indexkeys 컬럼 값에 함수 인덱스에 대한 정보가 있더군요. opclass에 대한 정보도 있는데 이부분은 저도 잘은 모르고 이 내용과는 벗어나는 내용이라 넘어가겠습니다. indexkeys 컬럼에도 딱 이 인덱스가 함수 인덱스다! 라고 알려주는 플래그 값이 있는건 아니지만, 함수 인덱스인 경우 <> 사이에 procid 값 정보가 있습니다. 이 정보로 인덱스가 함수 인덱스임을 알 수 있는 것입니다.

아래는 IBM Knowledge Center의 sysindices 테이블에 대한 설명입니다.


The fields within the indexkeys columns have the following significance:

  • The procid (as in sysprocedures) exists only for a functional index on return values of a function defined on columns of the table.
  • The list of columns (col1, col2, ... , coln) in the second field identifies the columns on which the index is defined. The maximum is language-dependent: up to 341 for an SPL or Java™ UDR; up to 102 for a C UDR.
  • The opclassid identifies the secondary access method that the database server used to build and to search the index. This is the same as the sysopclasses.opclassid value for the access method.
  •  

IBM Community에서 좋은 것을 배웠네요.

728x90
728x90

예전에 Ben Thompson의 블로그에서 Index fragment의 page limit에 대한 글을 읽었습니다.

2014년의 글인데 Index의 페이지 수가 12.1버전부터 2^31 (2,147,483,647)개로 증가했다는 것이지요.

IBM Knowledge Center에도 문서화되어 있습니다.

https://www.ibm.com/support/knowledgecenter/SSGU8G_12.1.0/com.ibm.adref.doc/ids_adr_0718.htm#ids_adr_0718__table

 

블로그의 저자인 Ben은 11.7 버전에서도 테스트해보았다는 내용이 있습니다. 그러나 IBM 문서에는 해당 내용이 없어서, IBM Community에 자문을 구하고자 질문을 올렸고 Art와 Ben이 몇가지 조언을 주어서 직접 테스트를 하게 되었습니다.

 

테스트 환경은 인포믹스 11.50.FC6과 11.70.FC5이고, 테스트 시나리오를 위해 고객사의 테이블 스키마를 사용했습니다.

고객은 인포믹스 11.7 버전을 사용하고 있고, 매달 생성하는 테이블이 있는데 최근 몇달간의 데이터를 보니 약 4억4천만건의 레코드가 저장되고 있더군요.

문제는 테이블은 라운드 로빈(round robin) 형태인데 primary key에 해당하는 unique index는 분할되지 않아 기존에 알고 있던 한계치인 2^24 (16,775,134) 개에 육박하고 있다는 점입니다.

그런데 막상 IBM Knowledge Center의 11.7 버전의 문서에는 인덱스 페이지 수의 제한은 나와있지 않고..

 

이런 상황을 가정하고 아래의 테이블을 만들었습니다.

create table log_12
  (
    col1 varchar (50),
    col2 varchar (32),
    col3 varchar (5),
    col4 varchar (5),
    col5 varchar (10)
   .....
  ) fragment by round robin in dbs1, dbs2, dbs3
extent size 16000000 next size 10000000;​

데이터 입력 시간을 절약하기 위해, 더미 데이터를 약 5억건 입력하고 아래 인덱스를 만들었습니다.

create unique index log_12_pk on log_12 (col1, col2, col3, col4, col5) in idx1;​

11.5에서는 인덱스의 페이지 수가 2^24 (16,775,134)개에 도달하자 인덱스를 생성하는 트랜잭션은 롤백되면서 online.log에는 아래와 같은 메시지가 출력되었습니다.

15:22:42 partition 'demo: informix.log_12_pk': no more pages

11.7에서는 인덱스 생성이 완료되었고, 인덱스에 35,999,944개의 페이지가 할당되었습니다.

아래는 oncheck -pt의 결과물의 일부입니다.

                  Index log_12_pk fragment partition idx1 in DBspace idx1
    Physical Address               5:5
    Creation date                  02/14/2020 16:47:53
    TBLspace Flags                 801        Page Locking
                                              TBLspace use 4 bit bit-maps
    Maximum row size               107
    Number of special columns      0
    Number of keys                 1
    Number of extents              2
    Current serial value           1
    Current SERIAL8 value          1
    Current BIGSERIAL value        1
    Current REFID value            1
    Pagesize (k)                   2
    First extent size              8672897
    Next extent size               5420560
    Number of pages allocated      35999944
    Number of pages used           35705525
    Number of data pages           0
    Number of rows                 0
    Partition partnum              5242882
    Partition lockid               2097217
    Extents
         Logical Page     Physical Page        Size Physical Pages
                    0              5:53    23999947   23999947
             23999947               9:3    11999997   11999997

인포믹스 11.7 이상이라면 인덱스 페이지 수 제한을 걱정하지 않아도 되겠군요.

 

추가로 IBM에 공식적으로 문의한 결과, 11.7 문서화에 명시되지 않았을 뿐 인덱스 페이지 수 제한은 12.1 버전과 마찬가지로 2^31 (2,147,483,647)개임을 확인했습니다.

728x90
728x90

Question

If you think that the DB2® index manager might be the origin of a problem, collect diagnostic data that you or IBM® Software Support can use to diagnose and resolve the problem.

Cause

Background

There are many flavours of index-related corruption problems, for instance:

  • Unique index contains duplicates with different RID(s) or same RID(s)
  • Multiple index entries pointing to the same RID
  • Index key is out of place (wrong index key order)
  • Row exists, but index keys do not exist in any or some of the indexes
  • Index entry pointing to an empty data slot or unused data slot or RID is invalid
  • Incorrect previous or next index page pointers, incorrect high key or other corruptions on index pages

Even if you see the same return code from the same function or probe point, problems could vary from one to another. For instance, when SQLI_NOKEY error is received from index function "procLeaf2Del" or "sqlischd" during a delete or update, you could be hitting problems 3, 4, or 6. 

If "row not found" or "bad page" type of error is received when an index is used to fetch the RID for a delete, update or select, then you could be hitting problems 3 or 5. 

In short, for each problem, a DB2 support analyst will need to look at the data collected, and analyze it to determine root cause. To have data collected as much as possible at the first failure point as well as preserving necessary log files are often critical to the root cause determination. 

Since index corruption or index data inconsistency type of errors could be caused by DB2 defects or by other issues such as hardware problems, it is necessary to collect the proper diagnostic data in order to determine the potential cause of the error. 


DB2 tools to detect these problems 

DB2 products offer a variety of tools to detect index or data corruption problems and index or data mismatch problems. 

  1. An off-line db2dart /t (table inspection) would be able to detect corruption symptoms 1, 2, 3, 6 of the above list. For example:

    Table resides in table space id 4, and object id of table is 2, the database name is T30, the output is "inspTsi4oi2.rpt":
    db2dart T30 /t /tsi 4 /oi 2 /scr n /rptn inspTsi4oi2.rpt

  2. The RUNSTATS command with a special registry variable set will collect statistics as well as detect corruption symptoms 4, 5 of the above list. For example:

    db2stop
    db2set DB2_KEYDATACHECK=IXSTATDATADEBUG
    db2start

    db2 connect to <database>
    db2 -v "runstats on table <tableschema>.<tabname> and detailed indexes all allow read access"

    NOTE: After each RUNSTATS, you need to save the contents of the sqllib/db2dump directory. You can do so by moving the entire content to another location. As long as the RUNSTATS does not return with DB2 completed successfully, you will need to report this to DB2 support.

Some enhancements are currently being investigated. The goal is to improve the existing tools for serviceability and usability in future releases in the above areas.

Answer

You often need an advanced knowledge of DB2 in order to determine whether the problem is an index corruption or an index or data inconsistency problem. Therefore you should always contact IBM support organization to help you recover from these problems.



For index-related corruption or index or data inconsistency problems, the following information is typically required:

  • the contents of the DB2DUMP directory
  • the on-disk version of the related index pages (collected via the db2dart tool)
  • related data page(s) for the index or data object (collected via the db2dart tool)
  • database transaction log files, which are used to understand the past history of the index or table object that encountered the error.

The following steps are the recommended best practice for data collection (and database recovery) when an error related to index corruption occurs. NOTE: DO NOT perform these steps unless you have been asked to do so by an IBM DB2 Support analyst. 

  1. Collect the standard diagnostic information using the db2support command. This should include all the files in DB2DUMP directory (or the path specified by DIAGPATH in the dbm configuration).

    There are cases where the corruptions are caused by a disk problem, and you do not care about root cause analysis of the problem. In these situations, skip step 2, and go directly to step 3.

  2. Run db2dart as indicated below (note: if this is a multi-partition DPF environment, then you will need to run db2dart on the partition where failure occurred):

    db2dart <dbname> /t /tsi <pool id> /oi <object id> /rptn inspTsi<poolid>oi<objectid>_bcrashrec.rpt

    In general, this inspection should be fairly quick. If the database is mission critical database and needs to be brought up online immediately, then you may skip this step at this time.

  3. Restart your database and verify crash recovery completes successfully.


    If crash recovery completes successfully and you cannot take the time needed to collect this data for root cause analysis of the problem, skip steps 4, 5 and 6 and go directly to step 7.


    If the restart fails, and it is determined to be a problem redoing or undoing an index record, then you can mark an index invalid (e.g. db2dart <dbname> /mi /oi <objectID> /tsi <tablespaceID>). INDEXREC will control when the invalid index object will be recreated.
  4. To prevent the previous failing application from accessing the table again before the problem on the related index or table is resolved, you can lock the table in exclusive mode. Use the following command:


    db2 +c "LOCK TABLE <SCHEMA NAME>.<TABLENAME> IN EXCLUSIVE MODE"


    The "+c" turns off the auto commit, since LOCK TABLE is only effective within the unit of work.


    If you do care about the availability of the rest of the tables in the same table space as the problem table, instead you may also want to consider quiescing the table spaces for the table object. For example:


    db2 "QUIESCE TABLESPACES FOR TABLE <SCHEMA NAME>.<TABLE NAME> EXCLUSIVE"


    NOTE: QUIESCE command will also quiesce the index table space if the index object is in a separate table space. In a partitioned database environment, this command will only quiesce the portion on current node.

  5. Run db2dart against the index and table object.


    If the problem is an index NOKEY error, then a number of db2dart commands will be printed in the db2diag.log file. Search for these commands (using eithergrep on UNIX® or find on Windows®) and save them in a file. Edit the file, replace "DBNAME" with the database name. If the problem has been hit multiple times, then there could be some duplicate entries. You only need to keep the latest set of db2dart commands.

    If it is the data manager that reports row not found or bad page, then NO db2dart commands are printed out. You can determine the pool id, object id, data page number from the db2diag.log entries. For example:


    2006-06-17-11.32.59.941406+000 I235565A460 LEVEL: Severe
    PID : 9224396 TID : 1 PROC : db2agent (SAMPLE) 0
    INSTANCE: db2inst NODE : 000 DB : OB8
    APPHDL : 0-78 APPID: ....
    FUNCTION: DB2 UDB, data management, sqldDeleteRow, probe:1329
    RETCODE : ZRC=0x87040001=-2029780991=SQLD_BADPAGE "Bad Data Page"
    DIA8500C A data file error has occurred, record id is "".


    2006-06-17-11.32.59.964241+000 I236403A380 LEVEL: Severe
    PID : 9224396 TID : 1 PROC : db2agent (SAMPLE) 0
    INSTANCE: db2inst NODE : 000 DB : OB8
    APPHDL : 0-78 APPID: ...
    MESSAGE : PAGE OBJECT IDENTIFIERS:
    DATA #1 : String, 54 bytes
    Tablespace ID = 15, Object ID = 8737, Object Type = 0


    2006-06-17-11.32.59.964448+000 I236784A350 LEVEL: Severe
    PID : 9224396 TID : 1 PROC : db2agent (SAMPLE) 0
    INSTANCE: db2inst NODE : 000 DB : OB8
    APPHDL : 0-78 APPID: ...
    MESSAGE : PAGE NUMBERS:
    DATA #1 : String, 35 bytes
    Obj Page = 297, Pool Page = 787527


    In this case, the data page is 787527, pool id is 15 and object id is 8737.

    The commands that you need are:

    1. db2dart <dbname> /dd /tsi <pool id> /oi <object id> /ps <data page number>p /np 1 /v y /scr n /rptn dTsi<poolid>oi<objectid>.rpt

      ...where <data page number> is 787527p using the example above, "p" is used to indicate this is pool relative page which is important for DMS table space.

    2. db2dart <dbname> /di /tsi <pool id> /oi <object id> /ps 0 /np <large enough value to dump all index pages> /v y /scr n /rptn iTsi<poolid>oi<objectid>_0-<pagenum>.rpt.rpt


      Note: The command might take longer to run if a large /np value is specified. Ideally, you should collect as much data as possible. At minimum, it should be no less than 100 pages.

      If table is not too big, also dump the whole table. If the table is too big to dump, select a reasonably large number so that at least a part of the table can be dumped, for example:

      db2dart <dbname> /dd /tsi <pool id> /oi <object id> /ps <data page to start> /np <large enough number> /v y /scr n /rptn dTsi<pool id>oi<object id>_0-<page num>.rpt

      where <data page to start> would be "0" if you want to dump the whole table, or <data page number - 100>p if the table is too big to dump (using the example, that would be "787427p"). Replace the <pool id> and <object id> with the actual number.

  6. Run RUNSTATS to check for any index or data inconsistency. For example:


    db2stop
    db2set DB2_KEYDATACHECK=IXSTATDATADEBUG
    db2start
    db2 connect to <db>
    db2 -v "SELECT INDSCHEMA, INDNAME FROM SYSCAT.INDEXES WHERE TABSCHEMA='<TABLE SCHENAME>' AND TABNAME='<TABLENAME>' "


    Then use the index schema name and index name from the above SELECT output for the following RUNSTATS command:

    db2 -v "runstats on table <tableschema>.<tabname> and index <index schema>.<index name> allow read access"


    NOTE: You need to perform the RUNSTATS on all of the indexes defined on the offending table; after each RUNSTATS command, you need to save the contents of the sqllib/db2dump directory (or DIAGPATH, if you're not using the default path).


    RUNSTATS on a large table could take a while and the table will not be available for updates while RUNSTATS is running. In a multi-partition DPF environment, RUNSTATS is done on a per partition basis.


    If you had quiesced the table space at step 4, then unquiesce the table spaces before you terminate the database connection. You will need to use the same user id for the QUIESCE command in order to reset:


    db2 -v "QUIESCE TABLESPACES FOR TABLE <SCHEMA NAME>.<TABLE NAME> RESET" 


    Once all the RUNSTATS commands are completed, you need to collect the db2diag.log along with the content of sqllib/db2dump (or content of DIAGPATH).

  7. If it is determined that the index is corrupted (that is, if either RUNSTATS or db2dart /t inspection returns with an error), then proceed with rebuilding the index as follows:


    db2 "REORG INDEXES ALL FOR TABLE <SCHEMA NAME>.<TABLE NAME> ALLOW NO ACCESS"


    Applications cannot access your table while the above REORG is running. If you need the table to be available immediately, you can choose to run the above REORG INDEXES command with ALLOW READ ACCESS or ALLOW WRITE ACCESS options. Be aware, however, that the database is at risk of being brought down again if the previous failing application is run against the table before the REORG completes.


    If you are not concerned with root cause analysis of the problem, you do not need to collect any additional information and can skip the subsequent steps.

  8. Collect table or index DDLs. You can use the db2look utility to do so. For example: 

    db2look -d <database> -l -e -a -t <table name> -o table.ddl


  9. Collect transaction log files

    You will only know which log files to collect after a DB2 Support analyst examines the data collected thus far. In general, they will ask you to send a raw log file. In some special cases, they may ask you to format the file and only send the relevant log files. The number of log files required will depend on when the problem was introduced, so it might be necessary to retrieve older log files from archive. 

  10. Additional information that may be important to root cause analysis or known APAR identification. For example:
    • What type of workload is typically running against the table?
    • Did you perform any actions that you don't normally do on this table, for instance any REORG INDEXES/TABLE commands, load or import operations, a high utilities workload, or restore or rollforward operations?
    • Were there any other unusual occurrences? For example, if there were recent hardware problems, provide the operating system error log. This would involve collecting the errpt -a output on AIX® or /var/adm/messages file on Solaris.


http://www-01.ibm.com/support/docview.wss?uid=swg21244563

728x90

+ Recent posts