I have a DB2 v8.2 Fix Pack 7 (8.1 FP 14) database on a single, IBM xSeries
server that has been in service for a couple of years. The database running
on the single DB2 instance is approximately 23Gb in size.
We have incorporated a cluster for high availability. This is a two node
cluster consisting of two blades in an IBM Blade Center connected to an IBM
SAN. Each blade is configured with a 16Gb flash drive (C:) and has 50% free.
They each have 8Gb RAM. Because we are running 32 bit Windows, we have the
/PAE and /3GB switches set on boot.ini.
The cluster is Active-Active. It runs MSSQL 2000 Enterprise on one node and
DB2 (v8.1 Fix Pack 15) on the other (both sets of software are installed on
each node). It is neccessary for us to run both database engines. In the
event of a node failure, both engines run on the remaining node and we will
probably take a small performance hit, but that is acceptable. Failovers
work without a hitch.
SQL Server is running extremely well. It's configured to use no more than
3.5Gb. I need DB2 to use no more than 3.5Gb since if we have a failover I
need to have some RAM left over for the OS.
On the first attempt, I applied the same DBM CFG to the cluster instance as
we have on the xSeries server. Then, I restored a backup of the DB2 database
on to the cluster without altering the DB CFG. All seemed fine after I
connected the application servers to the DB2 instance on the cluster. At
idle there are approximately 30 applications connected to the single
database in the instance (roughly 10 applications per application server).
Potentially, once users start connecting, there may be as many as 250
applications connected to the database. In any case, the first user
application caused DB2 to crash. Literally, db2diag.log states "DB2 has
crashed." After that, DB2 goes up and down like the Assyrian Empire until I
stop the application servers.
In my second attempt some days later (after some testing), I dropped and
restored the database from a new backup and rolled forward to end of logs.
This left the database consistent and backup pending == 'NO'. I then RESET
DB CFG and RESET DBM CFG and set the buffer pools to automatic. I then ran
AUTOCONFIGURE. These were the parameters I used:
autoconfigure using
mem_percent 50
workload_type mixed
num_stmts 10
tpm 100
admin_priority recovery
is_populated yes
num_local_apps 2
num_remote_apps 350
isolation UR
bp_resizeable yes
apply db and dbm;
This process set the DBM and DB as follows:
Former and Applied Values for Database Manager Configuration
Description Parameter Former Value
Applied Value
----------------------------------------------------------------------------
---------------------
Agent stack size (AGENT_STACK_SZ) = 16
16
Application support layer heap size (4KB) (ASLHEAPSZ) = 15
15
No. of int. communication buffers(4KB)(FCM_NUM_BUFFERS) = 1024
1024
Enable intra-partition parallelism (INTRA_PARALLEL) = NO
NO
Maximum query degree of parallelism (MAX_QUERYDEGREE) = 2
2
Max number of existing agents (MAXAGENTS) = 399
400
Agent pool size (NUM_POOLAGENTS) = 399
400
Initial number of agents in pool (NUM_INITAGENTS) = 0
0
Private memory threshold (4KB) (PRIV_MEM_THRESH) = 20000
20000
Max requester I/O block size (bytes) (RQRIOBLK) = 32767
32767
Sort heap threshold (4KB) (SHEAPTHRES) = 34900
34877
Former and Applied Values for Database Configuration
Description Parameter Former Value
Applied Value
----------------------------------------------------------------------------
---------------------
Max appl. control heap size (4KB) (APP_CTL_HEAP_SZ) = 900
128
Max size of appl. group mem set (4KB) (APPGROUP_MEM_SZ) = 30000
13480
Default application heap (4KB) (APPLHEAPSZ) = 2500
256
Catalog cache size (4KB) (CATALOGCACHE_SZ) = 5558
950
Changed pages threshold (CHNGPGS_THRESH) = 30
60
Database heap (4KB) (DBHEAP) = 25600
6264
Degree of parallelism (DFT_DEGREE) = 1
1
Default tablespace extentsize (pages) (DFT_EXTENT_SZ) = 32
32
Default prefetch size (pages) (DFT_PREFETCH_SZ) = 128
32
Default query optimization class (DFT_QUERYOPT) = 3
5
Max storage for lock list (4KB) (LOCKLIST) = 30000
4887
Log buffer size (4KB) (LOGBUFSZ) = 512
70
Log file size (4KB) (LOGFILSIZ) = 40000
1024
Number of primary log files (LOGPRIMARY) = 40
3
Number of secondary log files (LOGSECOND) = 50
0
Max number of active applications (MAXAPPLS) = 1000
400
Percent. of lock lists per application (MAXLOCKS) = 20
49
Group commit count (MINCOMMIT) = 1
1
Number of asynchronous page cleaners (NUM_IOCLEANERS) = 2
3
Number of I/O servers (NUM_IOSERVERS) = 3
55
Package cache size (4KB) (PCKCACHESZ) = 40000
1133
Percent log file reclaimed before soft chckpt (SOFTMAX) = 80
120
Sort list heap (4KB) (SORTHEAP) = 5000
192
SQL statement heap (4KB) (STMTHEAP) = 40960
2048
Statistics heap size (4KB) (STAT_HEAP_SZ) = 14000
4384
Utilities heap size (4KB) (UTIL_HEAP_SZ) = 10240
134719
Former and Applied Values for Bufferpool(s)
Description Parameter Former Value
Applied Value
----------------------------------------------------------------------------
---------------------
BUF16K Bufferpool size = -1
20110
BUF32K Bufferpool size = -1
10055
BUF4K Bufferpool size = -1
81091
BUFIDX Bufferpool size = -1
81091
IBMDEFAULTBP Bufferpool size = -1
81091
I then used db2cc to set the archive log file path. Because the database
logging was set for recovery, the instance was stopped and restarted and a
forced backup started by db2cc. This and subsequent repeated backup attempts
failed at the 50% mark, thus leaving the database inaccessable (connects
fail because of the backup pending status so I can't adjust and buffer pools
or cfg parameters). The memory consumption on the node running the instance
was 2.5Gb at the time. The db2diag.log starts logging Severe events like
this and continues like this:
2008-01-24-00.28.37.097000+540 I10150H442 LEVEL: Severe
PID : 2688 TID : 5372 PROC : db2syscs.exe
INSTANCE: DB2 NODE : 000 DB : SIEBEL
APPHDL : 0-35 APPID: *LOCAL.DB2.080123152837
FUNCTION: DB2 UDB, base sys utilities, sqleFirstConnect, probe:100
MESSAGE : DiagData
DATA #1 : Hexdump, 4 bytes
0x019AD194 : A4FB FFFF ....
2008-01-24-00.28.51.284000+540 I10594H379 LEVEL: Warning
PID : 2688 TID : 5372 PROC : db2syscs.exe
INSTANCE: DB2 NODE : 000 DB : SIEBEL
APPHDL : 0-36 APPID: *LOCAL.DB2.080123152850
FUNCTION: DB2 UDB, database utilities, sqlubSetupJobControl, probe:2028
MESSAGE : Starting an offline db backup.
2008-01-24-00.33.18.595000+540 I10975H521 LEVEL: Severe
PID : 2688 TID : 3024 PROC : db2syscs.exe
INSTANCE: DB2 NODE : 000
FUNCTION: DB2 UDB, buffer pool services, sqlbDMSDirectRead, probe:790
MESSAGE : DiagData
DATA #1 : Hexdump, 48 bytes
0x4A62F948 : C4AF 494A 0200 7F02 D8E2 7F02 2000 0000 ..IJ........ ...
0x4A62F958 : A03F 0400 DCE2 7F02 0000 0000 0100 0000 .?..............
0x4A62F968 : 0000 0000 F83F 4E71 008D 5F03 0000 0000 .....?Nq.._.....
2008-01-24-00.33.18.595000+540 I11498H521 LEVEL: Severe
PID : 2688 TID : 4980 PROC : db2syscs.exe
INSTANCE: DB2 NODE : 000
FUNCTION: DB2 UDB, buffer pool services, sqlbDMSDirectRead, probe:790
MESSAGE : DiagData
DATA #1 : Hexdump, 48 bytes
0x4A4EF948 : CCB1 494A 0200 7F02 D8E2 7F02 2000 0000 ..IJ........ ...
0x4A4EF958 : E03F 0400 DCE2 7F02 0000 0000 0100 0000 .?..............
0x4A4EF968 : 0000 0000 F83F 5E71 000D 7F03 0000 0000 .....?^q........
2008-01-24-00.33.18.611000+540 I12021H352 LEVEL: Error
PID : 2688 TID : 3024 PROC : db2syscs.exe
INSTANCE: DB2 NODE : 000
FUNCTION: DB2 UDB, buffer pool services, sqlbDMSDirectRead, probe:860
RETCODE : ZRC=0x870F00F2=-2029059854=SQLO_NORES
"no resources to create process or thread"
I am going to try the process again with another backup and different buffer
pool sizes and so on. I think I shall try different parameters in the
AUTOCONFIGURE (particularly mempercent = 100) and see how that goes. In the
meantime, any leads for investigation would be much appreciated. It almost
seems as though there is something wrong with the cluster install of DB2 UDB
WSE, but that all seemed to go quite well.
I have plenty of time to resolve this as the existing DB server is
unaffected.
Gary
DB2 & UDB email list listserv db2-l LazyDBA home page