RE: Finding New or Modified Rows

RE: Finding New or Modified Rows

 

  

Thanks everybody. Lots of good advice.

Blair

-----Original Message-----
From: ken farmer
[mailto:db2udbdba-ezmlmshield-x14704979.[Email address protected]
Sent: Monday, February 28, 2005 12:23 PM
To: LazyDBA Discussion
Subject: Re: Finding New or Modified Rows


Blair Jensen wrote:
> Is there an easy way to find the new or modified rows in DB2? I'm loading
a
> data warehouse and want to select only the new or modified rows from over
> 500 tables. The only thing I've been able to see so far is a "last
> modified" timestamp column that is updated by insert and update triggers.
> Is there an easier way? Please tell me there is.

Blair,

I usually prefer a image-delta process to one that uses logs or
timestamps on tables: the problem with log-based deltas is that logging
is optional: you can bypass it with loads, restores, and not logged
initially. The same is true of table timestamps, plus you've got more
change of error. Of course, errors are also possible with log-readers,
as are version restrictions. Replication is also an option - and it's
great for real-time ETL, but stinks for transformations, backup
strategies, performance, etc.

The biggest problem with image deltas is that they won't catch multiple
transactions that step on one another between images. But this is
typically ok in warehousing - not too many transactional fact tables.

The easiest way to do an image-delta is to just export a table every n
hours, sort current & previous exported files by unique key, then both
through a utility that compares each column:
- Anything in new and not old is an insert
- anything in old and not new is a delete
- anything with matching keys and non-matching non-keys is an update
- everything else is the same
Since both files can be read in sequence the comparison process is very
fast - typically much faster than the sorting. You can easily drive it
through a config file or command-line arguments - so you don't need to
rewrite it for each table. There may be a commercial product that will
do this: easytrieve? file-aid? syncsort? I've had to write this
several times now. Used to write it in c, these days in python.

You could also use sql to do the same thing, but the syntax of the merge
command is pretty clumsey (IMHO) compared to the use of a command-line
utility.

If I had to do this for 500 tables, and assuming that they could live
with snapshots rather than transactions, then I'd be inclined to use
this method. Then I'd generate the config files from db2 system tables.
Then I'd generate the exports and loads from the system tables as well.

good luck,

ken



---------------------------------------------------------------------
PLEASE CLICK REPLY-ALL TO SEND A REPLY TO EVERYONE
website: http://www.LazyDBA.com
To unsubscribe: http://www.lazydba.com/unsubscribe.html


"Attention: This message is intended only for the individual to whom it is
addressed and may contain information that is confidential or privileged. If
you are not the intended recipient, or the employee or person responsible for
delivering it to the intended recipient, you are hereby notified that any
dissemination, distribution, copying or use is strictly prohibited. If you
have received this communication in error, please notify the sender and
destroy or delete this communication immediately."
==============================================================================

DB2 & UDB email list listserv db2-l LazyDBA home page