Re: 2 Million Records, Duplicate and SubString Removal

Re: 2 Million Records, Duplicate and SubString Removal

 

  

Sorry, the formatting got removed. So resending the question.

Hello,
I have 2,000,000 records in an RDBMS (Oracle 9i/10g or SQL Server 2000)

Let me define the question in the following points:

1. I have 2 Million records like the following


TABLE NAME: NamesTable
Columns: NameId, NameString, length,
Example Data:
Row1: "R2ZRWRTY",
"RRRARAZZZRAAZRZARWZWZRWRZWZAZRRRZWRZRRRRZRWRRRZAARAZZRWRZWZZWZRRRRARZWRZRRR
ZAWRRZZRWRZRRZAZAARWRRRRANZRRRA", 106


Now in the above structure of table assume you have 2 million records.
NameString length is never more than 200.

2. In the "NamesTable" Table Remove all the DUPLICATE "NameString".

3. In the table left after removing duplicates, REMOVE ALL THE SUB -
NameString.
For Example, If NameString1="WORLDISGREAT" and NameString2 = "GREAT",
then you can clearly see that NameString2 exists in NameString1. So REMOVE
NameString2 because it is contained in NameString1.
So remove all Sub-NameStrings from the table.

How do you think we should do this? We can use Oracle 9i/10g or Sql Server
2000. Considering that there is so much of data and there are so many
possible combinations.

What is the best possible and the fastest solution to solve this kind of
DUPLICATE AND SUB-STRING REMOVAL PROBLEM.

Please Help

Thanks and Regards
Ranjit


DB2 & UDB email list listserv db2-l LazyDBA home page