Re: needle in a hay stack

Re: needle in a hay stack

 

  

I used the method of looking for column names like'%ssn%' and addr and name. It is a manual, slow process. You can also look for data that matches, len(snum) = 9. Again manual, but you can use system tables to pull the table name and column name to automate it some. But there is not a sure fire way to find this type of data without eyes on, very time consuming search (or the wonderfully documentation we all have of all our tables and processes....HA HA :->).


Stephen Dyckes



-----Original Message-----
From: Gary Nease <mssqldba-ezmlmshield-x20210501.[Email address protected]
To: LazyDBA Discussion <[Email address protected]
Sent: Wed Nov 28 12:48:23 2007
Subject: RE: needle in a hay stack

At Microsoft this is called the "PII Scrub", with PII standing for
Personally Identifiable Information. It was supposedly a required
deliverable in the process of designing and deploying any new data store
(table), and was vaguely considered a requirement for compliance with
"SOX" (Sarbanes-Oxley Act
http://en.wikipedia.org/wiki/Sarbanes-Oxley_Act). All copies of
production data being used in Dev, Test, and QA were supposed to go
through the scrub. The required scrub process was usually neglected
from planning for all new products, and only nominally addressed at the
last minute after testers noticed PII in their data, too late for proper
attention to design and far to late to develop effective test criteria.


Professional database publications have written extensivley on the
due-diligence implications of that legislation and the need to weigh
cost/benefit for efforts to comply. You may need to do some negotiation
after reading up on the subject. I have heard of no silver bullet.

Don't forget to look for xml tags containing strings such as "%addr%",
"%mail%", "%phone%", etc. I suggest you start managing expectations and
seek some attainable and measurable success criteria. This might
include a semi-finite list of strings found in column-names and xml
tags, and/or manual examination of samples of an agreed size and
randomization.

-----Original Message-----
From: Ligda John
[mailto:mssqldba-ezmlmshield-x40535838.[Email address protected]
Sent: Wednesday, November 28, 2007 9:31 AM
To: LazyDBA Discussion
Subject: needle in a hay stack

I have been asked to identify all tables and columns with sensitive data
such as social security numbers and the like. Our business analyst
quickly provided a report but I fear it represents the expected use of
the systems not actual use. For example, I already discovered some very
sensitive data in a description field which the analyst didn't expect,
although he wasn't surprised.



I have about nine hundred tables from about nine gigabytes of data
files.



I thought about a bulk copy batch to character format where I could run
some Perl regular expression searches against everything, but it would
be a nightmare to map the findings back to the table's columns. I
can't imagine I'm the first one to need to do this yet my internet
search reveals only tools for documents and not SQL Server tables.
Does anyone have experience or ideas?



Thanks!

John




--
DISCLAIMER

Confidentiality Note: This e-mail is intended only for the person or
entity to which it is addressed and may contain information that is
privileged, confidential or otherwise protected from disclosure.
Dissemination, distribution or copying of this e-mail or the information
herein by anyone other than the intended recipient, or an employee or
agent responsible for delivering the message to the intended recipient,
is prohibited. If you have received this e-mail in error, please notify
us immediately (telephone
415-288-0544 or e-mail [Email address protected] and destroy the
original message and all copies.

Hall Capital Partners LLC reserves the right to monitor and review the
content of all e-mail communications sent and/or received by its
employees.



---------------------------------------------------------------------
TO REPLY TO EVERYBODY , PLEASE CLICK REPLY-ALL, NOT JUST REPLY To post a
dba job: http://jobs.lazydba.com To subscribe : http://www.LazyDBA.com
To unsubscribe: http://www.lazydba.com/unsubscribe.html



---------------------------------------------------------------------
TO REPLY TO EVERYBODY , PLEASE CLICK REPLY-ALL, NOT JUST REPLY
To post a dba job: http://jobs.lazydba.com
To subscribe : http://www.LazyDBA.com
To unsubscribe: http://www.lazydba.com/unsubscribe.html


-----------------------------------------
The information contained in this e-mail message is intended only
for the personal and confidential use of the recipient(s) named
above. This message may be an attorney-client communication and/or
work product and as such is privileged and confidential. If the
reader of this message is not the intended recipient or an agent
responsible for delivering it to the intended recipient, you are
hereby notified that you have received this document in error and
that any review, dissemination, distribution, or copying of this
message is strictly prohibited. If you have received this
communication in error, please notify us immediately by e-mail, and
delete the original message.

MS Sql Server LazyDBA home page