12.07.2015 Views

TEST DATA MANAGEMENT Huw Price Lead Technical ... - Isaca

TEST DATA MANAGEMENT Huw Price Lead Technical ... - Isaca

TEST DATA MANAGEMENT Huw Price Lead Technical ... - Isaca

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>TEST</strong> <strong>DATA</strong> <strong>MANAGEMENT</strong><strong>Huw</strong> <strong>Price</strong><strong>Lead</strong> <strong>Technical</strong> Architect Grid-Tools Ltd


Building a Short to Long Term Test Data StrategyData ProfilingData ValidationRequirements ManagementDatabase SubsettingDatabase MaskingDatabase Data CreationFlat File SubsettingFlat File MaskingFlat File CreationData CoverageCause and EffectTest Data WarehouseSOA Test harnessSOA StubbingSOA Test case managementUI Test case managementData SynchronizationData CongruencyETLWork FlowTDM PortalData EncryptionData Access/TransparencyLong TermShort TermMedium Term


With an increasing trend to punish and hold individuals accountable for anydata breach, PII data is one of the top concerns of executives todaySurvey of over 3Kprofessionals across95 countrieswww.isaca.orgCompanies Rank Top Seven Tech-related Business Issues for Next 12-18 MonthsAt number 1: Regulatory compliance, specifically protecting PII"The cost of losing or compromising the integrity of PII is also leading to a renewed focus on informationsecurity,“ Greg Grocholski - (ISACA 2008)


DPA ComplianceThe DPA has 8 governing principles:The 4 most relevant for Application testingand development are..1. Fair and lawful processing2. Excessive data3. Security4. Foreign transfersAddressing these areas will ensureboth compliance and best practice


Copying Production Data is extremely CommonProductionDevelopmentProdData


Copying production data is expensive• Copied data is usually out of date by the time it is used fortesting, making time specific tests irrelevant.• New functionality will not have any pertinent data.• Multiple users will set up specific test scenarios which will bedestroyed every time production is re-copied to testing.• Large copies of production data on less powerful testinghardware make queries and searches run slowly and take uplots of expensive disk.• Disk is expensive


The myth of production data


Minimum Data Maximum Coverage


© Grid Tools Ltd 2012Data MaskingData ObfuscatingData ScramblingData AnonymizationData De-PersonalizationData Sanitization


Test Data ManagementCompany confidential - July 2011


Build Your Own Masking?• The scripts to scramble the data tend to get forgotten,are not kept updated and tend to be built by a singleDBA who may move on.• Scripts tend to fall outside normal programmingcontrol and are written in SQL scripts and non standardlanguages such as PERL. These scripts may well beperfect but tend not to be documented, notincorporated in source control systems and are notsubject to testing by the test department.• Database structures tend to change regularly and thescrambling functionality needs to be upgraded witheach release. After a while the scrambling routinestend to be forgotten.


Example MaskingTitle Mrs Mr MrsFirst Name Mary George JuliaLast Name Jones Jones RobertsDate of Birth 23/05/1958 23/07/1954 26/06/1961NI Number WL135645D RS45789DAcount Number 22461542 3467819 78909749Account Total 23.24 20,001,234.12 1,001,050.56Address 1 11 Brooklyn Gardens 12 Hayward Road 11 Brooklyn GardensCity Swansea Oxford SwanseaPost Code SA1 4HR OX2 6XY SA1 4HRTitle Mrs Mr MrTitle Mrs Mr MrFirst Name Simon Frederic JoanneFirst Name First Name First Name First NameLast NameLast Name<strong>Price</strong>Last Name X X1<strong>Price</strong>Last Name X X1KowalskiLast Name X X2DateDate ofofBirthBirth 27/05/195827/05/195827/05/195829/07/195427/05/195828/06/1961NI Number NINUMBER AR7454637E NINUMBER SE563434RAcount Number 11111111 45362718 22222222 34563566 1111111 86939494Account Total 23.24 100 20,001,234.12 100.00 1,001,050.56 100.00Address 1 ADDRESS 12 Hayward 1 Road ADDRESS 11 Brooklyn 1 Gardens ADDRESS 12 Hayward 1 RoadCity CITY Oxford XX 1 CITY Swansea XX2 CITY Oxford XX 1Post CodePost CodePOSTX1OX2 6XYCITY XX2SA1 4HRPOSTX1OX2 6XY


Personally Identifiable InformationRDBMSMeta-dataTransactionalLevelDataTrends &RelationshipsMasking Levels1 stOrder2 ndOrder3 rdOrder4 thOrder


Masking FunctionsDeterministicFunctions• DECRYPT• ENCRYPT• ADD• CHARHASH• FIXED• HASH• REPLACE• TRANSLATE• TRANSPOSE• SQLFUNCTIONRandom Functions & Seed• ADDRANDOM• ADDRANDOMDAYS• INTRANGE• NUMERICRANGE• RANDOM• RANDOMTXT• RANDSSN• SHUFFLE• VARIANCE• RANDLOV• SEQLOV• AMEXCARD• DOB• DOD• EMAIL• GENCARD• MASTERCARD• SEQCHAR• SEQNUMBER• USPHONE• USZIP• VISACARD• SQL FUNCTION• NEXTVAL• NEXTCHARVAL• WHERE• IF LOGIC• DRAW DOWN• FORMATRANDOMMaintain Cross Reference


Know your DataData Sampling Features• <strong>DATA</strong> RELATIONSHIPS• Discovery• Import• Validation• <strong>DATA</strong> SAMPLING• % Scans• <strong>DATA</strong> FILTERING• High Distinct• Mixed Alpha• Number patterns• Etc.• In Seed Tables• SIMILAR <strong>DATA</strong> PATTERNS• BUILD <strong>DATA</strong> CUBES• <strong>DATA</strong> COVERAGE %


Enterprise Data Masking


© Grid Tools Ltd 2011The Devil is in the detail


Handling Nulls – Text Fields - Aggregation


Know your data• Foreign Keys. How are tables related in the database?• Documentation. This is usually held in a variety offormats and applications, however, they are rarelycurrent.• User knowledge. What is the users understanding ofhow and where key data is held and displayed?• Naming standards. A surprisingly good source ofinformation, column names in tables can give a stronghint to their use and relationship to other columns.


Common Masking ArchitectureProdSecureScrambleCross RefSeedSubsetSubsetSubsetMaskedValidationProfilingRestartabilityError HandlingWeb portalWork FlowData SubsetETLDevelopmentTestingSubset 1 Subset 2 Subset 1Subset 3


Creation Process - Overview(1) Sample- Understand the dataconstellations(1) Create- Built the data according to thesampling information and thespecifications(2) Analyse- What does the current sourcedata cover ?22


Creation Process – Sample(1) Understanding the data sampling- Frequency/Distinct Values ?- Min/max ?- Nulls ?- 2-Tuples(2) Categorize Attributes- BK: Business Key [spec]- UV: Unique Value [sampling]- FA: Feature Attribute [sampling](3) Client Data Sensitivity Levels- A, B, C, D23


Covered Data in DevelopmentScenario SubsetProduction AttributesGenerated CoverageCovered SubsetProduction & EnhancedAttributes


Test Data ManagementMost users still use production data – stop it nowData Masking must become fundamental to theDNA of an organizationUse the work you do on Data Masking toimprove your testing and developmentGood Test Data can have dramatic effect onproject success


Test Data Management<strong>Huw</strong>.price@grid-tools.com


© Grid Tools Ltd 2011Extra Slides


Enterprise Test Data ManagementEnterprise Data Masking• Data Discovery• Quality• Sampling• Pattern Algorithms• Masking Functions• Standard• Extensible• Seed Tables – Powerful Data Generation• Dependencies – ETL or Tool or Process• Synchronization• Deterministic / Non Deterministic• Reversible / Non Reversible• It’s up to you


Enterprise Test Data Management• Technology / Architecture / Framework• Code Generation – COBOL – Z/os LUW• SDM – Single executable – smaller/flexible• Script – DB Functions• Extract Mask & Load• In Place masking• RDBMS, Flat Files, Adabas, IMS, Teradata etc.• Auditing• Analysis – Sampling – Checked, Validated, Approved• Masking Code• Masking – code, cross reference - trigger


Scrambling MethodsDynamic Tables


Banking Migration ChainRecon20%InconsistentReconCarriedacross20%Inconsistent10% BadCarriedacrossNot inProp dataRecon20%Inconsistent10% Bad10% BadNot inStage 210% BadNot inProp dataBasicInputDedup &PropagateBook InPrepareExcep. Excep. Excep. TotalMigrationExceptions• Data must be consistent across a reconciliation (e.g. Basic to Book in)• 60% records should be correct• 10% bad records that will fail load• 20% records that are not consistent across recon, so will fail reconciliation• Data volumes should be low (100 records max) so manual verification of resultscan be made


Banking Migration Chain20%Inconsistent10% Bad20%Inconsistent10% Bad20%Inconsistent10% Bad10% BadTDM Team


Offsite Masking and ReturnData ReturnProdFull AuditSecureValidationOldNewScrambleCross RefGold CopyProfilingRestartabilityError HandlingWeb portalView Old/NewDecryptSeedSubset Subset 1 1Subset 1EncryptWork FlowData SubsetETLTS TablespaceFlat FileEncryptDecryptData Repair – Data AnalysisOffsite


Scrambling Methods• A simple lookup to a value in the seed table, for example:select seed_value from (select seed_value, rownum rnfrom seed_dataorder by seed_value)where rn = mod((in_rownum - 1),wk_count) + 1;Will bring back a random value from a seed table.• A simple substation, for example SELECT DECODE(BANK_TYPE,’C’,’S’,’L’,’M’,’P’) willreassign the code values C to S, L to M otherwise P.• Top and bottom Coding. Setting a maximum and minimum value for a column, forexample:SELECT least(least(holiday_days,10) * -1, least(holiday_days + -1,-4)) *-1 holiday_daysFROM PeopleThe above will set the minimum holidays days to 4 and the maximum to 10.• The above techniques are sometimes know as Swapping or multidimensionalTransformations.


Scrambling Methods• Simple independent functions to put in random text, dates and numbers.• Multi table column values, for example, an account number is used in lots of tablesand as an identifier in other applications.• Offset values, for example, if a date is adjusted then other related dates must shift inline with the original date; if a post code is changed then corresponding address linesmust also shift.• Database functions – Every RDBMS comes with a vast library of built in functionsmany of which can be built up to scramble data quite easily.• Toolsets – Tools such as Datamaker come with many pre built functions.• Your own code – Some of the scrambling you need will be very specific to yoursystems, for example, customer numbers can be built up of combinations of locations,dates of birth and partial names. There will be code in your system already that buildsthese numbers so use the same function as part of your scrambling strategy.• The internet – Provides a vast array of free code snippets which can be easily used.


Scrambling MethodsSeed Tables


Scrambling Methods• Adding a small decimal increment to transaction values can mask individualtransactions, for example, SELECT TRANSACTION_AMOUNT + TO_CHAR(SYSDATE,'DD')/ 100 will add from 0.01 to 0.31 to a number dependant on the date.• Adding a number of days to all dates. A very simple transformation to implement,assuming all your dates are identified as date data type. This also has the obviousadvantage of allowing time dependant process testing to be more accurate, anexample would be:SELECT ORDER_DATE + 7 FROM ORDERSBear in mind end of month processing can be affected by this. You may be better offusing a cross reference table to match up periods, for example:


Scrambling MethodsMulti Table Cross Reference


Scrambling Methods• Use a hash function using date, time and rownum as input to create random text ornumber values, for example:translate(to_char(ora_hash(in_rownum + 1,4294967295,in_rownum)),'0123456789','ABCDEFGH')• A simple text replace for a phone number is a perfectly simple way of cleaning data,for example:SELECT ‘212-555-2121’ PHONE_NUMBER, …. FROM


Scrambling MethodsMulti Table Cross ReferenceA simple character by character replacement is an effective technique,basically shift character 5 and 6 in a string identifier to one more less andone more respectively. An example would be:SELECT substr(card_number,1,6) ||translate(substr(card_number,7,1),'0123456789','1234567890') ||substr(card_number,8) CARD_NUMBER FROM CREDIT_CARDSIn this example the 7 th character is being shifted up by one. As long asyou apply the same function to all of the occurrences of thisCARD_NUMBER then the system will retain integrity. Bear in mind thatsometimes the column may be used for other purposes. For example thecolumn could contain „NO CARD NO YET‟, the scrambling function wouldthen fail.


Scrambling MethodsMulti Table Cross ReferenceHashing is a key component in multi column replacement. It allowsvalues to be transformed to the same value every time dependant on ahash key. For example, 1 would be transformed to 7, 2 to 6, 3 to 1 etc.Each value has a unique hashed value and is repeatable based on thehash key. An example of this would be:substr(translate(to_char(ora_hash(in_value,4294967295,in_parm1)),'0123456789','1234567890'),1,9)This would build a hashed Social Security number.Using dynamic seed tables will build an exact replacement value for eachidentifier. You need to protect the seed table as it contains the “key tocrack the code” and you must also protect the offset algorithm, as this canbe used to identify data.


Offset ValuesScrambling MethodsMicro aggregation – The values of prior rows in a set of transactions referto each other. For example a TRANSACTION_BALANCE is dependant ofthe TRANSACTION_AMOUNT and the TRANSACTION_BALANCE of theprior transaction.Application process driven aggregations – Many systems haveapplication components that calculate balances based on transactionthroughput. These tend to be separate processes which can be runstand alone. For example, if you are adjusting transaction amounts thecustomer balances may not match. Running the balance adjustmentprocess may be required to reset these values.


Scrambling MethodsOffset ValuesDates of birth: Adjusting this by a few days, will alter the age andalso the age bracket so a person may move into a different insurancepremium.An Order Date: The Ship Date is after the Order Date, thus adjustingone date means the other must move by a similar amount.


Scrambling MethodsLibrary of Functions


© Grid Tools Ltd 2011Documentation andTraceability


Documentation and traceability• Which columns are sensitive and need scrambling?• Who has access to any scrambling functions, i.e. the codethat scrambles should be protected as well.• A before and after report of what the data has beenchanged from and to. You can use database comparetools such as Datamaker for this or generate triggers toupdate audit tables.• Who has access to any working schemas or files used inthe scrambling process?


Subsetting/Data SlicingSubsetProductionData SamplingData ObfuscationVirtualized ServicesGenerated DataMaskedCreatedApplicationCoverage MetricsData Design<strong>DATA</strong>EXPLOSION<strong>DATA</strong> OBJECTSSOA Test HarnessTest DataWarehouseVERSIONCONTROL<strong>DATA</strong>INHERITANCEWeb / Application PlaybackLoad Testing

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!