Manchester - National Genetics Reference Laboratories
Manchester - National Genetics Reference Laboratories
Manchester - National Genetics Reference Laboratories
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
NATIONAL GENETICS REFERENCE LABORATORY<br />
(<strong>Manchester</strong>)<br />
MLPA analysis spreadsheets – User Guide (updated October 2006)<br />
INTRODUCTION<br />
These spreadsheets are designed to assist with MLPA analysis using the kits available from<br />
MRC-Holland (see website at http://www.mrc-holland.com/). The spreadsheets have been<br />
created in Microsoft Excel 2003. They are intended to simplify and streamline the process of<br />
analysing complex MLPA data.<br />
Input data for the spreadsheets may be either peak heights or peak areas. We do however<br />
recommend using peak heights. Comparisons between the peak heights and peak areas as<br />
measures of peak intensity has shown that the variance of peak area measurements are<br />
consistently higher than those for peak heights. This may be due to peak smoothing or the<br />
arbitrary cut-off of peaks that occurs in fragment analysis programs. Peak heights appear to<br />
be a simpler and therefore more consistent measure than peak area<br />
If you have any suggestions for improvements or modifications to the spreadsheets I would<br />
be grateful for any feedback. I can be contacted on andrew.wallace@cmmc.nhs.uk.<br />
DESCRIPTION OF SPREADSHEETS<br />
The spreadsheets have been split into five “worksheets” or pages<br />
RAW DATA – This page is used for data entry. The cells on the page have been laid out in<br />
order to ensure minimal user intervention in transferring data from the fragment analysis<br />
package. For instance, output from a Genotyper or GeneMapper table, if correctly configured,<br />
can be pasted directly onto the cells indicated on the RAW DATA sheet.<br />
Fig 1 shows a RAW DATA sheet with data from the BRCA1 MLPA analysis spreadsheet.<br />
FIG1:<br />
NGRL(<strong>Manchester</strong>) MLPA analysis spreadsheet instructions v10-06
RESULTS – as the title suggests this page displays the results of the analysis. The results<br />
from the test samples are analysed in comparison with a group of 5 normal controls (see the<br />
analysis section for a more detailed description of the method of analysis). The results are<br />
displayed in four principal ways (i) as dosage quotients (DQs) gridded for each ligation<br />
product versus each control ligation product (ii) graphically as mean dosage quotient for each<br />
ligation product (iii) as a likelihood probability and odds for each ligation product calculated<br />
for one of three hypotheses – that the dosage is normal (2n copies), that the dosage is<br />
deleted (n copies) or that the dosage is duplicated (3n copies).<br />
Fig 2 shows the RESULTS sheet for some typical data entered into the BRCA1 MLPA<br />
spreadsheet.<br />
FIG2:<br />
CALC1 – this sheet is simply used for calculation. The dosage data is first normalised on this<br />
sheet depending on the signal strength of the control amplimers. The deviations of each test<br />
sample ligation product are also calculated on this sheet relative to the mean and standard<br />
deviation of the 5 normal controls.<br />
CALC2 – this sheet is also used for calculation. On this sheet the peak heights of each<br />
ligation product are divided by every other peak height within a sample to yield a ratio. This<br />
is then divided by the equivalent figure derived from the average of the five normal controls<br />
to yield the dosage quotients displayed on the RESULTS worksheet.<br />
REGRESSIONS – this sheet is used to correct for any data that slopes relative to increasing<br />
molecular weight of the product. We have found artefacts in data causing slope due to<br />
differences in the electrokinetic injection sample loading process used in capillary<br />
electrophoresis. Data is normalised on this page using a linear regression model based on the<br />
degree of sloping of the control ligation products. In kits where there are no clear control<br />
ligation products e.g. Human Telomere MLPA kits P069/P070, then all ligation products are<br />
used in the calculation of slope.<br />
NGRL(<strong>Manchester</strong>) MLPA analysis spreadsheet instructions v10-06
METHOD OF ANALYSIS<br />
Background<br />
Analysis of dosage data can be problematic. Dosage data is quantitative yet in diagnostics we<br />
require a binary (Yes or No) answer. Analysis of dosage data by the use of dosage quotients<br />
(DQs) has been generally accepted as the standard method of analysis for several years.<br />
These worksheets analyse data to produce DQs in the standard way; however, they also<br />
incorporate two novel features of analysis to aid with interpretation.<br />
The first generates a likelihood probability of concordance with one of three hypotheses.<br />
Namely that a ligation product is present at either one two or three copies within the test<br />
sample. This figure is generated by comparing the test sample to a series of five normal<br />
controls. The controls are used to give a measure of the variability for each ligation product<br />
and allows the probability of deviation from expectation of the test sample to be estimated<br />
using the t-statistic.<br />
The second acts as a control for the overall quality of an individual test by measuring the<br />
standard deviation of the DQs obtained for all the control ligation products. If the standard<br />
deviation exceeds 0.1 then the sample is deemed to be of poor quality. Studies carried out by<br />
Dr Ruth Charlton, Regional <strong>Genetics</strong> Service, Leeds have shown that there is no overlap<br />
between DQs of deleted, normal and duplicated DQs of samples where the standard deviation<br />
of the control ligation products do not exceed 0.1. Thus excluding samples with higher<br />
degrees of variability substantially reduces the possibility of making an incorrect diagnosis.<br />
The analysis process<br />
Firstly the data is input on the RAW DATA worksheet. The format of this sheet has been to<br />
designed to facilitate the input of data directly from fragment analysis applications with the<br />
minimum user intervention e.g. Genotyper/Genemapper. This data is then presented in a<br />
more amenable format either at the top or bottom of the RESULTS worksheet.<br />
Each test and control sample’s data is normalised by summing the total control peak height<br />
and dividing each ligation product’s peak height by this figure. Carrying out this step is<br />
necessary in order for meaningful measurements of the variability between control samples to<br />
be measured. The control and test data is then ‘equalised’ by dividing the normalised peak<br />
height by the mean peak height of all five controls. Both these stages are carried out on the<br />
CALC1 sheet.<br />
The next step that is carried out is to correct for ‘sloping’. This is achieved by carrying out a<br />
linear regression of all the equalised control products (or all equalised products if there are no<br />
control products) against the mean of the five control samples and correcting the equalised<br />
peak heights for the slope of any regression calculated according to the molecular weight of<br />
each peak. This stage is carried out on the REGRESSIONS sheet.<br />
Dosage quotients (DQs) are next calculated firstly by dividing each slope corrected ligation<br />
product peak height by each slope corrected control ligation product peak height for the<br />
average of all five control samples to create a matrix or grid of values. The same set of<br />
calculations are carried out for each of the test samples. These matrices are displayed on the<br />
CALC2 sheet. The dosage quotients are then calculated by dividing the test sample matrix by<br />
the control mean matrix. Dosage quotients (DQs) are displayed on the RESULTS sheet.<br />
The mean and standard deviation of each ligation product is then calculated for the five<br />
normal controls.<br />
The fit to each of the three hypotheses (deleted, normal or duplicated) is then calculated as<br />
follows. For the normal (2n copies) hypothesis, the difference of each test sample’s ligation<br />
product normalised peak height from the mean of the control samples is calculated as a<br />
number of standard deviations. For the deleted hypothesis (n copies) the assumption is made<br />
that if the test sample is heterozygously deleted for a ligation product that the peak should<br />
NGRL(<strong>Manchester</strong>) MLPA analysis spreadsheet instructions v10-06
e half-height and thus doubling the normalised peak height should therefore yield good fit<br />
with the control data. Thus doubled normalised peak heights are compared with the<br />
corresponding mean control amplimers to yield a difference as numbers of standard<br />
deviations for the deleted hypothesis. Finally, to test fit to the duplicated (3n) hypothesis the<br />
assumption is that if duplicated the test sample should be 1.5 times normal height thus<br />
multiplying the test sample normalised peak height by 2/3 should yield good fit with the<br />
control data.<br />
The three differences representing the three competing hypotheses are then converted into<br />
probabilities of deviation using the t-statistic. The precise probability for each amplimer is<br />
thus determined by two factors (i) the underlying variability in the batch of five normal<br />
controls for that particular ligation product and (ii) the size of the difference between the test<br />
sample for that ligation product and the control samples.<br />
Finally the relative likelihood of each of the three competing hypotheses is calculated for each<br />
ligation product as an odds ratio to indicate which hypothesis is more likely. For instance if<br />
the observed deviation from the normal hypothesis of the test sample is predicted to occur in<br />
10% cases and the deviation from the deleted hypothesis is predicted to occur in 0.1% of<br />
cases then the relative odds of the normal to deleted hypotheses is 100:1 in favour of the<br />
normal hypothesis.<br />
Fig 3 illustrates the method used for calculating relative likelihoods. The three curves<br />
represent the relative probability distribution of dosage quotient for a given ligation product<br />
for each of the hypotheses, n – deleted, 2n – normal, 3n – duplicated. The probability<br />
distribution is calculated in practice by the t-statistic. In the illustrated example the measured<br />
DQ of 0.9 equates to a probability of this being a normal result of 0.40, a probability of being<br />
a deleted result of 0.0009 and a probability of being a duplicated result of 0.0006. Dividing<br />
the Normal probability by the Deleted probability yields an odds ratio of 444:1 and dividing<br />
the Normal by the Duplicated probability yields an odds ratio of 667:1.<br />
Fig 3:<br />
p<br />
n 2n 3n<br />
DQ<br />
0.5 1.0 1.5<br />
DQ = 0.9; p(2n) = 0.40<br />
p(n) = 0.0009; p(3n) = 0.0006<br />
Odds Norm:Del = 444:1<br />
Odds Norm:Dup = 667:1<br />
USE OF AND INTERPRETATION OF DATA ON THE WORKSHEETS<br />
The spreadsheet may be started by simply opening the file in Excel or double clicking the file<br />
icon in Windows Explorer. Depending on the levels of Macro Security set on your workstation<br />
you may or may not be informed that the spreadsheet contains macros and given the<br />
opportunity to enable or disable them but if macros are enabled a small dialog box is<br />
presented to the user (see Fig 4). The user must then enter both their name and<br />
corresponding laboratory worksheet number in order to proceed with analysis. The user name<br />
and Worksheet number are then entered on the worksheet within locked cells for audit<br />
purposes. This feature can be bypassed if preferred as some spreadsheets still function<br />
normally with macros disabled, however it is important to note that some do not e.g. P069<br />
and P070 Human Telomeres. Any spreadsheets where macros need to be enabled in order for<br />
the spreadsheet to function normally will state this in the Release Notes. Should you need to<br />
NGRL(<strong>Manchester</strong>) MLPA analysis spreadsheet instructions v10-06
change the Macro security setting to allow macros to run, this can be done as follows in MS<br />
Excel 2003<br />
FIG 4:<br />
• On the Tools menu, click Options.<br />
• Click the Security tab.<br />
• Under Macro Security, click Macro Security.<br />
• Click the Security Level tab, and then select the security level you want to use –<br />
Medium is recommended,<br />
Data may now be copied and pasted directly from a Genotyper or equivalent output table in<br />
Excel format onto the RAW DATA sheet. The yellow cells indicate the cell(s) to select when<br />
pasting data.<br />
Fig 5 illustrates the appearance of the RAW DATA sheet prior to entering data.<br />
FIG 5:<br />
The spreadsheets must have five normal controls in order to function correctly. Once data<br />
has been pasted into position the “Cross Ref” column can be used to ensure that the data has<br />
been pasted into the sheet in the correct order provided the pasted data also includes data<br />
labels.<br />
NGRL(<strong>Manchester</strong>) MLPA analysis spreadsheet instructions v10-06
Fig 6 illustrates some data showing the concordance between the LABEL 1 field from the<br />
pasted data and the “CROSS REF” field.<br />
FIG 6:<br />
Space is allocated for a deletion or positive mutation control. Although this is not essential for<br />
the spreadsheet to function properly we strongly recommend that at least one positive<br />
control is run with each batch of samples. The data entry form is configured to accept up to<br />
10 test samples.<br />
Once the test samples have been entered onto the RAW DATA sheet the results may be<br />
visualised by clicking on the RESULTS tab.<br />
On the RESULTS sheet, the control and test sample raw data are represented at the top of<br />
the sheet. Further down the sheet the analysed data are presented with the deletion control<br />
displayed first followed by up to 10 test samples. To the left of each sample’s analysed data<br />
is a set of cells in which the sample name/lab no (from the raw data sheet), user and<br />
worksheet number (as entered in the dialog box when the worksheet was opened) is<br />
displayed for record keeping purposes. Beneath the worksheet information is a cell labelled<br />
“Int QC Stand Dev”. The cell directly below will either be coloured green if the sample quality<br />
is judged as good or red if it is judged poor. Sample quality is estimated by measuring the<br />
standard deviation of all the test ligation products measured against each other. As outlined<br />
previously retrospective analysis of MLPA data has shown that samples with control standard<br />
deviations less than 0.1 show no overlap between normal, duplicated and deleted ranges (Dr<br />
Ruth Charlton).<br />
Fig 7a illustrates the appearance of a typical sample where the data quality has been judged<br />
to be poor.<br />
FIG 7a:<br />
In worksheets where there are no control ligation products e.g. P069 and P070 Human<br />
telomeres, any cells that have been excluded from the quality control calculation due to them<br />
being possibly deleted or duplicated are listed in the cell below the data quality cell. If no<br />
cells have been excluded i.e. none have been judged as potentially deleted/duplicated the<br />
text ‘Omitted;None’ appears.<br />
NGRL(<strong>Manchester</strong>) MLPA analysis spreadsheet instructions v10-06
FIG 7b:<br />
To the right of this sample information, the results are presented in a tabular format. For<br />
each sample, the upper rows are a series of dosage quotients gridded out for each ligation<br />
product (control and test) horizontally versus each control ligation product vertically. These<br />
cells are conditionally formatted to highlight deleted/duplicated and aberrant results. The<br />
actual settings that have been set for the conditional formats are given to the right of the raw<br />
data and may vary depending on the spreadsheet but typical ranges are as follows:<br />
Normal DQ 0.85 – 1.15<br />
Deleted DQ 0.35 – 0.65<br />
Duplicated DQ 1.35 – 1.65<br />
Equivocal DQ 0.65-0.85 & 1.15-1.35<br />
Fig 8 illustrates a test sample showing a section of DQ results from a sample. The majority of<br />
DQs fall into the normal range and have a white background. A patch of three ligation<br />
products (BRCA1 Exons 1A, 1B and 2) all have reduced DQs within the deleted range of 0.35-<br />
0.65 and are shaded aqua (ringed). A single DQ measurement, that between the control<br />
ligation product C11p13 and C12p12 lies in the “equivocal” range and is shaded a cream<br />
colour.<br />
Fig 8:<br />
Beneath the gridded DQ data lie (typically) two rows labelled in blue type above three rows<br />
labelled in green type. These rows hold the probabilistic analysis of the sample’s mean DQ<br />
result for each ligation product. The green rows contain absolute probabilities measured by<br />
the t-statistic of the difference between the observed mean DQ for that ligation product and<br />
the expected DQ as estimated from the panel of 5 normal controls for each of the possible<br />
dosages (normal, deleted and duplicated). A 60% probability in the row for the normal<br />
hypothesis indicates that a random normal sample would be expected to deviate by at least<br />
this amount in 60% of tests. These cells are also conditionally formatted to highlight<br />
abnormal or equivocal results. The precise limits may vary from spreadsheet to spreadsheet<br />
but a key is given to the right of the control data at the top of the RESULTS sheet.<br />
The blue rows contain pairwise comparisons between the relative probabilities for the<br />
alternative hypotheses given as an odds ratio. With good quality data the odds ratios<br />
although varying should clearly favour one hypothesis over another. If the “normal” i.e. 2n<br />
hypothesis is clearly favoured (i.e. an odds ratio of >=20:1 for normal) the cell is<br />
conditionally formatted to have a green background. A clear odds ratio in favour of an<br />
abnormal hypothesis (i.e. an odds ration of >=1:20 in favour of a deleted or duplicated<br />
hypothesis) gives a cell with a red background. Any odds ratio giving equivocal results is<br />
highlighted with a cream background.<br />
NGRL(<strong>Manchester</strong>) MLPA analysis spreadsheet instructions v10-06
Figure 9 illustrates some typical results. Most of the odds ratios for the shown ligation<br />
products strongly favour the normal hypothesis and are consequently shaded in with a green<br />
backround. However the ligation product for BRCA1 Ex 13 is showing high DQs, consequently<br />
the fit is better to the duplicated hypothesis than either the normal or deleted hypotheses.<br />
Comparing the relative probabilities of fit to the duplicated and normal hypotheses the<br />
relative likelihood is calculated as 146:1 in favour of the sample being duplicated than the<br />
result being a normal outlier experienced by chance.<br />
Fig 9:<br />
The spreadsheets also incorporate a graphical representation of the results in the form of a<br />
histogram. This is located to the right of each test sample on the RESULTS sheet.<br />
Fig 10 illustrates a histogram from the HNPCC MLPA spreadsheet summarising the mean DQ<br />
data. This particular sample gave normal odds ratios for all ligation products except that for<br />
hMLH1 exon 2 which appeared to be deleted. This was subsequently confirmed by further<br />
analysis.<br />
FIG 10:<br />
INTERNAL QUALITY CONTROL<br />
Internal quality control is an important consideration for all diagnostic laboratories. Although<br />
the extra tiers of analysis given in the spreadsheets assist in the analysis of MLPA dosage<br />
data, the meaning and significance of some results will still remain a matter of professional<br />
judgement. The precise limits of what is an acceptable result must remain the responsibility<br />
of each laboratory to determine, however what follows are guidelines that have been found<br />
to be useful locally.<br />
1) In order to be an acceptable result the “Quality” value (standard deviation of the control<br />
ligation products) should have a value of = 20:1 odds then the sample can<br />
be judged to be normal.<br />
3) If two or more contiguous ligation products favour a deleted or duplicated hypothesis with<br />
>= 1:20 odds then the sample can be judged to be abnormal provided the remaining ligation<br />
products give odds ratios for the normal hypothesis of >= 20:1 odds. It still remains good<br />
NGRL(<strong>Manchester</strong>) MLPA analysis spreadsheet instructions v10-06
practice to confirm any mutations by repeating the analysis on a replicate MLPA analysis or<br />
on an affected first degree relative<br />
4) If a single ligation product gives a >= 1:20 odds ratio in favour of deletion/duplication<br />
then further evidence must be obtained to confirm this result. Firstly the sample should be<br />
sequenced to establish that there are no mutations present beneath the ligation product<br />
hybridisation sites (this has been a frequently cause of a ligation product which appears<br />
deleted). Secondly the deletion/duplication should either (i) be confirmed using a separate<br />
assay e.g. long PCR, dosage PCR (ii) be confirmed on a separate DNA sample (not just a new<br />
replicate MLPA assay) or (iii) be confirmed on an affected first degree relative.<br />
5) Where a sample otherwise appears normal yet the odds ratios for normality drop below<br />
>= 20:1 for some test ligation products this can be accepted to be a normal result provided<br />
(i) no one normal odds ratio drops below 5:1 in favour of normality (ii) that contiguous exons<br />
are not involved (iii) that no more than 10% of test ligation products are involved (iv) that<br />
none of the mean DQs falls into the transitional range (0.7-0.8 or 1.2-1.3). Please note that in<br />
the presence of a deletion the normal:duplication comparison is meaningless and will yield<br />
equivocal odds the same applies for the normal:deletion comparison for a duplicated sample.<br />
SPREADSHEETS FOR NEW MLPA ASSAYS<br />
If there is an MLPA assay in use in your laboratory for which you would like us to design a<br />
spreadsheet please contact me by email at andrew.wallace@cmmc.nhs.uk<br />
CONFIGURATION OF EXISTING SPREADSHEETS<br />
If there is an existing spreadsheet for which the data entry page is not compatible with your<br />
fragment analysis output, I can usually quite simply alter these to suit your requirements. In<br />
order to do this please email to me at andrew.wallace@cmmc.nhs.uk a sample of your data<br />
which you would like to import directly into the spreadsheets<br />
MODIFICATIONS TO EXISTING PROBE SETS<br />
MRC-Holland review their kits and often change the probe combinations in response to their<br />
customer demand. If you notice that this has occurred and the currently available<br />
spreadsheet will no longer analyse the data please let me know by email at<br />
andrew.wallace@cmmc.nhs.uk as I can usually make minor modifications quite easily.<br />
ACKNOWLEDGEMENTS<br />
The quality control measure using the standard deviation of the control ligation products was<br />
developed by Dr Ruth Charlton, Regional <strong>Genetics</strong> Service, Leeds<br />
NGRL(<strong>Manchester</strong>) MLPA analysis spreadsheet instructions v10-06