Information - Structural Biology Labs

xray.bmc.uu.se

Information - Structural Biology Labs

Critical thinking

Validation & Critical Thinking

What is wrong here

Gerard J. Kleywegt

Uppsala University

Critical thinking

Critical thinking

What is wrong here

(1) The tacR gene regulates the human

nervous system

(2) The tacQ gene is similar to tacR but is

found in E. coli

And what is wrong here

==> So, the tacQ gene regulates the E. coli

nervous system!

Critical thinking

Knowledge pyramid

Of course there is a fine line between

critical thinking and silliness …

• Swedish friends

• Luck

• Longevity

• Insight

• Experience

• Analysis

• Interpretation

• Validation

• Processing

• Visualisation

Nobel Prize

Wisdom

Knowledge

Information

Data

1


Data versus information

Karl Popper - falsifiability

Data

Facts

Observations

Information

Context

Meaning

Interpretation

ATOM 2567 N PHE B 175 7.821 -25.530 -22.848 1.00 8.71

ATOM 2568 CA PHE B 175 8.845 -25.172 -21.877 1.00 9.41

ATOM 2569 C PHE B 175 9.449 -23.798 -22.169 1.00 10.02

ATOM 2570 O PHE B 175 10.664 -23.613 -22.103 1.00 10.37

ATOM 2571 CB PHE B 175 9.928 -26.251 -21.848 1.00 9.53

ATOM 2572 CG PHE B 175 10.969 -26.137 -22.982 1.00 10.03

ATOM 2573 CD1 PHE B 175 12.356 -25.819 -22.988 1.00 10.51

ATOM 2574 CD2 PHE B 175 11.725 -27.211 -23.402 1.00 10.25

ATOM 2575 CE1 PHE B 175 11.821 -27.095 -22.869 1.00 11.17

ATOM 2576 CE2 PHE B 175 12.282 -26.086 -24.008 1.00 10.95

ATOM 2577 CZ PHE B 175 10.953 -26.335 -23.622 1.00 11.38

A theory that is not falsifiable is not scientific

Example

Theory: all swans are white

New observation: black swan (Australia)

New theory 1: Australian ornithologists are incompetent

New theory 2: all swans except Cygnus atratus are white; C.

atratus is black

Astrology versus astronomy

Occam’s razor

Do not make more assumptions than strictly

needed

When you hear hoof beats, think horses, not

zebras (unless you are in Africa!)

KISS principle - Keep It Simple, Stupid

Of two equivalent theories or explanations, all

other things being equal, the simpler one is to

be preferred

Maximum parsimony

Bioinformatics basics

Bioinformatics basics

Don’t always believe what databases /

programs / lecturers tell you!

They (almost) always give you some answer, but …

this can be misleading and is sometimes wrong

Don’t be a naïve user

Garbage in, garbage out

Statistical versus biological significance

Use common sense!

What is the right question to ask

Understand limitations of data, databases,

search algorithms, alignment methods,

prediction methods, etc.

Evaluate result: does it answer your

question Does it make sense

2


Validation

Science, errors & validation

Validation = establishing or checking the

truth or accuracy of (something)

Theory

Hypothesis

Model

Assertion, claim, statement, observation

Prior knowledge

Experiment

Hypothesis

or Model

Observations

Integral part of scientific activity!

Predictions

Precision versus accuracy

Precise, but not very

accurate

Ex: π~4.0053±0.0001

Fairly accurate, but

not very precise

Ex: π~3.1±0.1

Accurate and precise

Ex: π~3.1416±0.0001

Errors affect measurements

Random errors (noise)

Affect precision

Usually normally distributed

Reduce by increasing nr of observations

Systematic errors (bias)

Affect accuracy

Incomplete knowledge or inadequate design

Reproducible

Gross errors (bloopers)

Incorrect assumptions, undetected mistakes or

malfunctions

Sometimes detectable as outliers

Errors affect measurements

Errors affect measurements

How tall is Gerard

Bias

(accuracy)

200 203 202 203 202

201 203 80

Random error

Systematic error

Gross error

Precision (uncertainty; random error)

3


Science, errors & validation


✔ ✔ Experiment ✔


Prior knowledge

✔Observations


Science, errors & validation

Reliable

Experiment

Prior knowledge

Quality

Quantity

Inf. content

Observations

✔ ✔

Parameterisation

Optimised values

✔ ✔

Hypothesis

or Model

Predictions

Random errors ✔

(precision)

Systematic errors ✔

(accuracy)

Other prior

knowledge

Fit

Fit

Hypothesis

or Model

Predictions

Explain

Predict

Independent

observations

Science not immune to Murphy’s Law!

Gross errors ✔

(both)

Correct

Experiments

Structure validation

Structure validation

What type of residue is this

What is wrong with it

How did it end up in the PDB

Structure validation

Resolution

Should we trust the PDB

Structures are based on

experimental data

Amount of data differs

Structures are

interpretations of data

PDB must accept all

depositions

Low resolution

Little detail

High resolution

Much detail

4


Resolution

Interpretation

1ISR 4.0 Å

1EA7 0.9 Å

Structure validation

Torsion angles

Validation alert:

The arrow points

the wrong way!!!

Users of structures must make sure that

these are reliable for their purposes

Ramachandran plot

Fit of model and electron density

(http://eds.bmc.uu.se/)

Validation tutorial:

http://xray.bmc.uu.se/embo2001/modval/

Dihedral or torsion angle - given 4

sequential, bonded atoms A-B-C-D

Dihedral = angle between the planes

ABC and BCD

Torsion = looking at the projection

along bond B-C, the angle over

which one has to rotate A to bring it

on top of D (clockwise = positive)

note: torsion (ABCD) = torsion

(DCBA)

phi = torsion (C[i-1]-N[i]-Cα[i]-C[i])

psi = torsion (N[i]-Cα[i]-C[i]-N[i+1])

Ramachandran plot

Ramachandran plot

Favourable regions in

the Ramachandran

plot

Steric clashes (pink dashed lines) develop during rotation

around phi (left) and psi (right)

Only certain phi, psi combinations are sterically

favourable/allowed: Ramachandran plot

Good models have

very few residues

outside these regions

If there are any, there

is usually a good

reason

5


Ramachandran plot

PDBsum

Good model:

Few outliers

Strong concentration in core regions

Same structure, different data

Electron density fit

Global quality

important

Local quality also

Active site

Ligand

Substrate analogue

Metal-binding site

Important loop


Good fit of model

and density

Electron density fit

Electron density fit

Poor fit of model

and density

6


PDBreport

Oops!

} !!!

http://swift.cmbi.ru.nl/gv/pdbreport/

Playing the Blame Game …

Playing the Blame Game …

Why do errors make it into the literature

and the PDB Who is to blame

Suggestions from students

Cold Spring Harbor course, 2005

Copenhagen University course, 2006

Playing the Blame Game …

Crystallographer (ignorance, lack of experience,

incompetence, incorrect preconceptions/bias, cheating,

laziness, “science by mouse-click”, stress, can’t be

bothered to fix minor problems, no validation)

PI (pressure to publish/graduate fast, career interest,

competition, grant writing, insufficient supervision)

Referees/Editors (lazy, inadequate reviewing routines, no

access to raw data, “validation by senior author name”,

lack of experience)

Software (misses or causes errors)

PDB (doesn’t check)

External (competition/danger of being scooped)

Nature (limitations of the technique/resolution, errors hard

to detect, poor data)

7

More magazines by this user
Similar magazines