21.01.2013 Views

Multifile Patent Sequence Searching on STN - STN International

Multifile Patent Sequence Searching on STN - STN International

Multifile Patent Sequence Searching on STN - STN International

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<str<strong>on</strong>g>Multifile</str<strong>on</strong>g> <str<strong>on</strong>g>Patent</str<strong>on</strong>g> <str<strong>on</strong>g>Sequence</str<strong>on</strong>g> <str<strong>on</strong>g>Searching</str<strong>on</strong>g> <strong>on</strong> <strong>STN</strong> ®<br />

Robert Austin – FIZ Karlsruhe


Agenda<br />

• <str<strong>on</strong>g>Sequence</str<strong>on</strong>g> searchable databases <strong>on</strong> <strong>STN</strong> ®<br />

• Step-by-step through a multifile BLAST search<br />

• <str<strong>on</strong>g>Multifile</str<strong>on</strong>g> post-processing using <strong>STN</strong> Express<br />

• Overview of the search results<br />

• Summary and resources<br />

See also: <str<strong>on</strong>g>Sequence</str<strong>on</strong>g> Basics e-Seminar (June 2010):<br />

http://www.stn-internati<strong>on</strong>al.com/<str<strong>on</strong>g>Sequence</str<strong>on</strong>g>_Basics_Seminar.html<br />

2


<strong>STN</strong> sequence searchable databases<br />

• DGENE<br />

– Thoms<strong>on</strong> Reuters GENESEQ TM<br />

– Value-added patent sequence data from around the globe<br />

• USGENE<br />

– The USPTO Genetic <str<strong>on</strong>g>Sequence</str<strong>on</strong>g> Database<br />

– All available sequence data from the USPTO<br />

• PCTGEN<br />

– WIPO/PCT <str<strong>on</strong>g>Patent</str<strong>on</strong>g> Applicati<strong>on</strong> Biosequences<br />

– All available e-published sequence data from WIPO<br />

• CAS REGISTRY<br />

– Chemical Abstracts Service (CAS) REGISTRY<br />

– Worldwide value-added patent and n<strong>on</strong>-patent sequences<br />

3


DGENE, USGENE and PCTGEN offer three<br />

sequence search modes<br />

• <str<strong>on</strong>g>Sequence</str<strong>on</strong>g> Code Match (motif) searching<br />

– Using the RUN GETSEQ command<br />

• BLAST similarity<br />

– Using the RUN BLAST command<br />

• FASTA similarity<br />

– Using the RUN GETSIM command<br />

Note: this e-Seminar covers BLAST.<br />

4


CAS REGISTRY/CAplus offers two<br />

sequence search modes<br />

• <str<strong>on</strong>g>Sequence</str<strong>on</strong>g> Code Match (motif) searching<br />

– Using the Search (=> S) command<br />

• BLAST similarity<br />

– Using a separate Graphic User Interface<br />

Note: this e-Seminar covers BLAST.<br />

5


<str<strong>on</strong>g>Multifile</str<strong>on</strong>g> patent sequence searching<br />

Search Questi<strong>on</strong>:<br />

Find all patents that disclose Homo sapiens Damino-acid<br />

oxidase (NCBI NP_001908), or<br />

similar sequences (≥ 80%):<br />

MRVVVIGAGVIGLSTALCIHERYHSVLQPLDIKVYADRFTPLTTTDVAAGLWQPYLS<br />

DPNNPQEADWSQQTFDYLLSHVHSPNAENLGLFLISGYNLFHEAIPDPSWKDTVLGF<br />

RKLTPRELDMFPDYGYGWFHTSLILEGKNYLQWLTERLTERGVKFFQRKVESFEEVA<br />

REGADVIVNCTGVWAGALQRDPLLQPGRGQIMKVDAPWMKHFILTHDPERGIYNSPY<br />

IIPGTQTVTLGGIFQLGNWSELNNIQDHNTIWEGCCRLEPTLKNARIIGERTGFRPV<br />

RPQIRLEREQLRTGPSNTEVIHNYGHGGYGLTIHWGCALEAAKLFGRILEEKKLSRM<br />

PPSHL<br />

(Search c<strong>on</strong>ducted <strong>on</strong> 7 th July 2010)<br />

6


<str<strong>on</strong>g>Multifile</str<strong>on</strong>g> search strategy<br />

1) RUN BLAST in DGENE, USGENE and PCTGEN<br />

using offline BATCH mode<br />

2) Merge, organize by patent family, and display<br />

DGENE, USGENE and PCTGEN results<br />

3) Repeat the search using CAS REGISTRY BLAST<br />

4) Retrieve, identify, and display unique CAS<br />

REGISTRY BLAST CAplus records<br />

5) Post-process DGENE, USGENE and PCTGEN<br />

results using the <strong>STN</strong> Express Table Tool<br />

6) Post-process unique REGISTRY BLAST results<br />

using the BLAST Report Tool<br />

7


SAVE, UPLOAD and VERIFY the query<br />

• Prepare and save the query as a plain text file in<br />

a suitable text editor, e.g. Windows Notepad<br />

8


SAVE, UPLOAD and VERIFY the query (c<strong>on</strong>t.)<br />

(a) Click Upload <str<strong>on</strong>g>Sequence</str<strong>on</strong>g><br />

(b) Choose the query file<br />

(c) Select the <strong>STN</strong> database<br />

(a)<br />

From the Discover! butt<strong>on</strong> menu.<br />

(b)<br />

(c)<br />

The sequence becomes a Query<br />

L-number in the database of<br />

choice for use with RUN BLAST.<br />

9


SAVE, UPLOAD and VERIFY the query (c<strong>on</strong>t.)<br />

=> FILE USGENE<br />

=> UPL R BLAST<br />

Commands in red are automatically run by the<br />

<strong>STN</strong> Express <str<strong>on</strong>g>Sequence</str<strong>on</strong>g> Query Upload wizard.<br />

Uploading C:\. . . .\NP_001908 Homo sapiens DAO.txt<br />

UPLOAD SUCCESSFULLY COMPLETED<br />

L1 GENERATED<br />

Verify the sequence was uploaded<br />

=> D L1 LQUE successfully with D LQUE.<br />

L1 ANSWER 1 USGENE COPYRIGHT 2010 SEQUENCEBASE CORP <strong>on</strong> <strong>STN</strong><br />

LQUE MRVVVIGAGVIGLSTALCIHERYHSVLQPLDIKVYADRFTPLTTTDVAAGLWQPYLSD<br />

PNNPQEADWSQQTFDYLLSHVHSPNAENLGLFLISGYNLFHEAIPDPSWKDTVLGFRK<br />

LTPRELDMFPDYGYGWFHTSLILEGKNYLQWLTERLTERGVKFFQRKVESFEEVAREG<br />

ADVIVNCTGVWAGALQRDPLLQPGRGQIMKVDAPWMKHFILTHDPERGIYNSPYIIPG<br />

TQTVTLGGIFQLGNWSELNNIQDHNTIWEGCCRLEPTLKNARIIGERTGFRPVRPQIR<br />

LEREQLRTGPSNTEVIHNYGHGGYGLTIHWGCALEAAKLFGRILEEKKLSRMPPSHL<br />

The sequence query is now ready for searching directly in<br />

DGENE, USGENE, or PCTGEN using the L-number (L1).<br />

10


RUN the DGENE, USGENE and PCTGEN<br />

BLAST searches in BATCH mode<br />

=> FILE DGENE<br />

FILE 'DGENE' ENTERED AT 17:05:31 ON 07 JUL 2010<br />

COPYRIGHT (C) 2010 THOMSON REUTERS<br />

=> RUN BLAST L1 /SQP -F F BATCH<br />

PLEASE ENTER BATCH IDENTIFIER (MAX. 8 CHARS):DAOP<br />

TO BE NOTIFIED WHEN THIS BATCH SEARCH IS COMPLETE,<br />

PLEASE ENTER YOUR EMAIL ADDRESS (MAX. 50 CHARS) OR "NONE"<br />

INPUT: OR (END):ROBERT.AUSTIN@FIZ-KARLSRUHE.DE<br />

BLAST Versi<strong>on</strong> 2.2<br />

The BLAST software is used herein with permissi<strong>on</strong> of the<br />

Nati<strong>on</strong>al Center for Biotechnology Informati<strong>on</strong> (NCBI) of<br />

the Nati<strong>on</strong>al Library of Medicine (NLM). . . .<br />

BATCH PROCESSING STARTED FOR DAOP<br />

Add BATCH to the end of<br />

a RUN BLAST command<br />

to search in offline batch<br />

search mode.<br />

New!<br />

Enter a valid email<br />

address to be notified<br />

when the BATCH<br />

search is completed.<br />

11


RUN the DGENE, USGENE and PCTGEN<br />

BLAST searches in BATCH mode (c<strong>on</strong>t.)<br />

=> FILE USGENE<br />

=> RUN BLAST L1 /SQP -F F BATCH<br />

. . . .<br />

PLEASE ENTER BATCH IDENTIFIER (MAX. 8 CHARS):DAOP<br />

. . . .<br />

=> FILE PCTGEN<br />

=> RUN BLAST L1 /SQP -F F BATCH<br />

. . . .<br />

PLEASE ENTER BATCH IDENTIFIER (MAX. 8 CHARS):DAOP<br />

. . . .<br />

=> LOG H<br />

Note: DGENE, USGENE and<br />

PCTGEN BLAST searches can be<br />

run in parallel using BATCH mode.<br />

Turn the Low Complexity Filter off<br />

with the syntax: /SQP –F F<br />

Tip: use LOGOFF HOLD (LOG H)<br />

to be able to return to the same<br />

<strong>STN</strong> sessi<strong>on</strong> within two hours.<br />

SESSION WILL BE HELD FOR 120 MINUTES<br />

<strong>STN</strong> INTERNATIONAL SESSION SUSPENDED AT 17:07:14 ON 07 JUL 2010<br />

12


Retrieve the BATCH search results<br />

=> FILE DGENE<br />

FILE 'DGENE' ENTERED AT 17:11:25 ON 07 JUL 2010<br />

COPYRIGHT (C) 2010 THOMSON REUTERS<br />

=> RUN GETBATCH DAOP<br />

Use RUN GETBATCH to retrieve<br />

Please enter your batch identifier completed BATCH search results.<br />

or enter # for batch id list<br />

or enter * for batch id at top of list<br />

or enter - before batch id to delete<br />

or enter . for (end)<br />

Database DGENE AA<br />

Posted date: Jun 25, 2010 11:33 PM<br />

. . . .<br />

ENTER EITHER THE NUMBER OF ANSWERS YOU WISH TO KEEP<br />

OR ENTER MINIMUM PERCENT OF SELF SCORE FOLLOWED BY %<br />

(BEST ANSWER PERCENTAGE OF SELF SCORE IS 100%)<br />

ENTER (ALL) OR ? :80%<br />

L2 RUN STATEMENT CREATED<br />

L2 19 MRVVVIGAGVIGLSTALCIHERYHSVLQPLDIKVYAD. . . MPPSHL/SQP.-F F<br />

In this example, 80% of the Query<br />

Self Score is used to select out<br />

just the most relevant results (L2).<br />

Answer set arranged by accessi<strong>on</strong> number; to sort by descending<br />

similarity score, enter at an arrow prompt (=>) "sor score d".<br />

13


Retrieve the BATCH search results (c<strong>on</strong>t.)<br />

=> FILE USGENE<br />

=> RUN GETBATCH DAOP<br />

. . . .<br />

ENTER EITHER THE NUMBER OF ANSWERS YOU WISH TO KEEP<br />

OR ENTER MINIMUM PERCENT OF SELF SCORE FOLLOWED BY %<br />

(BEST ANSWER PERCENTAGE OF SELF SCORE IS 100%)<br />

ENTER (ALL) OR ? :80%<br />

L3 RUN STATEMENT CREATED<br />

L3 14 MRVVVIGAGVIGLSTALCIHERYHSVLQPLDIKVYAD. . . MPPSHL/SQP.-F F<br />

=> FILE PCTGEN<br />

=> RUN GETBATCH DAOP<br />

. . . .<br />

Use RUN GETBATCH to retrieve<br />

completed BATCH search results.<br />

ENTER EITHER THE NUMBER OF ANSWERS YOU WISH TO KEEP<br />

OR ENTER MINIMUM PERCENT OF SELF SCORE FOLLOWED BY %<br />

(BEST ANSWER PERCENTAGE OF SELF SCORE IS 100%)<br />

ENTER (ALL) OR ? :80%<br />

L4 RUN STATEMENT CREATED<br />

L4 3 MRVVVIGAGVIGLSTALCIHERYHSVLQPLDIKVYAD. . . MPPSHL/SQP.-F F<br />

14


<str<strong>on</strong>g>Multifile</str<strong>on</strong>g> search strategy<br />

1) RUN BLAST in DGENE, USGENE and PCTGEN<br />

using offline BATCH mode<br />

2) Merge, organize by patent family, and display<br />

DGENE, USGENE and PCTGEN results<br />

3) Repeat the search using CAS REGISTRY BLAST<br />

4) Retrieve, identify, and display unique CAS<br />

REGISTRY BLAST CAplus records<br />

5) Post-process DGENE, USGENE and PCTGEN<br />

results using the <strong>STN</strong> Express Table Tool<br />

6) Post-process unique REGISTRY BLAST results<br />

using the BLAST Report Tool<br />

15


Merge the results into a single L-number<br />

=> SET DUPORDER FILE<br />

SET COMMAND COMPLETED<br />

=> DUP IDE L2 L3 L4<br />

FILE 'DGENE' ENTERED AT 17:16:56 ON 07 JUL 2010<br />

COPYRIGHT (C) 2010 THOMSON REUTERS<br />

FILE 'USGENE' ENTERED AT 17:16:56 ON 07 JUL 2010<br />

COPYRIGHT (C) 2010 SEQUENCEBASE CORP<br />

FILE 'PCTGEN' ENTERED AT 17:16:56 ON 07 JUL 2010<br />

COPYRIGHT (C) 2010 WIPO<br />

PROCESSING COMPLETED FOR L2<br />

PROCESSING COMPLETED FOR L3<br />

PROCESSING COMPLETED FOR L4<br />

L5 36 DUP IDE L2 L3 L4 (INCLUDES 0 SETS OF DUPLICATES)<br />

ANSWERS '1-19' FROM FILE DGENE<br />

=> SOR IDENT D<br />

PROCESSING COMPLETED FOR L5<br />

L6 36 SOR L5 IDENT D<br />

ANSWERS '20-33' FROM FILE USGENE<br />

ANSWERS '34-36' FROM FILE PCTGEN<br />

New!<br />

SET DUPORER FILE ensures that<br />

multifile records merged using DUP<br />

IDE are organized by database (file).<br />

DUPLICATE IDENTIFY<br />

(DUP IDE) is used<br />

here to create a single<br />

multifile L-number (L5).<br />

The multifile L-number<br />

(L5) can be sorted by<br />

BLAST SCORE, or<br />

Percent Identity (IDENT).<br />

16


Review multifile answers with a free-of-charge<br />

format including alignment<br />

=> D L6 TRIAL SCORE ALIGN 1-36; FILE <strong>STN</strong>GUIDE<br />

L6 ANSWER 1 OF 36 DGENE COPYRIGHT 2010 THOMSON REUTERS <strong>on</strong> <strong>STN</strong><br />

AN AAO23074 Protein DGENE<br />

TI Determining a genotype of an individual for preparing a compositi<strong>on</strong><br />

for treating schizophrenia by determining the identity of a<br />

nucleotide at a biallelic marker of the D-amino acid oxidase gene of<br />

the polynucleotide in a sample -<br />

DESC Human D-amino acid oxidase wild-type protein.<br />

KW Biallelic marker; D-amino acid oxidase; DAO; neuroleptic; CNS<br />

disorder; movement; Parkins<strong>on</strong>'s disease; Huntingt<strong>on</strong>'s; motor<br />

neur<strong>on</strong>e; Alzheimer's; mood; unipolar depressi<strong>on</strong>; bipolar; . . . .<br />

SQL 347<br />

Query Self Score<br />

and percentage.<br />

SCORE 731 100% of query self score 731<br />

BLASTALIGN<br />

Query = 347 letters<br />

Length = 347<br />

Score = 731 bits (1886), Expect = 0.0<br />

Identities = 347/347 (100%), Positives = 347/347 (100%)<br />

Query: 1 MRVVVIGAGVIGLSTALCIHERYHSVLQPLDIKVYADRFTPLTTTDVAAGLWQP . . .<br />

MRVVVIGAGVIGLSTALCIHERYHSVLQPLDIKVYADRFTPLTTTDVAAGLWQP<br />

Sbjct: 1 MRVVVIGAGVIGLSTALCIHERYHSVLQPLDIKVYADRFTPLTTTDVAAGLWQP . . .<br />

17


Review answers with a free-of-charge format<br />

including alignment (c<strong>on</strong>t.)<br />

L6 ANSWER 4 OF 36 USGENE COPYRIGHT 2010 SEQUENCEBASE CORP <strong>on</strong> <strong>STN</strong><br />

TI Collecti<strong>on</strong>s of matched biological reagents and methods for<br />

identifying matched reagents (PublishedApplicati<strong>on</strong>)<br />

MTY Protein<br />

SQL 347<br />

SCORE 731 100% of query self score 731<br />

BLASTALIGN<br />

Query = 347 letters<br />

Length = 347<br />

Score = 731 bits (1886), Expect = 0.0<br />

Identities = 347/347 (100%), Positives = 347/347 (100%)<br />

BLAST Percent<br />

Identity (IDENT).<br />

Query: 1 MRVVVIGAGVIGLSTALCIHERYHSVLQPLDIKVYADRFTPLTTTDVAAGLWQPYLSDPN<br />

MRVVVIGAGVIGLSTALCIHERYHSVLQPLDIKVYADRFTPLTTTDVAAGLWQPYLSDPN<br />

Sbjct: 1 MRVVVIGAGVIGLSTALCIHERYHSVLQPLDIKVYADRFTPLTTTDVAAGLWQPYLSDPN<br />

Query: 61 NPQEADWSQQTFDYLLSHVHSPNAENLGLFLISGYNLFHEAIPDPSWKDTVLGFRKLTPR<br />

NPQEADWSQQTFDYLLSHVHSPNAENLGLFLISGYNLFHEAIPDPSWKDTVLGFRKLTPR<br />

Sbjct: 61 NPQEADWSQQTFDYLLSHVHSPNAENLGLFLISGYNLFHEAIPDPSWKDTVLGFRKLTPR<br />

Query: 121 ELDMFPDYGYGWFHTSLILEGKNYLQWLTERLTERGVKFFQRKVESFEEVAREGADVIVN<br />

ELDMFPDYGYGWFHTSLILEGKNYLQWLTERLTERGVKFFQRKVESFEEVAREGADVIVN<br />

Sbjct: 121 ELDMFPDYGYGWFHTSLILEGKNYLQWLTERLTERGVKFFQRKVESFEEVAREGADVIVN<br />

Query: 181 CTGVWAGALQRDPLLQPGRGQIMKVDAPWMKHFILTHDPERGIYNSPYIIPGTQ . . .<br />

CTGVWAGALQRDPLLQPGRGQIMKVDAPWMKHFILTHDPERGIYNSPYIIPGTQ<br />

Sbjct: 181 CTGVWAGALQRDPLLQPGRGQIMKVDAPWMKHFILTHDPERGIYNSPYIIPGTQ . . .<br />

18


Review answers with a free-of-charge format<br />

including alignment (c<strong>on</strong>t.)<br />

L6 ANSWER 28 OF 36 PCTGEN COPYRIGHT 2010 WIPO <strong>on</strong> <strong>STN</strong><br />

TI ORGAN-SPECIFIC PROTEINS AND METHODS OFTHEIR USE<br />

MTY PRT<br />

SQL 347<br />

SCORE 728 99% of query self score 731<br />

BLASTALIGN<br />

Query = 347 letters<br />

Length = 347<br />

Score = 728 bits (1879), Expect = 0.0<br />

Identities = 346/347 (99%), Positives = 346/347 (99%)<br />

Query: 1 MRVVVIGAGVIGLSTALCIHERYHSVLQPLDIKVYADRFTPLTTTDVAAGLWQPYLSDPN<br />

MRVVVIGAGVIGLSTALCIHERYHSVLQPL IKVYADRFTPLTTTDVAAGLWQPYLSDPN<br />

Sbjct: 1 MRVVVIGAGVIGLSTALCIHERYHSVLQPLHIKVYADRFTPLTTTDVAAGLWQPYLSDPN<br />

Query: 61 NPQEADWSQQTFDYLLSHVHSPNAENLGLFLISGYNLFHEAIPDPSWKDTVLGFRKLTPR<br />

NPQEADWSQQTFDYLLSHVHSPNAENLGLFLISGYNLFHEAIPDPSWKDTVLGFRKLTPR<br />

Sbjct: 61 NPQEADWSQQTFDYLLSHVHSPNAENLGLFLISGYNLFHEAIPDPSWKDTVLGFRKLTPR<br />

Query: 121 ELDMFPDYGYGWFHTSLILEGKNYLQWLTERLTERGVKFFQRKVESFEEVAREGADVIVN<br />

ELDMFPDYGYGWFHTSLILEGKNYLQWLTERLTERGVKFFQRKVESFEEVAREGADVIVN<br />

Sbjct: 121 ELDMFPDYGYGWFHTSLILEGKNYLQWLTERLTERGVKFFQRKVESFEEVAREGADVIVN<br />

Query: 181 CTGVWAGALQRDPLLQPGRGQIMKVDAPWMKHFILTHDPERGIYNSPYIIPGTQ . . .<br />

CTGVWAGALQRDPLLQPGRGQIMKVDAPWMKHFILTHDPERGIYNSPYIIPGTQ<br />

Sbjct: 181 CTGVWAGALQRDPLLQPGRGQIMKVDAPWMKHFILTHDPERGIYNSPYIIPGTQ . . .<br />

19


Ensure Capture Sessi<strong>on</strong> is <strong>on</strong> to record a<br />

transcript for use in post-processing<br />

Note: Check the Capture<br />

Retrospectively box to capture<br />

the sessi<strong>on</strong> so far, as well as the<br />

sessi<strong>on</strong> from this point forwards.<br />

20


Use the <strong>STN</strong> Express 8.4 <str<strong>on</strong>g>Patent</str<strong>on</strong>g> Family<br />

Manager wizard display the results<br />

Access the patent family<br />

manager wizard from the<br />

Discover! Menu.<br />

Choose a bibliographic display format with<br />

alignment for the first (best) hit, and a free-ofcharge<br />

format with alignment for the rest of<br />

the sequences in each patent family group.<br />

21


The patent family manager begins by<br />

organising the results using FSORT...<br />

=> FSORT L6<br />

. . . .<br />

L7 36 FSO L6<br />

11 Multi-record Families Answers 1-33<br />

Family 1 Answers 1-5<br />

Family 2 Answers 6-8<br />

Family 3 Answers 9-10<br />

Family 4 Answers 11-12<br />

Family 5 Answers 13-14<br />

Family 6 Answers 15-16<br />

Family 7 Answers 17-18<br />

Family 8 Answers 19-25<br />

Family 9 Answers 26-27<br />

Family 10 Answers 28-31<br />

Family 11 Answers 32-33<br />

3 Individual Records Answers 34-36<br />

0 N<strong>on</strong>-patent Records<br />

Commands in RED are those<br />

issued automatically by the <strong>STN</strong><br />

Express <str<strong>on</strong>g>Patent</str<strong>on</strong>g> Family Manager.<br />

FSORT organizes<br />

the patent<br />

sequence records<br />

by Publicati<strong>on</strong>,<br />

Applicati<strong>on</strong>,<br />

Related, and<br />

Priority numbers.<br />

In this example, 14 patent family<br />

groups (i.e. 11 + 3) are retrieved.<br />

22


...and then c<strong>on</strong>tinues by displaying the<br />

family groups in the specified formats<br />

=> DIS L7 PFAM=7 1 BIB,SQL,SCORE,IDENT,ALIGN<br />

L7 ANSWER 17 OF 36 DGENE COPYRIGHT 2010 THOMSON REUTERS <strong>on</strong> <strong>STN</strong> FAMILY7<br />

AN AEL25470 protein DGENE<br />

TI Identifying compound that reduce/inhibit internal ribosome . . . .<br />

IN Fear M<br />

PA (TELE-N) TELETHON INST CHILD HEALTH RES.<br />

PI WO 2006102720 A1 20061005 197<br />

AI WO 2006-AU435 20060331<br />

PRAI AU 2005-901574 20050331<br />

PSL Disclosure; SEQ ID NO 18<br />

LA English<br />

OS 2006-747347 [76]<br />

CR N-PSDB: AEL25469<br />

PC-NCBI: gi30446<br />

PC-SWISSPROT: P14920<br />

DESC Reporter protein SEQ ID NO:18.<br />

SQL 347<br />

SCORE 726 99% of query self score 731<br />

IDENT 99%<br />

BLASTALIGN<br />

Query = 347 letters<br />

Length = 347<br />

Score = 726 bits (1873), Expect = 0.0<br />

Identities = 345/347 (99%), Positives = 345/347 (99%)<br />

. . . .<br />

. . . .<br />

Commands in RED are those<br />

issued automatically by the <strong>STN</strong><br />

Express <str<strong>on</strong>g>Patent</str<strong>on</strong>g> Family Manager.<br />

23


...and then c<strong>on</strong>tinues by displaying the<br />

family groups in the specified formats (c<strong>on</strong>t.)<br />

=> DIS L7 PFAM=7 2-TOT TRIAL,SCORE,IDENT,ALIGN<br />

L7 ANSWER 18 OF 36 USGENE COPYRIGHT 2010 SEQUENCEBASE CORP <strong>on</strong> <strong>STN</strong>FAMILY7<br />

TI Isolati<strong>on</strong> of Inhibitors of IRES-Mediated Translati<strong>on</strong><br />

(PublishedApplicati<strong>on</strong>)<br />

DESC Homo Sapiens Protein; sequence 18 of 148<br />

MTY Protein<br />

SQL 347<br />

SCORE 726 99% of query self score 731<br />

IDENT 99%<br />

BLASTALIGN<br />

Query = 347 letters<br />

Length = 347<br />

Score = 726 bits (1873), Expect = 0.0<br />

Identities = 345/347 (99%), Positives = 345/347 (99%)<br />

This USGENE hit is in the same<br />

family as the DGENE record <strong>on</strong><br />

the previous slide (FAMILY 7).<br />

Query: 1 MRVVVIGAGVIGLSTALCIHERYHSVLQPLDIKVYADRFTPLTTTDVAAGLWQPYLSDPN<br />

MRVVVIGAGVIGLSTALCIHERYHSVLQPL IKVYADRFTPLTTTDVAAGLWQPYLSDPN<br />

Sbjct: 1 MRVVVIGAGVIGLSTALCIHERYHSVLQPLHIKVYADRFTPLTTTDVAAGLWQPYLSDPN<br />

Query: 61 NPQEADWSQQTFDYLLSHVHSPNAENLGLFLISGYNLFHEAIPDPSWKDTVLGFRKLTPR<br />

NPQEADWSQQTFDYLLSHVHSPNAENLGLFLISGYNLFHEAIPDPSWKDTVLGFRKLTPR<br />

Sbjct: 61 NPQEADWSQQTFDYLLSHVHSPNAENLGLFLISGYNLFHEAIPDPSWKDTVLGFRKLTPR<br />

Query: 121 ELDMFPDYGYGWFHTSLILEGKNYLQWLTERLTERGVKFFQRKVESFEEVAREGADVIVN<br />

ELDMFPDYGYGWFHTSLILEGKNYLQWLTERLTERGVKFFQRKVESFEEVAREGADVIVN<br />

Sbjct: 121 ELDMFPDYGYGWFHTSLILEGKNYLQWLTERLTERGVKFFQRKVESFEEVAREGADVIVN<br />

. . . .<br />

24


...and then c<strong>on</strong>tinues by displaying the<br />

family groups in the specified formats (c<strong>on</strong>t.)<br />

=> DIS L7 34-36 BIB,SQL,SCORE,IDENT,ALIGN<br />

L7 ANSWER 34 OF 36 USGENE COPYRIGHT 2010 SEQUENCEBASE CORP <strong>on</strong> <strong>STN</strong><br />

AN 20060275794.63099 Protein USGENE<br />

TI Collecti<strong>on</strong>s of matched biological reagents and methods for<br />

identifying matched reagents (PublishedApplicati<strong>on</strong>)<br />

IN Carrino John (San Diego, CA); Liang Feng (San Diego, CA)<br />

PA Invitrogen Corporati<strong>on</strong> (Carlsbad CA)<br />

PI US 20060275794 A1 20061207<br />

AI US 2006-371354 20060307<br />

DT <str<strong>on</strong>g>Patent</str<strong>on</strong>g><br />

SQL 347<br />

SCORE 731 100% of query self score 731<br />

IDENT 100%<br />

BLASTALIGN<br />

Query = 347 letters<br />

Length = 347<br />

Score = 731 bits (1886), Expect = 0.0<br />

Identities = 347/347 (100%), Positives = 347/347 (100%)<br />

This USGENE record is the first<br />

of the 3 “individual records” in<br />

the FSORT answer set (L7).<br />

Query: 1 MRVVVIGAGVIGLSTALCIHERYHSVLQPLDIKVYADRFTPLTTTDVAAGLWQPYLSDPN<br />

MRVVVIGAGVIGLSTALCIHERYHSVLQPLDIKVYADRFTPLTTTDVAAGLWQPYLSDPN<br />

Sbjct: 1 MRVVVIGAGVIGLSTALCIHERYHSVLQPLDIKVYADRFTPLTTTDVAAGLWQPYLSDPN<br />

Query: 61 NPQEADWSQQTFDYLLSHVHSPNAENLGLFLISGYNLFHEAIPDPSWKDTVLGFRKLTPR<br />

NPQEADWSQQTFDYLLSHVHSPNAENLGLFLISGYNLFHEAIPDPSWKDTVLGFRKLTPR<br />

Sbjct: 61 NPQEADWSQQTFDYLLSHVHSPNAENLGLFLISGYNLFHEAIPDPSWKDTVLGFRKLTPR<br />

. . . .<br />

25


<str<strong>on</strong>g>Multifile</str<strong>on</strong>g> search strategy<br />

1) RUN BLAST in DGENE, USGENE and PCTGEN<br />

using offline BATCH mode<br />

2) Merge, organize by patent family, and display<br />

DGENE, USGENE and PCTGEN results<br />

3) Repeat the search using CAS REGISTRY BLAST<br />

4) Retrieve, identify, and display unique CAS<br />

REGISTRY BLAST CAplus records<br />

5) Post-process DGENE, USGENE and PCTGEN<br />

results using the <strong>STN</strong> Express Table Tool<br />

6) Post-process unique REGISTRY BLAST results<br />

using the BLAST Report Tool<br />

26


Typical steps of CAS REGISTRY BLAST<br />

1. Launch BLAST<br />

2. Search the sequence<br />

3. Examine and evaluate alignment/relevance of<br />

sequence answers<br />

4. Display <strong>STN</strong> data <strong>on</strong> sequences – REGISTRY<br />

5. Display <strong>STN</strong> data <strong>on</strong> sequences – CAplus SM<br />

– Limit CAplus results, if necessary<br />

– Display CAplus data (references and HITRN)<br />

6. Post-process BLAST alignment data<br />

27


Launch CAS REGISTRY BLAST<br />

• The Result Set Manager is<br />

the starting point<br />

• To begin a new sequence<br />

search<br />

• To review results of previous<br />

sequence searches<br />

28


Input the search query<br />

• <str<strong>on</strong>g>Sequence</str<strong>on</strong>g>s can be input by Copy/paste<br />

• Read from a file<br />

• Recall a previously searched sequence<br />

within the same sessi<strong>on</strong><br />

• <str<strong>on</strong>g>Sequence</str<strong>on</strong>g> line numbers do not<br />

interfere with the search.<br />

29


Select the BLAST program<br />

The following programs are<br />

most typically run:<br />

• BLASTn for nucleotides<br />

• BLASTp for proteins/peptides<br />

30


Verify BLAST settings<br />

Default values have been set to<br />

optimize sequence searches for<br />

researchers.<br />

Recommended settings for<br />

patent searches:<br />

• Low Complexity Filtering –<br />

unchecked<br />

• Max No. of Answers - 1000<br />

31


View results<br />

Highlight the result set<br />

to be viewed, and click<br />

<strong>on</strong> View Results.<br />

32


Evaluate the alignment report<br />

The negative sign represents<br />

that the alignment details are<br />

shown.<br />

Detail informati<strong>on</strong> such as the<br />

sequence length, score,<br />

percent identity are available.<br />

33


Select sequences of interest<br />

<str<strong>on</strong>g>Sequence</str<strong>on</strong>g>s can be selected:<br />

• In groups, using the color bar in the<br />

Alignment Scores<br />

• Individually, by selecting the check box<br />

• To transfer the sequence data to <strong>STN</strong>,<br />

click the Get <strong>STN</strong> Data butt<strong>on</strong>.<br />

34


Get <strong>STN</strong> Data and Save alignments (.xss)<br />

Alignment data needs<br />

to be transferred for<br />

post-processing.<br />

The alignment data is saved in <strong>STN</strong><br />

Express Saved <str<strong>on</strong>g>Sequence</str<strong>on</strong>g>s (.xss) format.<br />

35


Transfer sequences to <strong>STN</strong><br />

• Log<strong>on</strong> to <strong>STN</strong> and a REGISTRY search<br />

of the sequences is automatic.<br />

• Results display can be accomplished<br />

using either Discover! wizards or<br />

command line input.<br />

• Note: Type END or click Cancel to get<br />

out of the “Display Wizard”. You can turn<br />

off the “Display Wizard” in Preferences.<br />

Display sequences if desired.<br />

36


<str<strong>on</strong>g>Multifile</str<strong>on</strong>g> search strategy<br />

1) RUN BLAST in DGENE, USGENE and PCTGEN<br />

using offline BATCH mode<br />

2) Merge, organize by patent family, and display<br />

DGENE, USGENE and PCTGEN results<br />

3) Repeat the search using CAS REGISTRY BLAST<br />

4) Retrieve, identify, and display unique CAS<br />

REGISTRY BLAST CAplus records<br />

5) Post-process DGENE, USGENE and PCTGEN<br />

results using the <strong>STN</strong> Express Table Tool<br />

6) Post-process unique REGISTRY BLAST results<br />

using the BLAST Report Tool<br />

37


Display additi<strong>on</strong>al CAplus answers including<br />

the HITRN for alignment post-processing<br />

=> FILE HCAPLUS<br />

FILE 'HCAPLUS' ENTERED AT 17:25:10 ON 07 JUL 2010<br />

COPYRIGHT (C) 2010 AMERICAN CHEMICAL SOCIETY (ACS)<br />

=> S L12 AND PATENT/DT<br />

The 44 REGISTRY records (L12)<br />

L13 12 L12 AND PATENT/DT<br />

corresp<strong>on</strong>d to 12 HCAplus patent<br />

records (L13).<br />

=> TRANSFER L6 PN 1-<br />

L14 TRANSFER L6 1- PN : Transfer 20 TERMS Publicati<strong>on</strong> Numbers (PN)<br />

L15 29 L14<br />

from DGENE/USGENE/PCTGEN<br />

ALL TERMS IN L14 RETRIEVED.<br />

(L6) to find corresp<strong>on</strong>ding HCAplus<br />

records (L15).<br />

=> S L13 NOT L15<br />

L16 2 L13 NOT L15<br />

=> D BIB HITRN 1-2<br />

In this example, 2 additi<strong>on</strong>al, highly<br />

relevant references have been<br />

found by including the<br />

REGISTRY/HCAplus search (L16).<br />

38


Example: Unique REGISTRY/CAplus result<br />

L16 ANSWER 1 OF 2 HCAPLUS COPYRIGHT 2010 ACS <strong>on</strong> <strong>STN</strong><br />

AN 2002:391912 HCAPLUS<br />

DN 137:1836<br />

TI Measurement of DNA methylati<strong>on</strong> for analysis of the toxicology . . . .<br />

IN Olek, Alexander; Piepenbrock, Christian; Berlin, Kurt<br />

PA Epigenomics Ag, Germany<br />

SO PCT Int. Appl., 113 pp.<br />

CODEN: PIXXD2<br />

LA German<br />

FAN.CNT 1<br />

PATENT NO. KIND DATE APPLICATION NO. DATE<br />

--------------- ---- -------- -------------------- --------<br />

PI WO 2002040710 A2 20020523 WO 2001-EP12951 20011108<br />

. . . .<br />

PRAI DE 2000-10056802 A 20001114<br />

WO 2001-EP12951 W 20011108<br />

Note: HITRN must be included,<br />

IT 391975-30-7, Protein (human 347-amino acid)<br />

RL: BSU (Biological study, unclassified); so that PRP the (Properties); CAS REGISTRY BIOL<br />

(Biological study)<br />

BLAST alignments can be<br />

(amino acid sequence; measurement of DNA methylati<strong>on</strong> for anal. of<br />

the toxicol. of substances)<br />

merged into the BLAST Report.<br />

39


<str<strong>on</strong>g>Multifile</str<strong>on</strong>g> search strategy<br />

1) RUN BLAST in DGENE, USGENE and PCTGEN<br />

using offline BATCH mode<br />

2) Merge, organize by patent family, and display<br />

DGENE, USGENE and PCTGEN results<br />

3) Repeat the search using CAS REGISTRY BLAST<br />

4) Retrieve, identify, and display unique CAS<br />

REGISTRY BLAST CAplus records<br />

5) Post-process DGENE, USGENE and PCTGEN<br />

results using the <strong>STN</strong> Express Table Tool<br />

6) Post-process unique REGISTRY BLAST results<br />

using the BLAST Report Tool<br />

40


Access the Table Tool and select the<br />

multifile search Transcript file<br />

The most recent <strong>STN</strong> sessi<strong>on</strong><br />

Transcript is usually listed here.<br />

41


Choose a template and select c<strong>on</strong>tent<br />

Opti<strong>on</strong>: choose a predefined<br />

custom template<br />

from a previous project.<br />

L7 is the DGENE,<br />

USGENE and PCTGEN<br />

FSORTed answer set.<br />

42


Select fields, column order, headings, f<strong>on</strong>ts<br />

and spacing for the table<br />

The pre-defined custom<br />

template included a list<br />

of fields. These can be<br />

further customized and<br />

the template re-saved.<br />

43


Review, adjust, and export the table<br />

44


Explore the results further in Microsoft Excel<br />

Some tips for Microsoft Excel:<br />

• Resize columns and rows as desired –<br />

especially the BLAST alignment<br />

column to approx 77<br />

• View, Freeze panes – holds the top row<br />

fixed when scrolling down<br />

• Add Filters – provides a great way to<br />

navigate results – for example by<br />

BLAST percent identity (above)<br />

45


<str<strong>on</strong>g>Multifile</str<strong>on</strong>g> search strategy<br />

1) RUN BLAST in DGENE, USGENE and PCTGEN<br />

using offline BATCH mode<br />

2) Merge, organize by patent family, and display<br />

DGENE, USGENE and PCTGEN results<br />

3) Repeat the search using CAS REGISTRY BLAST<br />

4) Retrieve, identify, and display unique CAS<br />

REGISTRY BLAST CAplus records<br />

5) Post-process DGENE, USGENE and PCTGEN<br />

results using the <strong>STN</strong> Express Table Tool<br />

6) Post-process unique REGISTRY BLAST results<br />

using the BLAST Report Tool<br />

46


Post-process REGISTRY BLAST alignments<br />

Download the post-processing template (.PRF) files used in this seminar:<br />

http://www.stn-internati<strong>on</strong>al.com/stn_biosequence_searching_mfs.html<br />

47


Select BLAST alignment report<br />

• The first step is to select the XSS<br />

file to include in the BLAST report.<br />

• Important: If your BLAST query is<br />

fairly l<strong>on</strong>g, or a nucleic acid, or the<br />

answers may exceed 1000<br />

characters, make sure you change<br />

the value in the Do not include<br />

alignments l<strong>on</strong>ger than box.<br />

Post-processing then c<strong>on</strong>tinues<br />

via standard <strong>STN</strong> Express<br />

Custom Report Tool steps.<br />

48


Select the sessi<strong>on</strong> Transcript and template<br />

Opti<strong>on</strong>: choose a predefined<br />

custom template<br />

from a previous project.<br />

The most recent <strong>STN</strong><br />

sessi<strong>on</strong> Transcript is<br />

usually listed here.<br />

49


Select the records to be processed<br />

L16 is REGISTRY/CAplus<br />

additi<strong>on</strong>al unique answers.<br />

50


Select fields, f<strong>on</strong>ts and spacing for the report<br />

The pre-defined custom<br />

template included a list<br />

of fields. These can be<br />

further customized and<br />

the template re-saved.<br />

51


Review, adjust, and export the report<br />

52


Overview of search results for Homo sapiens Damino-acid<br />

oxidase – unique in (red)<br />

SEQs<br />

≥ 80%<br />

PNs<br />

<str<strong>on</strong>g>Patent</str<strong>on</strong>g><br />

Families*<br />

DGENE 19 10 8 (1)<br />

USGENE 14 10 7 (2)<br />

PCTGEN 3 3 3 (1)<br />

REGISTRY 18 12 9 (2)<br />

NCBI 6 4 4 (0)<br />

Total Unique - - 14<br />

(* <str<strong>on</strong>g>Patent</str<strong>on</strong>g> families = INPADOC <str<strong>on</strong>g>Patent</str<strong>on</strong>g> Families. Specifically, family records in INPAFAMDB.)


Summary<br />

• RUN BLAST is available for searching DGENE,<br />

USGENE and PCTGEN directly <strong>on</strong> <strong>STN</strong><br />

• CAS REGISTRY BLAST provides BLAST searching<br />

opti<strong>on</strong>s for the REGISTRY database<br />

• DGENE, USGENE and PCTGEN multifile search<br />

results can be post-processed into tables, and<br />

exported to Microsoft Excel, using <strong>STN</strong> Express<br />

• CAS REGISTRY BLAST alignment data can be<br />

merged with CAplus records, and exported in to RTF<br />

format, to form single unified report<br />

• All four <strong>STN</strong> sequence databases are required for a<br />

comprehensive patent sequence search<br />

54


Resources for sequence searching <strong>on</strong> <strong>STN</strong><br />

• <str<strong>on</strong>g>Sequence</str<strong>on</strong>g> <str<strong>on</strong>g>Searching</str<strong>on</strong>g> <strong>on</strong> <strong>STN</strong> modular workshop<br />

http://www.stn-internati<strong>on</strong>al.com/sequence_searching.html<br />

• CAS REGISTRY sequence searching resources<br />

http://www.cas.org/support/stngen/stndoc/sequences.html<br />

• DGENE Workshop Manual<br />

http://www.stn-internati<strong>on</strong>al.com/dgene_wm.html<br />

• USGENE Workshop Manual<br />

http://www.stn-internati<strong>on</strong>al.com/usgene_wm.html<br />

• USGENE Workshop Manual <str<strong>on</strong>g>Multifile</str<strong>on</strong>g> Supplement:<br />

http://www.stn-internati<strong>on</strong>al.com/usgene_wm_mfs.html<br />

55


CAS<br />

E-mail: help@cas.org<br />

Support and Training:<br />

www.cas.org<br />

For more informati<strong>on</strong> …<br />

FIZ Karlsruhe<br />

helpdesk@fiz-karlsruhe.de<br />

Support and Training:<br />

www.stn-internati<strong>on</strong>al.de

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!