23.07.2013 Views

Exploiting Corpora with Sketch Engine - NLP Centre - Masaryk ...

Exploiting Corpora with Sketch Engine - NLP Centre - Masaryk ...

Exploiting Corpora with Sketch Engine - NLP Centre - Masaryk ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Contents Current Trends in Corpus Processing <strong>Sketch</strong> <strong>Engine</strong> Finding Collocations Using CQL<br />

<strong>Exploiting</strong> <strong>Corpora</strong> <strong>with</strong> <strong>Sketch</strong> <strong>Engine</strong><br />

Miloš Jakubíček<br />

Lexical Computing Ltd., Brighton, United Kingdom<br />

milos.jakubicek@sketchengine.co.uk<br />

Natural Language Processing <strong>Centre</strong><br />

Faculty of Informatics, <strong>Masaryk</strong> University, Brno, Czech Republic<br />

jak@fi.muni.cz<br />

CLARA course on Multilingual Lexical Resources and Tools,<br />

20-23 June 2011, Bergen<br />

Miloš Jakubíček LCL UK, <strong>NLP</strong>C FI MU CZ<br />

<strong>Exploiting</strong> <strong>Corpora</strong> <strong>with</strong> <strong>Sketch</strong> <strong>Engine</strong>


Contents Current Trends in Corpus Processing <strong>Sketch</strong> <strong>Engine</strong> Finding Collocations Using CQL<br />

Contents<br />

1 Current Trends in Corpus Processing<br />

2 <strong>Sketch</strong> <strong>Engine</strong><br />

3 Finding Collocations<br />

4 Using CQL<br />

Miloš Jakubíček LCL UK, <strong>NLP</strong>C FI MU CZ<br />

<strong>Exploiting</strong> <strong>Corpora</strong> <strong>with</strong> <strong>Sketch</strong> <strong>Engine</strong>


Contents Current Trends in Corpus Processing <strong>Sketch</strong> <strong>Engine</strong> Finding Collocations Using CQL<br />

Organisation<br />

Schedule<br />

Building & Searching <strong>Corpora</strong><br />

Exercises<br />

Word Sense & Word <strong>Sketch</strong><br />

Exercises<br />

If you don’t mind, we will mix lectures and exercises<br />

We will be using <strong>Sketch</strong> <strong>Engine</strong> installation at<br />

http://the.sketchengine.co.uk – you need to register for<br />

a trial account<br />

Ask questions immediately<br />

Do get back if you’d like to focus on anything particular<br />

Download slides from<br />

http://nlp.fi.muni.cz/~xjakub/clara.pdf<br />

Miloš Jakubíček LCL UK, <strong>NLP</strong>C FI MU CZ<br />

<strong>Exploiting</strong> <strong>Corpora</strong> <strong>with</strong> <strong>Sketch</strong> <strong>Engine</strong>


Contents Current Trends in Corpus Processing <strong>Sketch</strong> <strong>Engine</strong> Finding Collocations Using CQL<br />

Today’s <strong>Corpora</strong> I<br />

billions of tokens<br />

complex multi-level multi-value annotation<br />

wide range of languages<br />

growing demand on complex searching – moving from<br />

morphology to syntax and semantics<br />

search API for automatic information retrieval and<br />

post-processing in particular applications needed<br />

parallel distributed processing needed<br />

Miloš Jakubíček LCL UK, <strong>NLP</strong>C FI MU CZ<br />

<strong>Exploiting</strong> <strong>Corpora</strong> <strong>with</strong> <strong>Sketch</strong> <strong>Engine</strong>


Contents Current Trends in Corpus Processing <strong>Sketch</strong> <strong>Engine</strong> Finding Collocations Using CQL<br />

Today’s <strong>Corpora</strong> II<br />

What is the key property of a modern corpus?<br />

Miloš Jakubíček LCL UK, <strong>NLP</strong>C FI MU CZ<br />

<strong>Exploiting</strong> <strong>Corpora</strong> <strong>with</strong> <strong>Sketch</strong> <strong>Engine</strong>


Contents Current Trends in Corpus Processing <strong>Sketch</strong> <strong>Engine</strong> Finding Collocations Using CQL<br />

Today’s <strong>Corpora</strong> III<br />

Yes, it’s the size.<br />

Miloš Jakubíček LCL UK, <strong>NLP</strong>C FI MU CZ<br />

<strong>Exploiting</strong> <strong>Corpora</strong> <strong>with</strong> <strong>Sketch</strong> <strong>Engine</strong>


Contents Current Trends in Corpus Processing <strong>Sketch</strong> <strong>Engine</strong> Finding Collocations Using CQL<br />

Today’s <strong>Corpora</strong> IV<br />

Why are quantitative aspects so important?<br />

Law of large numbers<br />

Most language phenomena follow the Zipfian distribution<br />

Miloš Jakubíček LCL UK, <strong>NLP</strong>C FI MU CZ<br />

<strong>Exploiting</strong> <strong>Corpora</strong> <strong>with</strong> <strong>Sketch</strong> <strong>Engine</strong>


Contents Current Trends in Corpus Processing <strong>Sketch</strong> <strong>Engine</strong> Finding Collocations Using CQL<br />

Zipf’s law I<br />

100<br />

80<br />

60<br />

40<br />

20<br />

zipf(x)<br />

0<br />

0 200 400 600 800 1000<br />

Miloš Jakubíček LCL UK, <strong>NLP</strong>C FI MU CZ<br />

<strong>Exploiting</strong> <strong>Corpora</strong> <strong>with</strong> <strong>Sketch</strong> <strong>Engine</strong>


Contents Current Trends in Corpus Processing <strong>Sketch</strong> <strong>Engine</strong> Finding Collocations Using CQL<br />

Zipf’s law II<br />

may be simplified to inductive definition:<br />

Zipf’s law (simplified)<br />

frequency of the n-th element fn ≈ 1<br />

n · f1<br />

⇒ frequency is inversely proportional to the rank according to<br />

frequency<br />

⇒ one needs really large corpora to capture all the variety of<br />

many language phenomena<br />

Miloš Jakubíček LCL UK, <strong>NLP</strong>C FI MU CZ<br />

<strong>Exploiting</strong> <strong>Corpora</strong> <strong>with</strong> <strong>Sketch</strong> <strong>Engine</strong>


Contents Current Trends in Corpus Processing <strong>Sketch</strong> <strong>Engine</strong> Finding Collocations Using CQL<br />

Zipf’s law III<br />

Substantives + Verb tags on the Brown corpus<br />

Miloš Jakubíček LCL UK, <strong>NLP</strong>C FI MU CZ<br />

<strong>Exploiting</strong> <strong>Corpora</strong> <strong>with</strong> <strong>Sketch</strong> <strong>Engine</strong>


Contents Current Trends in Corpus Processing <strong>Sketch</strong> <strong>Engine</strong> Finding Collocations Using CQL<br />

Size is not everything . . .<br />

Why are qualitative aspects so important – well this can’t be really<br />

a question, right?<br />

web is the most used data source to obtain enough source<br />

texts – „web as corpus“<br />

web is garbage (by definition)<br />

garbage as corpus?<br />

building corpora from web requires extensive post-processing<br />

Miloš Jakubíček LCL UK, <strong>NLP</strong>C FI MU CZ<br />

<strong>Exploiting</strong> <strong>Corpora</strong> <strong>with</strong> <strong>Sketch</strong> <strong>Engine</strong>


Contents Current Trends in Corpus Processing <strong>Sketch</strong> <strong>Engine</strong> Finding Collocations Using CQL<br />

Building corpora from web<br />

crawling the data<br />

detecting language, detecting encoding<br />

cleaning boilerplate (metadata, ads, navigation etc.): jusText<br />

(web demo)<br />

deduplicating: onion<br />

see Pomikálek (2011) PhD thesis<br />

Miloš Jakubíček LCL UK, <strong>NLP</strong>C FI MU CZ<br />

<strong>Exploiting</strong> <strong>Corpora</strong> <strong>with</strong> <strong>Sketch</strong> <strong>Engine</strong>


Contents Current Trends in Corpus Processing <strong>Sketch</strong> <strong>Engine</strong> Finding Collocations Using CQL<br />

Deduplicating<br />

quite straightforward for full duplicates, but that’s not enough<br />

people do not only copy full documents<br />

they copy just parts of the document: orig vs. copy<br />

they copy and modify: orig vs. mod<br />

they copy and extend: orig vs. ext<br />

Miloš Jakubíček LCL UK, <strong>NLP</strong>C FI MU CZ<br />

<strong>Exploiting</strong> <strong>Corpora</strong> <strong>with</strong> <strong>Sketch</strong> <strong>Engine</strong>


Contents Current Trends in Corpus Processing <strong>Sketch</strong> <strong>Engine</strong> Finding Collocations Using CQL<br />

Building and searching corpora<br />

standard IT solution for storing data is a relational database,<br />

i. e. a table (rows are elements of the relation)<br />

not suitable for storing corpora at all (text doesn’t have<br />

relational backbone, nor most annotation types)<br />

requirements for specific database management system<br />

and appropriate user interface on top of it<br />

Miloš Jakubíček LCL UK, <strong>NLP</strong>C FI MU CZ<br />

<strong>Exploiting</strong> <strong>Corpora</strong> <strong>with</strong> <strong>Sketch</strong> <strong>Engine</strong>


Contents Current Trends in Corpus Processing <strong>Sketch</strong> <strong>Engine</strong> Finding Collocations Using CQL<br />

Introducing <strong>Sketch</strong> <strong>Engine</strong> (SkE)<br />

since 2003 as a commercial variant of the Manatee/Bonito<br />

corpus management system (Rychlý 1999)<br />

named after one of the key features – word sketches (to be<br />

introduced)<br />

most components released as open-source No<strong>Sketch</strong> <strong>Engine</strong>,<br />

following pay-per-service principle<br />

Miloš Jakubíček LCL UK, <strong>NLP</strong>C FI MU CZ<br />

<strong>Exploiting</strong> <strong>Corpora</strong> <strong>with</strong> <strong>Sketch</strong> <strong>Engine</strong>


Contents Current Trends in Corpus Processing <strong>Sketch</strong> <strong>Engine</strong> Finding Collocations Using CQL<br />

SkE architecture I<br />

Miloš Jakubíček LCL UK, <strong>NLP</strong>C FI MU CZ<br />

<strong>Exploiting</strong> <strong>Corpora</strong> <strong>with</strong> <strong>Sketch</strong> <strong>Engine</strong>


Contents Current Trends in Corpus Processing <strong>Sketch</strong> <strong>Engine</strong> Finding Collocations Using CQL<br />

SkE architecture II<br />

Components:<br />

Finlib (C++) – Fast Indexing Library (core indexing and data<br />

stream processing module)<br />

Manatee (C++) – Module for corpus encoding, searching and<br />

managing<br />

Bonito (Python) – Web user interface for Manatee<br />

Corpus Architect (Python) – Web user interface for managing<br />

corpora<br />

Miloš Jakubíček LCL UK, <strong>NLP</strong>C FI MU CZ<br />

<strong>Exploiting</strong> <strong>Corpora</strong> <strong>with</strong> <strong>Sketch</strong> <strong>Engine</strong>


Contents Current Trends in Corpus Processing <strong>Sketch</strong> <strong>Engine</strong> Finding Collocations Using CQL<br />

<strong>Sketch</strong> <strong>Engine</strong> infrastructure<br />

= servers and their equipment<br />

what do we need?<br />

most companies either store lots of data, but don’t need fast<br />

access (e.g. backups, logs) or store quite small amount of<br />

data accessible fastly (information systems, databases)<br />

we need both + lots of memory and fair number of CPU cores<br />

we need to manage concurrent access<br />

Miloš Jakubíček LCL UK, <strong>NLP</strong>C FI MU CZ<br />

<strong>Exploiting</strong> <strong>Corpora</strong> <strong>with</strong> <strong>Sketch</strong> <strong>Engine</strong>


Contents Current Trends in Corpus Processing <strong>Sketch</strong> <strong>Engine</strong> Finding Collocations Using CQL<br />

<strong>Sketch</strong> <strong>Engine</strong> in numbers<br />

by 2011:<br />

> 8500 registered users<br />

191 preloaded corpora, 38G tokens, 47 languages<br />

4,335 user corpora, 2.5G tokens<br />

ca. 30,000 requests per day<br />

Miloš Jakubíček LCL UK, <strong>NLP</strong>C FI MU CZ<br />

<strong>Exploiting</strong> <strong>Corpora</strong> <strong>with</strong> <strong>Sketch</strong> <strong>Engine</strong>


Contents Current Trends in Corpus Processing <strong>Sketch</strong> <strong>Engine</strong> Finding Collocations Using CQL<br />

SkE storage size<br />

Miloš Jakubíček LCL UK, <strong>NLP</strong>C FI MU CZ<br />

<strong>Exploiting</strong> <strong>Corpora</strong> <strong>with</strong> <strong>Sketch</strong> <strong>Engine</strong>


Contents Current Trends in Corpus Processing <strong>Sketch</strong> <strong>Engine</strong> Finding Collocations Using CQL<br />

SkE 2007<br />

Miloš Jakubíček LCL UK, <strong>NLP</strong>C FI MU CZ<br />

<strong>Exploiting</strong> <strong>Corpora</strong> <strong>with</strong> <strong>Sketch</strong> <strong>Engine</strong>


Contents Current Trends in Corpus Processing <strong>Sketch</strong> <strong>Engine</strong> Finding Collocations Using CQL<br />

SkE 2008<br />

Miloš Jakubíček LCL UK, <strong>NLP</strong>C FI MU CZ<br />

<strong>Exploiting</strong> <strong>Corpora</strong> <strong>with</strong> <strong>Sketch</strong> <strong>Engine</strong>


Contents Current Trends in Corpus Processing <strong>Sketch</strong> <strong>Engine</strong> Finding Collocations Using CQL<br />

SkE 2010<br />

Miloš Jakubíček LCL UK, <strong>NLP</strong>C FI MU CZ<br />

<strong>Exploiting</strong> <strong>Corpora</strong> <strong>with</strong> <strong>Sketch</strong> <strong>Engine</strong>


Contents Current Trends in Corpus Processing <strong>Sketch</strong> <strong>Engine</strong> Finding Collocations Using CQL<br />

SkE 2011<br />

Miloš Jakubíček LCL UK, <strong>NLP</strong>C FI MU CZ<br />

<strong>Exploiting</strong> <strong>Corpora</strong> <strong>with</strong> <strong>Sketch</strong> <strong>Engine</strong>


Contents Current Trends in Corpus Processing <strong>Sketch</strong> <strong>Engine</strong> Finding Collocations Using CQL<br />

SkE 2011<br />

Miloš Jakubíček LCL UK, <strong>NLP</strong>C FI MU CZ<br />

<strong>Exploiting</strong> <strong>Corpora</strong> <strong>with</strong> <strong>Sketch</strong> <strong>Engine</strong>


Contents Current Trends in Corpus Processing <strong>Sketch</strong> <strong>Engine</strong> Finding Collocations Using CQL<br />

Searching corpora – selected topics<br />

corpus querying using CQL – Corpus Query Language<br />

finding collocates<br />

Miloš Jakubíček LCL UK, <strong>NLP</strong>C FI MU CZ<br />

<strong>Exploiting</strong> <strong>Corpora</strong> <strong>with</strong> <strong>Sketch</strong> <strong>Engine</strong>


Contents Current Trends in Corpus Processing <strong>Sketch</strong> <strong>Engine</strong> Finding Collocations Using CQL<br />

Finding collocates<br />

many association scores available<br />

most of which do not have linguistically plausible<br />

interpretation<br />

most of which do not scale well according to corpus size ⇒<br />

cannot be used for cross-corpora comparisons<br />

see e. g. Rychlý (2008), Evert (2004)<br />

Miloš Jakubíček LCL UK, <strong>NLP</strong>C FI MU CZ<br />

<strong>Exploiting</strong> <strong>Corpora</strong> <strong>with</strong> <strong>Sketch</strong> <strong>Engine</strong>


Contents Current Trends in Corpus Processing <strong>Sketch</strong> <strong>Engine</strong> Finding Collocations Using CQL<br />

logDice I<br />

logDice<br />

2 · fxy<br />

14 + log2D = 14 + log2<br />

fx + fy<br />

yet another association score for collocations<br />

simple, scales well for corpus size<br />

has linguistically plausible interpretation<br />

based on the well known Dice association score (D above)<br />

see Rychlý (2008) sample comparison to T-score, MI-score<br />

and others<br />

Miloš Jakubíček LCL UK, <strong>NLP</strong>C FI MU CZ<br />

<strong>Exploiting</strong> <strong>Corpora</strong> <strong>with</strong> <strong>Sketch</strong> <strong>Engine</strong>


Contents Current Trends in Corpus Processing <strong>Sketch</strong> <strong>Engine</strong> Finding Collocations Using CQL<br />

logDice II<br />

Formal properties of logDice:<br />

Theoretical maximum is 14 – when X always co-occurs <strong>with</strong> Y<br />

and vice versa. Usually the value is less than 10.<br />

Value of 0 means there is less than 1 co-occurrence of X and<br />

Y per 16000 X or 16000 Y. We can say that negative values<br />

mean there is no statistical significance of XY co-occurrence.<br />

Comparing two scores, plus 1 point means 2 times more<br />

co-occurences, plus 7 points means roughly 100 times more<br />

co-occurences.<br />

The score is independent of the corpus size. The score<br />

combines relative frequencies of XY in relation to X and Y.<br />

Miloš Jakubíček LCL UK, <strong>NLP</strong>C FI MU CZ<br />

<strong>Exploiting</strong> <strong>Corpora</strong> <strong>with</strong> <strong>Sketch</strong> <strong>Engine</strong>


Contents Current Trends in Corpus Processing <strong>Sketch</strong> <strong>Engine</strong> Finding Collocations Using CQL<br />

CQL<br />

= Corpus Query Language (Christ and Schulze, 1994)<br />

positions and positional attributes: [attr="value"]<br />

structures and structural attributes: <br />

example:<br />

[word=".*ing" & tag="V.*"]<br />

[word=".*ing"] | [tag="V.*"]<br />

<br />

established a <strong>with</strong>in query:<br />

[tag="N.*"]+ <strong>with</strong>in <br />

and alternative meet/union query:<br />

(meet [lemma="take"] [tag="N.*"] -5 +5)<br />

(union (meet ...) (meet ...))<br />

Miloš Jakubíček LCL UK, <strong>NLP</strong>C FI MU CZ<br />

<strong>Exploiting</strong> <strong>Corpora</strong> <strong>with</strong> <strong>Sketch</strong> <strong>Engine</strong>


Contents Current Trends in Corpus Processing <strong>Sketch</strong> <strong>Engine</strong> Finding Collocations Using CQL<br />

CQL in Manatee/Bonito<br />

ehnancements and differences to the original CQL syntax<br />

<strong>with</strong>in and containing <br />

meet/union (sub)query<br />

inequality comparisons<br />

frequency function<br />

Miloš Jakubíček LCL UK, <strong>NLP</strong>C FI MU CZ<br />

<strong>Exploiting</strong> <strong>Corpora</strong> <strong>with</strong> <strong>Sketch</strong> <strong>Engine</strong>


Contents Current Trends in Corpus Processing <strong>Sketch</strong> <strong>Engine</strong> Finding Collocations Using CQL<br />

<strong>with</strong>in/containing queries<br />

searching for particles:<br />

[tag="PR.*"] <strong>with</strong>in [tag="V.*"] [tag="AT0"]?<br />

[tag="AJ0"]* [tag="(PR.?|N.*)"] [tag="PR.*"]<br />

<strong>with</strong>in <br />

searching for a Czech idiom “hnout někomu žlučí” (“to get<br />

somebody’s goat”):<br />

word-by-word translated as:<br />

hnout “move” [V, infinitive]<br />

někomu “somebody” [N, dative]<br />

žlučí “bile” [N, instrumental].<br />

containing [lemma="hnout"] containing<br />

[tag=".*c3.*"] containing [word="žlučí"]<br />

Miloš Jakubíček LCL UK, <strong>NLP</strong>C FI MU CZ<br />

<strong>Exploiting</strong> <strong>Corpora</strong> <strong>with</strong> <strong>Sketch</strong> <strong>Engine</strong>


Contents Current Trends in Corpus Processing <strong>Sketch</strong> <strong>Engine</strong> Finding Collocations Using CQL<br />

<strong>with</strong>in/containing queries<br />

structure boundaries: begin: , whole structure: ,<br />

end: <br />

changes: <strong>with</strong>in not allowed anymore, use <strong>with</strong>in<br />

<br />

Miloš Jakubíček LCL UK, <strong>NLP</strong>C FI MU CZ<br />

<strong>Exploiting</strong> <strong>Corpora</strong> <strong>with</strong> <strong>Sketch</strong> <strong>Engine</strong>


Contents Current Trends in Corpus Processing <strong>Sketch</strong> <strong>Engine</strong> Finding Collocations Using CQL<br />

meet/union queries<br />

combined <strong>with</strong> regular query: <br />

containing (meet [lemma="have"] [tag="P.*"] -5 5)<br />

containing (meet [tag="N.*"] [lemma="blue"])<br />

changes: meet/union queries can be used on any position,<br />

they can contain labels and no MU keyword is required (and<br />

deprecated):<br />

(meet 1:[] 2:[]) & 1.tag = 2.tag<br />

Miloš Jakubíček LCL UK, <strong>NLP</strong>C FI MU CZ<br />

<strong>Exploiting</strong> <strong>Corpora</strong> <strong>with</strong> <strong>Sketch</strong> <strong>Engine</strong>


Contents Current Trends in Corpus Processing <strong>Sketch</strong> <strong>Engine</strong> Finding Collocations Using CQL<br />

Inequality comparisons<br />

former comparisons allowed only equality and its negation:<br />

[attr="value"] [attr!="value"]<br />

inequality comparisons implemented: [attr="value"] [attr!="value"]<br />

intended usage:<br />

[tag="AJ.*"] [tag="NN.*"] <strong>with</strong>in ="2009"><br />

sophisticated comparison performed on the attribute value:<br />


Contents Current Trends in Corpus Processing <strong>Sketch</strong> <strong>Engine</strong> Finding Collocations Using CQL<br />

Fixed string comparisons<br />

normally the CQL values are regular expressions<br />

sometimes this is not desirable (batch processing needs<br />

escaping of metacharacters)<br />

new == and !== operator introduced for fixed strings<br />

comparison<br />

no escaping needed except for ’"’ and ’\’<br />

examples: ".", "$", " " matches a single dot, dollar sign and<br />

tilda, respectively, "\n" matches a backslash followed by the<br />

character n,<br />

Miloš Jakubíček LCL UK, <strong>NLP</strong>C FI MU CZ<br />

<strong>Exploiting</strong> <strong>Corpora</strong> <strong>with</strong> <strong>Sketch</strong> <strong>Engine</strong>


Contents Current Trends in Corpus Processing <strong>Sketch</strong> <strong>Engine</strong> Finding Collocations Using CQL<br />

Frequency function<br />

a frequency constraint allowed in the global conditions part of<br />

CQL:<br />

1:[tag="PP.*"] 2:[tag="NN.*"] & f(1.word) > 10<br />

Miloš Jakubíček LCL UK, <strong>NLP</strong>C FI MU CZ<br />

<strong>Exploiting</strong> <strong>Corpora</strong> <strong>with</strong> <strong>Sketch</strong> <strong>Engine</strong>


Contents Current Trends in Corpus Processing <strong>Sketch</strong> <strong>Engine</strong> Finding Collocations Using CQL<br />

Performance evaluation<br />

Table: Query performance evaluation – corpora legend: ◦ BNC (110M<br />

tokens), • BiWeC (version <strong>with</strong> 9.5G tokens), ∗ Czes (1.2G tokens)<br />

query # of results time (m:s)<br />

◦ [lemma="time"] 179,321 0.07<br />

◦ [lemma="t.*"] 14,660,881 3.12<br />

◦ Ex: particles 1,219,973 33.36<br />

• Ex: particles 97,671,485 32:26.48<br />

∗ Ex: idioms 66 1:6.86<br />

◦ Ex: meet/union 3 8.47<br />

• Ex: meet/union 1457 7:13.12<br />

Miloš Jakubíček LCL UK, <strong>NLP</strong>C FI MU CZ<br />

<strong>Exploiting</strong> <strong>Corpora</strong> <strong>with</strong> <strong>Sketch</strong> <strong>Engine</strong>


Contents Current Trends in Corpus Processing <strong>Sketch</strong> <strong>Engine</strong> Finding Collocations Using CQL<br />

Thank you!<br />

Thank you for your attention!<br />

Miloš Jakubíček LCL UK, <strong>NLP</strong>C FI MU CZ<br />

<strong>Exploiting</strong> <strong>Corpora</strong> <strong>with</strong> <strong>Sketch</strong> <strong>Engine</strong>

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!