Exploiting Corpora with Sketch Engine - NLP Centre - Masaryk ...
Exploiting Corpora with Sketch Engine - NLP Centre - Masaryk ...
Exploiting Corpora with Sketch Engine - NLP Centre - Masaryk ...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
Contents Current Trends in Corpus Processing <strong>Sketch</strong> <strong>Engine</strong> Finding Collocations Using CQL<br />
<strong>Exploiting</strong> <strong>Corpora</strong> <strong>with</strong> <strong>Sketch</strong> <strong>Engine</strong><br />
Miloš Jakubíček<br />
Lexical Computing Ltd., Brighton, United Kingdom<br />
milos.jakubicek@sketchengine.co.uk<br />
Natural Language Processing <strong>Centre</strong><br />
Faculty of Informatics, <strong>Masaryk</strong> University, Brno, Czech Republic<br />
jak@fi.muni.cz<br />
CLARA course on Multilingual Lexical Resources and Tools,<br />
20-23 June 2011, Bergen<br />
Miloš Jakubíček LCL UK, <strong>NLP</strong>C FI MU CZ<br />
<strong>Exploiting</strong> <strong>Corpora</strong> <strong>with</strong> <strong>Sketch</strong> <strong>Engine</strong>
Contents Current Trends in Corpus Processing <strong>Sketch</strong> <strong>Engine</strong> Finding Collocations Using CQL<br />
Contents<br />
1 Current Trends in Corpus Processing<br />
2 <strong>Sketch</strong> <strong>Engine</strong><br />
3 Finding Collocations<br />
4 Using CQL<br />
Miloš Jakubíček LCL UK, <strong>NLP</strong>C FI MU CZ<br />
<strong>Exploiting</strong> <strong>Corpora</strong> <strong>with</strong> <strong>Sketch</strong> <strong>Engine</strong>
Contents Current Trends in Corpus Processing <strong>Sketch</strong> <strong>Engine</strong> Finding Collocations Using CQL<br />
Organisation<br />
Schedule<br />
Building & Searching <strong>Corpora</strong><br />
Exercises<br />
Word Sense & Word <strong>Sketch</strong><br />
Exercises<br />
If you don’t mind, we will mix lectures and exercises<br />
We will be using <strong>Sketch</strong> <strong>Engine</strong> installation at<br />
http://the.sketchengine.co.uk – you need to register for<br />
a trial account<br />
Ask questions immediately<br />
Do get back if you’d like to focus on anything particular<br />
Download slides from<br />
http://nlp.fi.muni.cz/~xjakub/clara.pdf<br />
Miloš Jakubíček LCL UK, <strong>NLP</strong>C FI MU CZ<br />
<strong>Exploiting</strong> <strong>Corpora</strong> <strong>with</strong> <strong>Sketch</strong> <strong>Engine</strong>
Contents Current Trends in Corpus Processing <strong>Sketch</strong> <strong>Engine</strong> Finding Collocations Using CQL<br />
Today’s <strong>Corpora</strong> I<br />
billions of tokens<br />
complex multi-level multi-value annotation<br />
wide range of languages<br />
growing demand on complex searching – moving from<br />
morphology to syntax and semantics<br />
search API for automatic information retrieval and<br />
post-processing in particular applications needed<br />
parallel distributed processing needed<br />
Miloš Jakubíček LCL UK, <strong>NLP</strong>C FI MU CZ<br />
<strong>Exploiting</strong> <strong>Corpora</strong> <strong>with</strong> <strong>Sketch</strong> <strong>Engine</strong>
Contents Current Trends in Corpus Processing <strong>Sketch</strong> <strong>Engine</strong> Finding Collocations Using CQL<br />
Today’s <strong>Corpora</strong> II<br />
What is the key property of a modern corpus?<br />
Miloš Jakubíček LCL UK, <strong>NLP</strong>C FI MU CZ<br />
<strong>Exploiting</strong> <strong>Corpora</strong> <strong>with</strong> <strong>Sketch</strong> <strong>Engine</strong>
Contents Current Trends in Corpus Processing <strong>Sketch</strong> <strong>Engine</strong> Finding Collocations Using CQL<br />
Today’s <strong>Corpora</strong> III<br />
Yes, it’s the size.<br />
Miloš Jakubíček LCL UK, <strong>NLP</strong>C FI MU CZ<br />
<strong>Exploiting</strong> <strong>Corpora</strong> <strong>with</strong> <strong>Sketch</strong> <strong>Engine</strong>
Contents Current Trends in Corpus Processing <strong>Sketch</strong> <strong>Engine</strong> Finding Collocations Using CQL<br />
Today’s <strong>Corpora</strong> IV<br />
Why are quantitative aspects so important?<br />
Law of large numbers<br />
Most language phenomena follow the Zipfian distribution<br />
Miloš Jakubíček LCL UK, <strong>NLP</strong>C FI MU CZ<br />
<strong>Exploiting</strong> <strong>Corpora</strong> <strong>with</strong> <strong>Sketch</strong> <strong>Engine</strong>
Contents Current Trends in Corpus Processing <strong>Sketch</strong> <strong>Engine</strong> Finding Collocations Using CQL<br />
Zipf’s law I<br />
100<br />
80<br />
60<br />
40<br />
20<br />
zipf(x)<br />
0<br />
0 200 400 600 800 1000<br />
Miloš Jakubíček LCL UK, <strong>NLP</strong>C FI MU CZ<br />
<strong>Exploiting</strong> <strong>Corpora</strong> <strong>with</strong> <strong>Sketch</strong> <strong>Engine</strong>
Contents Current Trends in Corpus Processing <strong>Sketch</strong> <strong>Engine</strong> Finding Collocations Using CQL<br />
Zipf’s law II<br />
may be simplified to inductive definition:<br />
Zipf’s law (simplified)<br />
frequency of the n-th element fn ≈ 1<br />
n · f1<br />
⇒ frequency is inversely proportional to the rank according to<br />
frequency<br />
⇒ one needs really large corpora to capture all the variety of<br />
many language phenomena<br />
Miloš Jakubíček LCL UK, <strong>NLP</strong>C FI MU CZ<br />
<strong>Exploiting</strong> <strong>Corpora</strong> <strong>with</strong> <strong>Sketch</strong> <strong>Engine</strong>
Contents Current Trends in Corpus Processing <strong>Sketch</strong> <strong>Engine</strong> Finding Collocations Using CQL<br />
Zipf’s law III<br />
Substantives + Verb tags on the Brown corpus<br />
Miloš Jakubíček LCL UK, <strong>NLP</strong>C FI MU CZ<br />
<strong>Exploiting</strong> <strong>Corpora</strong> <strong>with</strong> <strong>Sketch</strong> <strong>Engine</strong>
Contents Current Trends in Corpus Processing <strong>Sketch</strong> <strong>Engine</strong> Finding Collocations Using CQL<br />
Size is not everything . . .<br />
Why are qualitative aspects so important – well this can’t be really<br />
a question, right?<br />
web is the most used data source to obtain enough source<br />
texts – „web as corpus“<br />
web is garbage (by definition)<br />
garbage as corpus?<br />
building corpora from web requires extensive post-processing<br />
Miloš Jakubíček LCL UK, <strong>NLP</strong>C FI MU CZ<br />
<strong>Exploiting</strong> <strong>Corpora</strong> <strong>with</strong> <strong>Sketch</strong> <strong>Engine</strong>
Contents Current Trends in Corpus Processing <strong>Sketch</strong> <strong>Engine</strong> Finding Collocations Using CQL<br />
Building corpora from web<br />
crawling the data<br />
detecting language, detecting encoding<br />
cleaning boilerplate (metadata, ads, navigation etc.): jusText<br />
(web demo)<br />
deduplicating: onion<br />
see Pomikálek (2011) PhD thesis<br />
Miloš Jakubíček LCL UK, <strong>NLP</strong>C FI MU CZ<br />
<strong>Exploiting</strong> <strong>Corpora</strong> <strong>with</strong> <strong>Sketch</strong> <strong>Engine</strong>
Contents Current Trends in Corpus Processing <strong>Sketch</strong> <strong>Engine</strong> Finding Collocations Using CQL<br />
Deduplicating<br />
quite straightforward for full duplicates, but that’s not enough<br />
people do not only copy full documents<br />
they copy just parts of the document: orig vs. copy<br />
they copy and modify: orig vs. mod<br />
they copy and extend: orig vs. ext<br />
Miloš Jakubíček LCL UK, <strong>NLP</strong>C FI MU CZ<br />
<strong>Exploiting</strong> <strong>Corpora</strong> <strong>with</strong> <strong>Sketch</strong> <strong>Engine</strong>
Contents Current Trends in Corpus Processing <strong>Sketch</strong> <strong>Engine</strong> Finding Collocations Using CQL<br />
Building and searching corpora<br />
standard IT solution for storing data is a relational database,<br />
i. e. a table (rows are elements of the relation)<br />
not suitable for storing corpora at all (text doesn’t have<br />
relational backbone, nor most annotation types)<br />
requirements for specific database management system<br />
and appropriate user interface on top of it<br />
Miloš Jakubíček LCL UK, <strong>NLP</strong>C FI MU CZ<br />
<strong>Exploiting</strong> <strong>Corpora</strong> <strong>with</strong> <strong>Sketch</strong> <strong>Engine</strong>
Contents Current Trends in Corpus Processing <strong>Sketch</strong> <strong>Engine</strong> Finding Collocations Using CQL<br />
Introducing <strong>Sketch</strong> <strong>Engine</strong> (SkE)<br />
since 2003 as a commercial variant of the Manatee/Bonito<br />
corpus management system (Rychlý 1999)<br />
named after one of the key features – word sketches (to be<br />
introduced)<br />
most components released as open-source No<strong>Sketch</strong> <strong>Engine</strong>,<br />
following pay-per-service principle<br />
Miloš Jakubíček LCL UK, <strong>NLP</strong>C FI MU CZ<br />
<strong>Exploiting</strong> <strong>Corpora</strong> <strong>with</strong> <strong>Sketch</strong> <strong>Engine</strong>
Contents Current Trends in Corpus Processing <strong>Sketch</strong> <strong>Engine</strong> Finding Collocations Using CQL<br />
SkE architecture I<br />
Miloš Jakubíček LCL UK, <strong>NLP</strong>C FI MU CZ<br />
<strong>Exploiting</strong> <strong>Corpora</strong> <strong>with</strong> <strong>Sketch</strong> <strong>Engine</strong>
Contents Current Trends in Corpus Processing <strong>Sketch</strong> <strong>Engine</strong> Finding Collocations Using CQL<br />
SkE architecture II<br />
Components:<br />
Finlib (C++) – Fast Indexing Library (core indexing and data<br />
stream processing module)<br />
Manatee (C++) – Module for corpus encoding, searching and<br />
managing<br />
Bonito (Python) – Web user interface for Manatee<br />
Corpus Architect (Python) – Web user interface for managing<br />
corpora<br />
Miloš Jakubíček LCL UK, <strong>NLP</strong>C FI MU CZ<br />
<strong>Exploiting</strong> <strong>Corpora</strong> <strong>with</strong> <strong>Sketch</strong> <strong>Engine</strong>
Contents Current Trends in Corpus Processing <strong>Sketch</strong> <strong>Engine</strong> Finding Collocations Using CQL<br />
<strong>Sketch</strong> <strong>Engine</strong> infrastructure<br />
= servers and their equipment<br />
what do we need?<br />
most companies either store lots of data, but don’t need fast<br />
access (e.g. backups, logs) or store quite small amount of<br />
data accessible fastly (information systems, databases)<br />
we need both + lots of memory and fair number of CPU cores<br />
we need to manage concurrent access<br />
Miloš Jakubíček LCL UK, <strong>NLP</strong>C FI MU CZ<br />
<strong>Exploiting</strong> <strong>Corpora</strong> <strong>with</strong> <strong>Sketch</strong> <strong>Engine</strong>
Contents Current Trends in Corpus Processing <strong>Sketch</strong> <strong>Engine</strong> Finding Collocations Using CQL<br />
<strong>Sketch</strong> <strong>Engine</strong> in numbers<br />
by 2011:<br />
> 8500 registered users<br />
191 preloaded corpora, 38G tokens, 47 languages<br />
4,335 user corpora, 2.5G tokens<br />
ca. 30,000 requests per day<br />
Miloš Jakubíček LCL UK, <strong>NLP</strong>C FI MU CZ<br />
<strong>Exploiting</strong> <strong>Corpora</strong> <strong>with</strong> <strong>Sketch</strong> <strong>Engine</strong>
Contents Current Trends in Corpus Processing <strong>Sketch</strong> <strong>Engine</strong> Finding Collocations Using CQL<br />
SkE storage size<br />
Miloš Jakubíček LCL UK, <strong>NLP</strong>C FI MU CZ<br />
<strong>Exploiting</strong> <strong>Corpora</strong> <strong>with</strong> <strong>Sketch</strong> <strong>Engine</strong>
Contents Current Trends in Corpus Processing <strong>Sketch</strong> <strong>Engine</strong> Finding Collocations Using CQL<br />
SkE 2007<br />
Miloš Jakubíček LCL UK, <strong>NLP</strong>C FI MU CZ<br />
<strong>Exploiting</strong> <strong>Corpora</strong> <strong>with</strong> <strong>Sketch</strong> <strong>Engine</strong>
Contents Current Trends in Corpus Processing <strong>Sketch</strong> <strong>Engine</strong> Finding Collocations Using CQL<br />
SkE 2008<br />
Miloš Jakubíček LCL UK, <strong>NLP</strong>C FI MU CZ<br />
<strong>Exploiting</strong> <strong>Corpora</strong> <strong>with</strong> <strong>Sketch</strong> <strong>Engine</strong>
Contents Current Trends in Corpus Processing <strong>Sketch</strong> <strong>Engine</strong> Finding Collocations Using CQL<br />
SkE 2010<br />
Miloš Jakubíček LCL UK, <strong>NLP</strong>C FI MU CZ<br />
<strong>Exploiting</strong> <strong>Corpora</strong> <strong>with</strong> <strong>Sketch</strong> <strong>Engine</strong>
Contents Current Trends in Corpus Processing <strong>Sketch</strong> <strong>Engine</strong> Finding Collocations Using CQL<br />
SkE 2011<br />
Miloš Jakubíček LCL UK, <strong>NLP</strong>C FI MU CZ<br />
<strong>Exploiting</strong> <strong>Corpora</strong> <strong>with</strong> <strong>Sketch</strong> <strong>Engine</strong>
Contents Current Trends in Corpus Processing <strong>Sketch</strong> <strong>Engine</strong> Finding Collocations Using CQL<br />
SkE 2011<br />
Miloš Jakubíček LCL UK, <strong>NLP</strong>C FI MU CZ<br />
<strong>Exploiting</strong> <strong>Corpora</strong> <strong>with</strong> <strong>Sketch</strong> <strong>Engine</strong>
Contents Current Trends in Corpus Processing <strong>Sketch</strong> <strong>Engine</strong> Finding Collocations Using CQL<br />
Searching corpora – selected topics<br />
corpus querying using CQL – Corpus Query Language<br />
finding collocates<br />
Miloš Jakubíček LCL UK, <strong>NLP</strong>C FI MU CZ<br />
<strong>Exploiting</strong> <strong>Corpora</strong> <strong>with</strong> <strong>Sketch</strong> <strong>Engine</strong>
Contents Current Trends in Corpus Processing <strong>Sketch</strong> <strong>Engine</strong> Finding Collocations Using CQL<br />
Finding collocates<br />
many association scores available<br />
most of which do not have linguistically plausible<br />
interpretation<br />
most of which do not scale well according to corpus size ⇒<br />
cannot be used for cross-corpora comparisons<br />
see e. g. Rychlý (2008), Evert (2004)<br />
Miloš Jakubíček LCL UK, <strong>NLP</strong>C FI MU CZ<br />
<strong>Exploiting</strong> <strong>Corpora</strong> <strong>with</strong> <strong>Sketch</strong> <strong>Engine</strong>
Contents Current Trends in Corpus Processing <strong>Sketch</strong> <strong>Engine</strong> Finding Collocations Using CQL<br />
logDice I<br />
logDice<br />
2 · fxy<br />
14 + log2D = 14 + log2<br />
fx + fy<br />
yet another association score for collocations<br />
simple, scales well for corpus size<br />
has linguistically plausible interpretation<br />
based on the well known Dice association score (D above)<br />
see Rychlý (2008) sample comparison to T-score, MI-score<br />
and others<br />
Miloš Jakubíček LCL UK, <strong>NLP</strong>C FI MU CZ<br />
<strong>Exploiting</strong> <strong>Corpora</strong> <strong>with</strong> <strong>Sketch</strong> <strong>Engine</strong>
Contents Current Trends in Corpus Processing <strong>Sketch</strong> <strong>Engine</strong> Finding Collocations Using CQL<br />
logDice II<br />
Formal properties of logDice:<br />
Theoretical maximum is 14 – when X always co-occurs <strong>with</strong> Y<br />
and vice versa. Usually the value is less than 10.<br />
Value of 0 means there is less than 1 co-occurrence of X and<br />
Y per 16000 X or 16000 Y. We can say that negative values<br />
mean there is no statistical significance of XY co-occurrence.<br />
Comparing two scores, plus 1 point means 2 times more<br />
co-occurences, plus 7 points means roughly 100 times more<br />
co-occurences.<br />
The score is independent of the corpus size. The score<br />
combines relative frequencies of XY in relation to X and Y.<br />
Miloš Jakubíček LCL UK, <strong>NLP</strong>C FI MU CZ<br />
<strong>Exploiting</strong> <strong>Corpora</strong> <strong>with</strong> <strong>Sketch</strong> <strong>Engine</strong>
Contents Current Trends in Corpus Processing <strong>Sketch</strong> <strong>Engine</strong> Finding Collocations Using CQL<br />
CQL<br />
= Corpus Query Language (Christ and Schulze, 1994)<br />
positions and positional attributes: [attr="value"]<br />
structures and structural attributes: <br />
example:<br />
[word=".*ing" & tag="V.*"]<br />
[word=".*ing"] | [tag="V.*"]<br />
<br />
established a <strong>with</strong>in query:<br />
[tag="N.*"]+ <strong>with</strong>in <br />
and alternative meet/union query:<br />
(meet [lemma="take"] [tag="N.*"] -5 +5)<br />
(union (meet ...) (meet ...))<br />
Miloš Jakubíček LCL UK, <strong>NLP</strong>C FI MU CZ<br />
<strong>Exploiting</strong> <strong>Corpora</strong> <strong>with</strong> <strong>Sketch</strong> <strong>Engine</strong>
Contents Current Trends in Corpus Processing <strong>Sketch</strong> <strong>Engine</strong> Finding Collocations Using CQL<br />
CQL in Manatee/Bonito<br />
ehnancements and differences to the original CQL syntax<br />
<strong>with</strong>in and containing <br />
meet/union (sub)query<br />
inequality comparisons<br />
frequency function<br />
Miloš Jakubíček LCL UK, <strong>NLP</strong>C FI MU CZ<br />
<strong>Exploiting</strong> <strong>Corpora</strong> <strong>with</strong> <strong>Sketch</strong> <strong>Engine</strong>
Contents Current Trends in Corpus Processing <strong>Sketch</strong> <strong>Engine</strong> Finding Collocations Using CQL<br />
<strong>with</strong>in/containing queries<br />
searching for particles:<br />
[tag="PR.*"] <strong>with</strong>in [tag="V.*"] [tag="AT0"]?<br />
[tag="AJ0"]* [tag="(PR.?|N.*)"] [tag="PR.*"]<br />
<strong>with</strong>in <br />
searching for a Czech idiom “hnout někomu žlučí” (“to get<br />
somebody’s goat”):<br />
word-by-word translated as:<br />
hnout “move” [V, infinitive]<br />
někomu “somebody” [N, dative]<br />
žlučí “bile” [N, instrumental].<br />
containing [lemma="hnout"] containing<br />
[tag=".*c3.*"] containing [word="žlučí"]<br />
Miloš Jakubíček LCL UK, <strong>NLP</strong>C FI MU CZ<br />
<strong>Exploiting</strong> <strong>Corpora</strong> <strong>with</strong> <strong>Sketch</strong> <strong>Engine</strong>
Contents Current Trends in Corpus Processing <strong>Sketch</strong> <strong>Engine</strong> Finding Collocations Using CQL<br />
<strong>with</strong>in/containing queries<br />
structure boundaries: begin: , whole structure: ,<br />
end: <br />
changes: <strong>with</strong>in not allowed anymore, use <strong>with</strong>in<br />
<br />
Miloš Jakubíček LCL UK, <strong>NLP</strong>C FI MU CZ<br />
<strong>Exploiting</strong> <strong>Corpora</strong> <strong>with</strong> <strong>Sketch</strong> <strong>Engine</strong>
Contents Current Trends in Corpus Processing <strong>Sketch</strong> <strong>Engine</strong> Finding Collocations Using CQL<br />
meet/union queries<br />
combined <strong>with</strong> regular query: <br />
containing (meet [lemma="have"] [tag="P.*"] -5 5)<br />
containing (meet [tag="N.*"] [lemma="blue"])<br />
changes: meet/union queries can be used on any position,<br />
they can contain labels and no MU keyword is required (and<br />
deprecated):<br />
(meet 1:[] 2:[]) & 1.tag = 2.tag<br />
Miloš Jakubíček LCL UK, <strong>NLP</strong>C FI MU CZ<br />
<strong>Exploiting</strong> <strong>Corpora</strong> <strong>with</strong> <strong>Sketch</strong> <strong>Engine</strong>
Contents Current Trends in Corpus Processing <strong>Sketch</strong> <strong>Engine</strong> Finding Collocations Using CQL<br />
Inequality comparisons<br />
former comparisons allowed only equality and its negation:<br />
[attr="value"] [attr!="value"]<br />
inequality comparisons implemented: [attr="value"] [attr!="value"]<br />
intended usage:<br />
[tag="AJ.*"] [tag="NN.*"] <strong>with</strong>in ="2009"><br />
sophisticated comparison performed on the attribute value:<br />
Contents Current Trends in Corpus Processing <strong>Sketch</strong> <strong>Engine</strong> Finding Collocations Using CQL<br />
Fixed string comparisons<br />
normally the CQL values are regular expressions<br />
sometimes this is not desirable (batch processing needs<br />
escaping of metacharacters)<br />
new == and !== operator introduced for fixed strings<br />
comparison<br />
no escaping needed except for ’"’ and ’\’<br />
examples: ".", "$", " " matches a single dot, dollar sign and<br />
tilda, respectively, "\n" matches a backslash followed by the<br />
character n,<br />
Miloš Jakubíček LCL UK, <strong>NLP</strong>C FI MU CZ<br />
<strong>Exploiting</strong> <strong>Corpora</strong> <strong>with</strong> <strong>Sketch</strong> <strong>Engine</strong>
Contents Current Trends in Corpus Processing <strong>Sketch</strong> <strong>Engine</strong> Finding Collocations Using CQL<br />
Frequency function<br />
a frequency constraint allowed in the global conditions part of<br />
CQL:<br />
1:[tag="PP.*"] 2:[tag="NN.*"] & f(1.word) > 10<br />
Miloš Jakubíček LCL UK, <strong>NLP</strong>C FI MU CZ<br />
<strong>Exploiting</strong> <strong>Corpora</strong> <strong>with</strong> <strong>Sketch</strong> <strong>Engine</strong>
Contents Current Trends in Corpus Processing <strong>Sketch</strong> <strong>Engine</strong> Finding Collocations Using CQL<br />
Performance evaluation<br />
Table: Query performance evaluation – corpora legend: ◦ BNC (110M<br />
tokens), • BiWeC (version <strong>with</strong> 9.5G tokens), ∗ Czes (1.2G tokens)<br />
query # of results time (m:s)<br />
◦ [lemma="time"] 179,321 0.07<br />
◦ [lemma="t.*"] 14,660,881 3.12<br />
◦ Ex: particles 1,219,973 33.36<br />
• Ex: particles 97,671,485 32:26.48<br />
∗ Ex: idioms 66 1:6.86<br />
◦ Ex: meet/union 3 8.47<br />
• Ex: meet/union 1457 7:13.12<br />
Miloš Jakubíček LCL UK, <strong>NLP</strong>C FI MU CZ<br />
<strong>Exploiting</strong> <strong>Corpora</strong> <strong>with</strong> <strong>Sketch</strong> <strong>Engine</strong>
Contents Current Trends in Corpus Processing <strong>Sketch</strong> <strong>Engine</strong> Finding Collocations Using CQL<br />
Thank you!<br />
Thank you for your attention!<br />
Miloš Jakubíček LCL UK, <strong>NLP</strong>C FI MU CZ<br />
<strong>Exploiting</strong> <strong>Corpora</strong> <strong>with</strong> <strong>Sketch</strong> <strong>Engine</strong>