15.01.2014 Views

Research design and methods

Research design and methods

Research design and methods

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Basic concepts<br />

Grammars<br />

<strong>Research</strong> <strong>design</strong> <strong>and</strong> <strong>methods</strong><br />

Ivona Kučerová<br />

COGSCIL 750<br />

September 21, 2009<br />

Ivona Kučerová: <strong>Research</strong> <strong>design</strong> <strong>and</strong> <strong>methods</strong> COGSCIL 750


Basic concepts<br />

Grammars<br />

◮ what does it mean when we say that a sentence is<br />

grammatical?<br />

Ivona Kučerová: <strong>Research</strong> <strong>design</strong> <strong>and</strong> <strong>methods</strong> COGSCIL 750


Basic concepts<br />

Grammars<br />

◮ what does it mean when we say that a sentence is<br />

grammatical?<br />

◮ in technical sense, a sentence is grammatical only with respect<br />

to a grammar<br />

Ivona Kučerová: <strong>Research</strong> <strong>design</strong> <strong>and</strong> <strong>methods</strong> COGSCIL 750


Basic concepts<br />

Grammars<br />

◮ what does it mean when we say that a sentence is<br />

grammatical?<br />

◮ in technical sense, a sentence is grammatical only with respect<br />

to a grammar<br />

◮ a sentence is grammatical if it can be generated by the<br />

grammar<br />

Ivona Kučerová: <strong>Research</strong> <strong>design</strong> <strong>and</strong> <strong>methods</strong> COGSCIL 750


Basic concepts<br />

Grammars<br />

◮ a natural language as a set of strings<br />

Ivona Kučerová: <strong>Research</strong> <strong>design</strong> <strong>and</strong> <strong>methods</strong> COGSCIL 750


Basic concepts<br />

Grammars<br />

◮ a natural language as a set of strings<br />

◮ the task: an explicit device that can decide whether a string<br />

belongs to the set or not<br />

Ivona Kučerová: <strong>Research</strong> <strong>design</strong> <strong>and</strong> <strong>methods</strong> COGSCIL 750


Basic concepts<br />

Grammars<br />

◮ a natural language as a set of strings<br />

◮ the task: an explicit device that can decide whether a string<br />

belongs to the set or not<br />

◮ we will call such a device grammar<br />

Ivona Kučerová: <strong>Research</strong> <strong>design</strong> <strong>and</strong> <strong>methods</strong> COGSCIL 750


Basic concepts<br />

Grammars<br />

Strings<br />

◮ Given a finite set A, a string on (or over) A is a finite<br />

sequence of occurrences from A<br />

Ivona Kučerová: <strong>Research</strong> <strong>design</strong> <strong>and</strong> <strong>methods</strong> COGSCIL 750


Basic concepts<br />

Grammars<br />

Strings<br />

◮ A = {a,b,c}<br />

◮ examples of strings on A:<br />

◮ acbaab<br />

◮ aacbbac<br />

◮ bbaccaab<br />

◮ . . .<br />

Ivona Kučerová: <strong>Research</strong> <strong>design</strong> <strong>and</strong> <strong>methods</strong> COGSCIL 750


Basic concepts<br />

Grammars<br />

Strings<br />

◮ strings are finite<br />

◮ A is assumed to be finite<br />

Ivona Kučerová: <strong>Research</strong> <strong>design</strong> <strong>and</strong> <strong>methods</strong> COGSCIL 750


Basic concepts<br />

Grammars<br />

Strings<br />

◮ the set from which strings are formed, A, is called vocabulary<br />

or alphabet<br />

Ivona Kučerová: <strong>Research</strong> <strong>design</strong> <strong>and</strong> <strong>methods</strong> COGSCIL 750


Basic concepts<br />

Grammars<br />

Strings<br />

◮ the length of a string is the number of occurrences of symbols<br />

in it (number of tokens!)<br />

◮ for example, the length of aabbcab is 7<br />

Ivona Kučerová: <strong>Research</strong> <strong>design</strong> <strong>and</strong> <strong>methods</strong> COGSCIL 750


Basic concepts<br />

Grammars<br />

Strings<br />

◮ careful: a string is a linearly ordered sequence<br />

◮ but not a linearly ordered set<br />

Ivona Kučerová: <strong>Research</strong> <strong>design</strong> <strong>and</strong> <strong>methods</strong> COGSCIL 750


Basic concepts<br />

Grammars<br />

Strings<br />

◮ we can define a string of length n over the alphabet A to be a<br />

function mapping the first n positive integers into A<br />

Ivona Kučerová: <strong>Research</strong> <strong>design</strong> <strong>and</strong> <strong>methods</strong> COGSCIL 750


Basic concepts<br />

Grammars<br />

Strings<br />

◮ An example<br />

◮ acbaab can be defined as the function<br />

{< 1,a >,< 2,c >,< 3,b >,< 4,a >,< 5,a >,< 6,b >}<br />

Ivona Kučerová: <strong>Research</strong> <strong>design</strong> <strong>and</strong> <strong>methods</strong> COGSCIL 750


Basic concepts<br />

Grammars<br />

Strings<br />

◮ An example<br />

◮ acbaab can be defined as the function<br />

{< 1,a >,< 2,c >,< 3,b >,< 4,a >,< 5,a >,< 6,b >}<br />

◮ How do you define bbcabca?<br />

Ivona Kučerová: <strong>Research</strong> <strong>design</strong> <strong>and</strong> <strong>methods</strong> COGSCIL 750


Basic concepts<br />

Grammars<br />

Strings<br />

◮ a special string is the unique string of length 0<br />

◮ empty string<br />

◮ e<br />

Ivona Kučerová: <strong>Research</strong> <strong>design</strong> <strong>and</strong> <strong>methods</strong> COGSCIL 750


Basic concepts<br />

Grammars<br />

Operations on strings<br />

◮ concatenation is associative:<br />

◮ for any string alpha, β, γ, (α ⌢ β) ⌢ γ = α ⌢ (β ⌢ γ)<br />

Ivona Kučerová: <strong>Research</strong> <strong>design</strong> <strong>and</strong> <strong>methods</strong> COGSCIL 750


Basic concepts<br />

Grammars<br />

Operations on strings<br />

◮ concatenation is not commutative, since in general<br />

α ⌢ β ≠ β ⌢ α<br />

Ivona Kučerová: <strong>Research</strong> <strong>design</strong> <strong>and</strong> <strong>methods</strong> COGSCIL 750


Basic concepts<br />

Grammars<br />

Operations on strings<br />

◮ A ∗ = def the set of all strings over A<br />

Ivona Kučerová: <strong>Research</strong> <strong>design</strong> <strong>and</strong> <strong>methods</strong> COGSCIL 750


Basic concepts<br />

Grammars<br />

Operations on strings<br />

◮ reversal<br />

◮ x R def the reversal of a string x<br />

◮ (acbab) R = babca<br />

◮ e R = e<br />

Ivona Kučerová: <strong>Research</strong> <strong>design</strong> <strong>and</strong> <strong>methods</strong> COGSCIL 750


Basic concepts<br />

Grammars<br />

Operations on strings<br />

(1) Given an alphabet A<br />

a. If x is a string of length 0, then x R = x (i.e., e R = e)<br />

b. Kf x is a string of length k + 1, then it is of the form<br />

wa, where α ∈ A <strong>and</strong> w ∈ A ∗ , then<br />

x R = (wa) R = aw R .<br />

Ivona Kučerová: <strong>Research</strong> <strong>design</strong> <strong>and</strong> <strong>methods</strong> COGSCIL 750


Basic concepts<br />

Grammars<br />

Operations on strings<br />

◮ the relation between concatenation <strong>and</strong> reversal:<br />

(2) For all strings x <strong>and</strong> y, (x ⌢ y) R = y R ⌢ x R<br />

(3) (bca ⌢ ca) R = (ca) R ⌢ (bca) R = ac ⌢ acb = acacb<br />

Ivona Kučerová: <strong>Research</strong> <strong>design</strong> <strong>and</strong> <strong>methods</strong> COGSCIL 750


Basic concepts<br />

Grammars<br />

Operations on strings<br />

◮ given a string x, a substring of x is any string formed from<br />

contiguous occurrences of symbols in x taken in the same<br />

order in which they occur in x<br />

◮ formally, y is a substring of x iff there exists strings z <strong>and</strong> w<br />

such that x = z ⌢ y ⌢ w<br />

◮ notice that both z <strong>and</strong> w may be empty<br />

Ivona Kučerová: <strong>Research</strong> <strong>design</strong> <strong>and</strong> <strong>methods</strong> COGSCIL 750


Basic concepts<br />

Grammars<br />

Operations on strings<br />

◮ prefix def : an initial substring<br />

◮ suffix def : a final substring<br />

Ivona Kučerová: <strong>Research</strong> <strong>design</strong> <strong>and</strong> <strong>methods</strong> COGSCIL 750


Basic concepts<br />

Grammars<br />

Language<br />

◮ Language (over a vocabulary A) def : any subset of A ∗<br />

Ivona Kučerová: <strong>Research</strong> <strong>design</strong> <strong>and</strong> <strong>methods</strong> COGSCIL 750


Basic concepts<br />

Grammars<br />

Language<br />

◮ A ∗ is a denumerably infinite set<br />

◮ the cardinality of A ∗ (|A ∗ |) is ℵ 0<br />

◮ its power set, i.e., the set of all languages over A, has<br />

cardinality 2 ℵ 0<br />

◮ the set of all languages over A is nondenumerably infinite<br />

Ivona Kučerová: <strong>Research</strong> <strong>design</strong> <strong>and</strong> <strong>methods</strong> COGSCIL 750


Basic concepts<br />

Grammars<br />

Language<br />

◮ learnability?<br />

Ivona Kučerová: <strong>Research</strong> <strong>design</strong> <strong>and</strong> <strong>methods</strong> COGSCIL 750


Basic concepts<br />

Grammars<br />

Language<br />

◮ not every language can be characterized by a finite device (a<br />

grammar)<br />

Ivona Kučerová: <strong>Research</strong> <strong>design</strong> <strong>and</strong> <strong>methods</strong> COGSCIL 750


Basic concepts<br />

Grammars<br />

Language<br />

◮ when we study formal languages, we study a scale of possible<br />

complex strings that can be distinguished by finite resources<br />

◮ we will talk about hierarchy of grammars <strong>and</strong> languages<br />

Ivona Kučerová: <strong>Research</strong> <strong>design</strong> <strong>and</strong> <strong>methods</strong> COGSCIL 750


Basic concepts<br />

Grammars<br />

Grammars<br />

◮ a formal grammar is a deductive system of axioms <strong>and</strong> rules of<br />

inference, which generates the sentences of a language as its<br />

theorems<br />

◮ by convention, a grammar contains:<br />

◮ one axiom – the initial symbol (S)<br />

◮ finite number of rules of the form φ → ω, where φ <strong>and</strong> ω are<br />

strings (R)<br />

Ivona Kučerová: <strong>Research</strong> <strong>design</strong> <strong>and</strong> <strong>methods</strong> COGSCIL 750


Basic concepts<br />

Grammars<br />

Grammars<br />

◮ grammars have two alphabets:<br />

◮ a terminal alphabet (VT )<br />

◮ a non-terminal alphabet (VN )<br />

◮ the strings we are interested in are strings over the terminal<br />

alphabet<br />

Ivona Kučerová: <strong>Research</strong> <strong>design</strong> <strong>and</strong> <strong>methods</strong> COGSCIL 750


Basic concepts<br />

Grammars<br />

An example<br />

V T (the terminal alphabet) = {a,b}<br />

V N (the nonterminal alphabet) = {S,A,B}<br />

S (the initial symbol → a member of V N )<br />

R(the set of rules) =<br />

{<br />

S → ABS<br />

S → e<br />

AB → BA<br />

BA → AB<br />

A → a<br />

B → b<br />

}<br />

Ivona Kučerová: <strong>Research</strong> <strong>design</strong> <strong>and</strong> <strong>methods</strong> COGSCIL 750


Basic concepts<br />

Grammars<br />

Grammars<br />

(4) Let Σ = V T ∪ V N . A formal grammar G is a quadruple<br />

< V T ,V N ,S,R >, where V T <strong>and</strong> V N are finite disjoint<br />

sets, S is a distinguished member of V N , <strong>and</strong> R is a finite<br />

set of ordered pairs in Σ ∗ V N Σ × Σ ∗ .<br />

Ivona Kučerová: <strong>Research</strong> <strong>design</strong> <strong>and</strong> <strong>methods</strong> COGSCIL 750


Basic concepts<br />

Grammars<br />

Grammars<br />

(5) Given a grammar G =< V T ,V N ,S,R >, a derivation is a<br />

sequence of strings x 1 ,x 2 ,...,x n (n ≥ 1) such that x 1 = S<br />

<strong>and</strong> for each x i (2 ≤ i ≤ n), x i is obtained from x i−1 by one<br />

application of some rule in R.<br />

Ivona Kučerová: <strong>Research</strong> <strong>design</strong> <strong>and</strong> <strong>methods</strong> COGSCIL 750


Basic concepts<br />

Grammars<br />

Grammars<br />

(6) A grammar G generates a string x ∈ VT ∗ if there is a<br />

derivation x 1 ,....,x n by G such that x n = x.<br />

Ivona Kučerová: <strong>Research</strong> <strong>design</strong> <strong>and</strong> <strong>methods</strong> COGSCIL 750


Basic concepts<br />

Grammars<br />

Grammars<br />

(7) The language generated by a grammar G, denoted L(G), is<br />

the set of all strings generated by G.<br />

Ivona Kučerová: <strong>Research</strong> <strong>design</strong> <strong>and</strong> <strong>methods</strong> COGSCIL 750

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!