28.10.2014 Views

Machine Translation Using Constraint-Based Synchronous Grammar

Machine Translation Using Constraint-Based Synchronous Grammar

Machine Translation Using Constraint-Based Synchronous Grammar

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

TSINGHUA SCIENCE AND TECHNOLOGY<br />

ISSN 1007-0214 06/16 pp295-306<br />

Volume 11, Number 3, June 2006<br />

<strong>Machine</strong> <strong>Translation</strong> <strong>Using</strong> <strong>Constraint</strong>-<strong>Based</strong><br />

<strong>Synchronous</strong> <strong>Grammar</strong> *<br />

WONG Fai ( 黄 辉 ), DONG Mingchui ( 董 名 垂 ) ** , HU Dongcheng ( 胡 东 成 )<br />

Department of Automation, Tsinghua University, Beijing 100084, China<br />

Abstract: A synchronous grammar based on the formalism of context-free grammar was developed by generalizing<br />

the first component of production that models the source text. Unlike other synchronous grammars,<br />

the grammar allows multiple target productions to be associated to a single production rule which can be<br />

used to guide a parser to infer different possible translational equivalences for a recognized input string according<br />

to the feature constraints of symbols in the pattern. An extended generalized LR algorithm was<br />

adapted to the parsing of the proposed formalism to analyze the syntactic structure of a language. The<br />

grammar was used as the basis for building a machine translation system for Portuguese to Chinese translation.<br />

The empirical results show that the grammar is more expressive when modeling the translational<br />

equivalences of parallel texts for machine translation and grammar rewriting applications.<br />

Key words: constraint-based synchronous grammar; machine translation; Portuguese to Chinese<br />

translation<br />

Introduction<br />

In machine translation, analysis of the structural deviations<br />

of the languages pairs is key to transforming one<br />

language into another. This analysis requires a large<br />

number of structural transformations, both grammatically<br />

and conceptually. The problems of syntactic<br />

complexity and word sense ambiguity have been the<br />

major obstacles to high-quality translations. Several<br />

approaches have been proposed to overcome the obstacles<br />

and, hence, improve the quality of translation<br />

systems.<br />

Ranbow and Satta [1] stated that much of theoretical<br />

linguistics can be formulated in a very natural manner<br />

as correspondences between layers of representations.<br />

*<br />

**<br />

Received: 2004-11-10<br />

Supported by the Center of Scientific and Technological Research<br />

of University of Macao (CATIVO: 3678)<br />

To whom correspondence should be addressed.<br />

E-mail: dmc@sftw.umac.mo; Tel: 86-10-82902115<br />

Similarly, many problems in natural language processing,<br />

in particular language translation and grammar<br />

rewriting systems, can be expressed as transduction<br />

through the use of synchronous formalisms [2-6] . <strong>Synchronous</strong><br />

grammars are becoming more and more<br />

popular for the formal description of parallel texts representing<br />

translations. The underlying idea of such<br />

formalisms is to combine two generative devices<br />

through pairing their productions in such a way that<br />

links the right-hand side non-terminal symbols in the<br />

paired productions. However, such formalisms are less<br />

expressive and unable to express mutual translations<br />

that have different lengths and crossing dependencies.<br />

Moreover, synchronous formalisms do not deal with<br />

unification and feature structures, as in unificationbased<br />

formalisms, that give patterns additional power<br />

for describing constraints on features. For example,<br />

multiple context-free grammar [4] uses functions for<br />

the non-terminal symbols in the productions to further<br />

interpret the symbols in the target generation. Inversion<br />

transduction grammar (ITG) [7,8] has been proposed for


296<br />

simultaneously bracketing parallel corpora as a variant<br />

of syntax directed translation schema [9] . But these<br />

formalisms lack the expressive ability to describe discontinuous<br />

constituents in linguistic expressions. The<br />

generalized multitext grammar (GMTG) [5,10] maintains<br />

two sets of productions as components, one for each<br />

language, for modeling parallel texts. Although GMTG<br />

is more expressive and can be used to express independent<br />

rewriting, the lack of flexibility in the description<br />

constraints on the features associated with a nonterminal<br />

makes it difficult to develop a practical machine<br />

translation system.<br />

Lexical-semantic formalisms and unification-based<br />

grammar formalisms such as lexical function grammar<br />

(LFG) [11-13] use constituent structures (c-structure) and<br />

functional structures (f-structure) for semantic interpretation<br />

of words. Head-driven phrase structural grammar<br />

(HPSG) [14] expresses the units of linguistic information<br />

such as syntactic, semantic, as well as pragmatic<br />

information of utterances in typed feature structures<br />

[15] to facilitate computationally precise descriptions<br />

of natural language syntax and semantics. The<br />

descriptive power of these grammars may be defined<br />

specifically enough to give correct translations of individual<br />

usages of words and phrases. However, translation<br />

systems based on these formalisms are not possible<br />

without much more efficient parsing and disambiguation<br />

algorithms for these formalisms. Moreover,<br />

the construction of these formalisms needs deeper<br />

grammatical knowledge to develop grammatical rules.<br />

The integration of syntactical and semantic information<br />

requires the use of many features making the task<br />

of writing specific constraint grammars too difficult for<br />

a non-specialist. Another limitation of translation applications<br />

is that these formalisms are not ready to describe<br />

the relations between two-language pairs that researchers<br />

need for translation systems.<br />

This paper describes a variation of synchronous<br />

grammar, the constraint-based synchronous grammar<br />

(CSG), based on the formalism of context-free grammar.<br />

The CSG uses feature structures such as in unification-based<br />

grammar to generalize the sentential patterns<br />

in the source text while grouping the corresponding<br />

target rewriting rules for each production in a<br />

vector representing the possible translation patterns for<br />

the source production. The rule choice for target<br />

Tsinghua Science and Technology, June 2006, 11(3): 295-306<br />

generation is based on the feature’s constraints of the<br />

non-terminal symbols in the pattern. The motivation is<br />

three-fold. First, synchronous formalisms have been<br />

proposed to model parallel texts, since such algorithms<br />

can infer the synchronous structures of texts for two<br />

different languages through the grammatical representation<br />

of their syntactic deviations. These synchronous<br />

structures are quite useful for the analysis of language<br />

pairs in machine translation systems. Secondly, augmentation<br />

of the synchronous models with feature<br />

structures enhances the pattern recognition with additional<br />

power to describe gender, number, agreement,<br />

etc., since the descriptive power of unification-based<br />

grammars is considerably greater than that of the classical<br />

CFG [11] . Finally, the combining of the notational<br />

and intuitive simplicity of the CFG with the power of<br />

the feature constraints provides grammar formalism<br />

with a better descriptive power than the CFG to give<br />

more efficient parsing and generation controlled by the<br />

feature constraints of symbols to provide better word<br />

sense and syntax disambiguation.<br />

The paper describes the formalism of the CSG with<br />

a parsing algorithm for the formalism. The CSG is then<br />

used to construct a machine translation system for<br />

Portuguese to Chinese translation.<br />

1 <strong>Constraint</strong>-<strong>Based</strong> <strong>Synchronous</strong><br />

<strong>Grammar</strong>s<br />

The CSGs are defined using the syntax of context-free<br />

grammar for synchronous language pairs. The formalism<br />

consists of a set of generative productions with<br />

each production constructed by a pair of context free<br />

grammar with/without syntactic head and link constraints<br />

for the non-terminal symbols in the patterns.<br />

The first component (on the right hand side of the productions)<br />

represents the sentential patterns of the<br />

source language, the source component, while the second<br />

component represents the translation patterns in<br />

the target language, the target component. Unlike other<br />

synchronous formalisms, the production target component<br />

consists of one or more generative rules associated<br />

with zero or one or more controlled conditions<br />

based on the features of the non-terminal symbols of<br />

the source rule for describing the possible generation<br />

correspondences in the target translation. The source


WONG Fai ( 黄 辉 ) et al:<strong>Machine</strong> <strong>Translation</strong> <strong>Using</strong> <strong>Constraint</strong>-<strong>Based</strong> … 297<br />

components in the CSG are then generalized by handling<br />

the feature’s constraints in the target component,<br />

which also helps reduce the grammar size. For example,<br />

the following is one of the productions used in the<br />

Portuguese to Chinese translation system:<br />

S→NP 1 VP* NP 2 PP NP 3 {[NP 1 VP 1 NP 3 VP 2 NP 2 ;<br />

VP cate =vb1,VP s:sem =NP 1sem , VP io:sem =NP 2sem ,VP o:sem = NP 3sem ],<br />

[NP 1 VP NP 3 NP 2 ; VP cate = vb0,VP s:sem =NP 1sem ,<br />

VP io:sem =NP 2sem ]} (1)<br />

The production has two components besides the reduced<br />

syntactic symbol on the left hand side, the first<br />

modeling Portuguese and the second Chinese. The target<br />

component in this production consists of two generative<br />

rules maintained in a vector, each of which involves<br />

control conditions based on the symbol features<br />

from the source component, which are used as the selectional<br />

preferences during parsing. These constraints<br />

in the parsing/generation algorithm are used for inferring<br />

not only the input structure to decide which structures<br />

are possible or probable, but also the output text<br />

structure for the target translation. For example, the<br />

condition expression: VP cate =vb1, VP s:sem =NP 1sem ,<br />

VP io:sem =NP 2sem , VP o:sem =NP 3sem , specifies that the<br />

senses of the first, second, and third nouns (NPs) in the<br />

input strings match the sense of the subject and the direct<br />

and indirect objects governed by the verb, VP,<br />

with category type vb1. When the condition is satisfied,<br />

the source structure is successfully recognized and the<br />

corresponding structure of the target language, NP 1<br />

VP 1 NP 3 VP 2 NP 2 , is also determined.<br />

Non-terminal symbols in the source and target rules<br />

are linked if they are given the same index “subscripts”<br />

for multiple occurrences, such as NP in the production:<br />

S→NP 1 VP NP 2 PP NP 3 [NP 1 VP * NP 3 NP 2 ]. Symbols<br />

that appear only once in both the source and target<br />

rules, such as VP, are implicitly linked to give the synchronous<br />

rewriting. Linked non-terminal symbols must<br />

be derived from a sequence of synchronized pairs.<br />

Consider the production: S→NP 1 VP NP 2 PP NP 3 [NP 1<br />

VP * NP 3 NP 2 ], where the second NP (NP 2 ) in the<br />

source rule corresponds to the third NP (NP 2 ) in the<br />

target rule and the third NP (NP 3 ) in the source rule<br />

corresponds to the second NP (NP 3 ) in the target pattern,<br />

while the first NP (NP 1 ) and VP correspond to<br />

their counterparts in the source and target rules. The<br />

symbol marked by an “*” is designated as the head<br />

element in the pattern, with the features of the designated<br />

head symbol propagated to the reduced nonterminal<br />

symbol on the left side of the production rule<br />

to achieve the feature inheritance in the CSG formalism.<br />

The use of feature structures associated with nonterminal<br />

symbols will be explained further in Section<br />

2.2.<br />

When modeling natural languages, especially when<br />

processing language pairs, non-standard linguistic<br />

phenomena such as crossing dependencies, and discontinuous<br />

constituents must be carefully analyzed due to<br />

the structural differences between the two languages, in<br />

particular for languages from different families such as<br />

Portuguese and Chinese [16,17] . Linguistic expressions can<br />

vanish or appear in translation. For example, the preposition<br />

(PP) in the source rule does not show up in any of<br />

the target rules in Production (1). In contrast, Production<br />

(2) allows the Chinese characters “ 本 ” and “ 辆 ” to appear<br />

in the target rules as modifiers of the noun (NP) together<br />

with the quantifier (num) as the proper translation<br />

for the source text. This explicitly relaxes the synchronization<br />

constraint, so that the two components<br />

can be rewritten independently.<br />

NP→num NP * {[num 本 NP; NP sem =SEM _book ],<br />

[num 辆 NP; NP sem =SEM _automobile ]} (2)<br />

An important strength of the CSG is its expressive<br />

power to describe discontinuous constituents. Chinese<br />

frequently uses discontinuously distributed combination<br />

words. For example, take the sentence pair [“Ele<br />

vendeu-me todas as uvas. (He sell me all the grapes.)”,<br />

“ 他 把 所 有 的 葡 萄 卖 给 了 我 ”]. The Chinese preposition<br />

“ 把 ” and the verb “ 卖 给 了 ” should be paired with<br />

the Portuguese verb “vendeu” which causes a fan-out<br />

with the discontinuous constituent in the Chinese component.<br />

(The term fan-out describes a word whose<br />

translation is paired with discontinuous words in<br />

the target language, e.g., “vendeu[-pro] [NP]” in<br />

Portuguese gives the similar English translation “sell<br />

[NP] to [pro]”, so “vendeu”, in this case, corresponds<br />

to both “sell” and “to”.) The following fragment of<br />

CSG productions represents such relationships.<br />

S→NP 1 VP * NP 2 NP 3 {[NP 1 VP 1 NP 3 VP 2 NP 2 ;<br />

VP cate = vb0,…],...}<br />

VP→vendeu*{[ 把 , 卖 给 了 ;∅]} (3)<br />

In Production (3), the discontinuous constituents of


298<br />

VP from the source rule are represented by VP 1 and<br />

VP 2 in the target rule, where the “superscripts” are<br />

added to indicate the pairing to VP in the target component.<br />

The corresponding translation constituents in<br />

the lexicalized production are separated by commas<br />

representing the discontinuity between constituents<br />

“ 把 ” and “ 卖 给 了 ” in the target translation. During the<br />

rewriting phase, the corresponding constituents will be<br />

used to replace the syntactic symbols in the pattern rule.<br />

2 Definitions and Features<br />

Representation<br />

2.1 Definitions<br />

Let L be a context-free language defined over the terminal<br />

symbol V T which is generated by the contextfree<br />

grammar G using the non-terminal symbol V N disjointed<br />

with V T , the starting symbol S, and the productions<br />

of the form A → w where A is in V N and w in (V N<br />

∪V T )*. Let Z be a set of integers, with each nonterminal<br />

symbol in V N assigned to an integer, Γ(V N ) =<br />

{W ω |W∈V N , ω∈Z}. The elements of Γ(V N ) are indexed<br />

non-terminal symbols. Then extend the rule to<br />

include the set of terminal symbols V T’ as the translation<br />

in the target language, disjoint from V T ,<br />

(V T ∩V T’ =∅). Let R={r 1 , …, r n | r i ∈ (Γ(V N )∪V T’ ), 1≤i<br />

≤n} be a finite set of rules and C={c 1 , …, c m } be a finite<br />

set of constraints over the associated features of<br />

(Γ(V N )∪V T ), where the features of non-terminal Γ(V N ),<br />

the syntactic symbols, are inherited from the designated<br />

head element during rule reduction. A target rule<br />

is defined as a pair [r∈R * , c∈C * ] in γ, where γ =<br />

R*×C* has the form [r, c]. Then, define ψ(γ i ) to denote<br />

the number of conjunct features being considered in<br />

the associated constraint to determine the degree of<br />

generalization for a constraint. So, the rules γ i and γ j<br />

are orderable, γ i ≺ γ j , if ψ(γ i )≥ψ(γ j ) (or γ i γ j , if<br />

ψ(γ i )


WONG Fai ( 黄 辉 ) et al:<strong>Machine</strong> <strong>Translation</strong> <strong>Using</strong> <strong>Constraint</strong>-<strong>Based</strong> … 299<br />

Table 1 CSG productions<br />

S → NP 1 VP* NP 2 PP {[NP 1 VP 1 NP 3 VP 2 NP 2 ; VP cate =vb1, VP s:sem =NP 1sem ,<br />

NP 3 VP io:sem =NP 2sem , VP o:sem = NP 3sem ], [NP 1 VP NP 3 NP 2 ;<br />

VP cate = vb0, VP s:sem =NP 1sem , VP io:sem =NP 2sem , VP o:sem =<br />

NP 3sem ]}<br />

VP → v {[v ; ∅]} (5)<br />

NP → det NP* {[NP ; ∅]} (6)<br />

(4)<br />

NP → num NP* {[num 本 NP; NP sem =SEM _book ]} (7)<br />

NP → n {[n; ∅]} (8)<br />

NP → pro {[pro; ∅]} (9)<br />

PP → p {[p; ∅]} (10)<br />

n → José {[ 若 泽 ; ∅]}| (11)<br />

livro<br />

{[ 书 ; ∅]}<br />

pro → ele {[ 他 ; ∅]} (12)<br />

v → deu {[ 给 了 ; ∅]} | (13)<br />

comprou<br />

{[ 向 , 买 了 ; ∅]}<br />

num → um {[ 一 ; ∅]} (14)<br />

p → a {∅} (15)<br />

det → o {∅} (16)<br />

Note: As with the head element designation in productions, the only symbol from the RHS of the production in the left hand side (LHS) will be the<br />

head element. Thus, no head mark “*” is given for such rules and we assume that “VP → v*” implies “VP → v”.<br />

determined target rules in Q. The translation of an input<br />

string s essentially consists of three steps. First, the<br />

input string is parsed using the source rules from the<br />

productions. Secondly, the link constraints are propagated<br />

from the source rule to the target component to<br />

determine and build a target derivation sequence. Finally,<br />

the translation of the input string is generated<br />

from the target derivation sequence.<br />

2.2 Features representation<br />

In the CSG, linguistic entities are modeled as feature<br />

structures which give patterns additional power for describing<br />

gender, number, semantics, attributes, and<br />

number of arguments required by a verb, and so on.<br />

This information is encoded in the commonly used attribute<br />

value matrices (AVMs) attached to each of the<br />

lexical and syntactic symbols in the CSG. The attribute<br />

or feature names are typically written in upper case in<br />

the AVMs with the values written to the right of the<br />

feature name in lower case:<br />

⎡<br />

⎢<br />

⎢<br />

⎢<br />

⎢<br />

⎢<br />

⎢<br />

⎢<br />

⎢<br />

⎢<br />

⎢<br />

⎢<br />

⎢<br />

⎢⎣<br />

CAT<br />

SUBJ<br />

DOBJ<br />

IOBJ<br />

TEN<br />

PER<br />

NUM<br />

vb0<br />

hum<br />

obj<br />

hum<br />

pp<br />

3rd<br />

sg<br />

⎤<br />

⎥<br />

⎥<br />

⎥<br />

⎥<br />

⎥<br />

⎥<br />

⎥<br />

⎥<br />

⎥<br />

⎥<br />

⎥<br />

⎥<br />

⎥⎦<br />

(17)<br />

This structure allows specification of syntactic dependencies<br />

such as agreement and subcategorization in<br />

the patterns. Unlike other unification-based grammars<br />

[12,15] , the unification is not completed but only the<br />

conditions that are explicitly expressed in the rule constraints<br />

are tested and unified. The unification process<br />

can be performed in a constant time. The use of feature<br />

constraints has to be restricted to maintain the parsing<br />

and generation algorithm efficiency, and especially to<br />

prevent generation of a large number of ambiguous<br />

structure candidates. The word selection in the target<br />

language can also be achieved by checking features.<br />

The parsing and generation algorithm propagates the<br />

features information to the reduced symbol from the


300<br />

designated head element in the pattern to realize the<br />

features inheritance. Features can either be put in the<br />

lexical dictionary isolated from the formalism to simplify<br />

construction of the analytical grammar or can be<br />

explicitly encoded in the preterminal rules as:<br />

pro→José:[CAT:pro,NUM:sg,GEN:masc,SEM:hum] {[ 若 泽 ;<br />

∅]};<br />

n→livro:[CAT:n,NUM:sg,GEN:masc,SEM:artifact+book]<br />

{[ 书 ; ∅]},<br />

where the features set is bracketed and separated by a<br />

semi-colon, and the name and the value of a feature are<br />

delimited by a colon to represent the feature pair.<br />

Another way to enhance the CSG formalism is to<br />

apply soft preferences instead of hard constraints in the<br />

features unification process. Two problems can arise.<br />

First, a single lexical item often has more than one<br />

combination of feature values in natural languages, i.e.,<br />

one word may have several translations according to<br />

the different senses and the pragmatic uses of the word,<br />

which is known as word senses disambiguation [18] .<br />

Secondly, the conventional feature unification method<br />

can only tell if the process is successful. For example,<br />

if relatively minor conditions fail during the unification,<br />

all the related candidates are rejected without any<br />

flexibility in choosing the next preferable or most<br />

probable candidate. The importance of conditions is<br />

associated by a weight with each feature structure. The<br />

matching features are then ranked according to their<br />

weights rather than the order of the lexical items expressed<br />

in grammars or dictionaries. In the current system,<br />

each symbol has an original weight, with the preference<br />

measurements when checking the feature constraints<br />

used to create a penalty to reduce the weight to<br />

the effective weight of those associated by features in a<br />

particular context. Features with the largest weight are<br />

chosen as the most preferable content.<br />

3 Parsing Algorithm<br />

The CSG formalism can be parsed by any CFG parsing<br />

algorithm including the Earley [19,20] and generalized<br />

LR algorithms [21,22] augmented to take into account the<br />

features’ constraints and the inference of the target<br />

structure. The parsing algorithm used for the formalism<br />

described here is based on the generalized LR algorithm<br />

developed for translation systems. Since the<br />

method uses a parsing table, it is considerably more efficient<br />

than Earley’s non-complied method which has<br />

Tsinghua Science and Technology, June 2006, 11(3): 295-306<br />

to compute a set of LR items at each parsing stage [23] .<br />

The generalized LR algorithm was first introduced by<br />

Tomita for parsing the augmented context-free grammar<br />

that can ingeniously handle non-determinism and<br />

ambiguity through the use of a graph-structured stack<br />

while retaining much of the advantages of the standard<br />

LR parsing, especially when the grammar is close to<br />

the LR grammars. The method is quite suitable for<br />

parsing natural languages with grammatical features<br />

such as prepositional phrase attachments which are inherently<br />

ambiguous.<br />

The generalized LR algorithm actually is a variation<br />

of standard LR parsing which uses a shift-reduce approach<br />

by using an extended LR parsing table to guide<br />

its actions by allowing multiple action entries such as<br />

shift/reduce and reduce/reduce to handle nondeterministic<br />

parsing with pseudo-parallelism. For this formalism,<br />

the parsing table was extended by including the<br />

features’ constraints and target rules into the actions<br />

table. The strategy is thus to parse the source rules of<br />

the CSG productions through the normal shift actions<br />

proposed by the parsing table, while checking the<br />

associated conditions to determine if the active<br />

reduction is valid depending on weather the working<br />

symbols in the patterns fulfill the features’ constraints.<br />

3.1 CSG parsing table<br />

Figure 1 shows an extended LR(1) parsing table for<br />

Productions (4)-(16) as constructed by using the LR<br />

table construction method [24] extended to consider the<br />

rule components of productions by associating the corresponding<br />

target rules and constraints. The parsing table<br />

includes a compact ACTION-GOTO table and a<br />

CONSTRAINT-RULE table. The ACTION-GOTO table<br />

is indexed by a state symbol s (row) and a symbol x<br />

∈V N ∪V T , including the end marker “⊥”. The entry<br />

ACTION[s, x] can be one of sn, rm, acc, or blank. sn<br />

denotes a shift action representing GOTO[s, x]=n<br />

which defines the next state for the parser; rm denotes<br />

a reduction using the m-th production located in the entry<br />

of CONSTRAINT-RULE in state s; acc denotes the<br />

accept action and blank indicates a parsing error. The<br />

CONSTRAINT-RULE table is indexed by the state<br />

symbol s (row) and the number of productions m that<br />

may be applied for reduction in state s. The entry<br />

CONSTRAINT-RULE[s, m] consists of a set of productions<br />

together with the target rules and features


WONG Fai ( 黄 辉 ) et al:<strong>Machine</strong> <strong>Translation</strong> <strong>Using</strong> <strong>Constraint</strong>-<strong>Based</strong> … 301<br />

State<br />

ACTIONs/GOTOs<br />

Reduced rules<br />

pro num n v det p NP VP PP S ⊥ o a um ele José livro deu comprou constraints/target rules<br />

0 s8 s9 s1 s11 s7 s6 s5 s2 s1 s4 s3<br />

1 r1 (1) pro → ele {[ 他 ; ∅]}<br />

2 r1 (1) num → um<br />

3 r1 (1) n → livro {[ 书 ; ∅]}<br />

4 r1 (1) n → José {[ 若 泽 ; ∅]}|<br />

5 r1 (1) det → o<br />

6 acc<br />

7 s14 s15 s12 s13<br />

8 r1 (1) NP → pro<br />

9 s8 s9 s1 s11 s16 s5 s2 s1 s4 s3<br />

10 r1 (1) NP → n<br />

11 s8 s9 s1 s11 s17 s5 s2 s1 s4 s3<br />

12 r1 (1) v → deu {[ 给 了 ; ∅]}<br />

13 r1 (1) v → comprou {[ 向 , 买 了 ; ∅]}<br />

14 r1 (1) VP → v<br />

15 s8 s9 s1 s11 s18 s5 s2 s1 s4 s3<br />

16 r1 (1) NP → num NP*<br />

17 r1 (1) NP → det NP* {[NP; ∅]}<br />

18 s21 s20 s19<br />

19 r1 (1) p → a<br />

20 s8 s9 s1 s11 s22 s5 s2 s1 s4 s3<br />

21 r1 (1) PP → p<br />

22 r1<br />

(1) S → NP 1 VP* NP 2 PP NP 3<br />

{[...]}<br />

Fig. 1 Extended LR(1) parsing table<br />

constraints that are used for validating if the active<br />

parsing node can be reduced and to identify the corresponding<br />

target generative rule for the reduced<br />

production.<br />

For simplicity, the productions used for building the<br />

parsing table are deterministic so no conflicting actions<br />

such as shift/reduce and reduce/reduce, appear in the<br />

constructed parsing table.<br />

3.2 CSG parser<br />

In the parsing process, the algorithm maintains a number<br />

of parsing processes in parallel, each of which<br />

represents an individual parsed result to handle nondeterministic<br />

cases. Each process has a stack and scans<br />

the input string from left-to-right. Actually, each process<br />

does not maintain its own separate stack, but the<br />

multiple stacks are represented by using a single directed<br />

acyclic graph, a graph-structured stack, for parsing<br />

multiple possible structures. The process has two<br />

major components, shift(i) and reduce(i), which are<br />

called at each position i=0, 1, …, n in an input string I<br />

= x 1 x 2 …x n . The shift(i) process shifts the top stack vertex<br />

v on x i from its current state s to some successor<br />

state s’ by creating a new leaf v’; establishes an edge<br />

from v’ to the top stack v; and makes v’ the new top<br />

stack vertex. The reduce(i) process executes a reduction<br />

action on production p by following the chain of<br />

parent links down from the top stack vertex v to the<br />

ancestor vertex from which the process began scanning<br />

for p, while removing intervening vertices off the stack.<br />

Every reduction action in reduce(i) has a set C of ordered<br />

constraints, c 1 ≺ …≺ c m , in the production, each<br />

of which is associated with a target rule that may be<br />

the probable corresponding target structure for the production,<br />

depending on whether the paired constraint is<br />

satisfied according to the features of the parsed string p.<br />

Before reduction, the constraints c j (1≤j≤m) are<br />

tested in order starting from the most specific one with<br />

the evaluation process stopping once a positive result


302<br />

is obtained from the evaluation. The corresponding<br />

target rule for the parsed string is determined and attached<br />

to the reduced syntactic symbol which will be<br />

used to rewrite the target translation in the generation<br />

PARSE (grammar, x 1 … x n )<br />

x n+1 ⇐ ⊥<br />

U i ⇐ ∅<br />

U 0 ⇐ v 0<br />

(0≤i≤n)<br />

for each terminal symbol x i<br />

P ⇐ ∅<br />

for each node v ∈ U i−1<br />

P ⇐ P∪v<br />

(1≤i≤n)<br />

if ACTION[STATE(v), x i ] = “shift s’”, SHIFT(v, s’)<br />

for each “reduce p” ∈ ACTION[STATE(v), x i ],<br />

REDUCE(v, p)<br />

if “acc” ∈ ACTION[STATE(v), x i ], accept<br />

if U i = ∅, reject<br />

SHIFT(v, s)<br />

if v’ ∈ U i s.t. STATE(v’)=s and ANCESTOR(v’, 1)=v and<br />

else<br />

state transition δ(v, x)=v’<br />

do nothing<br />

create a new node v’<br />

s.t. STATE(v’)=s and ANCESTOR(v’, 1)=v and state<br />

transition δ(v, x)=v’<br />

U i ⇐ U i ∪v’<br />

UNIFY(v, p)<br />

for “constraint c j ” ∈ CONSTRAINT(STATE(v)) (1≤j≤<br />

m, c 1 ≺ …≺ c m )<br />

if ξ(c j , p)= “true”<br />

TARGET(v) ⇐ j<br />

return “success”<br />

(ξ(∅, p)= “true”)<br />

Tsinghua Science and Technology, June 2006, 11(3): 295-306<br />

phase. At the same time, the features information is inherited<br />

from the designated head production element.<br />

The parsing algorithm for the CSG formalism is given<br />

in Fig. 2.<br />

REDUCE(v, p)<br />

for each possible reduced parent v 1 ’ ∈ ANCESTOR(v,<br />

RHS(p))<br />

if UNIFY(v, p)= “success”<br />

s” ⇐ GOTO(v 1 ’, LHS(p))<br />

if node v” ∈ U i−1 s.t. STATE(v”)=s”<br />

else<br />

if δ(v 1 ’, LHS(p))=v”<br />

else<br />

do nothing<br />

if node v 2 ’ ∈ ANCESTOR(v”, 1)<br />

else<br />

let v c ” s.t. ANCESTOR(v c ”, 1)=v 1 ’ and<br />

STATE(v c ”)=s”<br />

for each “reduce p” ∈ ACTION[STATE(v c ”), x i ],<br />

REDUCE(v c ”, p)<br />

if v” ∈ P<br />

else<br />

let v c ” s.t. ANCESTOR(v c ”, 1)=v 1 ’ and<br />

STATE(v c ”)=s”<br />

for each “reduce p” ∈ ACTION[STATE(v c ”), x i ],<br />

REDUCE(v c ”, p)<br />

create a new node v n<br />

s.t. STATE(v n )=s” and ANCESTOR(v n , 1)=v 1 ’ and<br />

state transition δ (v n , x)=v 1 ’<br />

U i−1 ⇐ U i−1 ∪v n<br />

current reduction failed<br />

Fig. 2 Modified generalized LR parsing algorithm<br />

The parser is a function of two arguments PARSE<br />

(grammar, x 1 … x n ), where the grammar is provided as<br />

a parsing table. The parser calls the functions SHIFT(v,<br />

s) and REDUCE(v, p) to process the shifting and rule<br />

reductions as described. The UNIFY(v, p) function is<br />

called for every possible reduction in REDUCE(v, p)<br />

to verify the legal reduction and select the target rule<br />

for the source structure for synchronization. The<br />

function TARGET(v) is called after unification passed<br />

to determine the j-th target rule for correspondence.<br />

4 Application to Portuguese to<br />

Chinese <strong>Machine</strong> <strong>Translation</strong><br />

4.1 Portuguese to Chinese translation system<br />

The formalism described in Section 3 was used to<br />

develop a Portuguese to Chinese translation system.<br />

The system is a transfer-based translation system using<br />

the formalism of the CSG as its analytical grammar.<br />

Unlike other transfer-based translation systems


WONG Fai ( 黄 辉 ) et al:<strong>Machine</strong> <strong>Translation</strong> <strong>Using</strong> <strong>Constraint</strong>-<strong>Based</strong> … 303<br />

that individually complete the major components of<br />

analysis, transfer, and generation in the pipeline using<br />

different sets of representation rules for structure<br />

analysis and transformation [25] , the present system uses<br />

only a single CSG set for the translation task. Since the<br />

structures of parallel languages are synchronized in the<br />

formalism, their structural differences are also captured<br />

and described by the grammar. Hence, translation of an<br />

input text essentially involves three steps. First, the<br />

structure of an input sentence s is analyzed using the<br />

source component’s rules from the CSG productions<br />

by the augmented generalized LR parsing algorithm.<br />

Secondly, the link constraints determined during the<br />

rule reduction process are propagated to the corresponding<br />

target rules R (as a selection of target rules)<br />

to construct a target derivation sequence Q. Finally, the<br />

derivation sequence Q is used to generate a translation<br />

of the input sentence s by referencing the set of generative<br />

rules R attached to the corresponding constituent<br />

nodes in the parsed tree.<br />

Figure 3 shows the architecture of the Portuguese to<br />

Chinese translation system, with the data flow in the<br />

translation process illustrated by the arrows. The input<br />

sentence is first analyzed by a morphological analyzer<br />

to restore the lexical item to its canonical form.<br />

Compared to English, Portuguese has a highly developed<br />

morphology and is relatively richer in inflectional<br />

variation. For example, each verb has at<br />

least 50 different conjugations according to distinctions<br />

such as number, person, voice, mood, and tense. The<br />

Portuguese morphological analyzer developed by<br />

Wong et al. [26] was used to analyze the input word<br />

and to retrieve a list of possible candidates that have<br />

different feature values including the part of speech. Ambiguous<br />

words are handled by the sense disambiguation<br />

module and the parser.<br />

Fig. 3 Portuguese to Chinese translation system architecture<br />

After analyzing the form of the lexical items and assigning<br />

a set of possible features to each word in the<br />

sentence, the next step is to determine the proper<br />

grammatical categories for the words using the<br />

Portuguese tagger [27] , which is a probabilistic tagging<br />

model smoothed by interpolating the probabilities of<br />

lexical features into the word probabilities to evaluate<br />

unknown words. As described in Section 3, soft preferences<br />

were used instead of hard constraints by associating<br />

a weight to each feature structure. During the<br />

translation process, no feature candidates are removed<br />

from the candidates list, but their weights are adjusted<br />

if the specific candidate cannot be hit in each analytical<br />

process. For example, in the tagging process, if the<br />

carrying grammar category of a feature does not match<br />

the tagged result, a penalty is given to reduce the<br />

weight so that the feature is given a lower priority.<br />

The search space during parsing is narrowed by using<br />

a word sense disambiguation module based on context<br />

collocation to select the most probable features for<br />

words and to put these features at the top of the features<br />

list. The parsing process will then use these as the<br />

default features for testing the constraint conditions defined<br />

in the CSG. The input sentence is then parsed to<br />

produce a parsed tree representing the source sentence<br />

structure, where each constituent unit of the representation<br />

structure is attached to the corresponding<br />

structure of the target language. For example, the


304<br />

parsed structure of the sentence “Ele deu um livro ao<br />

José. (He gave a book to José)” together with target<br />

correspondences is illustrated in Fig. 4. Figure 5 gives<br />

the final translation “ 他 给 了 若 泽 一 本 书 ” for the sentence<br />

after rendering the generation process based on<br />

the synchronous structures produced from the parser.<br />

Fig. 4 Parsed structure of the sentence “Ele deu um<br />

livro ao José (He gave a book to José)” with the target<br />

correspondences<br />

Fig. 5 Target translation after processing by the generation<br />

module<br />

4.2 Other applications<br />

The CSG can be used to simulate other synchronous<br />

models such as ITG [8] and independent rewriting systems<br />

by maintaining only one rule in the target component<br />

and relaxing the associated target rule constraints.<br />

In this case, the CSG model can be used to extract linguistic<br />

information from parallel texts, including the<br />

discovery of hidden structures in texts from different<br />

languages, by learning from parallel corpora. Collocation<br />

extraction and expression, for example, are quite<br />

important for developing translation systems where<br />

collocation data can contribute to improved performance<br />

in almost every processing component. Collocations,<br />

like idioms, belong to the set of phrases in a language<br />

that describe habitual word combinations which<br />

have particular meanings relative to the base. While<br />

Tsinghua Science and Technology, June 2006, 11(3): 295-306<br />

they are often realized idiomatically, collocational<br />

meanings can also be found in regular compositional<br />

phrases. There is a clear consensus in the literature that<br />

collocation data are very useful for parsing, word sense<br />

disambiguation, text generation, and other applications,<br />

especially for machine translation. Collocational data<br />

expressions can be easily achieved through the CSG<br />

formalism, since they can be incorporated as part of<br />

the analytical grammars for parsing during analysis<br />

and translation.<br />

Recently, there has been an increasing interest in<br />

controlled languages (an offspring of machine translation),<br />

culminating in a series of CLAW workshops on<br />

controlled language applications [28] . These have given<br />

impetus to both monolingual and multilingual guidelines<br />

and applications using controlled language for<br />

many different languages. Controlled languages are<br />

subsets of natural languages whose grammars and dictionaries<br />

have been restricted to reduce or eliminate<br />

both ambiguity and complexity. In a transfer-based<br />

system, controlled translation involves three processing<br />

phases that control the input text, the transfer routines,<br />

and the generation component on the target side.<br />

An alternative way to facilitate the controlled translation<br />

process is to limit the language grammars through<br />

the modeling of parallel (synchronous) structures for<br />

the language pairs, e.g., by using the CSG formalism.<br />

The generalization of source rules in the CSG allows<br />

flexible description of the common (universal) structure<br />

of natural languages, which greatly reduces the<br />

grammar size. With the CSG formalism, the output<br />

translation can be easily controlled with the constraints<br />

of the corresponding target component according to the<br />

input text features and the contextual information.<br />

5 Conclusions<br />

This paper describes a synchronous grammar based on<br />

the syntax of context-free grammar, called the CSG.<br />

The CSG source components are generalized to represent<br />

the common structures of languages. Unlike in<br />

other synchronous grammars, each source rule is associated<br />

with a set of target productions, where each target<br />

rule is connected with a constraint over the source<br />

pattern features. The set of target rules are grouped into


WONG Fai ( 黄 辉 ) et al:<strong>Machine</strong> <strong>Translation</strong> <strong>Using</strong> <strong>Constraint</strong>-<strong>Based</strong> … 305<br />

a vector ordered by the specificity of the constraints.<br />

The objective of this formalism is to allow parsing and<br />

generating algorithms to infer different possible translation<br />

equivalences for an input sentence according to<br />

the linguistic features. A modified generalized LR<br />

parsing algorithm was adapted to the parsing of<br />

this formalism to analyze the syntactic structure<br />

of Portuguese in a machine translation system. The<br />

CSG was then used to construct a machine translation<br />

system for Portuguese to Chinese translation. Other<br />

potential applications of the CSG include parallel<br />

text modeling, identification of collocation data<br />

expressions, and controlled translation.<br />

References<br />

[1] Ranbow O, Satta G. <strong>Synchronous</strong> models of language. In:<br />

Proceedings of 34th Annual Meeting of the Association for<br />

Computational Linguistics. University of California, Santa<br />

Cruz, California, USA, 1996: 116-123.<br />

[2] Lewis P, Stearns R. Syntax-directed transduction. Journal<br />

of the Association for Computing <strong>Machine</strong>ry, 1968, 15(3):<br />

465-488.<br />

[3] Shieber S, Schabes Y. <strong>Synchronous</strong> tree adjoining grammar.<br />

In: Proceedings of the 13th International Conference<br />

on Computational Linguistic. Helsinki, 1990.<br />

[4] Seki H, Matsumura T, Fujii M, Kasami T. On multiple<br />

context-free grammars. Theoretical Computer Science,<br />

1991, 88(2): 191-229.<br />

[5] Melamed I D. Multitext grammars and synchronous parsers.<br />

In: Proceedings of NAACL/HLT. Edmonton, 2003:<br />

79-86.<br />

[6] Wong F, Hu D C, Mao Y H, Dong M C. A flexible example<br />

annotation schema: <strong>Translation</strong> corresponding tree representation.<br />

In: Proceedings of the 20th International<br />

Conference on Computational Linguistics. Geneva,<br />

Switzerland, 2004: 1079-1085.<br />

[7] Wu D. <strong>Grammar</strong>less extraction of phrasal translation examples<br />

from parallel texts. In: Proceedings of TMI-95,<br />

Sixth International Conference on Theoretical and Methodological<br />

Issues in <strong>Machine</strong> <strong>Translation</strong>, vol. 2. Leuven,<br />

Belgium, 1995: 354-372.<br />

[8] Wu D. Stochastic inversion transduction grammars and bilingual<br />

parsing of parallel corpora. Computational Linguistics,<br />

1997, 23(3): 377-404.<br />

[9] Aho A, Ullman J. Syntax directed translations and the<br />

pushdown assembler. Journal of Computer and System<br />

Sciences, 1969, 3: 37-56.<br />

[10] Melamed I D, Satta G, Wellington B. Generalized multitext<br />

grammars. In: Proceedings of 42nd Annual Meeting of the<br />

Association for Computational Linguistics. Barcelona,<br />

Spain, 2004: 661-668.<br />

[11] Kaplan R, Bresnan J. Lexical-functional grammar: A formal<br />

system for grammatical representation. In: Bresnan J,<br />

ed. The Mental Representation of Grammatical Relations.<br />

Cambridge, Mass.: MIT Press, 1982: 173-281.<br />

[12] Kaplan R. The formal architecture of lexical-functional<br />

grammar. Information Science and Engineering, 1989, 5:<br />

305-322.<br />

[13] Bresnan J. Lexical-Functional Syntax. Oxford: Blackwell,<br />

2001.<br />

[14] Pollard C, Sag I. Information-<strong>Based</strong> Syntax and Semantics.<br />

Stanford: CSLI, 1989.<br />

[15] Pollard C, Sag I. Head-Driven Phrase Structure <strong>Grammar</strong>.<br />

Chicago: University of Chicago Press, 1994.<br />

[16] Wong F, Mao Y H, Dong Q F, Qi Y H. Automatic<br />

translation: Overcome the barriers between European and<br />

Chinese languages. In: Proceedings (CD Version) of First<br />

International UNL Open Conference. Suzhou, China, 2001.<br />

[17] Wong F, Mao Y H. Framework of electronic dictionary<br />

system for Chinese and Romance languages. Automatique<br />

des Langues (TAL): Les Dictionnaires Électroniques, 2003,<br />

44(2): 225-245.<br />

[18] Ide N, Veronis J. Word sense disambiguation: The state of<br />

the art. Computational Linguistics, 1998, 24(1): 1-41.<br />

[19] Earley J. An efficient context-free parsing algorithm [Dissertation].<br />

Computer Science Department, Carnegie-<br />

Mellon University, 1968.<br />

[20] Earley J. An efficient context-free parsing algorithm.<br />

CACM, 1970, 13(2): 94-102.<br />

[21] Tomita M. An efficiency augmented context-free parsing algorithm.<br />

Computational Linguistics, 1987, 13(1-2): 31-46.<br />

[22] Rekers J. Parser generation for interactive environments.<br />

[Dissertation]. University of Amsterdam, 1992.<br />

[23] Tomita M. Generalized LR Parsing. Carnegie-Mellon University:<br />

Kluwer Academic Publishers, 1991.<br />

[24] Aho A V, Sethi R, Ullman J D. Compiler: Principles, Techniques<br />

and Tools. Massachusetts: Addison-Wesley, 1986.<br />

[25] Hutchins W J, Somers H L. An Introduction to <strong>Machine</strong><br />

<strong>Translation</strong>. New York: Academic Press, 1992.<br />

[26] Wong F, Dong M C, Li Y P, Mao Y H. Semantic and morphological<br />

analysis and mouse tracking technology


306<br />

in translation tool. In: Proceedings of Computational<br />

Methods in Engineering and Science (EPMESC IX).<br />

Macao, 2003: 525-532.<br />

[27] Wong F, Chao S, Hu D C, Mao Y H. Interpolated probabilistic<br />

tagging model optimized with genetic algorithm. In:<br />

Proceedings of the 3rd International Conference on <strong>Machine</strong><br />

Learning and Cybernetics. Shanghai, China, 2004:<br />

2569-2574.<br />

Tsinghua Science and Technology, June 2006, 11(3): 295-306<br />

[28] Carl M. Data-assisted controlled translation. In: Joint Conference<br />

Combining the 8th International Workshop of the<br />

European Association for <strong>Machine</strong> <strong>Translation</strong> and the 4th<br />

Controlled Language Applications Workshop (EAMT-<br />

CLAW 03). Dublin City, 2003: 16-24.<br />

Tsinghua Betters Ties with MIT<br />

BEIJING, Feb.18, 2006—Tsinghua University, famed as the “cradle of engineers,” is often dubbed “Chinese MIT<br />

(Massachusetts Institute of Technology)” on the Chinese mainland. The recent visit of Dr. Susan Hockfield, president<br />

of MIT, to Tsinghua University brought the relationship between the two universities even closer.<br />

“Universities must foster many new kinds of collaboration,” Hockfield said in her speech at Tsinghua. “We must<br />

work across the borders between universities and industry, between academic disciplines, between institutions, and<br />

between nations.”<br />

Tsinghua-MIT Beijing Urban Design Joint Studio has been one of the co-operative programs between Tsinghua<br />

and MIT. The idea was mooted in 1979 and turned into reality six years later when the first joint studio opened in<br />

Beijing. The joint studio has been held every two years, ever since. In each studio, directed by two to three professors<br />

from both universities, some 20 MIT graduates together with some 10 Tsinghua counterparts do field study,<br />

urban design and joint review in selected places in Beijing for a month. Tsinghua and MIT will hold the 10th joint<br />

studio between June and July, 2006. And an exhibition of the research fruits yielded by participating students will<br />

be held.<br />

Over the years, Tsinghua and MIT have started other collaborative programs. In 1996, the School of Economics<br />

and Management of Tsinghua began a partnership with the MIT Sloan School of Management, developing programmes<br />

to prepare China’s management for business in the global arena. Since then, more than 40 faculty members<br />

have visited MIT Sloan for at least half a year for teaching and research experience. The joint International<br />

Master’s in Business Administration (IMBA) have attracted applicants not only from China, but also from other<br />

countries. In fact, 26% of the students in the program are international students.<br />

In the same co-operative spirit, the Tsinghua China Finance Research Centre was founded in July 2002. Since<br />

then, more than 200 students have received training through various seminars at the centre.<br />

“Universities will need to learn from each other and to draw on best practices in education and research wherever<br />

they originate,” Hockfield said. She is the first university president to be given an honourary doctorate by Tsinghua<br />

University in its 95-year history.<br />

(Source: China Daily, http://news.xinhuanet.com/english/2006-02/18/content_4196376.htm)<br />

(http://news.tsinghua.edu.cn, partly selected)

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!