Machine Translation Using Constraint-Based Synchronous Grammar
Machine Translation Using Constraint-Based Synchronous Grammar
Machine Translation Using Constraint-Based Synchronous Grammar
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
TSINGHUA SCIENCE AND TECHNOLOGY<br />
ISSN 1007-0214 06/16 pp295-306<br />
Volume 11, Number 3, June 2006<br />
<strong>Machine</strong> <strong>Translation</strong> <strong>Using</strong> <strong>Constraint</strong>-<strong>Based</strong><br />
<strong>Synchronous</strong> <strong>Grammar</strong> *<br />
WONG Fai ( 黄 辉 ), DONG Mingchui ( 董 名 垂 ) ** , HU Dongcheng ( 胡 东 成 )<br />
Department of Automation, Tsinghua University, Beijing 100084, China<br />
Abstract: A synchronous grammar based on the formalism of context-free grammar was developed by generalizing<br />
the first component of production that models the source text. Unlike other synchronous grammars,<br />
the grammar allows multiple target productions to be associated to a single production rule which can be<br />
used to guide a parser to infer different possible translational equivalences for a recognized input string according<br />
to the feature constraints of symbols in the pattern. An extended generalized LR algorithm was<br />
adapted to the parsing of the proposed formalism to analyze the syntactic structure of a language. The<br />
grammar was used as the basis for building a machine translation system for Portuguese to Chinese translation.<br />
The empirical results show that the grammar is more expressive when modeling the translational<br />
equivalences of parallel texts for machine translation and grammar rewriting applications.<br />
Key words: constraint-based synchronous grammar; machine translation; Portuguese to Chinese<br />
translation<br />
Introduction<br />
In machine translation, analysis of the structural deviations<br />
of the languages pairs is key to transforming one<br />
language into another. This analysis requires a large<br />
number of structural transformations, both grammatically<br />
and conceptually. The problems of syntactic<br />
complexity and word sense ambiguity have been the<br />
major obstacles to high-quality translations. Several<br />
approaches have been proposed to overcome the obstacles<br />
and, hence, improve the quality of translation<br />
systems.<br />
Ranbow and Satta [1] stated that much of theoretical<br />
linguistics can be formulated in a very natural manner<br />
as correspondences between layers of representations.<br />
*<br />
**<br />
Received: 2004-11-10<br />
Supported by the Center of Scientific and Technological Research<br />
of University of Macao (CATIVO: 3678)<br />
To whom correspondence should be addressed.<br />
E-mail: dmc@sftw.umac.mo; Tel: 86-10-82902115<br />
Similarly, many problems in natural language processing,<br />
in particular language translation and grammar<br />
rewriting systems, can be expressed as transduction<br />
through the use of synchronous formalisms [2-6] . <strong>Synchronous</strong><br />
grammars are becoming more and more<br />
popular for the formal description of parallel texts representing<br />
translations. The underlying idea of such<br />
formalisms is to combine two generative devices<br />
through pairing their productions in such a way that<br />
links the right-hand side non-terminal symbols in the<br />
paired productions. However, such formalisms are less<br />
expressive and unable to express mutual translations<br />
that have different lengths and crossing dependencies.<br />
Moreover, synchronous formalisms do not deal with<br />
unification and feature structures, as in unificationbased<br />
formalisms, that give patterns additional power<br />
for describing constraints on features. For example,<br />
multiple context-free grammar [4] uses functions for<br />
the non-terminal symbols in the productions to further<br />
interpret the symbols in the target generation. Inversion<br />
transduction grammar (ITG) [7,8] has been proposed for
296<br />
simultaneously bracketing parallel corpora as a variant<br />
of syntax directed translation schema [9] . But these<br />
formalisms lack the expressive ability to describe discontinuous<br />
constituents in linguistic expressions. The<br />
generalized multitext grammar (GMTG) [5,10] maintains<br />
two sets of productions as components, one for each<br />
language, for modeling parallel texts. Although GMTG<br />
is more expressive and can be used to express independent<br />
rewriting, the lack of flexibility in the description<br />
constraints on the features associated with a nonterminal<br />
makes it difficult to develop a practical machine<br />
translation system.<br />
Lexical-semantic formalisms and unification-based<br />
grammar formalisms such as lexical function grammar<br />
(LFG) [11-13] use constituent structures (c-structure) and<br />
functional structures (f-structure) for semantic interpretation<br />
of words. Head-driven phrase structural grammar<br />
(HPSG) [14] expresses the units of linguistic information<br />
such as syntactic, semantic, as well as pragmatic<br />
information of utterances in typed feature structures<br />
[15] to facilitate computationally precise descriptions<br />
of natural language syntax and semantics. The<br />
descriptive power of these grammars may be defined<br />
specifically enough to give correct translations of individual<br />
usages of words and phrases. However, translation<br />
systems based on these formalisms are not possible<br />
without much more efficient parsing and disambiguation<br />
algorithms for these formalisms. Moreover,<br />
the construction of these formalisms needs deeper<br />
grammatical knowledge to develop grammatical rules.<br />
The integration of syntactical and semantic information<br />
requires the use of many features making the task<br />
of writing specific constraint grammars too difficult for<br />
a non-specialist. Another limitation of translation applications<br />
is that these formalisms are not ready to describe<br />
the relations between two-language pairs that researchers<br />
need for translation systems.<br />
This paper describes a variation of synchronous<br />
grammar, the constraint-based synchronous grammar<br />
(CSG), based on the formalism of context-free grammar.<br />
The CSG uses feature structures such as in unification-based<br />
grammar to generalize the sentential patterns<br />
in the source text while grouping the corresponding<br />
target rewriting rules for each production in a<br />
vector representing the possible translation patterns for<br />
the source production. The rule choice for target<br />
Tsinghua Science and Technology, June 2006, 11(3): 295-306<br />
generation is based on the feature’s constraints of the<br />
non-terminal symbols in the pattern. The motivation is<br />
three-fold. First, synchronous formalisms have been<br />
proposed to model parallel texts, since such algorithms<br />
can infer the synchronous structures of texts for two<br />
different languages through the grammatical representation<br />
of their syntactic deviations. These synchronous<br />
structures are quite useful for the analysis of language<br />
pairs in machine translation systems. Secondly, augmentation<br />
of the synchronous models with feature<br />
structures enhances the pattern recognition with additional<br />
power to describe gender, number, agreement,<br />
etc., since the descriptive power of unification-based<br />
grammars is considerably greater than that of the classical<br />
CFG [11] . Finally, the combining of the notational<br />
and intuitive simplicity of the CFG with the power of<br />
the feature constraints provides grammar formalism<br />
with a better descriptive power than the CFG to give<br />
more efficient parsing and generation controlled by the<br />
feature constraints of symbols to provide better word<br />
sense and syntax disambiguation.<br />
The paper describes the formalism of the CSG with<br />
a parsing algorithm for the formalism. The CSG is then<br />
used to construct a machine translation system for<br />
Portuguese to Chinese translation.<br />
1 <strong>Constraint</strong>-<strong>Based</strong> <strong>Synchronous</strong><br />
<strong>Grammar</strong>s<br />
The CSGs are defined using the syntax of context-free<br />
grammar for synchronous language pairs. The formalism<br />
consists of a set of generative productions with<br />
each production constructed by a pair of context free<br />
grammar with/without syntactic head and link constraints<br />
for the non-terminal symbols in the patterns.<br />
The first component (on the right hand side of the productions)<br />
represents the sentential patterns of the<br />
source language, the source component, while the second<br />
component represents the translation patterns in<br />
the target language, the target component. Unlike other<br />
synchronous formalisms, the production target component<br />
consists of one or more generative rules associated<br />
with zero or one or more controlled conditions<br />
based on the features of the non-terminal symbols of<br />
the source rule for describing the possible generation<br />
correspondences in the target translation. The source
WONG Fai ( 黄 辉 ) et al:<strong>Machine</strong> <strong>Translation</strong> <strong>Using</strong> <strong>Constraint</strong>-<strong>Based</strong> … 297<br />
components in the CSG are then generalized by handling<br />
the feature’s constraints in the target component,<br />
which also helps reduce the grammar size. For example,<br />
the following is one of the productions used in the<br />
Portuguese to Chinese translation system:<br />
S→NP 1 VP* NP 2 PP NP 3 {[NP 1 VP 1 NP 3 VP 2 NP 2 ;<br />
VP cate =vb1,VP s:sem =NP 1sem , VP io:sem =NP 2sem ,VP o:sem = NP 3sem ],<br />
[NP 1 VP NP 3 NP 2 ; VP cate = vb0,VP s:sem =NP 1sem ,<br />
VP io:sem =NP 2sem ]} (1)<br />
The production has two components besides the reduced<br />
syntactic symbol on the left hand side, the first<br />
modeling Portuguese and the second Chinese. The target<br />
component in this production consists of two generative<br />
rules maintained in a vector, each of which involves<br />
control conditions based on the symbol features<br />
from the source component, which are used as the selectional<br />
preferences during parsing. These constraints<br />
in the parsing/generation algorithm are used for inferring<br />
not only the input structure to decide which structures<br />
are possible or probable, but also the output text<br />
structure for the target translation. For example, the<br />
condition expression: VP cate =vb1, VP s:sem =NP 1sem ,<br />
VP io:sem =NP 2sem , VP o:sem =NP 3sem , specifies that the<br />
senses of the first, second, and third nouns (NPs) in the<br />
input strings match the sense of the subject and the direct<br />
and indirect objects governed by the verb, VP,<br />
with category type vb1. When the condition is satisfied,<br />
the source structure is successfully recognized and the<br />
corresponding structure of the target language, NP 1<br />
VP 1 NP 3 VP 2 NP 2 , is also determined.<br />
Non-terminal symbols in the source and target rules<br />
are linked if they are given the same index “subscripts”<br />
for multiple occurrences, such as NP in the production:<br />
S→NP 1 VP NP 2 PP NP 3 [NP 1 VP * NP 3 NP 2 ]. Symbols<br />
that appear only once in both the source and target<br />
rules, such as VP, are implicitly linked to give the synchronous<br />
rewriting. Linked non-terminal symbols must<br />
be derived from a sequence of synchronized pairs.<br />
Consider the production: S→NP 1 VP NP 2 PP NP 3 [NP 1<br />
VP * NP 3 NP 2 ], where the second NP (NP 2 ) in the<br />
source rule corresponds to the third NP (NP 2 ) in the<br />
target rule and the third NP (NP 3 ) in the source rule<br />
corresponds to the second NP (NP 3 ) in the target pattern,<br />
while the first NP (NP 1 ) and VP correspond to<br />
their counterparts in the source and target rules. The<br />
symbol marked by an “*” is designated as the head<br />
element in the pattern, with the features of the designated<br />
head symbol propagated to the reduced nonterminal<br />
symbol on the left side of the production rule<br />
to achieve the feature inheritance in the CSG formalism.<br />
The use of feature structures associated with nonterminal<br />
symbols will be explained further in Section<br />
2.2.<br />
When modeling natural languages, especially when<br />
processing language pairs, non-standard linguistic<br />
phenomena such as crossing dependencies, and discontinuous<br />
constituents must be carefully analyzed due to<br />
the structural differences between the two languages, in<br />
particular for languages from different families such as<br />
Portuguese and Chinese [16,17] . Linguistic expressions can<br />
vanish or appear in translation. For example, the preposition<br />
(PP) in the source rule does not show up in any of<br />
the target rules in Production (1). In contrast, Production<br />
(2) allows the Chinese characters “ 本 ” and “ 辆 ” to appear<br />
in the target rules as modifiers of the noun (NP) together<br />
with the quantifier (num) as the proper translation<br />
for the source text. This explicitly relaxes the synchronization<br />
constraint, so that the two components<br />
can be rewritten independently.<br />
NP→num NP * {[num 本 NP; NP sem =SEM _book ],<br />
[num 辆 NP; NP sem =SEM _automobile ]} (2)<br />
An important strength of the CSG is its expressive<br />
power to describe discontinuous constituents. Chinese<br />
frequently uses discontinuously distributed combination<br />
words. For example, take the sentence pair [“Ele<br />
vendeu-me todas as uvas. (He sell me all the grapes.)”,<br />
“ 他 把 所 有 的 葡 萄 卖 给 了 我 ”]. The Chinese preposition<br />
“ 把 ” and the verb “ 卖 给 了 ” should be paired with<br />
the Portuguese verb “vendeu” which causes a fan-out<br />
with the discontinuous constituent in the Chinese component.<br />
(The term fan-out describes a word whose<br />
translation is paired with discontinuous words in<br />
the target language, e.g., “vendeu[-pro] [NP]” in<br />
Portuguese gives the similar English translation “sell<br />
[NP] to [pro]”, so “vendeu”, in this case, corresponds<br />
to both “sell” and “to”.) The following fragment of<br />
CSG productions represents such relationships.<br />
S→NP 1 VP * NP 2 NP 3 {[NP 1 VP 1 NP 3 VP 2 NP 2 ;<br />
VP cate = vb0,…],...}<br />
VP→vendeu*{[ 把 , 卖 给 了 ;∅]} (3)<br />
In Production (3), the discontinuous constituents of
298<br />
VP from the source rule are represented by VP 1 and<br />
VP 2 in the target rule, where the “superscripts” are<br />
added to indicate the pairing to VP in the target component.<br />
The corresponding translation constituents in<br />
the lexicalized production are separated by commas<br />
representing the discontinuity between constituents<br />
“ 把 ” and “ 卖 给 了 ” in the target translation. During the<br />
rewriting phase, the corresponding constituents will be<br />
used to replace the syntactic symbols in the pattern rule.<br />
2 Definitions and Features<br />
Representation<br />
2.1 Definitions<br />
Let L be a context-free language defined over the terminal<br />
symbol V T which is generated by the contextfree<br />
grammar G using the non-terminal symbol V N disjointed<br />
with V T , the starting symbol S, and the productions<br />
of the form A → w where A is in V N and w in (V N<br />
∪V T )*. Let Z be a set of integers, with each nonterminal<br />
symbol in V N assigned to an integer, Γ(V N ) =<br />
{W ω |W∈V N , ω∈Z}. The elements of Γ(V N ) are indexed<br />
non-terminal symbols. Then extend the rule to<br />
include the set of terminal symbols V T’ as the translation<br />
in the target language, disjoint from V T ,<br />
(V T ∩V T’ =∅). Let R={r 1 , …, r n | r i ∈ (Γ(V N )∪V T’ ), 1≤i<br />
≤n} be a finite set of rules and C={c 1 , …, c m } be a finite<br />
set of constraints over the associated features of<br />
(Γ(V N )∪V T ), where the features of non-terminal Γ(V N ),<br />
the syntactic symbols, are inherited from the designated<br />
head element during rule reduction. A target rule<br />
is defined as a pair [r∈R * , c∈C * ] in γ, where γ =<br />
R*×C* has the form [r, c]. Then, define ψ(γ i ) to denote<br />
the number of conjunct features being considered in<br />
the associated constraint to determine the degree of<br />
generalization for a constraint. So, the rules γ i and γ j<br />
are orderable, γ i ≺ γ j , if ψ(γ i )≥ψ(γ j ) (or γ i γ j , if<br />
ψ(γ i )
WONG Fai ( 黄 辉 ) et al:<strong>Machine</strong> <strong>Translation</strong> <strong>Using</strong> <strong>Constraint</strong>-<strong>Based</strong> … 299<br />
Table 1 CSG productions<br />
S → NP 1 VP* NP 2 PP {[NP 1 VP 1 NP 3 VP 2 NP 2 ; VP cate =vb1, VP s:sem =NP 1sem ,<br />
NP 3 VP io:sem =NP 2sem , VP o:sem = NP 3sem ], [NP 1 VP NP 3 NP 2 ;<br />
VP cate = vb0, VP s:sem =NP 1sem , VP io:sem =NP 2sem , VP o:sem =<br />
NP 3sem ]}<br />
VP → v {[v ; ∅]} (5)<br />
NP → det NP* {[NP ; ∅]} (6)<br />
(4)<br />
NP → num NP* {[num 本 NP; NP sem =SEM _book ]} (7)<br />
NP → n {[n; ∅]} (8)<br />
NP → pro {[pro; ∅]} (9)<br />
PP → p {[p; ∅]} (10)<br />
n → José {[ 若 泽 ; ∅]}| (11)<br />
livro<br />
{[ 书 ; ∅]}<br />
pro → ele {[ 他 ; ∅]} (12)<br />
v → deu {[ 给 了 ; ∅]} | (13)<br />
comprou<br />
{[ 向 , 买 了 ; ∅]}<br />
num → um {[ 一 ; ∅]} (14)<br />
p → a {∅} (15)<br />
det → o {∅} (16)<br />
Note: As with the head element designation in productions, the only symbol from the RHS of the production in the left hand side (LHS) will be the<br />
head element. Thus, no head mark “*” is given for such rules and we assume that “VP → v*” implies “VP → v”.<br />
determined target rules in Q. The translation of an input<br />
string s essentially consists of three steps. First, the<br />
input string is parsed using the source rules from the<br />
productions. Secondly, the link constraints are propagated<br />
from the source rule to the target component to<br />
determine and build a target derivation sequence. Finally,<br />
the translation of the input string is generated<br />
from the target derivation sequence.<br />
2.2 Features representation<br />
In the CSG, linguistic entities are modeled as feature<br />
structures which give patterns additional power for describing<br />
gender, number, semantics, attributes, and<br />
number of arguments required by a verb, and so on.<br />
This information is encoded in the commonly used attribute<br />
value matrices (AVMs) attached to each of the<br />
lexical and syntactic symbols in the CSG. The attribute<br />
or feature names are typically written in upper case in<br />
the AVMs with the values written to the right of the<br />
feature name in lower case:<br />
⎡<br />
⎢<br />
⎢<br />
⎢<br />
⎢<br />
⎢<br />
⎢<br />
⎢<br />
⎢<br />
⎢<br />
⎢<br />
⎢<br />
⎢<br />
⎢⎣<br />
CAT<br />
SUBJ<br />
DOBJ<br />
IOBJ<br />
TEN<br />
PER<br />
NUM<br />
vb0<br />
hum<br />
obj<br />
hum<br />
pp<br />
3rd<br />
sg<br />
⎤<br />
⎥<br />
⎥<br />
⎥<br />
⎥<br />
⎥<br />
⎥<br />
⎥<br />
⎥<br />
⎥<br />
⎥<br />
⎥<br />
⎥<br />
⎥⎦<br />
(17)<br />
This structure allows specification of syntactic dependencies<br />
such as agreement and subcategorization in<br />
the patterns. Unlike other unification-based grammars<br />
[12,15] , the unification is not completed but only the<br />
conditions that are explicitly expressed in the rule constraints<br />
are tested and unified. The unification process<br />
can be performed in a constant time. The use of feature<br />
constraints has to be restricted to maintain the parsing<br />
and generation algorithm efficiency, and especially to<br />
prevent generation of a large number of ambiguous<br />
structure candidates. The word selection in the target<br />
language can also be achieved by checking features.<br />
The parsing and generation algorithm propagates the<br />
features information to the reduced symbol from the
300<br />
designated head element in the pattern to realize the<br />
features inheritance. Features can either be put in the<br />
lexical dictionary isolated from the formalism to simplify<br />
construction of the analytical grammar or can be<br />
explicitly encoded in the preterminal rules as:<br />
pro→José:[CAT:pro,NUM:sg,GEN:masc,SEM:hum] {[ 若 泽 ;<br />
∅]};<br />
n→livro:[CAT:n,NUM:sg,GEN:masc,SEM:artifact+book]<br />
{[ 书 ; ∅]},<br />
where the features set is bracketed and separated by a<br />
semi-colon, and the name and the value of a feature are<br />
delimited by a colon to represent the feature pair.<br />
Another way to enhance the CSG formalism is to<br />
apply soft preferences instead of hard constraints in the<br />
features unification process. Two problems can arise.<br />
First, a single lexical item often has more than one<br />
combination of feature values in natural languages, i.e.,<br />
one word may have several translations according to<br />
the different senses and the pragmatic uses of the word,<br />
which is known as word senses disambiguation [18] .<br />
Secondly, the conventional feature unification method<br />
can only tell if the process is successful. For example,<br />
if relatively minor conditions fail during the unification,<br />
all the related candidates are rejected without any<br />
flexibility in choosing the next preferable or most<br />
probable candidate. The importance of conditions is<br />
associated by a weight with each feature structure. The<br />
matching features are then ranked according to their<br />
weights rather than the order of the lexical items expressed<br />
in grammars or dictionaries. In the current system,<br />
each symbol has an original weight, with the preference<br />
measurements when checking the feature constraints<br />
used to create a penalty to reduce the weight to<br />
the effective weight of those associated by features in a<br />
particular context. Features with the largest weight are<br />
chosen as the most preferable content.<br />
3 Parsing Algorithm<br />
The CSG formalism can be parsed by any CFG parsing<br />
algorithm including the Earley [19,20] and generalized<br />
LR algorithms [21,22] augmented to take into account the<br />
features’ constraints and the inference of the target<br />
structure. The parsing algorithm used for the formalism<br />
described here is based on the generalized LR algorithm<br />
developed for translation systems. Since the<br />
method uses a parsing table, it is considerably more efficient<br />
than Earley’s non-complied method which has<br />
Tsinghua Science and Technology, June 2006, 11(3): 295-306<br />
to compute a set of LR items at each parsing stage [23] .<br />
The generalized LR algorithm was first introduced by<br />
Tomita for parsing the augmented context-free grammar<br />
that can ingeniously handle non-determinism and<br />
ambiguity through the use of a graph-structured stack<br />
while retaining much of the advantages of the standard<br />
LR parsing, especially when the grammar is close to<br />
the LR grammars. The method is quite suitable for<br />
parsing natural languages with grammatical features<br />
such as prepositional phrase attachments which are inherently<br />
ambiguous.<br />
The generalized LR algorithm actually is a variation<br />
of standard LR parsing which uses a shift-reduce approach<br />
by using an extended LR parsing table to guide<br />
its actions by allowing multiple action entries such as<br />
shift/reduce and reduce/reduce to handle nondeterministic<br />
parsing with pseudo-parallelism. For this formalism,<br />
the parsing table was extended by including the<br />
features’ constraints and target rules into the actions<br />
table. The strategy is thus to parse the source rules of<br />
the CSG productions through the normal shift actions<br />
proposed by the parsing table, while checking the<br />
associated conditions to determine if the active<br />
reduction is valid depending on weather the working<br />
symbols in the patterns fulfill the features’ constraints.<br />
3.1 CSG parsing table<br />
Figure 1 shows an extended LR(1) parsing table for<br />
Productions (4)-(16) as constructed by using the LR<br />
table construction method [24] extended to consider the<br />
rule components of productions by associating the corresponding<br />
target rules and constraints. The parsing table<br />
includes a compact ACTION-GOTO table and a<br />
CONSTRAINT-RULE table. The ACTION-GOTO table<br />
is indexed by a state symbol s (row) and a symbol x<br />
∈V N ∪V T , including the end marker “⊥”. The entry<br />
ACTION[s, x] can be one of sn, rm, acc, or blank. sn<br />
denotes a shift action representing GOTO[s, x]=n<br />
which defines the next state for the parser; rm denotes<br />
a reduction using the m-th production located in the entry<br />
of CONSTRAINT-RULE in state s; acc denotes the<br />
accept action and blank indicates a parsing error. The<br />
CONSTRAINT-RULE table is indexed by the state<br />
symbol s (row) and the number of productions m that<br />
may be applied for reduction in state s. The entry<br />
CONSTRAINT-RULE[s, m] consists of a set of productions<br />
together with the target rules and features
WONG Fai ( 黄 辉 ) et al:<strong>Machine</strong> <strong>Translation</strong> <strong>Using</strong> <strong>Constraint</strong>-<strong>Based</strong> … 301<br />
State<br />
ACTIONs/GOTOs<br />
Reduced rules<br />
pro num n v det p NP VP PP S ⊥ o a um ele José livro deu comprou constraints/target rules<br />
0 s8 s9 s1 s11 s7 s6 s5 s2 s1 s4 s3<br />
1 r1 (1) pro → ele {[ 他 ; ∅]}<br />
2 r1 (1) num → um<br />
3 r1 (1) n → livro {[ 书 ; ∅]}<br />
4 r1 (1) n → José {[ 若 泽 ; ∅]}|<br />
5 r1 (1) det → o<br />
6 acc<br />
7 s14 s15 s12 s13<br />
8 r1 (1) NP → pro<br />
9 s8 s9 s1 s11 s16 s5 s2 s1 s4 s3<br />
10 r1 (1) NP → n<br />
11 s8 s9 s1 s11 s17 s5 s2 s1 s4 s3<br />
12 r1 (1) v → deu {[ 给 了 ; ∅]}<br />
13 r1 (1) v → comprou {[ 向 , 买 了 ; ∅]}<br />
14 r1 (1) VP → v<br />
15 s8 s9 s1 s11 s18 s5 s2 s1 s4 s3<br />
16 r1 (1) NP → num NP*<br />
17 r1 (1) NP → det NP* {[NP; ∅]}<br />
18 s21 s20 s19<br />
19 r1 (1) p → a<br />
20 s8 s9 s1 s11 s22 s5 s2 s1 s4 s3<br />
21 r1 (1) PP → p<br />
22 r1<br />
(1) S → NP 1 VP* NP 2 PP NP 3<br />
{[...]}<br />
Fig. 1 Extended LR(1) parsing table<br />
constraints that are used for validating if the active<br />
parsing node can be reduced and to identify the corresponding<br />
target generative rule for the reduced<br />
production.<br />
For simplicity, the productions used for building the<br />
parsing table are deterministic so no conflicting actions<br />
such as shift/reduce and reduce/reduce, appear in the<br />
constructed parsing table.<br />
3.2 CSG parser<br />
In the parsing process, the algorithm maintains a number<br />
of parsing processes in parallel, each of which<br />
represents an individual parsed result to handle nondeterministic<br />
cases. Each process has a stack and scans<br />
the input string from left-to-right. Actually, each process<br />
does not maintain its own separate stack, but the<br />
multiple stacks are represented by using a single directed<br />
acyclic graph, a graph-structured stack, for parsing<br />
multiple possible structures. The process has two<br />
major components, shift(i) and reduce(i), which are<br />
called at each position i=0, 1, …, n in an input string I<br />
= x 1 x 2 …x n . The shift(i) process shifts the top stack vertex<br />
v on x i from its current state s to some successor<br />
state s’ by creating a new leaf v’; establishes an edge<br />
from v’ to the top stack v; and makes v’ the new top<br />
stack vertex. The reduce(i) process executes a reduction<br />
action on production p by following the chain of<br />
parent links down from the top stack vertex v to the<br />
ancestor vertex from which the process began scanning<br />
for p, while removing intervening vertices off the stack.<br />
Every reduction action in reduce(i) has a set C of ordered<br />
constraints, c 1 ≺ …≺ c m , in the production, each<br />
of which is associated with a target rule that may be<br />
the probable corresponding target structure for the production,<br />
depending on whether the paired constraint is<br />
satisfied according to the features of the parsed string p.<br />
Before reduction, the constraints c j (1≤j≤m) are<br />
tested in order starting from the most specific one with<br />
the evaluation process stopping once a positive result
302<br />
is obtained from the evaluation. The corresponding<br />
target rule for the parsed string is determined and attached<br />
to the reduced syntactic symbol which will be<br />
used to rewrite the target translation in the generation<br />
PARSE (grammar, x 1 … x n )<br />
x n+1 ⇐ ⊥<br />
U i ⇐ ∅<br />
U 0 ⇐ v 0<br />
(0≤i≤n)<br />
for each terminal symbol x i<br />
P ⇐ ∅<br />
for each node v ∈ U i−1<br />
P ⇐ P∪v<br />
(1≤i≤n)<br />
if ACTION[STATE(v), x i ] = “shift s’”, SHIFT(v, s’)<br />
for each “reduce p” ∈ ACTION[STATE(v), x i ],<br />
REDUCE(v, p)<br />
if “acc” ∈ ACTION[STATE(v), x i ], accept<br />
if U i = ∅, reject<br />
SHIFT(v, s)<br />
if v’ ∈ U i s.t. STATE(v’)=s and ANCESTOR(v’, 1)=v and<br />
else<br />
state transition δ(v, x)=v’<br />
do nothing<br />
create a new node v’<br />
s.t. STATE(v’)=s and ANCESTOR(v’, 1)=v and state<br />
transition δ(v, x)=v’<br />
U i ⇐ U i ∪v’<br />
UNIFY(v, p)<br />
for “constraint c j ” ∈ CONSTRAINT(STATE(v)) (1≤j≤<br />
m, c 1 ≺ …≺ c m )<br />
if ξ(c j , p)= “true”<br />
TARGET(v) ⇐ j<br />
return “success”<br />
(ξ(∅, p)= “true”)<br />
Tsinghua Science and Technology, June 2006, 11(3): 295-306<br />
phase. At the same time, the features information is inherited<br />
from the designated head production element.<br />
The parsing algorithm for the CSG formalism is given<br />
in Fig. 2.<br />
REDUCE(v, p)<br />
for each possible reduced parent v 1 ’ ∈ ANCESTOR(v,<br />
RHS(p))<br />
if UNIFY(v, p)= “success”<br />
s” ⇐ GOTO(v 1 ’, LHS(p))<br />
if node v” ∈ U i−1 s.t. STATE(v”)=s”<br />
else<br />
if δ(v 1 ’, LHS(p))=v”<br />
else<br />
do nothing<br />
if node v 2 ’ ∈ ANCESTOR(v”, 1)<br />
else<br />
let v c ” s.t. ANCESTOR(v c ”, 1)=v 1 ’ and<br />
STATE(v c ”)=s”<br />
for each “reduce p” ∈ ACTION[STATE(v c ”), x i ],<br />
REDUCE(v c ”, p)<br />
if v” ∈ P<br />
else<br />
let v c ” s.t. ANCESTOR(v c ”, 1)=v 1 ’ and<br />
STATE(v c ”)=s”<br />
for each “reduce p” ∈ ACTION[STATE(v c ”), x i ],<br />
REDUCE(v c ”, p)<br />
create a new node v n<br />
s.t. STATE(v n )=s” and ANCESTOR(v n , 1)=v 1 ’ and<br />
state transition δ (v n , x)=v 1 ’<br />
U i−1 ⇐ U i−1 ∪v n<br />
current reduction failed<br />
Fig. 2 Modified generalized LR parsing algorithm<br />
The parser is a function of two arguments PARSE<br />
(grammar, x 1 … x n ), where the grammar is provided as<br />
a parsing table. The parser calls the functions SHIFT(v,<br />
s) and REDUCE(v, p) to process the shifting and rule<br />
reductions as described. The UNIFY(v, p) function is<br />
called for every possible reduction in REDUCE(v, p)<br />
to verify the legal reduction and select the target rule<br />
for the source structure for synchronization. The<br />
function TARGET(v) is called after unification passed<br />
to determine the j-th target rule for correspondence.<br />
4 Application to Portuguese to<br />
Chinese <strong>Machine</strong> <strong>Translation</strong><br />
4.1 Portuguese to Chinese translation system<br />
The formalism described in Section 3 was used to<br />
develop a Portuguese to Chinese translation system.<br />
The system is a transfer-based translation system using<br />
the formalism of the CSG as its analytical grammar.<br />
Unlike other transfer-based translation systems
WONG Fai ( 黄 辉 ) et al:<strong>Machine</strong> <strong>Translation</strong> <strong>Using</strong> <strong>Constraint</strong>-<strong>Based</strong> … 303<br />
that individually complete the major components of<br />
analysis, transfer, and generation in the pipeline using<br />
different sets of representation rules for structure<br />
analysis and transformation [25] , the present system uses<br />
only a single CSG set for the translation task. Since the<br />
structures of parallel languages are synchronized in the<br />
formalism, their structural differences are also captured<br />
and described by the grammar. Hence, translation of an<br />
input text essentially involves three steps. First, the<br />
structure of an input sentence s is analyzed using the<br />
source component’s rules from the CSG productions<br />
by the augmented generalized LR parsing algorithm.<br />
Secondly, the link constraints determined during the<br />
rule reduction process are propagated to the corresponding<br />
target rules R (as a selection of target rules)<br />
to construct a target derivation sequence Q. Finally, the<br />
derivation sequence Q is used to generate a translation<br />
of the input sentence s by referencing the set of generative<br />
rules R attached to the corresponding constituent<br />
nodes in the parsed tree.<br />
Figure 3 shows the architecture of the Portuguese to<br />
Chinese translation system, with the data flow in the<br />
translation process illustrated by the arrows. The input<br />
sentence is first analyzed by a morphological analyzer<br />
to restore the lexical item to its canonical form.<br />
Compared to English, Portuguese has a highly developed<br />
morphology and is relatively richer in inflectional<br />
variation. For example, each verb has at<br />
least 50 different conjugations according to distinctions<br />
such as number, person, voice, mood, and tense. The<br />
Portuguese morphological analyzer developed by<br />
Wong et al. [26] was used to analyze the input word<br />
and to retrieve a list of possible candidates that have<br />
different feature values including the part of speech. Ambiguous<br />
words are handled by the sense disambiguation<br />
module and the parser.<br />
Fig. 3 Portuguese to Chinese translation system architecture<br />
After analyzing the form of the lexical items and assigning<br />
a set of possible features to each word in the<br />
sentence, the next step is to determine the proper<br />
grammatical categories for the words using the<br />
Portuguese tagger [27] , which is a probabilistic tagging<br />
model smoothed by interpolating the probabilities of<br />
lexical features into the word probabilities to evaluate<br />
unknown words. As described in Section 3, soft preferences<br />
were used instead of hard constraints by associating<br />
a weight to each feature structure. During the<br />
translation process, no feature candidates are removed<br />
from the candidates list, but their weights are adjusted<br />
if the specific candidate cannot be hit in each analytical<br />
process. For example, in the tagging process, if the<br />
carrying grammar category of a feature does not match<br />
the tagged result, a penalty is given to reduce the<br />
weight so that the feature is given a lower priority.<br />
The search space during parsing is narrowed by using<br />
a word sense disambiguation module based on context<br />
collocation to select the most probable features for<br />
words and to put these features at the top of the features<br />
list. The parsing process will then use these as the<br />
default features for testing the constraint conditions defined<br />
in the CSG. The input sentence is then parsed to<br />
produce a parsed tree representing the source sentence<br />
structure, where each constituent unit of the representation<br />
structure is attached to the corresponding<br />
structure of the target language. For example, the
304<br />
parsed structure of the sentence “Ele deu um livro ao<br />
José. (He gave a book to José)” together with target<br />
correspondences is illustrated in Fig. 4. Figure 5 gives<br />
the final translation “ 他 给 了 若 泽 一 本 书 ” for the sentence<br />
after rendering the generation process based on<br />
the synchronous structures produced from the parser.<br />
Fig. 4 Parsed structure of the sentence “Ele deu um<br />
livro ao José (He gave a book to José)” with the target<br />
correspondences<br />
Fig. 5 Target translation after processing by the generation<br />
module<br />
4.2 Other applications<br />
The CSG can be used to simulate other synchronous<br />
models such as ITG [8] and independent rewriting systems<br />
by maintaining only one rule in the target component<br />
and relaxing the associated target rule constraints.<br />
In this case, the CSG model can be used to extract linguistic<br />
information from parallel texts, including the<br />
discovery of hidden structures in texts from different<br />
languages, by learning from parallel corpora. Collocation<br />
extraction and expression, for example, are quite<br />
important for developing translation systems where<br />
collocation data can contribute to improved performance<br />
in almost every processing component. Collocations,<br />
like idioms, belong to the set of phrases in a language<br />
that describe habitual word combinations which<br />
have particular meanings relative to the base. While<br />
Tsinghua Science and Technology, June 2006, 11(3): 295-306<br />
they are often realized idiomatically, collocational<br />
meanings can also be found in regular compositional<br />
phrases. There is a clear consensus in the literature that<br />
collocation data are very useful for parsing, word sense<br />
disambiguation, text generation, and other applications,<br />
especially for machine translation. Collocational data<br />
expressions can be easily achieved through the CSG<br />
formalism, since they can be incorporated as part of<br />
the analytical grammars for parsing during analysis<br />
and translation.<br />
Recently, there has been an increasing interest in<br />
controlled languages (an offspring of machine translation),<br />
culminating in a series of CLAW workshops on<br />
controlled language applications [28] . These have given<br />
impetus to both monolingual and multilingual guidelines<br />
and applications using controlled language for<br />
many different languages. Controlled languages are<br />
subsets of natural languages whose grammars and dictionaries<br />
have been restricted to reduce or eliminate<br />
both ambiguity and complexity. In a transfer-based<br />
system, controlled translation involves three processing<br />
phases that control the input text, the transfer routines,<br />
and the generation component on the target side.<br />
An alternative way to facilitate the controlled translation<br />
process is to limit the language grammars through<br />
the modeling of parallel (synchronous) structures for<br />
the language pairs, e.g., by using the CSG formalism.<br />
The generalization of source rules in the CSG allows<br />
flexible description of the common (universal) structure<br />
of natural languages, which greatly reduces the<br />
grammar size. With the CSG formalism, the output<br />
translation can be easily controlled with the constraints<br />
of the corresponding target component according to the<br />
input text features and the contextual information.<br />
5 Conclusions<br />
This paper describes a synchronous grammar based on<br />
the syntax of context-free grammar, called the CSG.<br />
The CSG source components are generalized to represent<br />
the common structures of languages. Unlike in<br />
other synchronous grammars, each source rule is associated<br />
with a set of target productions, where each target<br />
rule is connected with a constraint over the source<br />
pattern features. The set of target rules are grouped into
WONG Fai ( 黄 辉 ) et al:<strong>Machine</strong> <strong>Translation</strong> <strong>Using</strong> <strong>Constraint</strong>-<strong>Based</strong> … 305<br />
a vector ordered by the specificity of the constraints.<br />
The objective of this formalism is to allow parsing and<br />
generating algorithms to infer different possible translation<br />
equivalences for an input sentence according to<br />
the linguistic features. A modified generalized LR<br />
parsing algorithm was adapted to the parsing of<br />
this formalism to analyze the syntactic structure<br />
of Portuguese in a machine translation system. The<br />
CSG was then used to construct a machine translation<br />
system for Portuguese to Chinese translation. Other<br />
potential applications of the CSG include parallel<br />
text modeling, identification of collocation data<br />
expressions, and controlled translation.<br />
References<br />
[1] Ranbow O, Satta G. <strong>Synchronous</strong> models of language. In:<br />
Proceedings of 34th Annual Meeting of the Association for<br />
Computational Linguistics. University of California, Santa<br />
Cruz, California, USA, 1996: 116-123.<br />
[2] Lewis P, Stearns R. Syntax-directed transduction. Journal<br />
of the Association for Computing <strong>Machine</strong>ry, 1968, 15(3):<br />
465-488.<br />
[3] Shieber S, Schabes Y. <strong>Synchronous</strong> tree adjoining grammar.<br />
In: Proceedings of the 13th International Conference<br />
on Computational Linguistic. Helsinki, 1990.<br />
[4] Seki H, Matsumura T, Fujii M, Kasami T. On multiple<br />
context-free grammars. Theoretical Computer Science,<br />
1991, 88(2): 191-229.<br />
[5] Melamed I D. Multitext grammars and synchronous parsers.<br />
In: Proceedings of NAACL/HLT. Edmonton, 2003:<br />
79-86.<br />
[6] Wong F, Hu D C, Mao Y H, Dong M C. A flexible example<br />
annotation schema: <strong>Translation</strong> corresponding tree representation.<br />
In: Proceedings of the 20th International<br />
Conference on Computational Linguistics. Geneva,<br />
Switzerland, 2004: 1079-1085.<br />
[7] Wu D. <strong>Grammar</strong>less extraction of phrasal translation examples<br />
from parallel texts. In: Proceedings of TMI-95,<br />
Sixth International Conference on Theoretical and Methodological<br />
Issues in <strong>Machine</strong> <strong>Translation</strong>, vol. 2. Leuven,<br />
Belgium, 1995: 354-372.<br />
[8] Wu D. Stochastic inversion transduction grammars and bilingual<br />
parsing of parallel corpora. Computational Linguistics,<br />
1997, 23(3): 377-404.<br />
[9] Aho A, Ullman J. Syntax directed translations and the<br />
pushdown assembler. Journal of Computer and System<br />
Sciences, 1969, 3: 37-56.<br />
[10] Melamed I D, Satta G, Wellington B. Generalized multitext<br />
grammars. In: Proceedings of 42nd Annual Meeting of the<br />
Association for Computational Linguistics. Barcelona,<br />
Spain, 2004: 661-668.<br />
[11] Kaplan R, Bresnan J. Lexical-functional grammar: A formal<br />
system for grammatical representation. In: Bresnan J,<br />
ed. The Mental Representation of Grammatical Relations.<br />
Cambridge, Mass.: MIT Press, 1982: 173-281.<br />
[12] Kaplan R. The formal architecture of lexical-functional<br />
grammar. Information Science and Engineering, 1989, 5:<br />
305-322.<br />
[13] Bresnan J. Lexical-Functional Syntax. Oxford: Blackwell,<br />
2001.<br />
[14] Pollard C, Sag I. Information-<strong>Based</strong> Syntax and Semantics.<br />
Stanford: CSLI, 1989.<br />
[15] Pollard C, Sag I. Head-Driven Phrase Structure <strong>Grammar</strong>.<br />
Chicago: University of Chicago Press, 1994.<br />
[16] Wong F, Mao Y H, Dong Q F, Qi Y H. Automatic<br />
translation: Overcome the barriers between European and<br />
Chinese languages. In: Proceedings (CD Version) of First<br />
International UNL Open Conference. Suzhou, China, 2001.<br />
[17] Wong F, Mao Y H. Framework of electronic dictionary<br />
system for Chinese and Romance languages. Automatique<br />
des Langues (TAL): Les Dictionnaires Électroniques, 2003,<br />
44(2): 225-245.<br />
[18] Ide N, Veronis J. Word sense disambiguation: The state of<br />
the art. Computational Linguistics, 1998, 24(1): 1-41.<br />
[19] Earley J. An efficient context-free parsing algorithm [Dissertation].<br />
Computer Science Department, Carnegie-<br />
Mellon University, 1968.<br />
[20] Earley J. An efficient context-free parsing algorithm.<br />
CACM, 1970, 13(2): 94-102.<br />
[21] Tomita M. An efficiency augmented context-free parsing algorithm.<br />
Computational Linguistics, 1987, 13(1-2): 31-46.<br />
[22] Rekers J. Parser generation for interactive environments.<br />
[Dissertation]. University of Amsterdam, 1992.<br />
[23] Tomita M. Generalized LR Parsing. Carnegie-Mellon University:<br />
Kluwer Academic Publishers, 1991.<br />
[24] Aho A V, Sethi R, Ullman J D. Compiler: Principles, Techniques<br />
and Tools. Massachusetts: Addison-Wesley, 1986.<br />
[25] Hutchins W J, Somers H L. An Introduction to <strong>Machine</strong><br />
<strong>Translation</strong>. New York: Academic Press, 1992.<br />
[26] Wong F, Dong M C, Li Y P, Mao Y H. Semantic and morphological<br />
analysis and mouse tracking technology
306<br />
in translation tool. In: Proceedings of Computational<br />
Methods in Engineering and Science (EPMESC IX).<br />
Macao, 2003: 525-532.<br />
[27] Wong F, Chao S, Hu D C, Mao Y H. Interpolated probabilistic<br />
tagging model optimized with genetic algorithm. In:<br />
Proceedings of the 3rd International Conference on <strong>Machine</strong><br />
Learning and Cybernetics. Shanghai, China, 2004:<br />
2569-2574.<br />
Tsinghua Science and Technology, June 2006, 11(3): 295-306<br />
[28] Carl M. Data-assisted controlled translation. In: Joint Conference<br />
Combining the 8th International Workshop of the<br />
European Association for <strong>Machine</strong> <strong>Translation</strong> and the 4th<br />
Controlled Language Applications Workshop (EAMT-<br />
CLAW 03). Dublin City, 2003: 16-24.<br />
Tsinghua Betters Ties with MIT<br />
BEIJING, Feb.18, 2006—Tsinghua University, famed as the “cradle of engineers,” is often dubbed “Chinese MIT<br />
(Massachusetts Institute of Technology)” on the Chinese mainland. The recent visit of Dr. Susan Hockfield, president<br />
of MIT, to Tsinghua University brought the relationship between the two universities even closer.<br />
“Universities must foster many new kinds of collaboration,” Hockfield said in her speech at Tsinghua. “We must<br />
work across the borders between universities and industry, between academic disciplines, between institutions, and<br />
between nations.”<br />
Tsinghua-MIT Beijing Urban Design Joint Studio has been one of the co-operative programs between Tsinghua<br />
and MIT. The idea was mooted in 1979 and turned into reality six years later when the first joint studio opened in<br />
Beijing. The joint studio has been held every two years, ever since. In each studio, directed by two to three professors<br />
from both universities, some 20 MIT graduates together with some 10 Tsinghua counterparts do field study,<br />
urban design and joint review in selected places in Beijing for a month. Tsinghua and MIT will hold the 10th joint<br />
studio between June and July, 2006. And an exhibition of the research fruits yielded by participating students will<br />
be held.<br />
Over the years, Tsinghua and MIT have started other collaborative programs. In 1996, the School of Economics<br />
and Management of Tsinghua began a partnership with the MIT Sloan School of Management, developing programmes<br />
to prepare China’s management for business in the global arena. Since then, more than 40 faculty members<br />
have visited MIT Sloan for at least half a year for teaching and research experience. The joint International<br />
Master’s in Business Administration (IMBA) have attracted applicants not only from China, but also from other<br />
countries. In fact, 26% of the students in the program are international students.<br />
In the same co-operative spirit, the Tsinghua China Finance Research Centre was founded in July 2002. Since<br />
then, more than 200 students have received training through various seminars at the centre.<br />
“Universities will need to learn from each other and to draw on best practices in education and research wherever<br />
they originate,” Hockfield said. She is the first university president to be given an honourary doctorate by Tsinghua<br />
University in its 95-year history.<br />
(Source: China Daily, http://news.xinhuanet.com/english/2006-02/18/content_4196376.htm)<br />
(http://news.tsinghua.edu.cn, partly selected)