Machine Translation Using Constraint-Based Synchronous Grammar

TSINGHUA SCIENCE AND TECHNOLOGY 

ISSN 1007-0214 06/16 pp295-306 

Volume 11, Number 3, June 2006 

Machine Translation Using Constraint-Based 

Synchronous Grammar * 

WONG Fai ( 黄辉 ), DONG Mingchui ( 董名垂 ) ** , HU Dongcheng ( 胡东成 ) 

Department of Automation, Tsinghua University, Beijing 100084, China 

Abstract: A synchronous grammar based on the formalism of context-free grammar was developed by generalizing 

the first component of production that models the source text. Unlike other synchronous grammars, 

the grammar allows multiple target productions to be associated to a single production rule which can be 

used to guide a parser to infer different possible translational equivalences for a recognized input string according 

to the feature constraints of symbols in the pattern. An extended generalized LR algorithm was 

adapted to the parsing of the proposed formalism to analyze the syntactic structure of a language. The 

grammar was used as the basis for building a machine translation system for Portuguese to Chinese translation. 

The empirical results show that the grammar is more expressive when modeling the translational 

equivalences of parallel texts for machine translation and grammar rewriting applications. 

Key words: constraint-based synchronous grammar; machine translation; Portuguese to Chinese 

translation 

Introduction 

In machine translation, analysis of the structural deviations 

of the languages pairs is key to transforming one 

language into another. This analysis requires a large 

number of structural transformations, both grammatically 

and conceptually. The problems of syntactic 

complexity and word sense ambiguity have been the 

major obstacles to high-quality translations. Several 

approaches have been proposed to overcome the obstacles 

and, hence, improve the quality of translation 

systems. 

Ranbow and Satta [1] stated that much of theoretical 

linguistics can be formulated in a very natural manner 

as correspondences between layers of representations. 

* 

** 

Received: 2004-11-10 

Supported by the Center of Scientific and Technological Research 

of University of Macao (CATIVO: 3678) 

To whom correspondence should be addressed. 

E-mail: dmc@sftw.umac.mo; Tel: 86-10-82902115 

Similarly, many problems in natural language processing, 

in particular language translation and grammar 

rewriting systems, can be expressed as transduction 

through the use of synchronous formalisms [2-6] . Synchronous 

grammars are becoming more and more 

popular for the formal description of parallel texts representing 

translations. The underlying idea of such 

formalisms is to combine two generative devices 

through pairing their productions in such a way that 

links the right-hand side non-terminal symbols in the 

paired productions. However, such formalisms are less 

expressive and unable to express mutual translations 

that have different lengths and crossing dependencies. 

Moreover, synchronous formalisms do not deal with 

unification and feature structures, as in unificationbased 

formalisms, that give patterns additional power 

for describing constraints on features. For example, 

multiple context-free grammar [4] uses functions for 

the non-terminal symbols in the productions to further 

interpret the symbols in the target generation. Inversion 

transduction grammar (ITG) [7,8] has been proposed for

296 

simultaneously bracketing parallel corpora as a variant 

of syntax directed translation schema [9] . But these 

formalisms lack the expressive ability to describe discontinuous 

constituents in linguistic expressions. The 

generalized multitext grammar (GMTG) [5,10] maintains 

two sets of productions as components, one for each 

language, for modeling parallel texts. Although GMTG 

is more expressive and can be used to express independent 

rewriting, the lack of flexibility in the description 

constraints on the features associated with a nonterminal 

makes it difficult to develop a practical machine 

translation system. 

Lexical-semantic formalisms and unification-based 

grammar formalisms such as lexical function grammar 

(LFG) [11-13] use constituent structures (c-structure) and 

functional structures (f-structure) for semantic interpretation 

of words. Head-driven phrase structural grammar 

(HPSG) [14] expresses the units of linguistic information 

such as syntactic, semantic, as well as pragmatic 

information of utterances in typed feature structures 

[15] to facilitate computationally precise descriptions 

of natural language syntax and semantics. The 

descriptive power of these grammars may be defined 

specifically enough to give correct translations of individual 

usages of words and phrases. However, translation 

systems based on these formalisms are not possible 

without much more efficient parsing and disambiguation 

algorithms for these formalisms. Moreover, 

the construction of these formalisms needs deeper 

grammatical knowledge to develop grammatical rules. 

The integration of syntactical and semantic information 

requires the use of many features making the task 

of writing specific constraint grammars too difficult for 

a non-specialist. Another limitation of translation applications 

is that these formalisms are not ready to describe 

the relations between two-language pairs that researchers 

need for translation systems. 

This paper describes a variation of synchronous 

grammar, the constraint-based synchronous grammar 

(CSG), based on the formalism of context-free grammar. 

The CSG uses feature structures such as in unification-based 

grammar to generalize the sentential patterns 

in the source text while grouping the corresponding 

target rewriting rules for each production in a 

vector representing the possible translation patterns for 

the source production. The rule choice for target 

Tsinghua Science and Technology, June 2006, 11(3): 295-306 

generation is based on the feature’s constraints of the 

non-terminal symbols in the pattern. The motivation is 

three-fold. First, synchronous formalisms have been 

proposed to model parallel texts, since such algorithms 

can infer the synchronous structures of texts for two 

different languages through the grammatical representation 

of their syntactic deviations. These synchronous 

structures are quite useful for the analysis of language 

pairs in machine translation systems. Secondly, augmentation 

of the synchronous models with feature 

structures enhances the pattern recognition with additional 

power to describe gender, number, agreement, 

etc., since the descriptive power of unification-based 

grammars is considerably greater than that of the classical 

CFG [11] . Finally, the combining of the notational 

and intuitive simplicity of the CFG with the power of 

the feature constraints provides grammar formalism 

with a better descriptive power than the CFG to give 

more efficient parsing and generation controlled by the 

feature constraints of symbols to provide better word 

sense and syntax disambiguation. 

The paper describes the formalism of the CSG with 

a parsing algorithm for the formalism. The CSG is then 

used to construct a machine translation system for 

Portuguese to Chinese translation. 

1 Constraint-Based Synchronous 

Grammars 

The CSGs are defined using the syntax of context-free 

grammar for synchronous language pairs. The formalism 

consists of a set of generative productions with 

each production constructed by a pair of context free 

grammar with/without syntactic head and link constraints 

for the non-terminal symbols in the patterns. 

The first component (on the right hand side of the productions) 

represents the sentential patterns of the 

source language, the source component, while the second 

component represents the translation patterns in 

the target language, the target component. Unlike other 

synchronous formalisms, the production target component 

consists of one or more generative rules associated 

with zero or one or more controlled conditions 

based on the features of the non-terminal symbols of 

the source rule for describing the possible generation 

correspondences in the target translation. The source

WONG Fai ( 黄辉 ) et al:Machine Translation Using Constraint-Based … 297 

components in the CSG are then generalized by handling 

the feature’s constraints in the target component, 

which also helps reduce the grammar size. For example, 

the following is one of the productions used in the 

Portuguese to Chinese translation system: 

S→NP 1 VP* NP 2 PP NP 3 {[NP 1 VP 1 NP 3 VP 2 NP 2 ; 

VP cate =vb1,VP s:sem =NP 1sem , VP io:sem =NP 2sem ,VP o:sem = NP 3sem ], 

[NP 1 VP NP 3 NP 2 ; VP cate = vb0,VP s:sem =NP 1sem , 

VP io:sem =NP 2sem ]} (1) 

The production has two components besides the reduced 

syntactic symbol on the left hand side, the first 

modeling Portuguese and the second Chinese. The target 

component in this production consists of two generative 

rules maintained in a vector, each of which involves 

control conditions based on the symbol features 

from the source component, which are used as the selectional 

preferences during parsing. These constraints 

in the parsing/generation algorithm are used for inferring 

not only the input structure to decide which structures 

are possible or probable, but also the output text 

structure for the target translation. For example, the 

condition expression: VP cate =vb1, VP s:sem =NP 1sem , 

VP io:sem =NP 2sem , VP o:sem =NP 3sem , specifies that the 

senses of the first, second, and third nouns (NPs) in the 

input strings match the sense of the subject and the direct 

and indirect objects governed by the verb, VP, 

with category type vb1. When the condition is satisfied, 

the source structure is successfully recognized and the 

corresponding structure of the target language, NP 1 

VP 1 NP 3 VP 2 NP 2 , is also determined. 

Non-terminal symbols in the source and target rules 

are linked if they are given the same index “subscripts” 

for multiple occurrences, such as NP in the production: 

S→NP 1 VP NP 2 PP NP 3 [NP 1 VP * NP 3 NP 2 ]. Symbols 

that appear only once in both the source and target 

rules, such as VP, are implicitly linked to give the synchronous 

rewriting. Linked non-terminal symbols must 

be derived from a sequence of synchronized pairs. 

Consider the production: S→NP 1 VP NP 2 PP NP 3 [NP 1 

VP * NP 3 NP 2 ], where the second NP (NP 2 ) in the 

source rule corresponds to the third NP (NP 2 ) in the 

target rule and the third NP (NP 3 ) in the source rule 

corresponds to the second NP (NP 3 ) in the target pattern, 

while the first NP (NP 1 ) and VP correspond to 

their counterparts in the source and target rules. The 

symbol marked by an “*” is designated as the head 

element in the pattern, with the features of the designated 

head symbol propagated to the reduced nonterminal 

symbol on the left side of the production rule 

to achieve the feature inheritance in the CSG formalism. 

The use of feature structures associated with nonterminal 

symbols will be explained further in Section 

2.2. 

When modeling natural languages, especially when 

processing language pairs, non-standard linguistic 

phenomena such as crossing dependencies, and discontinuous 

constituents must be carefully analyzed due to 

the structural differences between the two languages, in 

particular for languages from different families such as 

Portuguese and Chinese [16,17] . Linguistic expressions can 

vanish or appear in translation. For example, the preposition 

(PP) in the source rule does not show up in any of 

the target rules in Production (1). In contrast, Production 

(2) allows the Chinese characters “ 本 ” and “ 辆 ” to appear 

in the target rules as modifiers of the noun (NP) together 

with the quantifier (num) as the proper translation 

for the source text. This explicitly relaxes the synchronization 

constraint, so that the two components 

can be rewritten independently. 

NP→num NP * {[num 本 NP; NP sem =SEM _book ], 

[num 辆 NP; NP sem =SEM _automobile ]} (2) 

An important strength of the CSG is its expressive 

power to describe discontinuous constituents. Chinese 

frequently uses discontinuously distributed combination 

words. For example, take the sentence pair [“Ele 

vendeu-me todas as uvas. (He sell me all the grapes.)”, 

“ 他把所有的葡萄卖给了我 ”]. The Chinese preposition 

“ 把 ” and the verb “ 卖给了 ” should be paired with 

the Portuguese verb “vendeu” which causes a fan-out 

with the discontinuous constituent in the Chinese component. 

(The term fan-out describes a word whose 

translation is paired with discontinuous words in 

the target language, e.g., “vendeu[-pro] [NP]” in 

Portuguese gives the similar English translation “sell 

[NP] to [pro]”, so “vendeu”, in this case, corresponds 

to both “sell” and “to”.) The following fragment of 

CSG productions represents such relationships. 

S→NP 1 VP * NP 2 NP 3 {[NP 1 VP 1 NP 3 VP 2 NP 2 ; 

VP cate = vb0,…],...} 

VP→vendeu*{[ 把 , 卖给了 ;∅]} (3) 

In Production (3), the discontinuous constituents of

298 

VP from the source rule are represented by VP 1 and 

VP 2 in the target rule, where the “superscripts” are 

added to indicate the pairing to VP in the target component. 

The corresponding translation constituents in 

the lexicalized production are separated by commas 

representing the discontinuity between constituents 

“ 把 ” and “ 卖给了 ” in the target translation. During the 

rewriting phase, the corresponding constituents will be 

used to replace the syntactic symbols in the pattern rule. 

2 Definitions and Features 

Representation 

2.1 Definitions 

Let L be a context-free language defined over the terminal 

symbol V T which is generated by the contextfree 

grammar G using the non-terminal symbol V N disjointed 

with V T , the starting symbol S, and the productions 

of the form A → w where A is in V N and w in (V N 

∪V T )*. Let Z be a set of integers, with each nonterminal 

symbol in V N assigned to an integer, Γ(V N ) = 

{W ω |W∈V N , ω∈Z}. The elements of Γ(V N ) are indexed 

non-terminal symbols. Then extend the rule to 

include the set of terminal symbols V T’ as the translation 

in the target language, disjoint from V T , 

(V T ∩V T’ =∅). Let R={r 1 , …, r n | r i ∈ (Γ(V N )∪V T’ ), 1≤i 

≤n} be a finite set of rules and C={c 1 , …, c m } be a finite 

set of constraints over the associated features of 

(Γ(V N )∪V T ), where the features of non-terminal Γ(V N ), 

the syntactic symbols, are inherited from the designated 

head element during rule reduction. A target rule 

is defined as a pair [r∈R * , c∈C * ] in γ, where γ = 

R*×C* has the form [r, c]. Then, define ψ(γ i ) to denote 

the number of conjunct features being considered in 

the associated constraint to determine the degree of 

generalization for a constraint. So, the rules γ i and γ j 

are orderable, γ i ≺ γ j , if ψ(γ i )≥ψ(γ j ) (or γ i γ j , if 

ψ(γ i )


Table 1 CSG productions 

S → NP 1 VP* NP 2 PP {[NP 1 VP 1 NP 3 VP 2 NP 2 ; VP cate =vb1, VP s:sem =NP 1sem , 

NP 3 VP io:sem =NP 2sem , VP o:sem = NP 3sem ], [NP 1 VP NP 3 NP 2 ; 

VP cate = vb0, VP s:sem =NP 1sem , VP io:sem =NP 2sem , VP o:sem = 

NP 3sem ]} 

VP → v {[v ; ∅]} (5) 

NP → det NP* {[NP ; ∅]} (6) 

(4) 

NP → num NP* {[num 本 NP; NP sem =SEM _book ]} (7) 

NP → n {[n; ∅]} (8) 

NP → pro {[pro; ∅]} (9) 

PP → p {[p; ∅]} (10) 

n → José {[ 若泽 ; ∅]}| (11) 

livro 

{[ 书 ; ∅]} 

pro → ele {[ 他 ; ∅]} (12) 

v → deu {[ 给了 ; ∅]} | (13) 

comprou 

{[ 向 , 买了 ; ∅]} 

num → um {[ 一 ; ∅]} (14) 

p → a {∅} (15) 

det → o {∅} (16) 

Note: As with the head element designation in productions, the only symbol from the RHS of the production in the left hand side (LHS) will be the 

head element. Thus, no head mark “*” is given for such rules and we assume that “VP → v*” implies “VP → v”. 

determined target rules in Q. The translation of an input 

string s essentially consists of three steps. First, the 

input string is parsed using the source rules from the 

productions. Secondly, the link constraints are propagated 

from the source rule to the target component to 

determine and build a target derivation sequence. Finally, 

the translation of the input string is generated 

from the target derivation sequence. 

2.2 Features representation 

In the CSG, linguistic entities are modeled as feature 

structures which give patterns additional power for describing 

gender, number, semantics, attributes, and 

number of arguments required by a verb, and so on. 

This information is encoded in the commonly used attribute 

value matrices (AVMs) attached to each of the 

lexical and syntactic symbols in the CSG. The attribute 

or feature names are typically written in upper case in 

the AVMs with the values written to the right of the 

feature name in lower case: 

⎡ 

⎢ 

⎢ 

⎢ 

⎢ 

⎢ 

⎢ 

⎢ 

⎢ 

⎢ 

⎢ 

⎢ 

⎢ 

⎢⎣ 

CAT 

SUBJ 

DOBJ 

IOBJ 

TEN 

PER 

NUM 

vb0 

hum 

obj 

hum 

pp 

3rd 

sg 

⎤ 

⎥ 

⎥ 

⎥ 

⎥ 

⎥ 

⎥ 

⎥ 

⎥ 

⎥ 

⎥ 

⎥ 

⎥ 

⎥⎦ 

(17) 

This structure allows specification of syntactic dependencies 

such as agreement and subcategorization in 

the patterns. Unlike other unification-based grammars 

[12,15] , the unification is not completed but only the 

conditions that are explicitly expressed in the rule constraints 

are tested and unified. The unification process 

can be performed in a constant time. The use of feature 

constraints has to be restricted to maintain the parsing 

and generation algorithm efficiency, and especially to 

prevent generation of a large number of ambiguous 

structure candidates. The word selection in the target 

language can also be achieved by checking features. 

The parsing and generation algorithm propagates the 

features information to the reduced symbol from the

300 

designated head element in the pattern to realize the 

features inheritance. Features can either be put in the 

lexical dictionary isolated from the formalism to simplify 

construction of the analytical grammar or can be 

explicitly encoded in the preterminal rules as: 

pro→José:[CAT:pro,NUM:sg,GEN:masc,SEM:hum] {[ 若泽 ; 

∅]}; 

n→livro:[CAT:n,NUM:sg,GEN:masc,SEM:artifact+book] 

{[ 书 ; ∅]}, 

where the features set is bracketed and separated by a 

semi-colon, and the name and the value of a feature are 

delimited by a colon to represent the feature pair. 

Another way to enhance the CSG formalism is to 

apply soft preferences instead of hard constraints in the 

features unification process. Two problems can arise. 

First, a single lexical item often has more than one 

combination of feature values in natural languages, i.e., 

one word may have several translations according to 

the different senses and the pragmatic uses of the word, 

which is known as word senses disambiguation [18] . 

Secondly, the conventional feature unification method 

can only tell if the process is successful. For example, 

if relatively minor conditions fail during the unification, 

all the related candidates are rejected without any 

flexibility in choosing the next preferable or most 

probable candidate. The importance of conditions is 

associated by a weight with each feature structure. The 

matching features are then ranked according to their 

weights rather than the order of the lexical items expressed 

in grammars or dictionaries. In the current system, 

each symbol has an original weight, with the preference 

measurements when checking the feature constraints 

used to create a penalty to reduce the weight to 

the effective weight of those associated by features in a 

particular context. Features with the largest weight are 

chosen as the most preferable content. 

3 Parsing Algorithm 

The CSG formalism can be parsed by any CFG parsing 

algorithm including the Earley [19,20] and generalized 

LR algorithms [21,22] augmented to take into account the 

features’ constraints and the inference of the target 

structure. The parsing algorithm used for the formalism 

described here is based on the generalized LR algorithm 

developed for translation systems. Since the 

method uses a parsing table, it is considerably more efficient 

than Earley’s non-complied method which has 


to compute a set of LR items at each parsing stage [23] . 

The generalized LR algorithm was first introduced by 

Tomita for parsing the augmented context-free grammar 

that can ingeniously handle non-determinism and 

ambiguity through the use of a graph-structured stack 

while retaining much of the advantages of the standard 

LR parsing, especially when the grammar is close to 

the LR grammars. The method is quite suitable for 

parsing natural languages with grammatical features 

such as prepositional phrase attachments which are inherently 

ambiguous. 

The generalized LR algorithm actually is a variation 

of standard LR parsing which uses a shift-reduce approach 

by using an extended LR parsing table to guide 

its actions by allowing multiple action entries such as 

shift/reduce and reduce/reduce to handle nondeterministic 

parsing with pseudo-parallelism. For this formalism, 

the parsing table was extended by including the 

features’ constraints and target rules into the actions 

table. The strategy is thus to parse the source rules of 

the CSG productions through the normal shift actions 

proposed by the parsing table, while checking the 

associated conditions to determine if the active 

reduction is valid depending on weather the working 

symbols in the patterns fulfill the features’ constraints. 

3.1 CSG parsing table 

Figure 1 shows an extended LR(1) parsing table for 

Productions (4)-(16) as constructed by using the LR 

table construction method [24] extended to consider the 

rule components of productions by associating the corresponding 

target rules and constraints. The parsing table 

includes a compact ACTION-GOTO table and a 

CONSTRAINT-RULE table. The ACTION-GOTO table 

is indexed by a state symbol s (row) and a symbol x 

∈V N ∪V T , including the end marker “⊥”. The entry 

ACTION[s, x] can be one of sn, rm, acc, or blank. sn 

denotes a shift action representing GOTO[s, x]=n 

which defines the next state for the parser; rm denotes 

a reduction using the m-th production located in the entry 

of CONSTRAINT-RULE in state s; acc denotes the 

accept action and blank indicates a parsing error. The 

CONSTRAINT-RULE table is indexed by the state 

symbol s (row) and the number of productions m that 

may be applied for reduction in state s. The entry 

CONSTRAINT-RULE[s, m] consists of a set of productions 

together with the target rules and features


State 

ACTIONs/GOTOs 

Reduced rules 

pro num n v det p NP VP PP S ⊥ o a um ele José livro deu comprou constraints/target rules 

0 s8 s9 s1 s11 s7 s6 s5 s2 s1 s4 s3 

1 r1 (1) pro → ele {[ 他 ; ∅]} 

2 r1 (1) num → um 

3 r1 (1) n → livro {[ 书 ; ∅]} 

4 r1 (1) n → José {[ 若泽 ; ∅]}| 

5 r1 (1) det → o 

6 acc 

7 s14 s15 s12 s13 

8 r1 (1) NP → pro 

9 s8 s9 s1 s11 s16 s5 s2 s1 s4 s3 

10 r1 (1) NP → n 

11 s8 s9 s1 s11 s17 s5 s2 s1 s4 s3 

12 r1 (1) v → deu {[ 给了 ; ∅]} 

13 r1 (1) v → comprou {[ 向 , 买了 ; ∅]} 

14 r1 (1) VP → v 

15 s8 s9 s1 s11 s18 s5 s2 s1 s4 s3 

16 r1 (1) NP → num NP* 

17 r1 (1) NP → det NP* {[NP; ∅]} 

18 s21 s20 s19 

19 r1 (1) p → a 

20 s8 s9 s1 s11 s22 s5 s2 s1 s4 s3 

21 r1 (1) PP → p 

22 r1 

(1) S → NP 1 VP* NP 2 PP NP 3 

{[...]} 

Fig. 1 Extended LR(1) parsing table 

constraints that are used for validating if the active 

parsing node can be reduced and to identify the corresponding 

target generative rule for the reduced 

production. 

For simplicity, the productions used for building the 

parsing table are deterministic so no conflicting actions 

such as shift/reduce and reduce/reduce, appear in the 

constructed parsing table. 

3.2 CSG parser 

In the parsing process, the algorithm maintains a number 

of parsing processes in parallel, each of which 

represents an individual parsed result to handle nondeterministic 

cases. Each process has a stack and scans 

the input string from left-to-right. Actually, each process 

does not maintain its own separate stack, but the 

multiple stacks are represented by using a single directed 

acyclic graph, a graph-structured stack, for parsing 

multiple possible structures. The process has two 

major components, shift(i) and reduce(i), which are 

called at each position i=0, 1, …, n in an input string I 

= x 1 x 2 …x n . The shift(i) process shifts the top stack vertex 

v on x i from its current state s to some successor 

state s’ by creating a new leaf v’; establishes an edge 

from v’ to the top stack v; and makes v’ the new top 

stack vertex. The reduce(i) process executes a reduction 

action on production p by following the chain of 

parent links down from the top stack vertex v to the 

ancestor vertex from which the process began scanning 

for p, while removing intervening vertices off the stack. 

Every reduction action in reduce(i) has a set C of ordered 

constraints, c 1 ≺ …≺ c m , in the production, each 

of which is associated with a target rule that may be 

the probable corresponding target structure for the production, 

depending on whether the paired constraint is 

satisfied according to the features of the parsed string p. 

Before reduction, the constraints c j (1≤j≤m) are 

tested in order starting from the most specific one with 

the evaluation process stopping once a positive result

302 

is obtained from the evaluation. The corresponding 

target rule for the parsed string is determined and attached 

to the reduced syntactic symbol which will be 

used to rewrite the target translation in the generation 

PARSE (grammar, x 1 … x n ) 

x n+1 ⇐ ⊥ 

U i ⇐ ∅ 

U 0 ⇐ v 0 

(0≤i≤n) 

for each terminal symbol x i 

P ⇐ ∅ 

for each node v ∈ U i−1 

P ⇐ P∪v 

(1≤i≤n) 

if ACTION[STATE(v), x i ] = “shift s’”, SHIFT(v, s’) 

for each “reduce p” ∈ ACTION[STATE(v), x i ], 

REDUCE(v, p) 

if “acc” ∈ ACTION[STATE(v), x i ], accept 

if U i = ∅, reject 

SHIFT(v, s) 

if v’ ∈ U i s.t. STATE(v’)=s and ANCESTOR(v’, 1)=v and 

else 

state transition δ(v, x)=v’ 

do nothing 

create a new node v’ 

s.t. STATE(v’)=s and ANCESTOR(v’, 1)=v and state 

transition δ(v, x)=v’ 

U i ⇐ U i ∪v’ 

UNIFY(v, p) 

for “constraint c j ” ∈ CONSTRAINT(STATE(v)) (1≤j≤ 

m, c 1 ≺ …≺ c m ) 

if ξ(c j , p)= “true” 

TARGET(v) ⇐ j 

return “success” 

(ξ(∅, p)= “true”) 


phase. At the same time, the features information is inherited 

from the designated head production element. 

The parsing algorithm for the CSG formalism is given 

in Fig. 2. 

REDUCE(v, p) 

for each possible reduced parent v 1 ’ ∈ ANCESTOR(v, 

RHS(p)) 

if UNIFY(v, p)= “success” 

s” ⇐ GOTO(v 1 ’, LHS(p)) 

if node v” ∈ U i−1 s.t. STATE(v”)=s” 

else 

if δ(v 1 ’, LHS(p))=v” 

else 

do nothing 

if node v 2 ’ ∈ ANCESTOR(v”, 1) 

else 

let v c ” s.t. ANCESTOR(v c ”, 1)=v 1 ’ and 

STATE(v c ”)=s” 

for each “reduce p” ∈ ACTION[STATE(v c ”), x i ], 

REDUCE(v c ”, p) 

if v” ∈ P 

else 

let v c ” s.t. ANCESTOR(v c ”, 1)=v 1 ’ and 

STATE(v c ”)=s” 

for each “reduce p” ∈ ACTION[STATE(v c ”), x i ], 

REDUCE(v c ”, p) 

create a new node v n 

s.t. STATE(v n )=s” and ANCESTOR(v n , 1)=v 1 ’ and 

state transition δ (v n , x)=v 1 ’ 

U i−1 ⇐ U i−1 ∪v n 

current reduction failed 

Fig. 2 Modified generalized LR parsing algorithm 

The parser is a function of two arguments PARSE 

(grammar, x 1 … x n ), where the grammar is provided as 

a parsing table. The parser calls the functions SHIFT(v, 

s) and REDUCE(v, p) to process the shifting and rule 

reductions as described. The UNIFY(v, p) function is 

called for every possible reduction in REDUCE(v, p) 

to verify the legal reduction and select the target rule 

for the source structure for synchronization. The 

function TARGET(v) is called after unification passed 

to determine the j-th target rule for correspondence. 

4 Application to Portuguese to 

Chinese Machine Translation 

4.1 Portuguese to Chinese translation system 

The formalism described in Section 3 was used to 

develop a Portuguese to Chinese translation system. 

The system is a transfer-based translation system using 

the formalism of the CSG as its analytical grammar. 

Unlike other transfer-based translation systems


that individually complete the major components of 

analysis, transfer, and generation in the pipeline using 

different sets of representation rules for structure 

analysis and transformation [25] , the present system uses 

only a single CSG set for the translation task. Since the 

structures of parallel languages are synchronized in the 

formalism, their structural differences are also captured 

and described by the grammar. Hence, translation of an 

input text essentially involves three steps. First, the 

structure of an input sentence s is analyzed using the 

source component’s rules from the CSG productions 

by the augmented generalized LR parsing algorithm. 

Secondly, the link constraints determined during the 

rule reduction process are propagated to the corresponding 

target rules R (as a selection of target rules) 

to construct a target derivation sequence Q. Finally, the 

derivation sequence Q is used to generate a translation 

of the input sentence s by referencing the set of generative 

rules R attached to the corresponding constituent 

nodes in the parsed tree. 

Figure 3 shows the architecture of the Portuguese to 

Chinese translation system, with the data flow in the 

translation process illustrated by the arrows. The input 

sentence is first analyzed by a morphological analyzer 

to restore the lexical item to its canonical form. 

Compared to English, Portuguese has a highly developed 

morphology and is relatively richer in inflectional 

variation. For example, each verb has at 

least 50 different conjugations according to distinctions 

such as number, person, voice, mood, and tense. The 

Portuguese morphological analyzer developed by 

Wong et al. [26] was used to analyze the input word 

and to retrieve a list of possible candidates that have 

different feature values including the part of speech. Ambiguous 

words are handled by the sense disambiguation 

module and the parser. 

Fig. 3 Portuguese to Chinese translation system architecture 

After analyzing the form of the lexical items and assigning 

a set of possible features to each word in the 

sentence, the next step is to determine the proper 

grammatical categories for the words using the 

Portuguese tagger [27] , which is a probabilistic tagging 

model smoothed by interpolating the probabilities of 

lexical features into the word probabilities to evaluate 

unknown words. As described in Section 3, soft preferences 

were used instead of hard constraints by associating 

a weight to each feature structure. During the 

translation process, no feature candidates are removed 

from the candidates list, but their weights are adjusted 

if the specific candidate cannot be hit in each analytical 

process. For example, in the tagging process, if the 

carrying grammar category of a feature does not match 

the tagged result, a penalty is given to reduce the 

weight so that the feature is given a lower priority. 

The search space during parsing is narrowed by using 

a word sense disambiguation module based on context 

collocation to select the most probable features for 

words and to put these features at the top of the features 

list. The parsing process will then use these as the 

default features for testing the constraint conditions defined 

in the CSG. The input sentence is then parsed to 

produce a parsed tree representing the source sentence 

structure, where each constituent unit of the representation 

structure is attached to the corresponding 

structure of the target language. For example, the

304 

parsed structure of the sentence “Ele deu um livro ao 

José. (He gave a book to José)” together with target 

correspondences is illustrated in Fig. 4. Figure 5 gives 

the final translation “ 他给了若泽一本书 ” for the sentence 

after rendering the generation process based on 

the synchronous structures produced from the parser. 

Fig. 4 Parsed structure of the sentence “Ele deu um 

livro ao José (He gave a book to José)” with the target 

correspondences 

Fig. 5 Target translation after processing by the generation 

module 

4.2 Other applications 

The CSG can be used to simulate other synchronous 

models such as ITG [8] and independent rewriting systems 

by maintaining only one rule in the target component 

and relaxing the associated target rule constraints. 

In this case, the CSG model can be used to extract linguistic 

information from parallel texts, including the 

discovery of hidden structures in texts from different 

languages, by learning from parallel corpora. Collocation 

extraction and expression, for example, are quite 

important for developing translation systems where 

collocation data can contribute to improved performance 

in almost every processing component. Collocations, 

like idioms, belong to the set of phrases in a language 

that describe habitual word combinations which 

have particular meanings relative to the base. While 


they are often realized idiomatically, collocational 

meanings can also be found in regular compositional 

phrases. There is a clear consensus in the literature that 

collocation data are very useful for parsing, word sense 

disambiguation, text generation, and other applications, 

especially for machine translation. Collocational data 

expressions can be easily achieved through the CSG 

formalism, since they can be incorporated as part of 

the analytical grammars for parsing during analysis 

and translation. 

Recently, there has been an increasing interest in 

controlled languages (an offspring of machine translation), 

culminating in a series of CLAW workshops on 

controlled language applications [28] . These have given 

impetus to both monolingual and multilingual guidelines 

and applications using controlled language for 

many different languages. Controlled languages are 

subsets of natural languages whose grammars and dictionaries 

have been restricted to reduce or eliminate 

both ambiguity and complexity. In a transfer-based 

system, controlled translation involves three processing 

phases that control the input text, the transfer routines, 

and the generation component on the target side. 

An alternative way to facilitate the controlled translation 

process is to limit the language grammars through 

the modeling of parallel (synchronous) structures for 

the language pairs, e.g., by using the CSG formalism. 

The generalization of source rules in the CSG allows 

flexible description of the common (universal) structure 

of natural languages, which greatly reduces the 

grammar size. With the CSG formalism, the output 

translation can be easily controlled with the constraints 

of the corresponding target component according to the 

input text features and the contextual information. 

5 Conclusions 

This paper describes a synchronous grammar based on 

the syntax of context-free grammar, called the CSG. 

The CSG source components are generalized to represent 

the common structures of languages. Unlike in 

other synchronous grammars, each source rule is associated 

with a set of target productions, where each target 

rule is connected with a constraint over the source 

pattern features. The set of target rules are grouped into


a vector ordered by the specificity of the constraints. 

The objective of this formalism is to allow parsing and 

generating algorithms to infer different possible translation 

equivalences for an input sentence according to 

the linguistic features. A modified generalized LR 

parsing algorithm was adapted to the parsing of 

this formalism to analyze the syntactic structure 

of Portuguese in a machine translation system. The 

CSG was then used to construct a machine translation 

system for Portuguese to Chinese translation. Other 

potential applications of the CSG include parallel 

text modeling, identification of collocation data 

expressions, and controlled translation. 

References 

[1] Ranbow O, Satta G. Synchronous models of language. In: 

Proceedings of 34th Annual Meeting of the Association for 

Computational Linguistics. University of California, Santa 

Cruz, California, USA, 1996: 116-123. 

[2] Lewis P, Stearns R. Syntax-directed transduction. Journal 

of the Association for Computing Machinery, 1968, 15(3): 

465-488. 

[3] Shieber S, Schabes Y. Synchronous tree adjoining grammar. 

In: Proceedings of the 13th International Conference 

on Computational Linguistic. Helsinki, 1990. 

[4] Seki H, Matsumura T, Fujii M, Kasami T. On multiple 

context-free grammars. Theoretical Computer Science, 

1991, 88(2): 191-229. 

[5] Melamed I D. Multitext grammars and synchronous parsers. 

In: Proceedings of NAACL/HLT. Edmonton, 2003: 

79-86. 

[6] Wong F, Hu D C, Mao Y H, Dong M C. A flexible example 

annotation schema: Translation corresponding tree representation. 

In: Proceedings of the 20th International 

Conference on Computational Linguistics. Geneva, 

Switzerland, 2004: 1079-1085. 

[7] Wu D. Grammarless extraction of phrasal translation examples 

from parallel texts. In: Proceedings of TMI-95, 

Sixth International Conference on Theoretical and Methodological 

Issues in Machine Translation, vol. 2. Leuven, 

Belgium, 1995: 354-372. 

[8] Wu D. Stochastic inversion transduction grammars and bilingual 

parsing of parallel corpora. Computational Linguistics, 

1997, 23(3): 377-404. 

[9] Aho A, Ullman J. Syntax directed translations and the 

pushdown assembler. Journal of Computer and System 

Sciences, 1969, 3: 37-56. 

[10] Melamed I D, Satta G, Wellington B. Generalized multitext 

grammars. In: Proceedings of 42nd Annual Meeting of the 

Association for Computational Linguistics. Barcelona, 

Spain, 2004: 661-668. 

[11] Kaplan R, Bresnan J. Lexical-functional grammar: A formal 

system for grammatical representation. In: Bresnan J, 

ed. The Mental Representation of Grammatical Relations. 

Cambridge, Mass.: MIT Press, 1982: 173-281. 

[12] Kaplan R. The formal architecture of lexical-functional 

grammar. Information Science and Engineering, 1989, 5: 

305-322. 

[13] Bresnan J. Lexical-Functional Syntax. Oxford: Blackwell, 

2001. 

[14] Pollard C, Sag I. Information-Based Syntax and Semantics. 

Stanford: CSLI, 1989. 

[15] Pollard C, Sag I. Head-Driven Phrase Structure Grammar. 

Chicago: University of Chicago Press, 1994. 

[16] Wong F, Mao Y H, Dong Q F, Qi Y H. Automatic 

translation: Overcome the barriers between European and 

Chinese languages. In: Proceedings (CD Version) of First 

International UNL Open Conference. Suzhou, China, 2001. 

[17] Wong F, Mao Y H. Framework of electronic dictionary 

system for Chinese and Romance languages. Automatique 

des Langues (TAL): Les Dictionnaires Électroniques, 2003, 

44(2): 225-245. 

[18] Ide N, Veronis J. Word sense disambiguation: The state of 

the art. Computational Linguistics, 1998, 24(1): 1-41. 

[19] Earley J. An efficient context-free parsing algorithm [Dissertation]. 

Computer Science Department, Carnegie- 

Mellon University, 1968. 

[20] Earley J. An efficient context-free parsing algorithm. 

CACM, 1970, 13(2): 94-102. 

[21] Tomita M. An efficiency augmented context-free parsing algorithm. 

Computational Linguistics, 1987, 13(1-2): 31-46. 

[22] Rekers J. Parser generation for interactive environments. 

[Dissertation]. University of Amsterdam, 1992. 

[23] Tomita M. Generalized LR Parsing. Carnegie-Mellon University: 

Kluwer Academic Publishers, 1991. 

[24] Aho A V, Sethi R, Ullman J D. Compiler: Principles, Techniques 

and Tools. Massachusetts: Addison-Wesley, 1986. 

[25] Hutchins W J, Somers H L. An Introduction to Machine 

Translation. New York: Academic Press, 1992. 

[26] Wong F, Dong M C, Li Y P, Mao Y H. Semantic and morphological 

analysis and mouse tracking technology

306 

in translation tool. In: Proceedings of Computational 

Methods in Engineering and Science (EPMESC IX). 

Macao, 2003: 525-532. 

[27] Wong F, Chao S, Hu D C, Mao Y H. Interpolated probabilistic 

tagging model optimized with genetic algorithm. In: 

Proceedings of the 3rd International Conference on Machine 

Learning and Cybernetics. Shanghai, China, 2004: 

2569-2574. 


[28] Carl M. Data-assisted controlled translation. In: Joint Conference 

Combining the 8th International Workshop of the 

European Association for Machine Translation and the 4th 

Controlled Language Applications Workshop (EAMT- 

CLAW 03). Dublin City, 2003: 16-24. 

Tsinghua Betters Ties with MIT 

BEIJING, Feb.18, 2006—Tsinghua University, famed as the “cradle of engineers,” is often dubbed “Chinese MIT 

(Massachusetts Institute of Technology)” on the Chinese mainland. The recent visit of Dr. Susan Hockfield, president 

of MIT, to Tsinghua University brought the relationship between the two universities even closer. 

“Universities must foster many new kinds of collaboration,” Hockfield said in her speech at Tsinghua. “We must 

work across the borders between universities and industry, between academic disciplines, between institutions, and 

between nations.” 

Tsinghua-MIT Beijing Urban Design Joint Studio has been one of the co-operative programs between Tsinghua 

and MIT. The idea was mooted in 1979 and turned into reality six years later when the first joint studio opened in 

Beijing. The joint studio has been held every two years, ever since. In each studio, directed by two to three professors 

from both universities, some 20 MIT graduates together with some 10 Tsinghua counterparts do field study, 

urban design and joint review in selected places in Beijing for a month. Tsinghua and MIT will hold the 10th joint 

studio between June and July, 2006. And an exhibition of the research fruits yielded by participating students will 

be held. 

Over the years, Tsinghua and MIT have started other collaborative programs. In 1996, the School of Economics 

and Management of Tsinghua began a partnership with the MIT Sloan School of Management, developing programmes 

to prepare China’s management for business in the global arena. Since then, more than 40 faculty members 

have visited MIT Sloan for at least half a year for teaching and research experience. The joint International 

Master’s in Business Administration (IMBA) have attracted applicants not only from China, but also from other 

countries. In fact, 26% of the students in the program are international students. 

In the same co-operative spirit, the Tsinghua China Finance Research Centre was founded in July 2002. Since 

then, more than 200 students have received training through various seminars at the centre. 

“Universities will need to learn from each other and to draw on best practices in education and research wherever 

they originate,” Hockfield said. She is the first university president to be given an honourary doctorate by Tsinghua 

University in its 95-year history. 

(Source: China Daily, http://news.xinhuanet.com/english/2006-02/18/content_4196376.htm) 

(http://news.tsinghua.edu.cn, partly selected)

Machine Translation Using Constraint-Based Synchronous Grammar

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?