12.01.2015 Views

CNGL Annual Report 2012

CNGL Annual Report 2012

CNGL Annual Report 2012

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>CNGL</strong> ANNUAL REPORT <strong>2012</strong>


<strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />

Director: Prof. Josef Van Genabith<br />

School of Computing<br />

Dublin City University<br />

Dublin 9<br />

Deputy Director: Prof. Vincent Wade<br />

School of Computer Science and Statistics<br />

Trinity College Dublin<br />

Dublin 2<br />

Associate Director: Dr. Páraic Sheridan<br />

School of Computing<br />

Dublin City University<br />

Dublin 9<br />

INFO@<strong>CNGL</strong>.IE<br />

WWW.<strong>CNGL</strong>.IE<br />

Dublin City University Trinity College Dublin University College Dublin University of Limerick


Preface<br />

THE CENTRE FOR NEXT GENERATION LOCALISATION (<strong>CNGL</strong>) IS A CENTRE FOR SCIENCE ENGINEERING<br />

AND TECHNOLOGY (CSET) FUNDED BY SCIENCE FOUNDATION IRELAND (SFI) AND INDUSTRY PARTNERS.<br />

Centres for Science, Engineering and Technology (CSETs) help link scientists and engineers in partnerships across<br />

academia and industry to address crucial research questions, foster the development of new and existing Irish-based<br />

technology companies, attract industry that could make an important contribution to Ireland and its economy, and<br />

expand educational and career opportunities in Ireland in science and engineering. CSETs are expected to exhibit<br />

outstanding research quality, intellectual breadth, active collaboration, flexibility in responding to new research<br />

opportunities, and integration of research and education in the fields that SFI supports. Science Foundation Ireland<br />

(SFI) is a key organisation in the implementation of Ireland’s National Development Plan (NDP 2007-2013) and the<br />

Strategy for Science, Technology and Innovation 2006-2013. A sum of €8.2 billion has been allocated for scientific<br />

research under the NDP and SSTI of which SFI has responsibility to invest €1.4 billion. SFI will continue to invest in<br />

academic researchers and research teams who are most likely to generate new knowledge, leading edge technologies<br />

and competitive enterprises in the fields of science and engineering.<br />

SFI Vision<br />

Ireland will be a global knowledge leader that places scientific and engineering research at the core of its society<br />

to power economic development and social progress.<br />

This centre is supported by Science Foundation Ireland (grant 07/CE/I1142)<br />

and the National Development Plan 2007–2013.<br />

Science Foundation Ireland<br />

National Development Plan<br />

2007-2013


Table of Contents<br />

Executive Summary 5<br />

CSET Leadership 7<br />

Management Team Biosketches 9<br />

<strong>CNGL</strong> Overview 17<br />

Integrated Language Technologies 27<br />

Digital Content Management 45<br />

Next Generation Localisation 57<br />

Systems Framework 71<br />

Year 5 Demonstrator Programme 81<br />

Industry Partnerships and Technology Transfer 89<br />

Management and Governance 99<br />

Education and Outreach 107<br />

Appendix 1: People and Partnerships 115<br />

Appendix 2: Outputs 124


Executive Summary


Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 5<br />

Executive Summary<br />

“Our work is guided by the vision of enabling people to interact with content, products, services and other people<br />

in their own language, according to their own culture, and according to their own personal needs.”<br />

Localisation is the process of adapting digital content<br />

to culture, locale and linguistic environment. It is<br />

a key enabling multiplier technology of the global<br />

manufacturing, software, services and content creation<br />

and distribution industries, unlocking markets otherwise<br />

unavailable. Localisation has a social dimension as<br />

many communities find themselves on the wrong side<br />

of the “digital divide” with vital information (health,<br />

hygiene, food, education etc.) not available in the local<br />

languages, with potentially disastrous consequences.<br />

Localisation technologies and processes can make<br />

a significant contribution to bridging this divide.<br />

The <strong>CNGL</strong> partnership has focused on both the<br />

commercial and the societal dimensions of localisation,<br />

concentrating on the challenges of volume, access and<br />

personalisation. Volume: the amount of content to be<br />

localised massively outstrips human translation capacity.<br />

Access: mobile devices enable ubiquitous access to<br />

perishable and frequently updated information on the<br />

go, involving interaction modalities such as speech and<br />

image, corporate as well as user-generated content.<br />

Personalisation: information is most useful if adapted to<br />

the user, device, background information, knowledge<br />

and task at hand. In terms of a slogan: “the person is the<br />

ultimate locale”.<br />

Over the last five years (2007-<strong>2012</strong>) <strong>CNGL</strong> has made<br />

strident progress connecting the localisation industry<br />

with cutting-edge research in language technologies,<br />

content management, workflow, community and human<br />

factors and software engineering: today the question is<br />

no longer whether or not to use machine translation but<br />

how best to. Today the question is no longer whether or<br />

not to use user-generated content in customer support,<br />

but how best to. Today the question is no longer whether<br />

or not to use collaborative community-based localisation<br />

models, but how best to. These step-changes are based<br />

on scientific progress. Over its first funding period <strong>CNGL</strong><br />

has produced more than 400 peer-reviewed research<br />

papers, 21 PhD students, 39 innovation and software<br />

disclosures, 9 patent applications and secured €15.8m<br />

additional research income growing the <strong>CNGL</strong> research<br />

eco-system.<br />

Key to the success of <strong>CNGL</strong> is close collaboration with<br />

the <strong>CNGL</strong> industry partners, focusing and sharpening the<br />

research. Without this, the step-change in localisation<br />

would not have been possible. Taking research out of the<br />

lab is a core objective of <strong>CNGL</strong>: to date 4 <strong>CNGL</strong> start-up<br />

and spin-out companies including Xcelerator Machine<br />

Translations, Digital Linguistics, Scream Technologies<br />

and Emizar and the not-for-profit social localisation<br />

Rosetta Foundation are strong testimony to this.<br />

Additionally, spinout candidate Wripl is preparing for<br />

launch in 2013.<br />

<strong>CNGL</strong> is preparing for the future: <strong>2012</strong> saw the successful<br />

<strong>CNGL</strong>II application coordinated and led by <strong>CNGL</strong> Deputy<br />

Director Prof. Vincent Wade secure core SFI funding<br />

of €10.5M for the next 30 months. <strong>CNGL</strong>II focuses on<br />

Global Intelligent Content based on the concept of the<br />

Global Content Value Chain, where services interact<br />

with content to make it self-describing, self-aware<br />

and self-adapting across language barriers, modalities<br />

and interaction platforms, tuned to context and user.<br />

Prof. Wade will take over as <strong>CNGL</strong> Director in March<br />

2013. Prof. Wade is an experienced and accomplished<br />

international research leader. Please give him all your<br />

support.<br />

To conclude, I would like to say to all our research<br />

students, postdoctoral researchers, principal<br />

investigators, technical, operations and education and<br />

outreach team staff, to all our industry partners, all our<br />

start-up companies and the researchers and staff in our<br />

extended <strong>CNGL</strong> research eco-system: thank you! You<br />

make this happen!<br />

Prof. Josef van Genabith<br />

Director, <strong>CNGL</strong>


CSET Leadership


Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 7<br />

CSET Leadership<br />

CSET Contact Information<br />

<strong>CNGL</strong><br />

School of Computing<br />

Dublin City University<br />

Dublin 9<br />

Phone: +353 1 700 6700<br />

Fax: +353 1 700 6702<br />

Email: info@cngl.ie<br />

Management Team<br />

Director, Co-Leader: Integrated<br />

Language Technologies Track<br />

Prof. Josef van Genabith<br />

School of Computing<br />

Dublin City University<br />

Dublin 9<br />

Phone: +353 1 700 6700<br />

Fax: +353 1 700 6702<br />

Email: josef@computing.dcu.ie<br />

Deputy Director, Track Leader:<br />

Digital Content Management<br />

Prof. Vincent Wade<br />

Department of Computer Science and Statistics<br />

Trinity College Dublin<br />

Dublin 2<br />

Phone: +353 1 896 1765<br />

Fax: +353 1 677 2204<br />

Email: vincent.wade@cs.tcd.ie<br />

Associate Director<br />

Dr. Páraic Sheridan<br />

School of Computing<br />

Dublin City University<br />

Dublin 9<br />

Phone: +353 1 700 6706<br />

Fax: +353 1 700 6702<br />

Email: psheridan@computing.dcu.ie<br />

Track Leaders<br />

Co-Track Leader:<br />

Integrated Language Technologies<br />

Prof. Nick Campbell<br />

Centre for Language and Communication Studies<br />

Trinity College Dublin<br />

Dublin 2<br />

Phone: +353 1 896 1626<br />

Fax: +353 1 896 2941<br />

Email: nick.campbell@tcd.ie<br />

Track Leader:<br />

Systems Framework<br />

Dr. Saturnino Luz<br />

School of Computer Science and Statistics<br />

Trinity College Dublin<br />

Dublin 2<br />

Phone: +353 1 896 3686<br />

Fax: +353 1 677 2204<br />

Email: luzs@cs.tcd.ie<br />

Track Leader:<br />

Next Generation Localisation<br />

Mr. Reinhard Schäler<br />

Department of Computer Science<br />

and Information Systems<br />

University of Limerick<br />

Limerick<br />

Phone: +353 61 202 881<br />

Fax: +353 61 202 734<br />

Email: reinhard.schaler@ul.ie<br />

OPERATIONS TEAM<br />

Commercial Development Manager<br />

Mr. Steve Gotz<br />

School of Computing<br />

Dublin City University<br />

Dublin 9<br />

Phone: +353 1 700 6710<br />

Fax: +353 1 700 6702<br />

Email: sgotz@computing.dcu.ie


8<br />

Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />

CSET LEADERSHIP<br />

LRC Administrator<br />

Ms. Geraldine Harrahill<br />

Department of Computer Science<br />

and Information Systems<br />

University of Limerick<br />

Limerick<br />

Phone: +353 61 202 881<br />

Fax: +353 61 202 734<br />

Email: geraldine.harrahill@ul.ie<br />

Financial Administrator<br />

Ms. Fiona Maguire<br />

School of Computing<br />

Dublin City University<br />

Dublin 9<br />

Phone: +353 1 700 6708<br />

Fax: +353 1 700 6702<br />

Email: fmaguire@computing.dcu.ie<br />

Centre Administrator<br />

Ms. Sophie Matabaro<br />

School of Computing<br />

Dublin City University<br />

Dublin 9<br />

Phone: +353 1 700 6707<br />

Fax: +353 1 700 6702<br />

Email: smatabaro@computing.dcu.ie<br />

Centre Secretary<br />

Ms. Eithne McCann<br />

School of Computing<br />

Dublin City University<br />

Dublin 9<br />

Phone: +353 1 700 6700<br />

Fax: +353 1 700 6702<br />

Email: emccann@computing.dcu.ie<br />

Project Manager<br />

Ms. Hilary McDonald<br />

School of Computer Science and Statistics<br />

O’Reilly Institute<br />

Trinity College Dublin<br />

Dublin 2<br />

Phone: +353 1 896 4244<br />

Fax: +353 1 677 2204<br />

Email: mcdonah@scss.tcd.ie<br />

Intellectual Property Manager<br />

Mr. Stephen Roantree<br />

School of Computing<br />

Dublin City University<br />

Dublin 9<br />

Phone: +353 1 700 6720<br />

Fax: +353 1 700 6702<br />

Email: sroantree@computing.dcu.ie<br />

Systems Administrator<br />

Mr. Joachim Wagner<br />

School of Computing<br />

Dublin City University<br />

Dublin 9<br />

Phone: +353 1 700 6915<br />

Fax: +353 1 700 6702<br />

Email: jwagner@computing.dcu.ie<br />

Education and Outreach Team<br />

Education and Outreach Manager<br />

Ms. Cara Greene<br />

School of Computing<br />

Dublin City University<br />

Dublin 9<br />

Phone: +353 1 700 6704<br />

Fax: +353 1 700 6702<br />

Email: cgreene@computing.dcu.ie<br />

Marketing and Communications Officer<br />

Ms. Laura Grehan<br />

School of Computing<br />

Dublin City University<br />

Dublin 9<br />

Phone: +353 1 700 6705<br />

Fax: +353 1 700 6702<br />

Email: lgrehan@computing.dcu.ie<br />

LRC Manager<br />

Mr. Karl Kelly<br />

Department of Computer Science<br />

and Information Systems<br />

University of Limerick<br />

Limerick<br />

Phone: +353 61 202 748<br />

Fax: +353 61 202 734<br />

Email: karl.kelly@ul.ie


Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 9<br />

Management Team Biosketches<br />

publications (including publications in the journals of<br />

Computational Linguistics, Machine Translation, Artificial<br />

Intelligence, Research on Language and Computation,<br />

Natural Language Engineering and the ACL, EACL,<br />

COLING, EMNLP and IJCNLP conferences).<br />

Research Interests<br />

Prof. van Genabith works on localisation, machine<br />

translation, multilingual treebank-based deep grammar<br />

acquisition, and statistical parsing and generation.<br />

Career Highlights<br />

Centre Director and Co-Leader,<br />

Integrated Technologies Track:<br />

Prof. Josef van Genabith<br />

Department: School of Computing<br />

University: Dublin City University<br />

Brief Biography<br />

Prof. Josef van Genabith is the founder and Director<br />

of the Centre for Next Generation Localisation (<strong>CNGL</strong>)<br />

and an Associate Professor in DCU School of Computing.<br />

He graduated in Electronic Engineering and English<br />

at RWTH Aachen (Germany) in 1988 and received his<br />

PhD in Linguistics from the University of Essex (U.K.)<br />

in 1993. He worked as a researcher at the University of<br />

Essex (1991–1992) and at the Institut für Maschinelle<br />

Sprachverarbeitung IMS, Universität Stuttgart (Germany)<br />

(1992–1996). He joined the School of Computing at<br />

DCU as Lecturer in 1996, became Senior Lecturer in<br />

1999 and Associate Professor in 2002. He was Chair<br />

of the Programme Board for the B.Sc. in Applied<br />

Computational Linguistics (DCU) 1997–2001. In 2001<br />

he became Director of the National Centre for Language<br />

Technology (NCLT) and developed the NCLT to its<br />

current 40+ members, and research grant income of<br />

over €5M since 2001 (excluding <strong>CNGL</strong>). He has been<br />

leading Science Foundation Ireland (SFI), Enterprise<br />

Ireland (EI) and European Union (EU) funded research<br />

projects and was awarded an SFI Principal Investigator<br />

award in 2004. He became a Visiting Researcher at IBM’s<br />

Dublin Center for Advanced Studies (CAS) in 2003 and<br />

a Faculty Fellow in 2004. He has graduated 18 PhD<br />

and 6 M.Sc. by Research students. He is (joint) author<br />

of more than 150 peer-reviewed international research<br />

} <strong>2012</strong>: General Chair COLING 2014, Dublin, Ireland<br />

} <strong>2012</strong>: Recipient of the DCU <strong>2012</strong> President’s<br />

Research Award for Science and Engineering<br />

} 2010–present: META-NET (Multilingual Europe<br />

Technology Alliance EU Network of Excellence)<br />

Executive Board and Technology Council member<br />

} 2007–present: Advisory Board, European Association<br />

for Computational Linguistics (EACL)<br />

} 2007–present: Director and Lead-PI of SFI <strong>CNGL</strong><br />

CSET Award €16.8M<br />

} 2005–present: Faculty Fellow, IBM Center for<br />

Advanced Studies (CAS), Dublin<br />

} 2004–2005: Visiting Scientist, IBM Center for<br />

Advanced Studies (CAS), Dublin<br />

} 2004–2009: SFI Principal Investigator, Science<br />

Foundation Ireland, GramLab, €839K<br />

} 2001–2008: Director, National Centre for Language<br />

Technology (NCLT), DCU<br />

} 1997–2001: Chair of Programme Board, B.Sc. in<br />

Applied Computational Linguistics (ACL), DCU<br />

School of Computing


10<br />

Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />

MANAGEMENT TEAM BIOSKETCHES<br />

Deputy Centre Director, Track Leader:<br />

Digital Content Management:<br />

Prof. Vincent P. Wade<br />

Department: Discipline of Intelligent Systems,<br />

School of Computer Science and Statistics<br />

University: Trinity College Dublin<br />

Brief Biography<br />

Prof. Vincent Wade is Deputy Director of the Centre<br />

for Next Generation Localisation (<strong>CNGL</strong>) and Head of<br />

the Discipline of Intelligent Systems at the School of<br />

Computer Science and Statistics, Trinity College Dublin.<br />

The Discipline of Intelligent Systems comprises four<br />

research groups: the Knowledge and Data Engineering<br />

Group, the Computational Linguistics Group, the<br />

Graphics Vision and Visualisation Group, and the<br />

Artificial Intelligence Group. The Discipline comprises<br />

21 academics and more than 150 full-time postgraduate<br />

(PhD) students and research fellows.<br />

Prof. Wade graduated from UCD with a B.Sc. (Hons)<br />

in Computer Science (1987) and received his M.Sc.<br />

and PhD postgraduate degrees in Computer Science<br />

from TCD. He holds the position of Associate Professor<br />

in the School of Computer Science and Statistics and<br />

in 2002 was awarded Fellowship of Trinity College for<br />

his contribution to research in the areas of knowledge<br />

management and adaptive technologies. In 1999 he<br />

founded the Centre for Learning Technology, which<br />

has pioneered the innovation and development of<br />

eLearning technologies in the University. He was also<br />

awarded the position of Visiting Scientist in the Center<br />

for Advanced Studies at IBM for his research in adaptive<br />

hypermedia and knowledge management (2005-2008).<br />

He was Research Director of the Knowledge and Data<br />

Engineering Research Group (1995-2007).<br />

Prof. Wade is author of over 150 scientific papers<br />

in peer-reviewed research journals and international<br />

conferences and has received eight ‘best paper’ awards<br />

for publications in IEEE, IFIP and AACE Conferences<br />

within the last nine years. He has been guest editor of<br />

IEEE Communications as well as a reviewer for many<br />

IEEE and ACM journals including IEEE Communications,<br />

IEEE Network, IEEE Intelligent Systems, ACM Transaction<br />

on the Web, and IEEE Transactions on Learning<br />

Technologies. Prof. Wade is a scientific programme<br />

member for many prestigious international conferences<br />

including IEEE’s IM and NOMS, ACM Hypertext and<br />

WWW Conference series. He was co-chair of the<br />

Adaptive Hypermedia Conference (AH2006) that<br />

was held in Dublin in June 2006, and General Cochair<br />

for IEEE IM 2011, which was held at TCD in May<br />

2011. He has been responsible for fourteen major EU<br />

research projects under the EU ACTS and IST Research<br />

Programmes as well as national research projects<br />

funded under the SFI PI Programme, HEA PRTLI and<br />

several Science Foundation Ireland/Enterprise Ireland<br />

Technology Innovation Development Awards. He has<br />

been responsible for the commercialisation of research<br />

and is a co-founder of ‘Empower The User’, an innovative<br />

start-up company in the area of personalisation and soft<br />

skills training.<br />

Research Interests<br />

Prof. Wade’s research interests focus on Knowledge<br />

Engineering research, in particular adaptive web systems,<br />

dynamic personalisation, adaptive management and<br />

control systems, and process management. His research<br />

has been applied in several technology application<br />

areas including eLearning and Management Systems<br />

for next generation networks and distributed services.<br />

Since 1991, he has been TCD’s Principal Investigator for<br />

over fifteen EU research projects under the EU RACE,<br />

Telematics, ESPRIT, ACTS, and IST research programmes.<br />

He was also PI for ADAPT (2005–2007) and Pudecas<br />

(2005–2007), funded under the Technology Innovation<br />

Research Programme (Enterprise Ireland) and PI for the<br />

HEA-sponsored MZONES project (2002–2006).


Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 11<br />

Associate Director:<br />

Dr. Páraic Sheridan<br />

Department: School of Computing<br />

University: Dublin City University<br />

Brief Biography<br />

Dr. Páraic Sheridan is Associate Director at <strong>CNGL</strong>.<br />

He received his B.Sc. degree (1st class honours) in<br />

Computer Applications from Dublin City University<br />

(DCU) in 1989. He then completed an M.Sc. degree<br />

in Computer Applications at DCU by research in 1991,<br />

studying the use of Natural Language Processing in<br />

Information Retrieval. This was followed in 1994 by an<br />

M.S. degree in Computational Linguistics at Carnegie<br />

Mellon University (CMU) in Pittsburgh, PA. His study<br />

at CMU was funded by Claris Corporation (Dublin) for<br />

whom he researched the use of Translation Memories<br />

in the software localisation process. He completed his<br />

doctoral work in 1998 at the Swiss Federal Institute of<br />

Technology (ETH) Zürich with a dissertation on the topic<br />

of Cross-Language Information Retrieval. While at ETH<br />

he also helped develop the SPIDER information retrieval<br />

system which was commercialised and spun out from<br />

ETH into the EuroSpider company.<br />

Dr. Sheridan then joined TextWise LLC, a start-up<br />

company in Syracuse, NY which was a spin-out from<br />

Syracuse University-based on research by Prof. Elizabeth<br />

Liddy in the area of Natural Language Processing and<br />

Information Retrieval. Over the course of a 10-year career<br />

at TextWise, Dr. Sheridan held a variety of positions in<br />

research management, programme management and<br />

product management, ultimately achieving the position<br />

of Chief Scientist at the company. This reflected his work<br />

on the CINDOR cross-language search system, initially<br />

as a government-funded research project which was<br />

then commercialised and marketed by TextWise in the<br />

enterprise search space. Dr. Sheridan also led the effort<br />

in adapting the CINDOR product to the needs of the<br />

U.S. Intelligence Community; developing a crosslanguage<br />

English-Arabic query translation module to<br />

integrate with standard enterprise search platforms.


12<br />

Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />

MANAGEMENT TEAM BIOSKETCHES<br />

Research Interests<br />

Co-Track Leader: Integrated Language Technologies:<br />

Prof. Nick Campbell<br />

Department: Centre for Language<br />

and Communication Studies (CLCS)<br />

University: Trinity College Dublin<br />

Brief Biography<br />

Prof. Nick Campbell is SFI Stokes Professor of Speech<br />

& Communication Technology at Trinity College Dublin.<br />

He received his Ph.D. degree in Experimental Psychology<br />

from the University of Sussex in the U.K., and was<br />

previously engaged at the Japanese National Institute<br />

of Information and Communications Technology, and<br />

as Chief Researcher in the Department of Acoustics<br />

and Speech Research, Advanced Telecommunications<br />

Research Institute International, Kyoto, Japan, where<br />

he also served as Research Director for the JST/CREST<br />

Expressive Speech Processing and the SCOPE “Robot’s<br />

Ears” projects. He was first invited as a Research Fellow<br />

at the IBM U.K. Scientific Centre, where he developed<br />

algorithms for speech synthesis, and later at the AT&T<br />

Bell Laboratories, where he worked on the synthesis of<br />

Japanese. He served as Senior Linguist at the Edinburgh<br />

University Centre for Speech Technology Research before<br />

joining ATR in 1990. His research interests are based on<br />

large speech databases, and include nonverbal speech<br />

processing, concatenative speech synthesis, and prosodic<br />

information modelling. He spends his spare time working<br />

with postgraduate students as Visiting Professor at the<br />

School of Information Science, Nara Institute of Science<br />

and Technology (NAIST), Nara, Japan, and was also<br />

Visiting Professor at Kobe University, Kobe, Japan for<br />

10 years.<br />

Prof. Nick Campbell’s background is in experimental<br />

psychology and linguistics, but most of his experience<br />

is in speech technology. Prof. Campbell is an advocate<br />

of corpus-based approaches and he has pioneered<br />

advanced (and paradigm-shifting) methods of speech<br />

synthesis and natural conversational speech collection<br />

in a multimodal environment. His principal interest is<br />

in speech prosody, extending this research to social<br />

interaction to show how the voice is used in discourse<br />

to express personal relations as well as propositional<br />

content. Most of his previous work has used speech<br />

materials collected in Japan and, through his move to<br />

Ireland, he can confirm the universality of his previous<br />

findings – both for Irish and for Hiberno-English.<br />

Ultimately, Prof. Campbell is working to produce a<br />

friendlier speech-based human-machine interface for<br />

web-based information, customer-services, games,<br />

and robotics, while trying to understand how humans<br />

perform such often perfect communication.<br />

Career Highlights<br />

} 2010-2015: Science Foundation Ireland Principal<br />

Investigator, FastNet Summary Focus on Actions in<br />

Social Talk; Network Enabling Technology (€1.23M)<br />

} Oct. 2011 – Present: Member, Spoken Language<br />

Technical Committee, IEEE Signal Processing Society<br />

} Feb. 2011: Vice President, European Language<br />

Resources Association<br />

} Nov. 2010 – Present: Board Member, European<br />

Language Resources Association (ELRA)<br />

} 2009 – Present: Board member, International<br />

Speech Communication Association<br />

} 2005 – Present: Board member, Japan British<br />

Association of the Kansai<br />

} Member, International Phonetic Association<br />

} Member, Coordinating Committee on Speech<br />

I/O Database Assessment<br />

} Member, International Committee of Acoustic<br />

Society of Japan<br />

} Member, International Speech Communication<br />

Association Institute of Acoustics (adherent) U.K.<br />

} Member, Acoustic Society of America<br />

} Member, Acoustic Society of Japan<br />

} Member, IEEE Signal Processing Society


Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 13<br />

since 1990 and has researched different approaches<br />

to Example Based Machine Translation (EBMT) which<br />

contributed to the work now carried out by The Rosetta<br />

Foundation, using translation tools and technologies<br />

for the provision of translation and localisation services,<br />

supported by volunteer translators, project managers and<br />

engineers.<br />

Career Highlights<br />

Track Leader: Next Generation Localisation:<br />

Mr. Reinhard Schäler<br />

Department: Department of Computer<br />

Science and Information Systems<br />

University: University of Limerick<br />

Brief Biography<br />

Reinhard Schäler has been involved in the localisation<br />

industry in a variety of roles since 1987. He is the founder<br />

and editor of Localisation Focus – The International<br />

Journal of Localisation, a founding editor of the Journal<br />

of Specialised Translation (JosTrans), a former member of<br />

the editorial board of Multilingual Computing (October<br />

1997 to January 2007, covering 70 issues), a founder<br />

and CEO of The Institute of Localisation Professionals<br />

(TILP), and a member of OASIS. He has attracted more<br />

than €5.5M in research funding and has published more<br />

than 50 articles, book chapters and conference papers<br />

on language technologies and localisation. He has been<br />

an invited speaker at EU and international governmentorganised<br />

conferences in Africa, the Middle East, South<br />

America and Asia. In 2009, he founded The Rosetta<br />

Foundation, a non-profit organisation and charity aiming<br />

to make knowledge available in every language. He is<br />

a lecturer at the Department of Computer Science and<br />

Information Systems (CSIS), University of Limerick (UL),<br />

and the founder and director of the Localisation Research<br />

Centre (LRC) at UL, established in 1995.<br />

Research Interests<br />

Schäler’s main research area is the automation of<br />

localisation workflows and the application of tools<br />

and technologies to the localisation of digital content,<br />

including translation, engineering and testing. He has<br />

been researching approaches to Machine Translation<br />

(MT) and Computer Assisted Translation (CAT) systems<br />

} Establishment of the Localisation Research Centre<br />

(LRC), 1995, £250K.<br />

} Establishment of the Grad. Dip./M.Sc. in Software<br />

Localisation at University of Limerick in 1997.<br />

} EU-funded IGNITE project on Linguistic Infrastructure<br />

for Localisation: Language Data, Tools and Standards,<br />

together with four European industrial partners, total<br />

budget: €3.5M, 2005-2007.<br />

} Invited keynotes: Localisation and<br />

Internationalisation of Software for Export,<br />

Florianópolis, Brazil (November 2004);<br />

Manufacturers’ Association for Information<br />

Technology (MAIT), New Delhi, India (December<br />

2004); The First International Conference on Persian<br />

Script & Language Localisation, Supreme Council of<br />

ICT and Iran Telecom Research Centre, Tehran, Iran<br />

(May 2005); The IEEE Professional Communication<br />

Society, International Professional Communication<br />

Conference, Limerick, Ireland (July 2005); LISA<br />

Forum Cairo, The Localisation Industry Standards<br />

Association, Cairo, Egypt (December 2005);<br />

Multilingual Web, Madrid, Spain (October 2010).<br />

} Establishment of The Rosetta Foundation in the<br />

summer of 2009, a not-for-profit organisation<br />

(charity) promoting equality via language and<br />

cultural diversity through access to digital knowledge<br />

and information independent of language.<br />

} Establishment of the Dynamic Coalition for a Global<br />

Localisation Platform: Localisation4all, under the<br />

umbrella of the United Nations Internet Governance<br />

Forum (IGF) in 2009.


14<br />

Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />

MANAGEMENT TEAM BIOSKETCHES<br />

Research Interests<br />

Dr. Luz’s research focuses on the theoretical bases of<br />

computer-supported collaboration, more specifically<br />

processes related to information structuring and<br />

retrieval, in scenarios encompassing multimedia data<br />

and multimodal interaction. He is also interested in<br />

natural language parsing, text classification, and dialogue<br />

systems, particularly human-factors research.<br />

Career Highlights<br />

Track Leader: Systems Framework:<br />

Dr. Saturnino Luz<br />

Department: School of Computer Science and Statistics<br />

University: Trinity College Dublin<br />

Brief Biography<br />

Dr. Saturnino Luz has worked on the development of<br />

novel technologies for human-computer interfaces in the<br />

areas of computer-supported cooperative work, spoken<br />

language systems, natural language processing, dialogue<br />

management, and design support tools for multimodal<br />

systems. He has been a Lecturer in Computer Science<br />

at Trinity College since 2001, where he supervises PhD<br />

and M.Sc. students in the areas of natural language<br />

processing, computer supported cooperative work,<br />

human-computer interaction and machine learning.<br />

Dr. Luz has participated in a number of Irish- and<br />

EU-funded research projects, working on computing<br />

support for connected communities, dialogue systems<br />

engineering, technology for medical team meetings, as<br />

well as various topics in machine learning. He has served<br />

on the programme committees of several international<br />

conferences and the editorial boards of international<br />

journals. He has been a member of the Association for<br />

Computing Machinery (ACM) since 1994 and contributes<br />

regularly to the ACM Computing Reviews.<br />

} Acted as Principal Investigator ECOMMET<br />

project on Enhanced Computing Support for<br />

Multidisciplinary Medical Team Meetings, funded<br />

by Science Foundation Ireland.<br />

} Principal Investigator of a Basic Research project<br />

on content indexing for multimedia meeting<br />

recordings, funded by Enterprise Ireland.<br />

} Review selected as a Computing Review highlight;<br />

featured as profiled reviewer in acknowledgement<br />

of his contributions to that publication (2004).<br />

} Invited talks at the University of Ulster (2002),<br />

at the German Research Centre for Artificial<br />

Intelligence (2003), at the University of South Africa<br />

(2004), at the Seminar on New Trends in Corpus<br />

Linguistics for Language Teaching and Translation<br />

Studies (Granada, Spain, 2008), and at KTH<br />

(Stockholm, Sweden, 2010).<br />

} Chaired the programme committee of the Irish<br />

Human-Computer Interaction Conference (2009)<br />

and co-chaired the Special Track on Supporting<br />

Collaboration among Healthcare Workers at the<br />

IEEE International Symposium on Computer-Based<br />

Medical Systems (2008-2010).<br />

} Served as member of the Editorial Board of<br />

Information from 2000 to 2003.


Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 15<br />

Research Interests<br />

Education and Outreach Manager:<br />

Ms. Cara Greene<br />

Department: School of Computing<br />

University: Dublin City University<br />

Greene has a B.Sc. in Applied Computational Linguistics<br />

from Dublin City University. She then became a learning<br />

support and resource teacher before returning to DCU<br />

to undertake a PhD in Information Communication<br />

Technology (ICT). She is currently writing up her PhD<br />

thesis part-time on integrating ICT into the secondary<br />

school curriculum. Grene’s PhD thesis investigates<br />

whether integrating ICT into the curriculum can produce<br />

inclusive curricula that cater to the needs of all students<br />

(with and without learning difficulties). Post-PhD, Cara<br />

wants to carry out research on the impact of education<br />

programmes provided by large research centres on the<br />

numbers of students taking up these subjects at third<br />

level.<br />

Brief Biography<br />

Cara Greene is Education and Outreach (E&O) Manager<br />

in the Centre for Next Generation Localisation (<strong>CNGL</strong>).<br />

The Education and Outreach Programme is split into<br />

two areas: Education and Outreach. The Education<br />

Programme aims to provide educational and training<br />

opportunities at all levels of education in key areas in the<br />

localisation industry. These range from primary school<br />

courses to localisation professional training courses. It<br />

also provides professional development and research<br />

support to <strong>CNGL</strong> students and staff as well as others<br />

in the localisation industry. The Outreach Programme<br />

encompasses developing public-facing projects, hosting<br />

conferences and industry events, and promoting <strong>CNGL</strong><br />

research in the media.<br />

Career Highlights<br />

} Nominated for the DCU President’s Award for Civic<br />

Engagement 2010.<br />

} Member of the Third Level Education and Outreach<br />

(TREO) Communications and Evaluation working<br />

groups.<br />

} Research paper selected to be presented at the<br />

Young Researchers Consortium at ICCHP 2006.<br />

} Awarded the DCU Chancellor’s Medal at Graduation<br />

2002.


<strong>CNGL</strong> Overview


Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 17<br />

<strong>CNGL</strong> Overview<br />

Localisation: Global Challenges<br />

and Opportunities<br />

Localisation is the industrial process of adapting (digital)<br />

content to culture, locale and linguistic environment,<br />

at high quality, speed and low cost. It is a key enabling<br />

multiplier technology of the global manufacturing,<br />

software, services and content distribution industries,<br />

unlocking markets otherwise unavailable. Importantly,<br />

the true potential of localisation goes well beyond<br />

opening up business opportunities across the globe:<br />

many communities find themselves on the wrong side<br />

of the “digital divide” with vital information (hygiene,<br />

health, food, education etc.) not available in the local<br />

languages, with potentially disastrous consequences.<br />

Localisation technologies and processes can make a<br />

considerable contribution to bridging this divide. The<br />

<strong>CNGL</strong> partnership and research focus on both the<br />

commercial and the societal dimensions of localisation.<br />

as speech and image, as well as corporate and usergenerated<br />

content. Personalisation: while traditional<br />

localisation is coarse-grained (focusing on a geographic<br />

locale and language: e.g. the Middle East), information<br />

is most useful if adapted to the user, device, background<br />

information/knowledge and task at hand. In terms of a<br />

slogan, “the person is the ultimate locale”.<br />

The three axes Volume, Access and Personalisation<br />

define the “Localisation Cube” (Figure 1). The <strong>CNGL</strong><br />

mission (derived from its vision) is to develop processes<br />

and technologies that can address each point in the cube<br />

at configurable quality and speed.<br />

Figure 1. The Localisation Cube (and traditional<br />

Enterprise Localisation technologies)<br />

The Centre for Next Generation Localisation (<strong>CNGL</strong>,<br />

2007-<strong>2012</strong>) is an Industry-Academia partnership funded<br />

jointly by Science Foundation Ireland (SFI) and industry<br />

partners. The university partners are DCU (Dublin City<br />

University, lead institution), TCD (Trinity College Dublin),<br />

UCD (University College Dublin) and UL (University of<br />

Limerick). Industry partners include Microsoft Ireland,<br />

Symantec Ireland, Dai Nippon Printing (Japan), SDL,<br />

Translations.com (Alchemy), CAPITA (Applied Language<br />

Solutions), Welocalize, VistaTEC and SpeechStorm,<br />

assembling some of the world-leading software,<br />

publishing and localisation companies in the <strong>CNGL</strong><br />

partnership.<br />

The <strong>CNGL</strong> vision is to enable people to interact with<br />

content, products, services and each other, in their own<br />

language, culture, context and according to their own<br />

personal needs.<br />

To realise this vision, the <strong>CNGL</strong> research programme<br />

concentrates on the challenges of Volume, Access<br />

and Personalisation. Volume: the amount of content<br />

is growing dramatically and massively outstrips human<br />

translation capacity. Access: while traditional localisation<br />

is text, print and (full) screen/keyboard based, mobile<br />

devices enable ubiquitous access to information on<br />

the go, involving additional interaction modalities such<br />

Traditional enterprise localisation technologies tend to<br />

focus on large and well-managed localisation workflows,<br />

with predictable corporate content, targeting the lower,<br />

front, right-most part of the localisation cube (Figure<br />

1), with large parts of the Localisation Cube remaining<br />

unaddressed.<br />

Next Generation Localisation, by contrast, is based on a<br />

set of flexible and adaptive technologies and processes<br />

that allow us to address each point in the Localisation<br />

Cube, at configurable quality and speed. The <strong>CNGL</strong><br />

research programme concentrates on three focal points<br />

in the Cube (Figure 2): 1<br />

1 Note that volume here refers to a single localisation request: while<br />

traditional bulk or enterprise localisation projects may involve the<br />

translation of millions of words into many languages, a single customer<br />

care interaction may only involve a few hundred words and one or two<br />

languages. However, the total effect is, of course, cumulative: millions of<br />

customer care interactions will generate very large total volumes.


18<br />

Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />

<strong>CNGL</strong> OVERVIEW<br />

Figure 2: <strong>CNGL</strong> Focal Points in the Localisation Cube<br />

technologies in terms of the flexible and adaptive <strong>CNGL</strong><br />

Components Framework (rather than a single monolithic<br />

one-size-fits-all system) have served us well: it has guided<br />

the <strong>CNGL</strong> research programme and has allowed us to<br />

anticipate and respond flexibly to many of the recent<br />

challenges and opportunities in the localisation space,<br />

including the:<br />

} massive increase in multilingual user-generated (UGC)<br />

content (in addition to professionally edited corporate<br />

content) from user forums and social networking sites<br />

} growing importance of UGC in localisation and<br />

community-based customer support models<br />

The Bulk Localisation Workflow (BLW) scenario targets<br />

large volume localisation tasks with and without human<br />

pre- and post-editing, familiar from large localisation<br />

projects. The focus is on both corporate and NGO<br />

content, automation (translation technologies, in<br />

particular machine translation), the optimal integration<br />

of novel social and collaborative localisation models<br />

(including crowd-sourcing), supported by open standards<br />

and a flexible, open and web-services-based localisation<br />

platform that supports a wide range of workflows<br />

(supporting standard corporate as well as novel<br />

collaborative workflows).<br />

The Personalised Multilingual Customer Care (PMCC)<br />

scenario focuses on supporting global customers<br />

interacting with on-line and perishable corporate and<br />

user-generated multilingual content (e.g. product blogs),<br />

providing for frequent content updates, multi-modal<br />

access (speech and image, in addition to the more<br />

traditional text-based modalities) and increased levels of<br />

personalisation in real time interactions, without (or with<br />

minimal) human pre- and post-processing interventions.<br />

} emergence and impact of novel social and communitybased<br />

localisation in both for-profit and not-for-profit<br />

localisation operations<br />

} increasing number of non-governmental organisations<br />

(NGOs) world-wide targeting the global “digital<br />

divide” striving to provide access to information in the<br />

local language as a basic human right<br />

} increasing number of SMEs (rather than just<br />

Multinationals) targeting global markets with<br />

localisation needs markedly different from those<br />

of the Multinationals<br />

In particular, in <strong>CNGL</strong> project Year 5 (<strong>2012</strong>) we focus on<br />

two related themes, representing the key commercial<br />

and social dimensions of <strong>CNGL</strong> research: (i) Supporting<br />

the Global Customer and (ii) Promoting the Multilingual<br />

Society.<br />

Supporting the Global Customer and Promoting<br />

the Multilingual Society<br />

The Personalised Multilingual Social Networking (PMSN)<br />

scenario focuses fully on user-generated (UGC, in<br />

contrast to corporate) and highly perishable content<br />

prevalent on social networking and messaging sites, with<br />

high levels of personalisation and full use of all access<br />

modalities, developing <strong>CNGL</strong> technologies to monitor<br />

and manage information for customer support and to link<br />

social networking activities across linguistic barriers.<br />

This conceptualisation (the Localisation Cube), the<br />

factoring of challenges and opportunities into three<br />

dimensions (Volume, Access and Personalisation) and<br />

the implementation of the Next Generation Localisation


Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 19<br />

Figure 3: Organisation of the <strong>CNGL</strong> Research Programme<br />

Addressing the Challenges and Making the<br />

Most of the Opportunities: Charting the<br />

<strong>CNGL</strong> Research Map<br />

The <strong>CNGL</strong> mission is to develop flexible and adaptive<br />

next-generation localisation technologies and processes<br />

that allow us to address any point in the space defined<br />

by the Localisation Cube (Figure 1), at configurable<br />

quality and speed, realising the <strong>CNGL</strong> vision to enable<br />

people to interact with content, products, services and<br />

other people in their own language, according to their<br />

own culture, and according to their own personal needs.<br />

This mission directly determines the structure of the core<br />

<strong>CNGL</strong> research programme (Figure 3):<br />

The <strong>CNGL</strong> research programme intertwines four major<br />

research tracks (as well as a demonstrator programme):<br />

two of the tracks, Integrated Language Technologies (ILT)<br />

and Digital Content Management (DCM) are basic research<br />

tracks, and the remaining two, Next Generalisation<br />

Localisation (LOC) and Systems Framework (SF) are<br />

more applied, integrating research tracks.<br />

LOC: technological advances from ILT and DCM<br />

need to be integrated into workflows and blue-prints<br />

of Next Generation Localisation. LOC researches the<br />

life-cycle of digital content, including content design<br />

and development, standards; evaluates sophisticated<br />

language and content management technologies for<br />

integration into novel collaborative, community-driven<br />

and social localisation models; and provides technology<br />

support for such models in terms of an open modular,<br />

component and web services-based architecture, based<br />

on the SOLAS technology platform.<br />

SF: SF research focuses on underexplored software<br />

engineering aspects of complex multilingual digital<br />

content management, including requirements analysis,<br />

user interface design, the development of WebWOZ,<br />

a web-based Wizard-of-Oz technology platform, rapid<br />

prototyping systems, semantic interoperability, adaptive<br />

workflows, and web-based service architectures. SF<br />

coordinates the development of an evolution of <strong>CNGL</strong><br />

demonstrator systems.<br />

ILT: ILT research focuses on Machine Translation (MT),<br />

Speech Technology and Text Analytics to provide the<br />

support technologies for translation and interaction<br />

automation across language and modality (text and<br />

speech) barriers, based on the MaTrEX MT and MUSE<br />

Speech Technology platforms.<br />

DCM: DCM research focuses on combining Adaptive<br />

Hypermedia (AH) with Cross-Lingual and Multimodal<br />

(Text, Image and Speech) Information Retrieval (IR)<br />

technologies to find, dice and slice and recompose<br />

content to support the <strong>CNGL</strong> information access and<br />

personalisation agenda in a multilingual setting, based<br />

on the Adaptive Engine technology platform.<br />

<strong>CNGL</strong> Demonstrator Systems<br />

Demonstrator systems are a core part of <strong>CNGL</strong> research.<br />

The demonstrators provide focal points for project<br />

cohesion and collaboration, combining technologies<br />

and teams from across <strong>CNGL</strong> research tracks and<br />

academic and industry partners. The demonstrators are<br />

an essential component in overall project evaluation and<br />

contribute platforms for research and experimentation<br />

across all <strong>CNGL</strong>. They showcase <strong>CNGL</strong> technologies to<br />

the outside world and ground <strong>CNGL</strong> research outputs<br />

in commercial as well as non-profit societal applications.


20<br />

Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />

<strong>CNGL</strong> OVERVIEW<br />

During <strong>CNGL</strong> project Years 1 to 3, the demonstrator<br />

systems focused on the three core use scenarios in<br />

the space defined by the Localisation Cube (Figure<br />

2): the Bulk Localisation Workflow (BLW) scenario,<br />

the Personalised Multilingual Customer Care (PMCC)<br />

scenario and the Personalised Multilingual Social<br />

Networking (PMSN) scenario. Based on this work, during<br />

project Year 4 (2011), the demonstrators showcased a<br />

broad industry story line around the “Supporting the<br />

Global Customer” theme, while for <strong>CNGL</strong> project Year 5<br />

(<strong>2012</strong>) the focus was on advancing and showcasing those<br />

demonstrators with the most commercial and industry<br />

impact and those demonstrators showing promising<br />

research directions for the future.<br />

<strong>CNGL</strong> Outreach and Technology-Transfer<br />

Activities<br />

Technology transfer is a key <strong>CNGL</strong> objective to convert<br />

research outputs into economic and social impact.<br />

<strong>CNGL</strong> carefully manages IP in close collaboration with<br />

the researchers and industry partners and fosters an<br />

entrepreneurial spirit within the <strong>CNGL</strong> researcher<br />

community.<br />

Fostering interest in science and technology, in particular<br />

information technology, in education (within and outside<br />

<strong>CNGL</strong>) and the public in general is a further key objective<br />

for <strong>CNGL</strong>. We offer a wide range of activities, including<br />

projects for first, second, third and fourth level education;<br />

professional development and communication within<br />

<strong>CNGL</strong>; and communication and dissemination in relevant<br />

professional research and industry sectors as well as the<br />

public in general.<br />

Changes and Developments in the <strong>CNGL</strong><br />

Consortium<br />

<strong>CNGL</strong> operates in a dynamic and fast-changing<br />

environment, both in our research and business sectors,<br />

in particular in the localisation space: <strong>2012</strong> saw a strongly<br />

increased focus on commercialisation of <strong>CNGL</strong> research<br />

expertise, in particular in the form of the growth and<br />

traction of <strong>CNGL</strong> spin-out and start-up companies and<br />

not-for-profit organisations:<br />

ILT technologies underpin three start-up companies:<br />

} Xcelerator Machine Translations, through its<br />

KantanMT product (www.kantanmt.com), operates<br />

in the space of cloud-based and scalable provision of<br />

personalised and adaptive MT services that are easy<br />

to configure, manage and operate<br />

} Scream Technologies (www.screamtechnologies.com)<br />

specialises in creating synthetic voices from human<br />

actors, enabling the end user to create humansounding<br />

synthetic speech and control how it sounds.<br />

Scream’s product enables enterprise customers to<br />

find a voice that represents them, and then to use<br />

that voice for all announcements, interactive voice<br />

response, telephone, or advertising without ever<br />

needing to return to a recording studio<br />

} Digital Linguistics (www.digitallinguistics.com) uses<br />

machine learning based text classification technologies<br />

for quality assurance (QA) for localisation projects<br />

DCM technologies underpin two start-up companies:<br />

} Emizar (www.emizar.com) focuses on customer care<br />

applications based on adaptive and personalised<br />

dicing, slicing and recomposing digital content<br />

} Wripl (www.wripl.com) offers Personalisation-as-a-<br />

Service across websites, improving a user’s experience<br />

as they browse across multiple different CMS systems<br />

to solve a particular task. Wripl is spinout preparation<br />

mode at present.<br />

LOC technologies underpin:<br />

The<br />

R SETTA<br />

Foundation<br />

} The Rosetta Foundation (www.therosettafoundation.<br />

org), a not-for-profit organisation that provides<br />

localisation services to NGOs and social causes based<br />

on novel, community-based localisation models (to<br />

date involving 2,600+ volunteers), supported by the<br />

<strong>CNGL</strong> SOLAS technology platform.


Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 21<br />

These developments (including the 2,600 volunteers<br />

engaging with The Rosetta Foundation and the €1.25M<br />

in venture capital raised by Xcelerator and Scream<br />

Technologies) clearly show the social and economic<br />

relevance of the <strong>CNGL</strong> research programme.<br />

Commercialisation activities are strongly underpinned<br />

by SFI and Enterprise Ireland funded activities<br />

(including 6 Technology Innovation Development<br />

Award grants, 5 Enterprise Ireland Feasibility Awards,<br />

3 Commercialisation Fund Awards, and 2 Innovation<br />

Partnerships) taking research all the way from the labs<br />

into the market.<br />

On the academic and research side, <strong>CNGL</strong> extends a<br />

strong welcome to Prof. Qun Liu, formerly Director of<br />

the Natural Language Processing Lab of the Chinese<br />

Academy of Sciences in Beijing, as the new Professor<br />

of Machine Translation in DCU. Prof. Liu’s expertise<br />

in cutting-edge machine translation and language<br />

technology research and his international standing in<br />

the research community make a key contribution to<br />

<strong>CNGL</strong> and substantially strengthen <strong>CNGL</strong>’s expertise in<br />

multilingual technologies.<br />

Research Highlights <strong>2012</strong><br />

Due to limits in space, unfortunately, below we can only<br />

provide a sneak preview of a few selected highlights. For<br />

full details please consult the subsequent sections in the<br />

<strong>2012</strong> <strong>CNGL</strong> <strong>Annual</strong> <strong>Report</strong>.<br />

Research Outputs <strong>2012</strong><br />

Research performance and output in <strong>2012</strong> has been<br />

strong: <strong>CNGL</strong> has again substantially outperformed its<br />

research KPI targets (Table 1) with 92 conference and<br />

26 journal, book and book chapter publications, a total<br />

of 118 against a cumulative target of 62 for the reporting<br />

period. Since 2007, <strong>CNGL</strong> has published a total of 411<br />

research publications, against a target of 291 (Table 2),<br />

outperforming overall targets by a factor of 1.5.<br />

Table 1: <strong>CNGL</strong> <strong>2012</strong> Research KPIs against Targets<br />

<strong>CNGL</strong> Research Outputs <strong>2012</strong> Actuals Targets<br />

Journal papers, book chapters<br />

and books<br />

26 12<br />

Conference publications 92 50<br />

Conferences/workshops hosted 17 8<br />

Table 2: <strong>CNGL</strong> 2007–<strong>2012</strong> Cumulative Research KPIs<br />

against Targets<br />

<strong>CNGL</strong> Research Outputs<br />

2007-<strong>2012</strong><br />

Journal papers, book chapters<br />

and books<br />

Actuals<br />

Targets<br />

63 43<br />

Conference publications 348 237<br />

Conferences/workshops hosted 58 39<br />

ILT: highlights include best paper awards (Vogel and<br />

Mamani Sánchez, <strong>2012</strong> and Emms and Franco-Penya,<br />

<strong>2012</strong>), winning the SANCL-<strong>2012</strong> Web Parsing challenge<br />

organised by Google at NAACL-HLT <strong>2012</strong> (Le Roux,<br />

Foster, Wagner, Kaljahi and Bryl, <strong>2012</strong>), strong speech<br />

technology publications with 6 journal papers, 2 book<br />

chapters and 5 conference papers at ICASSP and<br />

Interspeech <strong>2012</strong>, the strong presence of <strong>CNGL</strong> at<br />

COLING <strong>2012</strong>, Mumbai, India with a total of 15 full, short<br />

and workshop MT and Text Analytics papers, and the<br />

award to host COLING 2014 in Dublin to <strong>CNGL</strong> partner<br />

DCU with <strong>CNGL</strong> Director Prof. Josef van Genabith as<br />

General Chair. ILT researchers have worked in close<br />

cooperation with <strong>CNGL</strong> industry partners and startup<br />

companies VistaTEC and Digital Linguistics in text<br />

classification for MT quality assessment, Symantec<br />

in tuning MT to User-Generated Content, and with<br />

Xcelerator and Welocalize on integrating MT and TM<br />

technologies. ILT researchers are involved in 2 new EU<br />

FP7 MT projects (QTLaunchPad and the EXPERT Marie<br />

Curie PhD Graduate School) and lead (Dr. Antonio Toral)<br />

the Abu-MaTran FP7 Academia-Industry partnership<br />

project.


22<br />

Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />

<strong>CNGL</strong> OVERVIEW<br />

DCM: highlights include the publication of over 30<br />

peer-reviewed papers in international journals (e.g. ACM<br />

CSUR, UMUAI and Journal IR) and at major international<br />

conferences (ACM Hypertext, SIGIR <strong>2012</strong>, CIKM <strong>2012</strong>,<br />

COLING 2013, AAAI <strong>2012</strong>, ACL <strong>2012</strong>, and TPDL 2013).<br />

DCM research has made significant advances in the<br />

personalisation and dynamic aggregation of usergenerated<br />

content, corporate content, and open content<br />

harvested from the open web. This has led to industry<br />

trials in the application areas of Personalised Multilingual<br />

Customer Care and Personalisation as a Service.<br />

Research has progressed on structural content analysis<br />

for web slicing, and the MOODfinger framework for<br />

affective news retrieval has been further developed. DCM<br />

researchers (Prof. Owen Conlan and Prof. Vincent Wade)<br />

won two SFI TIDA grants for research in personalisation,<br />

which have led to the planning of spinout companies<br />

Emizar and Wripl. A third TIDA grant – for work in<br />

automated slicing of content for reuse and repurposing<br />

– has been secured (Prof. Vincent Wade) and work will<br />

commence in early 2013. Trials and evaluations of DCM<br />

technology, including the Personalised Multilingual<br />

Customer Care portal and the Personalised Multilingual<br />

Information Retrieval demonstrator, were conducted in<br />

collaboration with Microsoft and Symantec. TCD (Prof.<br />

Vincent Wade) also established the Enterprise Ireland<br />

Technology Centre for Technology Enhanced Learning<br />

(Learnovate Centre) which is allied to <strong>CNGL</strong>.<br />

LOC: highlights include the continued development of<br />

a flexible, open-source, open-standards-, componentsand<br />

web-services-based platform (SOLAS) supporting<br />

standard but also innovative social, collaborative and<br />

distributed localisation workflows. SOLAS consists of two<br />

main strands: SOLAS Match and SOLAS Productivity.<br />

SOLAS Productivity makes use of a standardised data<br />

container, open web service APIs, and a common<br />

orchestration and process management module, which<br />

connect to any number of component technologies<br />

developed by academic and industrial partners within<br />

<strong>CNGL</strong> as well as with third party technologies and tools.<br />

SOLAS Match provides ground-breaking and intuitive<br />

technology that allows for the seamless and user-friendly<br />

matching of community translation tasks with volunteer<br />

translators. The close collaboration between LOC and<br />

the Rosetta Foundation makes <strong>CNGL</strong> technologies<br />

directly available to social localisation operations and,<br />

in return, tests <strong>CNGL</strong> technologies with currently 2,600+<br />

volunteers.<br />

SF: highlights include strong progress in human<br />

factor and interaction design research, substantial<br />

contributions to standardisation (ITS (W3C) and XLIFF<br />

(OASIS)) and interoperability for systems services<br />

architecture research. Doherty, Karamanis and Luz<br />

(<strong>2012</strong>) investigates the impact of work contexts on the<br />

use of MT in localisation operations. The <strong>CNGL</strong> Wizard<br />

of Oz platform has been made open source and is<br />

available online (www.webwoz.com). A Linked Open<br />

Data approach has been used for end-to-end content<br />

management and localisation integration (Lewis et al.,<br />

<strong>2012</strong>) involving SOLAS and the MaTrEx <strong>CNGL</strong> platform<br />

technologies, provenance tracking and visualisation,<br />

in close collaboration with <strong>CNGL</strong> partners Microsoft<br />

and VistaTEC. Substantial progress has been achieved<br />

in instrumenting CAT tools to capture post-editing of<br />

MT outputs as well as in the visualisation of online<br />

community analytics, closely collaborating with <strong>CNGL</strong><br />

partners Welocalize and Symantec.<br />

Commercialisation<br />

Translating research outputs into economic and social<br />

impact is a key objective for <strong>CNGL</strong>: Table 3 shows a<br />

total of 10 invention and software disclosures, 1 patent<br />

application and 1 spin-out company (against targets<br />

of 20, 4 and 2, respectively) for <strong>2012</strong>. <strong>CNGL</strong> engages<br />

strongly in spin-out and start-up companies as well<br />

as in not-for-profit social operations. The Rosetta<br />

Foundation (www.therosettafoundation.org) focuses<br />

on localisation support for NGOs (and other not-for-profit<br />

organisations) using a novel social and collaborative<br />

localisation platform. Emizar (www.emizar.com) focuses<br />

on digital content and personalisation technologies for<br />

customer support. Xcelerator Machine Translations,<br />

through its KantanMT product (www.kantanmt.com),<br />

provides Cloud-based MT technologies automatically<br />

producing highly scalable custom MT engines by<br />

uploading data resources, requiring minimal technical<br />

expertise on the part of the client. Digital Linguistics<br />

(www.digitallinguistics.com) uses stylometrics and<br />

text classification technologies developed in ILT for<br />

translation quality review. Scream Technologies (www.<br />

screamtechnologies.com) offers custom text-to-speech<br />

systems based on ILT technologies. Additionally, spinout<br />

candidate Wripl (www.wripl.com) offers personalisationas-a-service<br />

across websites, drawing on DCM research.


Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 23<br />

Table 3: <strong>2012</strong> IP KPIs against Targets<br />

<strong>CNGL</strong> KPIs <strong>2012</strong> Actuals Targets<br />

Patent applications 1 4<br />

Invention and software<br />

disclosures<br />

10 20<br />

Spin-outs 1 2<br />

Outreach<br />

The <strong>CNGL</strong> Education and Outreach Programme<br />

concentrates on first, second, third and fourth level<br />

education, and outreach to industry and the general<br />

public.<br />

<strong>CNGL</strong> founded the All Irish Linguistics Olympiad<br />

(AILO) in 2009 and has organised the competition since<br />

then. In <strong>2012</strong>, more than 400 secondary level students<br />

participated in the national competitions and the top<br />

four individual students represented Ireland at the<br />

International Linguistics Olympiad (ILO) <strong>2012</strong> in Slovenia.<br />

Promoting African languages in the Information<br />

Society, the University of Limerick’s MSc in Multilingual<br />

Computing and Localisation will be delivered through<br />

distance learning and co-hosted by the United Nations<br />

Economic Commission for Africa at its Information<br />

Training Centre for Africa in Addis Ababa, Ethiopia.<br />

Growing the <strong>CNGL</strong> Research Eco-System<br />

<strong>2012</strong><br />

<strong>CNGL</strong> has been highly successful in attracting<br />

competitive research funding nationally and<br />

internationally, rapidly developing a research eco-system<br />

clustering around core <strong>CNGL</strong> based on a large number<br />

of affiliated EU projects (under the FP7 programme),<br />

SFI-funded programmes, <strong>CNGL</strong> business-development<br />

activities funded through Enterprise Ireland programmes<br />

or direct contract research co-operations. Major currently<br />

active projects are listed in Table 4. These provide further<br />

evidence of the rapid development of the international<br />

research standing and recognition of <strong>CNGL</strong>, as well as<br />

of the relevance and commercialisation potential of the<br />

<strong>CNGL</strong> research programme.<br />

Planning for the Future<br />

With the end of project Year 5 in <strong>2012</strong>, <strong>CNGL</strong> has<br />

now completed its original funding cycle (2007-<strong>2012</strong>),<br />

and is completing a number of key on-going research,<br />

commercialisation and outreach projects in a non-costed<br />

extension in 2013. <strong>CNGL</strong> has been a resounding success<br />

generating (to date) more than 400 peer-reviewed<br />

publications, 21 PhD theses, 39 invention and software<br />

disclosures, 9 patent applications, 4 commercial spin-out<br />

and start-up companies, 1 not-for-profit spin-out, strong<br />

industry-academia partnerships and a total of €15.8m<br />

of additional competitive research, development and<br />

commercialisation funding growing the <strong>CNGL</strong> Research<br />

Eco-System.<br />

At the same time, <strong>CNGL</strong> has been successful in winning<br />

further substantial competitive funding from Science<br />

Foundation Ireland for initially 30 months to continue<br />

<strong>CNGL</strong> into the future with a core grant of €10.5M<br />

(<strong>CNGL</strong>II: March 2013 – September 2016). “<strong>CNGL</strong>II” is<br />

based on an evolution of <strong>CNGL</strong>, expanding its remit<br />

from localisation to a broader focus on Digital Content<br />

Management in a Global Intelligent Content setting<br />

based on the concept of a Global Content Value<br />

Chain, where services interact with content to make<br />

it self-describing, self-aware and self-adapting across<br />

language barriers, modalities and interaction platforms,<br />

tuned to context and user. The <strong>CNGL</strong>II application was<br />

successfully led by Prof. Vincent Wade (TCD), who will<br />

take over as Director of <strong>CNGL</strong> on 1 March 2013.


24<br />

Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />

<strong>CNGL</strong> OVERVIEW<br />

Table 4: <strong>CNGL</strong> Research Eco-System: Income Received from Active Affiliated Research Projects <strong>2012</strong><br />

Project Funding Body €<br />

contribution<br />

(to <strong>CNGL</strong><br />

partner)<br />

QT Launch Pad EC – FP7 477,960<br />

LT-Web – Language Technology in the Web EC – FP7 396,391<br />

CENDARI (Collaborative European Digital Archive<br />

Infrastructure)<br />

EXPERT (EXPloiting Empirical appRoaches to<br />

Translation)<br />

Abu-MaTran (Automatic building of Machine<br />

Translation)<br />

IRCSET Data Mining for Industrial Apps –<br />

PhD Sponsorship<br />

EC – FP7 120,000<br />

EC – FP7 – Marie Currie 481,000<br />

EC – FP7 – Marie Currie 365,966<br />

Phorest 24,000<br />

IRCSET Data Mining for Industrial Apps –<br />

PhD Sponsorship<br />

Irish Research Council for Science,<br />

Engineering and Technology (IRCSET)<br />

48,000<br />

Learning Technology Centre Enterprise Ireland (EI) 3,000,000<br />

EI Commercialisation with Xcelerator Machine<br />

Translations<br />

EI Feasibility Grant – Adaptive Solutions for Patent<br />

Translation<br />

Enterprise Ireland (EI) 152,000<br />

Enterprise Ireland (EI) 15,000<br />

EI Innovation Voucher with Cipherion Translations Enterprise Ireland (EI) 5,000<br />

EI Innovation Voucher with IntelImpact Enterprise Ireland (EI) 5,000<br />

EI Feasibility Study Critical Data Auditor Feasibility<br />

Study<br />

Enterprise Ireland (EI) 8,827<br />

EI Innovation Voucher with FFiG Enterprise Ireland (EI) 5,000<br />

EI Feasibility Grant – Wripl Enterprise Ireland (EI) 15,000<br />

EI Innovation Partnership Programme with Pixalert –<br />

Crital Data Auditor<br />

Enterprise Ireland (EI) 40,400<br />

EI Commercialisation Fund Ata-Bot Enterprise Ireland (EI) 244,381<br />

PoliMon4Cloud Technology Innovation Development Award (TIDA) 76,384<br />

Integrated Software Suite to provide Next Generation<br />

Personalised Multilingual Customer Care<br />

Technology Innovation Development Award (TIDA) 67,748<br />

MT & TM Integration Technology Innovation Development Award (TIDA) 86,427<br />

UNITE (Personalised Cross-site Personalisation) Technology Innovation Development Award (TIDA) 60,000<br />

Iterative Retraining of Machine Translation with<br />

Post-edits to Increase Post-Editing Productivity in<br />

Localisation Workflows<br />

Linguabox: Automated Open Content Repurposing<br />

Service to support Personalized eLearning<br />

iOmegaT – An Instrumented Replayable<br />

Computer-Aided-Translation Tool<br />

Technology Innovation Development Award (TIDA) 99,218<br />

Technology Innovation Development Award (TIDA) 87,768<br />

Technology Innovation Development Award (TIDA) 92,273<br />

5,973,743


Integrated Language<br />

Technologies


Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 27<br />

Strand Name: Integrated Language Technologies<br />

AREA CO-ORDINATORS:<br />

PROF. JOSEF VAN GENABITH, DUBLIN CITY UNIVERSITY<br />

PROF. NICK CAMPBELL, TRINITY COLLEGE DUBLIN<br />

Participant Names and Affiliation<br />

Industrial Collaborators<br />

International Collaborators<br />

Mr. Takeshi Fukunaga<br />

Dai Nippon Printing<br />

Prof. Walter Daelemans<br />

Antwerp, Belgium<br />

Mr. Tom Gray<br />

SpeechStorm<br />

Prof. Mikel Forcada<br />

Alicante, Spain<br />

Mr. John Dixon<br />

Dr. Fred Hollowood<br />

Mr. Paul McManus<br />

Mr. Enda McDonnell<br />

Applied Language<br />

Solutions<br />

Symantec<br />

SDL<br />

Alchemy Software<br />

Development<br />

Prof. Bernd Möbius<br />

Prof. Khalil Sima’an<br />

Prof. Eiichiro Sumita<br />

Prof. Antal van den Bosch<br />

Prof. François Yvon<br />

Stuttgart, Germany<br />

Amsterdam, Netherlands<br />

ATR, Japan<br />

Tilburg, Netherlands<br />

Paris, France<br />

Mr. Phil Ritchie<br />

VistaTEC<br />

Dr. Johann Roturier<br />

Symantec<br />

Mr. Dag Schmidtke<br />

Microsoft<br />

Faculty<br />

Prof. Nick Campbell Trinity College Dublin ILT Co-Leader, ILT2 Leader<br />

Dr. Peter Cahill University College Dublin ILT2 Co-Leader<br />

Prof. Julie Carson-Berndsen University College Dublin ILT2 Co-Leader<br />

Dr. Martin Emms Trinity College Dublin ILT3<br />

Dr. Christer Gobl Trinity College Dublin ILT2<br />

Prof. Qun Liu Dublin City University ILT1<br />

Dr. Dorothy Kenny Dublin City University ILT1<br />

Dr. Saturnino Luz Trinity College Dublin ILT3<br />

Prof. Ailbhe Ní Chasáide Trinity College Dublin ILT2<br />

Dr. Sharon O’Brien Dublin City University ILT1<br />

Prof. Josef van Genabith Dublin City University ILT Co-Leader, ILT1 Leader, ILT3<br />

Dr. Carl Vogel Trinity College Dublin ILT3 Leader<br />

Research Integration Officer<br />

Dr. Declan Groves<br />

Dublin City University


28<br />

Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />

INTEGRATED LANGUAGE TECHNOLOGIES<br />

Postdoctoral Researchers<br />

Dr. Ergun Biçici Dublin City University ILT1<br />

Dr. Joao Cabral University College Dublin ILT2<br />

Dr. Yvette Graham Dublin City University Affiliated<br />

Dr. Ingmar Steiner University College Dublin ILT2<br />

Dr. Erwan Moreau Trinity College Dublin ILT3<br />

Dr. Sara Morrissey Dublin City University ILT1<br />

Dr. Sudip Kumar Naskar Dublin City University ILT1<br />

Dr. Irena Yanushevskaya Trinity College Dublin ILT2<br />

Dr. Xiaofeng Wu Dublin City University ILT1<br />

Dr. Junhui Li Dublin City University ILT1<br />

PhD Students<br />

Mr. Mohamed Abou-Zleikha University College Dublin ILT2<br />

Mr. Zeeshan Ahmed University College Dublin ILT2<br />

Ms. Hala Al-Maghout Dublin City University ILT1<br />

Mr. Pratyush Banerjee Dublin City University ILT1<br />

Ms. Hanna Béchara Dublin City University ILT1<br />

Mr. Sandipan Dandapat Dublin City University ILT1<br />

Mr. Stephen Doherty Dublin City University ILT1<br />

Ms. Amelie Dorn Trinity College Dublin ILT2<br />

Mr. Hector Hugo Franco Penya Trinity College Dublin ILT3<br />

Mr. John Kane Trinity College Dublin ILT2<br />

Mr. Mark Kane University College Dublin ILT2<br />

Mr. Gerard Lynch Trinity College Dublin ILT3<br />

Mr. Alfredo Maldonado Guerra Trinity College Dublin ILT3<br />

Ms. Liliana Mamani Sanchez Trinity College Dublin ILT3<br />

Ms. Neasa Ní Chiaráin Trinity College Dublin ILT2<br />

Mr. Udochukwu Kalu Ogbureke University College Dublin ILT2<br />

Ms. Maria O’Reilly Trinity College Dublin ILT2<br />

Mr. Ankit Srivastava Dublin City University ILT1<br />

Ms. Eva Szekely University College Dublin ILT2<br />

Mr. Christoph Wendler Trinity College Dublin ILT2<br />

Ms. Amalia Zahra University College Dublin ILT2


Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 29<br />

Funding<br />

<strong>2012</strong> Funding from SFI<br />

<strong>CNGL</strong> (07/CE/I1142): €1,064,77<br />

SFI TIDA Award Iterative Retraining of Machine<br />

Translation with Post-edits to increase Post-Editing<br />

Productivity in Localisation Workflows €99,218<br />

SFI TIDA Award MT & TM Integration €86,427<br />

<strong>2012</strong> Funding from Other Sources<br />

van Genabith EU FP7 QT Launch Pad: €477,960<br />

van Genabith: EU FP7 LT Web: €87,290<br />

van Genabith: EU FP7 EXPERT Marie Curie PhD Training<br />

€481,000<br />

Toral: Abu-MaTran EU FP7 PEOPLE €365,966


30<br />

Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />

INTEGRATED LANGUAGE TECHNOLOGIES<br />

Research Overview: Integrated<br />

Language Technologies (ILT)<br />

Goals<br />

Human languages are a core medium for representing,<br />

storing and sharing knowledge and information. The<br />

objective of the ILT track is to perform basic and applied<br />

research in language technologies (LTs) supporting<br />

content processing and management across languages<br />

and modalities (text and speech). ILT1 focuses on<br />

advancing machine translation (MT), ILT2 on speech<br />

input and output as well as speech translation, and<br />

ILT3 on text classification and annotation. The three<br />

groups work closely together on integrated technologies<br />

providing core <strong>CNGL</strong> language-based services.<br />

Research Barriers and Methodologies<br />

to Address Them<br />

ILT1: Machine Translation<br />

Statistical Machine Translation (SMT), in particular<br />

Phrase-Based SMT (PB-SMT such as the Moses<br />

platform), has been a game-changer in both research<br />

and commercial applications of MT. At the same time<br />

SMT is reaching a performance plateau, with disruptive<br />

improvements in translation quality requiring massive<br />

increases in training data. Traditional PB-SMT uses<br />

string-based information. Substantial improvements<br />

are expected through the use of richer (linguistically or<br />

distributionally motivated) signals, including syntactic<br />

and semantic information, in machine learning-based<br />

approaches to MT. Mining and translating noisy usergenerated<br />

content (UGC) is becoming increasingly<br />

important in global business intelligence and customer<br />

support operations. However, UGC is highly challenging<br />

for MT trained on “clean” professionally-edited data. MT<br />

is applied to increasing numbers of domains and text<br />

types. Novel domain adaptation techniques are required<br />

to ensure optimal MT output quality. Improvement in<br />

MT components (such as alignment) can improve overall<br />

MT performance. Most system combination and hybrid<br />

MT approaches can profit from better machine learning<br />

technologies. Technologies need to be developed to<br />

support fully language-independent quality estimation/<br />

prediction (without access to a reference translation)<br />

that treats the MT system as a black box. Finally, optimal<br />

integration of translation technologies requires full<br />

consideration of the human in the loop.<br />

Almaghout et al. (<strong>2012</strong>a, b) show how linguisticallymotivated<br />

sophisticated syntactic information enriching<br />

synchronous context free grammars (SCFGs) can improve<br />

state-of-the-art hierarchical phrase-based SMT (HPB-<br />

SMT) systems. Graham and van Genabith (<strong>2012</strong>) present<br />

a statistical, deep syntax, LFG-based decoder and MT<br />

system. Banerjee et al. (<strong>2012</strong>a) develop a translationquality<br />

driven supplementary training data selection<br />

model for tuning MT to user-generated content. Banerjee<br />

et al. (<strong>2012</strong>b) compare normalisation and supplementary<br />

training data based approaches to MT of UGC. Pecina<br />

et al. (<strong>2012</strong>) present approaches to adapting log-linear<br />

weight vectors to achieve optimal translation for different<br />

domains given a generic training set without retraining.<br />

Tu et al. (<strong>2012</strong>) show how compact representations of<br />

alignment alternatives can improve MT. Dandapat et al.<br />

(<strong>2012</strong>) develop an efficient system combination approach<br />

integrating EBMT, SMT, TM and IR-based technologies.<br />

The Second Workshop and Shared Task on Applying<br />

Machine Learning Techniques to Optimise the Division<br />

of Labour in Hybrid MT (ML4HMT-12) was co-organised<br />

by <strong>CNGL</strong> (van Genabith, Badia, Federmann, Melero,<br />

Costa-jussà and Okita, <strong>2012</strong>) and <strong>CNGL</strong> research teams<br />

contributed four submissions (Wu et al., <strong>2012</strong>; Okital et<br />

al., <strong>2012</strong>a; Okita, et al., <strong>2012</strong>b; Okita, <strong>2012</strong>) to the shared<br />

task. Bicici et al. (2013 accepted for publication) show<br />

how quality prediction can be performed using language<br />

independent features treating MT systems as a black box.<br />

Doherty et al. (<strong>2012</strong>), Doherty and O’Brien (<strong>2012</strong>) and<br />

Doherty and Moorkens (2013) investigate human factors<br />

in translation technology integration using eye-tracking<br />

experiments as well as studies on SMT integration into<br />

translation professional training syllabi.<br />

ILT2: Speech and Machine Translation<br />

The analysis of voice characteristics¸ synthesis of<br />

expressive voices, linking speech with other modalities<br />

(such as facial expressions) and speech-to-speech<br />

translation are some of the core challenges in speech<br />

research.<br />

Kane et al. (<strong>2012</strong>) develop algorithms for automatically<br />

detecting creaky voice and facilitating its inclusion<br />

in speech synthesis. An Invention Disclosure for a<br />

new method for tracking changes in the voice with<br />

applications in speaker identity tracking and emotion<br />

detection has been filed. Székely et al. (<strong>2012</strong>) detects<br />

voice styles in audiobooks and builds synthetic voices for<br />

those voice styles. Abou-Zleikha et al. (<strong>2012</strong>) presents


Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 31<br />

novel work on pitch and duration modelling. Cabral<br />

et al. (<strong>2012</strong>) improve modelling of vocal cord vibration<br />

for better voice quality in speech synthesis. Székely et<br />

al. (<strong>2012</strong>) link synthetic speech voice style and facial<br />

expression. Ahmed et al. (<strong>2012</strong>) develop state-of-theart<br />

phone-based hierarchical phrase-based machine<br />

translation (HPB-SMT) models.<br />

ILT3: Analytics<br />

Language data provide “unstructured” representations of<br />

information. Language technologies (LTs) are required to<br />

automatically extract structure from language data and to<br />

record this structure in the form of mark-up, annotation,<br />

metadata and explicit representations of information<br />

content, across languages and domains. In order to<br />

address these challenges, ILT3 develops sophisticated<br />

classification-based language technologies using a<br />

wide variety of features and approaches in document,<br />

sentence and sub-sentential classification problems in<br />

syntax, semantics and pragmatics, often with a focus on<br />

supporting MT. In addition, ILT3 has a strong focus on<br />

domain adaptation, concentrating in particular on usergenerated<br />

content.<br />

of Dubai (Attia et al. (<strong>2012</strong>a), Attia et al. (<strong>2012</strong>b)<br />

show how a combination of finite state and machine<br />

learning-based technologies can be used to produce<br />

wider coverage lexical resources for Modern Standard<br />

Arabic using the Arabic Giga-Word corpus data, as well<br />

as how spell checking for Arabic can be improved. In<br />

collaboration with the Chinese Academy of Sciences and<br />

New York University, the DCU ILT3 team investigates the<br />

granularity of syntactic information required to improve<br />

sentiment analysis (Tu et al., <strong>2012</strong>).<br />

Text classification developed by Dr. Carl Vogel’s team<br />

has produced two Invention Disclosures and a Patent<br />

Application (application no. 11169673.8-1527) with<br />

the European Patent Office, as well as a commercial<br />

licence for Digital Linguistics, a <strong>CNGL</strong> start-up company.<br />

Moreau and Vogel (<strong>2012</strong>) compare supervised and semisupervised<br />

approaches to MT quality estimation. Lynch,<br />

Moreau and Vogel (<strong>2012</strong>) develop accurate classifiers<br />

to decide whether something is a translation or not. If<br />

it is a translation, Lynch and Vogel (<strong>2012</strong>) predict the<br />

source language. Emms (<strong>2012</strong>), Emms and Franco Penya<br />

(<strong>2012</strong>a, b) explore stochastic tree distance similarity<br />

measures and employ it for semantic role labelling<br />

(Emms and Franco Penya, <strong>2012</strong>c). Maldonado-Guerra<br />

and Emms (<strong>2012</strong>) develop methods to investigate the<br />

complex translation behaviour of multi-word expressions.<br />

Vogel and Mamani Sanchez (<strong>2012</strong>) predict the complex<br />

interplay between emoticons and hedges as social signals<br />

in user fora. The DCU-Paris 13 parsing team won the<br />

Web-Parsing Challenge and Shared Task organised by<br />

Google as part of SANCL-<strong>2012</strong> at NAACL-HLT <strong>2012</strong> (Le<br />

Roux et al., <strong>2012</strong>), using the DCU LORG parser platform<br />

and domain adaptation techniques. In a collaboration<br />

between DCU, Heinrich Heine University in Düsseldorf,<br />

Charles University Prague and the British University<br />

Hector-Hugo Franco-Penya, Dr. Alexandru Ceausu and Dr. Antonio Toral<br />

were among the many participants in the Hadoop Hackathon run by<br />

<strong>CNGL</strong> in March<br />

Year 5 Progress<br />

The final year of the initial funding cycle of <strong>CNGL</strong><br />

(2007–<strong>2012</strong>) has been dominated by strong research<br />

and publication outputs, writing-up of PhD theses<br />

(leading to six successful PhD completions), increased<br />

commercialisation activities translating research outputs<br />

into IP (Invention Disclosures, Patent Applications and<br />

Licences) and considerable time and effort spent on the<br />

<strong>CNGL</strong> final review and <strong>CNGL</strong>II application preparations.<br />

Despite the loss of some members of the research<br />

team who have taken up new positions in industry and<br />

academia, all research tracks in ILT continue to run ahead<br />

of schedule in close collaboration with <strong>CNGL</strong> industry<br />

partners and increased engagements with additional<br />

commercial entities.


32<br />

Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />

INTEGRATED LANGUAGE TECHNOLOGIES<br />

Progress in ILT1: Machine Translation<br />

The ILT1 group continues to have an impressive<br />

publication record. Conference papers have been<br />

accepted at a number of world-renowned conferences<br />

including the Association for Computational Linguistics<br />

(ACL-<strong>2012</strong>, Jeju, Korea), the International Conference<br />

for Computational Linguistics (COLING-<strong>2012</strong>, Mumbai,<br />

India), the European Association for Machine Translation<br />

(EAMT-<strong>2012</strong>, Trento, Italy) as well as the Machine<br />

Translation Summit (AMTA-<strong>2012</strong>, San Diego, CA) and<br />

the Workshop on Statistical Machine Translation (WMT-<br />

<strong>2012</strong>, Montreal, Canada). COLING-<strong>2012</strong> was particularly<br />

successful with a total of 9 MT papers at the main<br />

conference and COLING workshops.<br />

Data-driven statistical MT technologies are able to<br />

provide translations suitable for use in commercial<br />

settings, as evidenced by the dramatic increase in<br />

adoption and provision of MT services in the localisation<br />

industry. The question is no longer whether or not to<br />

use MT technologies, but how best to integrate MT into<br />

localisation and content management workflows. At<br />

the same time, statistical MT, in particular phrase-based<br />

statistical MT (PB-SMT), is reaching a performance<br />

plateau, with disruptive improvements in translation<br />

quality requiring massive increases in training data.<br />

Traditional PB-SMT uses string-based information.<br />

Substantial improvements are expected through<br />

the use of richer (linguistically or distributionally<br />

motivated) signals, including syntactic and semantic<br />

information, in machine learning-based approaches to<br />

MT. ILT1 has made key contributions to this research<br />

challenge, evidenced by <strong>CNGL</strong> publications at EAMT-<br />

<strong>2012</strong>, ACL-<strong>2012</strong> and WMT-<strong>2012</strong> from the DCU MT<br />

group. Almaghout et al. (<strong>2012</strong>) and Li et al. (<strong>2012</strong>a,<br />

b) show how linguistically-motivated sophisticated<br />

syntactic information enriching synchronous context<br />

free grammars (SCFGs) can improve state-of-the-art<br />

hierarchical phrase-based SMT (HPB-SMT) systems.<br />

Graham and van Genabith (<strong>2012</strong>) develop a deep syntax<br />

(Lexical-Functional Grammar)-based statistical MT<br />

system.<br />

With increasing volumes of content being generated<br />

by users (rather than professional writers), the need for<br />

mining and making this content (user fora, blogs, tweets)<br />

available across multiple languages has significantly<br />

increased. Coping with potentially noisy user-generated<br />

content (UGC) presents a major challenge for MT and<br />

novel training data selection models are crucial for tuning<br />

MT models to UGC. Working in close cooperation with<br />

<strong>CNGL</strong> industry partner Symantec, a key DCU MT group<br />

publication at COLING-<strong>2012</strong> (Banerjee et al., <strong>2012</strong>a)<br />

presents a translation-quality driven supplementary<br />

training data selection model for tuning MT to UGC,<br />

while Banerjee et al. (<strong>2012</strong>b) investigate the question<br />

whether text normalisation techniques are more<br />

productive in automatic translation of UGC compared<br />

to adding suitable supplementary training data. Tuning<br />

MT to diverse text types and content domains is a crucial<br />

factor in ensuring optimal quality. In many real world<br />

application scenarios, however, a complete retraining of<br />

the MT system on domain specific training material is not<br />

an option: it may either be too costly or suitable training<br />

material is simply not available. A joint DCU MT group<br />

and Charles University Prague COLING-<strong>2012</strong> publication<br />

(Pecina et al., <strong>2012</strong>) presents approaches to adapting loglinear<br />

weight vectors to achieve improved translation for<br />

different domains given a generic training set without the<br />

need for full retraining.<br />

System combination and hybrid MT can improve MT<br />

quality: in partnership with DFKI (The German Research<br />

Center for Artificial Intelligence) and Barcelona Media<br />

(BM), the DCU <strong>CNGL</strong> MT group organised the Second<br />

Workshop and Shared Task on Applying Machine<br />

Learning Techniques to Optimise the Division of<br />

Labour in Hybrid MT (ML4HMT-12) in Mumbai, India,<br />

as a COLING-<strong>2012</strong> workshop (van Genabith, Badia,<br />

Federmann, Melero, Costa-jussà and Okita, <strong>2012</strong>).<br />

The DCU <strong>CNGL</strong> MT research teams contributed four<br />

submissions (Wu et al., 202; Okita et al, <strong>2012</strong>a; Okita<br />

et al., <strong>2012</strong>b; Okita, <strong>2012</strong>) to the shared task. System<br />

combination is usually most effective when the MT<br />

systems involved are quite diverse. Dandapat et al.<br />

(<strong>2012</strong>) develop an efficient system combination approach<br />

integrating EBMT, SMT, TM and IR-based technologies.


Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 33<br />

Improvements in components of statistical MT systems<br />

can lead to better translation outputs. In a collaboration<br />

with the Chinese Academy of Sciences, Tsinghua<br />

University and New York University, Tu et al. (<strong>2012</strong>) show<br />

how compact representations of alignment alternatives<br />

(rather than using a single alignment) can improve MT.<br />

MT and other translation technologies can only deliver<br />

if full consideration is given to the human in the loop:<br />

Doherty et al. (<strong>2012</strong>) develop and validate a syllabus to<br />

teach translators SMT and related skills. Doherty and<br />

O’Brien (<strong>2012</strong>) examine the usability of MT using eye<br />

tracking and find quality for some target languages to<br />

be as good as the source, but detrimental to the user<br />

experience in others. Doherty and Moorkens (2013)<br />

present an evaluation of teaching translation technology<br />

to translators and identify several hurdles and solutions.<br />

Moorkens et al. (2013) use SMT output to remove<br />

consistencies in TM data and demonstrate resulting<br />

improvements in both TM and SMT quality.<br />

MT quality estimation is the task of predicting the<br />

quality of MT output without access to a reference<br />

translation. Ideally this can be done without access to the<br />

internals of the MT system involved and in a language<br />

independent way, i.e. without relying on languagespecific<br />

resources that may require costly supervised<br />

training. Bicici et al. (2013 accepted for publication) show<br />

how quality prediction can be performed using language<br />

independent features treating MT systems as a black<br />

box. Parts of this research have been submitted as an<br />

Invention Disclosure.<br />

Dr. Sharon O’Brien of DCU and <strong>CNGL</strong> alumnus Dr. Sergio Penkale<br />

of CAPITA pictured at the AMTA-<strong>2012</strong> Workshop on Post-editing<br />

Technology and Practice (WPTP) in San Diego, USA<br />

Progress in ILT2<br />

Although the focus of the PhD students has mainly been<br />

on thesis write-up, the ILT2 Speech Technology research<br />

groups at UCD and TCD have made significant progress<br />

in Year 5. Building on research conducted in previous<br />

years, there was significant further development of<br />

methodologies for analysis of voice characteristics<br />

and for text-to-speech synthesis of expressive voices.<br />

The ILT2 group at TCD has developed algorithms for<br />

automatically detecting creaky voice and provided<br />

mechanisms to facilitate its inclusion in speech synthesis<br />

(Kane et al., <strong>2012</strong>). The progress on this topic is reflected<br />

in two publications at the <strong>2012</strong> Interspeech conference<br />

and in one journal article. John Kane (TCD) has also filed<br />

an Invention Disclosure for a new method for tracking<br />

changes in the voice, which may be deployed in a wide<br />

range of applications from improved speech synthesis to<br />

speaker identity tracking and even emotion detection.<br />

Significant developments on synthesis of expressive<br />

voices were made by researchers at the Speech<br />

Technology Group at UCD. One of the major<br />

contributions looks at exploring the variability in voice<br />

qualities in audiobook corpora by detecting voice styles<br />

in this type of corpora and building synthetic voices for<br />

those voice styles (Székely et al., <strong>2012</strong>). Work on pitch<br />

and duration modelling using novel techniques based<br />

on exemplar-based generation also contributed to the<br />

improvement of the prosodic aspect and expressiveness<br />

of the synthetic speech (Abou-Zleikha et al., <strong>2012</strong>).<br />

Research on modelling other aspects of the voice source<br />

than pitch, using the LF-model to represent the signal<br />

produced by vibration of the vocal cords in human<br />

speech production, has also been further investigated<br />

to permit better control of voice quality in speech<br />

synthesis (Cabral et al., <strong>2012</strong>). One of the outcomes<br />

of the research on expressive speech synthesis is the<br />

WinkTalk system developed at UCD as part of the <strong>CNGL</strong><br />

Demonstrator Programme. This system is a multimodal<br />

speech synthesis platform which links facial expression to<br />

expressive voices (Székely et al., <strong>2012</strong>). It allows the user<br />

to control the voice style of the synthetic speech by facial<br />

expression, with the help of a web camera and tools for<br />

facial expression analysis. Another interesting application<br />

of expressive speech synthesis developed at UCD is its<br />

integration into speech-to-speech translation (Székely<br />

et al., <strong>2012</strong>). The resulting prototype system, FEAST


34<br />

Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />

INTEGRATED LANGUAGE TECHNOLOGIES<br />

(Facial Expression-based Affective Speech) classifies the<br />

emotional state of the user and uses it to render the<br />

translated output in an appropriate voice style.<br />

The successful collaboration between researchers<br />

from UCD (ILT2) and DCU (ILT3) in previous years on<br />

the integration of speech recognition with machine<br />

translation, continued this year with work on phonebased<br />

hierarchical phrase-based machine translation<br />

which results in better performance than conventional<br />

speech translation approaches (Ahmed et al., <strong>2012</strong>).<br />

Progress in ILT3<br />

ILT3 continues to provide mark-up, annotation,<br />

metadata, and knowledge through automatic linguistic<br />

analysis for the discovery, transformation and delivery<br />

of unstructured information across languages. ILT3 has<br />

maintained a strong focus on user-generated and possibly<br />

“noisy” content as found on blogs, forums, tweets and<br />

generally on social media, and continues to expand<br />

close collaboration with industry partners, concentrating<br />

on customer care, event detection and sentiment<br />

tracking scenarios. This strand of research focuses on<br />

text classification and annotation and holds that texts<br />

have infinitely many uses, whereby each sort of use<br />

elicits classification decisions. There is no single form of<br />

annotation that has maximal useful impact across actual<br />

or potential uses. It is frequently useful to make the<br />

annotations within the domain of syntax, in semantics,<br />

and with respect to the pragmatic function; however, it<br />

is not to be expected that each application which has a<br />

need for syntactic labels, for example, will benefit from<br />

the same class of labels or level of detail within a class<br />

(sometimes LFG c-structure with f-structure annotation<br />

is necessary; sometimes part-of-speech tagging of lexical<br />

stems alone is necessary). ILT3 research has addressed<br />

document, sentence and sub-sentential classification<br />

problems in syntax, semantics and pragmatics.<br />

Text classification is a core technology in <strong>CNGL</strong>, the<br />

subject of basic research in extending classification<br />

methods, and applied in various contexts – used in<br />

domain tuning and translation quality assessment (inter<br />

alia). The ILT3 team performs text classification from<br />

the perspective of linguistic theory, testing theories of<br />

language use in conjunction with other strands of ILT<br />

and <strong>CNGL</strong>. Tools developed by Dr. Carl Vogel’s team at<br />

TCD for <strong>CNGL</strong> external purposes and deployed within<br />

our demonstrator activities have formed the basis of two<br />

Invention Disclosures, one collaborative with Mr. Phil<br />

Ritchie of VistaTEC and Dr. David Lewis (<strong>CNGL</strong> SF2).<br />

The IP disclosures have culminated in both a Patent<br />

Application with the European Patent Office (application<br />

no. 11169673.8-1527) “Data processing system and<br />

method for assessing quality of a translation” and a<br />

Commercial Licence of this intellectual property to<br />

Digital Linguistics. This work has been developed further,<br />

first of all by comparing supervised and less-supervised<br />

methods of classification in general for the task of quality<br />

estimation (Moreau and Vogel, <strong>2012</strong>; Moreau and Vogel,<br />

under review) towards identification of parameters<br />

that lead to method preference. Secondly, we have<br />

successfully deployed exactly this method in selecting<br />

items for training MT engines on the basis of similarity<br />

between potential training items and the intended<br />

material for translation. This work is collaborative<br />

with the DCU MT team, and is in the process of being<br />

written for formal peer review. Thirdly, we have studied<br />

base-lines in automated processing of texts produced<br />

by language learners for the identification of particular<br />

error types, such as correct preposition use (Lynch et al.,<br />

<strong>2012</strong>). Finally, we have used automatically discoverable<br />

features in texts to analyse potential translations, with<br />

approximately 80% accuracy in not just the binary<br />

classification problem of deciding whether a text is a<br />

translation or originally written in English, but further,<br />

deciding among potential source languages where the<br />

text is translated (Lynch and Vogel, <strong>2012</strong>). In this case,<br />

the texts were not learner texts but professional literary<br />

translations.<br />

Additional basic advances in text classification methods<br />

have been explored in relation to structural analyses of<br />

sentences comprising texts, and follow-on computation<br />

in relation to the trees that model structural analysis.<br />

Emms (<strong>2012</strong>) explored stochastic tree distances and<br />

their training with expectation-maximisation. Emms and<br />

Franco Penya (<strong>2012</strong>a, <strong>2012</strong>b) establish empirical and<br />

analytical differences between tree-difference metrics<br />

established in the literature for distance and similarity.<br />

Emms and Franco Penya (<strong>2012</strong>c) demonstrate how<br />

mappings between trees can be used for the purposes<br />

of identifying the fillers of semantic roles of predicates.


Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 35<br />

Multi-word expressions (MWEs) are challenging in<br />

text analytics and automatic translation. In the case of<br />

non-compositional MWEs, the meaning of the MWE<br />

is not a conjunction of the meanings of its constituent<br />

parts. This has a strong impact on translation, for<br />

example. <strong>CNGL</strong> research (Maldonado-Guerra, 2011)<br />

on automatically assessing the compositionality of<br />

meaning in fixed-word expressions – collocations) has<br />

been productive: the system exploits the intuition that<br />

a highly compositional collocation would tend to have<br />

a considerable semantic overlap with its constituents,<br />

whereas a collocation with low compositionality would<br />

share little semantic content with its constituents. This<br />

intuition is operationalised via three configurations that<br />

exploit cosine similarity measures to detect the semantic<br />

overlap between the collocation and its constituents.<br />

The system performed competitively in that task. There<br />

are first-order and second-order approaches to vector<br />

encodings of word meanings. Maldonado-Guerra and<br />

Emms (<strong>2012</strong>) consider these systematically, introducing<br />

a matrix multiplication perspective on the 2nd-order<br />

construction, and exploring both the geometry induced<br />

and the performance on supervised and unsupervised<br />

word sense disambiguation/discrimination tasks. In part<br />

led by the matrix multiplication perspective, work has<br />

been carried out on a variety of matrix consolidation<br />

techniques or dimensionality reduction techniques.<br />

On-going work in ILT3 includes, for example, assessing<br />

whether information about linguistic hedges can be<br />

constructively used as a feature that predicts whether<br />

postings in online fora provided by industry partners are<br />

from individuals who ultimately will be rated as forum<br />

leaders. This is a natural development of our success in<br />

this area using Combinatory Categorial Grammar (CCG)<br />

representations of syntactic structures in combination<br />

with n-grams of sub-lexical (orthography and<br />

morphology) features, as well as sentence-level linguistic<br />

features. This work has been successful (Mamani<br />

Sanchez and Vogel, 2013; Vogel and Mamani Sanchez,<br />

<strong>2012</strong>): firstly, we have noted that emoticon use is a kind<br />

of social signal, and significant positive correlations exist<br />

between the use of positive emoticons and propensity for<br />

posts to be rated as useful (and ultimately the withinforum<br />

rank of posters) and the use of negative emoticons<br />

and un-ranked posters (presumably, individuals posting<br />

queries to expert users); secondly, we have noted<br />

interacting effects of the use of linguistic hedges such as<br />

epistemic qualifiers (technical forum users who rate posts<br />

appear to prefer hedged responses).<br />

Parsing web data is challenging due to the scale and<br />

variety of data. To ascertain the current state-of-the-art<br />

with respect to domain adaptation, Google organised a<br />

shared task at the SANCL-<strong>2012</strong> workshop at NAACL-HLT<br />

<strong>2012</strong> (Montreal, Canada). The DCU-Paris 13 parsing team<br />

won the Web-Parsing Challenge and Shared Task (Le<br />

Roux et al., <strong>2012</strong>), using the DCU LORG parser platform<br />

and domain adaptation techniques. Lexical resources<br />

are a crucial ingredient of many LT applications and are<br />

challenging to obtain automatically for highly inflecting<br />

languages such as Arabic. Attia et al. (<strong>2012</strong>a, <strong>2012</strong>b)<br />

show how a combination of finite state and machine<br />

learning based technologies can be used to produce<br />

wide coverage lexical resources for Modern Standard<br />

Arabic (MSA) using the Arabic Giga-Word corpus<br />

together with data crawled from the Al Jazeera web site,<br />

as well as how spell checking for MSA can be improved.<br />

Sentiment analysis is a key task in many LT applications.<br />

The DCU ILT3 team investigates the granularity of<br />

syntactic information required to improve sentiment<br />

analysis (Tu et al., <strong>2012</strong>).<br />

Collaborations<br />

Collaboration is at the core of <strong>CNGL</strong>, including<br />

close engagement with <strong>CNGL</strong> industry partners,<br />

university-based <strong>CNGL</strong> researchers, and international<br />

collaborators as well <strong>CNGL</strong> participation in international<br />

research projects (including EU FP7 funded projects).<br />

Collaboration is also particularly visible in our<br />

demonstrator systems, which draw on and combine<br />

research from the four <strong>CNGL</strong> research tracks focusing on<br />

industry partner needs and requirements.


36<br />

Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />

INTEGRATED LANGUAGE TECHNOLOGIES<br />

Some of the ILT collaboration highlights in <strong>2012</strong> include:<br />

} In partnership with DCU and the National Centre for<br />

Language Technology (NCLT), <strong>CNGL</strong> was successful<br />

in a Marie Curie Mobility grant application for the<br />

EXPERT PhD Graduate School (€481K, DCU PI Prof.<br />

Josef van Genabith) with a total of 15 PhD Marie<br />

Curie fellowships (two of them at DCU) and three<br />

postdoctoral researchers. Led by the University of<br />

Wolverhampton (UK), EXPERT focuses on empirical<br />

approaches to (machine) translation, and as part of<br />

their training PhD students will spend time at DCU’s<br />

EXPERT university and industry partners.<br />

} In partnership with DCU and the National Centre for<br />

Language Technology (NCLT), <strong>CNGL</strong> was successful<br />

in an FP7 SA (Support Action) application called<br />

QTLaunchPad (€426K, DCU PI Prof. Josef van<br />

Genabith). Led by DFKI (German Research Centre<br />

for Artificial Intelligence), QTLaunchPad targets High<br />

Quality MT and funds two postdoctoral researcher<br />

positions at DCU.<br />

project. DCU’s role is around developing word- and<br />

phrase-aligned data resources (including bilingual<br />

dictionaries and transfer grammars) from the<br />

acquired parallel corpora and using this data to<br />

build MT systems.<br />

META-NET aims to mobilise and build a network<br />

between various language technology research<br />

groups within Europe, including commercial<br />

providers of applications and services and other<br />

relevant stakeholders. DCU is heavily involved<br />

in dissemination activities as well as organising<br />

workshops and the provision of data sets and<br />

annotations for the use of machine learning<br />

techniques for MT system combination. In<br />

this way, the project hopes to bridge the gaps<br />

between the machine learning community and<br />

the MT research community. The network is led<br />

by DFKI (Germany). <strong>CNGL</strong> industry partners<br />

DNP, Microsoft, Symantec and Applied Language<br />

Solutions are members of META-NET.<br />

} In partnership with DCU, the National Centre for<br />

Language Technology and international collaborators,<br />

<strong>CNGL</strong> was successful in attracting €1M funding<br />

as lead partner (DCU Lead PI Dr. Antonio Toral)<br />

in the EU FP7 Abu-MaTran project, focusing on<br />

enhancing industry-academia cooperation as a key<br />

aspect to tackle one of Europe’s biggest challenges:<br />

multilinguality.<br />

} <strong>CNGL</strong> and ILT1 in partnership with DCU and the<br />

National Centre for Language Technology, are<br />

continuing their strong engagement in European<br />

EU FP7 Machine Translation projects PANACEA,<br />

CoSyne, PLuTO, and MultilingualWeb-LT as well<br />

as the META-NET/T4ME Network of Excellence:<br />

The CoSyne project focuses on multilingual<br />

content synchronisation for wikis. The project<br />

is led by the University of Amsterdam. DCU’s<br />

involvement centres on diagnostic linguisticbased<br />

evaluation of MT systems between multiple<br />

European languages.<br />

The PANCEA project aims to develop a platform<br />

for automatic, normalised annotation and costeffective<br />

acquisition of language resources<br />

for human language technologies centred on<br />

interoperable web services. The Universitat<br />

Pompeu Fabra (Spain) is co-ordinating the STREP<br />

Pictured at the launch of the META-NET White Paper on The Irish<br />

Language in the Digital Age are its authors including (second from right)<br />

Prof. Ailbhe Ní Chasaide of <strong>CNGL</strong> and (centre) Mr. Dinny McGinley T.D.<br />

Minister for State for the Gaeltacht<br />

The PLuTO (Patent Translations Online) project is<br />

a PSP project focused on delivering a solution for<br />

online patent translation, including the use of MT<br />

and TM technologies tuned to the patent domain.<br />

This project is co-ordinated by DCU, who also look<br />

after research and development of patent-tuned<br />

MT systems for multiple languages.


Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 37<br />

<strong>CNGL</strong>, represented jointly through ILT1, LOC<br />

and SF2 (TCD, DCU and UL) and together<br />

with <strong>CNGL</strong> industry partners Microsoft and<br />

VistaTEC, were successful in attracting funding<br />

for the EU FP7 Support Action MultilingualWeb-<br />

LT, an international consortium with senior<br />

representatives of the translation industry as well<br />

as standards bodies coordinated by Prof. Felix<br />

Sasaki, DFKI, Germany, to support research on the<br />

interoperability of language technologies on the<br />

web by defining new metadata standards.<br />

} Funding from DCU‘s Ireland-India Fund and the<br />

Government of India‘s India-Ireland Cooperative<br />

Science Programme is facilitating collaboration<br />

of <strong>CNGL</strong> with IIIT Hyderabad on English-Indian<br />

translation systems and has enabled DCU to coorganise<br />

a COLING-<strong>2012</strong> workshop on Machine<br />

Translating and Parsing Indian Languages<br />

(MTPIL-<strong>2012</strong>) in Mumbai, India.<br />

} ILT teams have forged extensive international<br />

research collaborations and have published widely<br />

with colleagues from prestigious institutions in many<br />

countries, including China, USA, Czech Republic,<br />

Germany, Spain, UAE, Belgium, Italy and Hungary.<br />

} Dr. Carl Vogel has been increasingly engaged in the<br />

EU COST action IS1004, WebDataNet, on conducting<br />

iScience availing of the opportunities that emerge<br />

from access to raw data and participants in research<br />

via the Internet.<br />

} <strong>CNGL</strong> demonstrator systems are combining research<br />

teams across <strong>CNGL</strong> tracks, partner universities and<br />

industry partners:<br />

KantanMT – Moses on the Cloud: involves close<br />

collaboration between ILT and <strong>CNGL</strong> spinout<br />

Xcelerator Machine Translations Ltd.<br />

PLuTO – Facilitating Patent Search with Machine<br />

Translation: involves active collaboration with the<br />

PLuTO FP7 project at DCU<br />

Rapid MT Retraining: involves tight collaboration<br />

between ILT, SF, the FP7 PANACEA project at DCU,<br />

and the Multilingual Web-LT project<br />

WebWOZ – A Wizard of Oz Platform: involves<br />

close collaboration between ILT and SF<br />

The <strong>CNGL</strong> Demonstrators Programme has<br />

promoted strong collaboration between ILT1 and<br />

ILT2 researchers in the demonstrator Personalising<br />

Speech for Interpersonal Communication<br />

(MySpeech). One highlight of this collaboration<br />

was to use the Wizard-of-Oz framework to conduct<br />

a preliminary evaluation of the MySpeech system<br />

for pronunciation training of foreign languages<br />

(Cabral et al., <strong>2012</strong>).<br />

} ILT3 has strongly collaborated with DCM, ILT1, SF and<br />

<strong>CNGL</strong> industry partners (particularly Symantec and<br />

VistaTEC) and affiliates (Digital Linguistics) on text<br />

classification for particular applications.<br />

} ILT1 and ILT3 have been collaborating closely with<br />

researchers at the National Centre for Language<br />

Technology (NCLT) on using the LFG AA output to<br />

improve MT evaluation, extending the German LFG<br />

AA feature set to improve parsing the German side<br />

of the EuroParl data, improving the LFG-inspired<br />

constituency to dependency conversion, integrating<br />

multi-word expressions in the LFG AA, integrating<br />

MWEs into constituency parsing, and tuning a number<br />

of statistical parsing architectures to user-generated<br />

data (including Twitter data and user forum data).<br />

} ILT1 and ILT3, in collaboration with the National<br />

Centre for Language Technology (NCLT), are<br />

continuing their close research cooperation with<br />

<strong>CNGL</strong> industry partner Symantec on tuning MT and<br />

text analytics technologies to analyse user-generated<br />

content: in addition to the existing collaboration<br />

(Pratyush Banjeree, PhD student with ILT1), Symantec<br />

is funding research on tuning language technologies<br />

to user-generated text in partnership with IRCSET<br />

(Irish Research Council for Science, Technology and<br />

Engineering) through a project involving one PhD<br />

student and one postdoctoral researcher in a project<br />

led by Dr. Jennifer Foster.<br />

} ILT3 (Dr. Carl Vogel) is continuing collaborations<br />

with VistaTEC and Digital Linguistics, including<br />

preparations for joint publications. Engagement<br />

with Microsoft has commenced leading to joint<br />

development of text classification methods and<br />

tools detecting offensive content in user fora<br />

(both linguistic and non-linguistic content) for 2013.<br />

Text Classification for Bulk Localisation Review:<br />

involves active collaboration between ILT3, SF2<br />

and industry partner VistaTEC.


38<br />

Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />

INTEGRATED LANGUAGE TECHNOLOGIES<br />

} <strong>CNGL</strong>, through the DCU MT team in ILT1, is<br />

continuing to work with Prof. Mikel Forcada from<br />

Universitat d’Alacant, Spain following the success of<br />

his Walton Fellowship at <strong>CNGL</strong> in 2010. Prof. Forcada<br />

has continued to work with PhD student Sandipan<br />

Dandapat and visited <strong>CNGL</strong> again in <strong>2012</strong>. Prof. Mikel<br />

Forcada and Prof. Khalil Sima’an from the University<br />

of Amsterdam partner DCU in an EU FP7 application<br />

DELIQAT in the area of High Quality MT.<br />

Year 5 also saw the arrival of PhD students who joined<br />

ILT2. Christoph Wendler and Maria O’Reilly joined the<br />

<strong>CNGL</strong> speech group at TCD in April and May <strong>2012</strong><br />

respectively.<br />

People<br />

Year 5 has been a very dynamic year in terms of arrivals<br />

and departures. With Prof. Andy Way’s departure to an<br />

industry appointment in June 2011 and the time it took<br />

to find a new Professor in Machine Translation, Prof. Josef<br />

van Genabith (<strong>CNGL</strong> Director) assumed the position<br />

of ILT co-track leader (along with Prof. Nick Campbell<br />

of TCD) as an interim arrangement in addition to his<br />

position as <strong>CNGL</strong> Director and ILT1 lead.<br />

Prof. Qun Liu has joined DCU, <strong>CNGL</strong> and the NCLT as<br />

Professor of Machine Translation and leader of the MT<br />

group. Prof. Liu was the Director of the Natural Language<br />

Processing Research Group in the Institute of Computing<br />

Technology at the Chinese Academy of Sciences (CAS)<br />

in Beijing. He has over 150 research publications and his<br />

work is widely cited internationally. He has produced<br />

ground-breaking research in many aspects of statistical and<br />

rule-based machine translation as well as in Chinese word<br />

segmentation and NLP. He has successfully led a large<br />

number research projects at CAS. His research interests<br />

span Chinese Natural Language Processing, Machine<br />

Translation and Information Extraction. Prof. Liu has<br />

quickly embedded in <strong>CNGL</strong> and made key contributions<br />

to an EU FP7 application currently under review.<br />

Some of the 11 visiting MSc and PhD scholars who worked with ILT<br />

during <strong>2012</strong> under <strong>CNGL</strong>’s postgraduate internship programme<br />

Eleven visiting MSc and PhD interns joined ILT over five<br />

months in <strong>2012</strong>, under <strong>CNGL</strong>’s postgraduate internship<br />

programme. The programme enables students to gain<br />

valuable experience as part of a highly-regarded and<br />

continually-growing research centre. This year’s<br />

programme attracted interns from institutions across the<br />

globe, including Italy, France, China and India. The<br />

internships covered a wide range of topics in Natural<br />

Language Processing and Machine Translation.<br />

Dr. Ergun Biçici joined <strong>CNGL</strong>, NCLT and the DCU MT<br />

team as a postdoctoral researcher from Koç University<br />

(Turkey) and is working on regression-based approaches<br />

for MT and parse quality estimation. Dr. Biçici has a<br />

strong background in machine learning and is<br />

contributing key expertise to the <strong>CNGL</strong> research teams.<br />

Dr. Ingmar Steiner joined the ILT2 group at UCD in June<br />

<strong>2012</strong> and worked jointly with the Speech Communication<br />

group at TCD. As of December <strong>2012</strong>, he has moved to<br />

the Computational Linguistics and Phonetics department<br />

at DFKI, Saarbrücken, Germany, as a senior researcher, to<br />

set up an Independent Research Group.<br />

Prof. Qun Liu joined <strong>CNGL</strong> at DCU as Professor of Machine Translation<br />

in <strong>2012</strong>


Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 39<br />

A number of ILT researchers moved on to roles at other<br />

academic institutions or transitioned to industry during<br />

<strong>2012</strong>.<br />

Achievements<br />

Postdoctoral researcher Dr. Junhui Li (ILT1 DCU) took up<br />

a position as postdoctoral researcher at the University of<br />

Maryland (USA), where he is continuing his research on<br />

machine translation.<br />

Dr. Pavel Pecina (ILT1 DCU) accepted a call as an<br />

Associate Professor in Machine Translation to the Charles<br />

University in Prague (Czech Republic).<br />

Dr. Yifan He (former <strong>CNGL</strong> ILT1 PhD student and<br />

MT postdoctoral researcher) accepted a posdoctoral<br />

researcher position at New York University (USA).<br />

Hala Almaghout, Sandipan Dandapat, Pratyush<br />

Banerjee, Ankit Srivastava, Stephen Doherty and Irena<br />

Yanushevskaya successfully defended their PhD vivas in<br />

<strong>2012</strong>.<br />

Dr. Sandipan Dandapat has taken up a lecturing position<br />

at IIT-Guwahati, Assam, India.<br />

After a dedicated contribution to <strong>CNGL</strong> over the past<br />

five years, Dr. Peter Cahill departed the ILT2 group to<br />

become engaged full-time in his spin-out company,<br />

Scream Technologies. His start-up company develops<br />

speech synthesis technology products which have<br />

valuable applications in areas as diverse as video games,<br />

customer support and advertising.<br />

John Kane from ILT2 submitted his thesis in September<br />

<strong>2012</strong> and he is awaiting his defence. Meanwhile, he<br />

departed <strong>CNGL</strong> in October and started a research<br />

position with the Fastnet project, at TCD. The PhD<br />

fellow Amelie Dorn departed TCD in November <strong>2012</strong>.<br />

Stephen Doherty was one of six ILT1 doctoral students to successfully<br />

defend their PhD theses during <strong>2012</strong><br />

Awards and Prizes<br />

} Prof. Josef van Genabith was recipient of the<br />

DCU President’s Research Award for Science and<br />

Engineering <strong>2012</strong>.<br />

} Prof. Carl Vogel and Liliana Mamani Sanchez (TCD)<br />

were awarded a best paper prize for their work<br />

“Epistemic Signals and Emoticons Affect Kudos”<br />

at 3rd IEEE International Conference on Cognitive<br />

Infocommunications in December <strong>2012</strong>.<br />

} Dr. Martin Emms and Hector Franco-Penya (TCD)<br />

were recipients of a best paper award at the<br />

International Conference on Pattern Recognition<br />

Application and Methods (ICPRAM <strong>2012</strong>) in February<br />

<strong>2012</strong>.<br />

} The DCU-Paris 13 team won the Web-Parsing<br />

Challenge and Shared Task organised by Google as<br />

part of SANCL-<strong>2012</strong> at NAACL-HLT <strong>2012</strong> (Le Roux,<br />

Foster, Wagner, Kaljahi and Bryl <strong>2012</strong>), using the<br />

DCU LORG parser platform and domain adaptation<br />

techniques.<br />

} Prof. Josef van Genabith (<strong>CNGL</strong>/NCLT/DCU) has<br />

been appointed as general chair of COLING 2014, to<br />

be held in Dublin in August 2014.<br />

UCD hosts Innovation and Applications in Speech Technology (IAST)<br />

Workshop in March


40<br />

Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />

INTEGRATED LANGUAGE TECHNOLOGIES<br />

Prof. Josef van Genabith delivers his address before accepting the DCU<br />

President’s Research Award for Science and Engineering in February<br />

International Collaborations<br />

The EU FP7 projects are continuing on track, with<br />

PANACEA, PLuTO, CoSyne and T4ME/META-NET<br />

successfully passing their second year reviews. Work<br />

carried out on MT within the PLuTO project on patentlanguage<br />

MT has gained a significant amount of<br />

commercial interest and press coverage. EU FP7 Support<br />

Action MultilingualWeb-LT is running at full strength. The<br />

new EU FP7 support action (SA) project QTLaunchPad<br />

commenced in June <strong>2012</strong> and recruitment for the new<br />

Marie Curie EXPERT PhD programme is under way.<br />

Industry Engagement<br />

During <strong>2012</strong> considerable effort has been placed on<br />

exploring avenues for commercialisation of ILT research<br />

and on developing industrially-relevant prototype and<br />

proof-of-concept systems. In turn, there has been a<br />

significant increase in commercial interest in the research<br />

we are carrying out in <strong>CNGL</strong>.<br />

Technology Innovation Development Award (TIDA)<br />

projects are funded by Science Foundation Ireland to<br />

support the transition of basic research outputs from the<br />

lab to industrial applications, primarily through industry<br />

strength implementations and road-testing in commercial<br />

environments.<br />

ILT1 (MT) has been successful in attracting funding for<br />

two TIDA projects. TMTPrime (Machine Translation<br />

and Translation Memory Integration in a Localisation<br />

Workflow, Dr. Declan Groves, DCU) started in mid-<br />

<strong>2012</strong> and is focusing on developing an industry-strength<br />

application to optimally combine the outputs of Machine<br />

Translation (MT) systems with Translation Memory (TM)<br />

(fuzzy) matches, based on <strong>CNGL</strong> ILT1 basic research<br />

reported in (He et al., 2010). The technology uses<br />

translation quality prediction to recommend either MT<br />

or TM output based on estimated post-editing effort.<br />

The project is particularly important as TMs are still the<br />

main-stay technology in many localisation operations<br />

and pricing models are based on TM reuse. TMTPrime<br />

technology guarantees that the MT/TM combination<br />

will have TM-based pricing as an upper bound, with<br />

potentially substantial savings through the use of MT.<br />

Project partners include DCU, Symantec, VistaTEC<br />

and Welocalize. The second ILT1 TIDA (Dr. Antonio<br />

Toral, DCU) focuses on Iterative Retraining of an MT<br />

System with Post-Edits. This is particularly important as<br />

mistakes in MT output corrected by human professional<br />

translators should be made as available as possible as<br />

additional training material to the MT systems in order<br />

to prevent similar mistakes in future. Two challenges<br />

need to be overcome: (i) full retaining of a statistical MT<br />

system is time consuming and computationally expensive<br />

and (ii) post-edits generally constitute a small amount of<br />

additional data unlikely to sway a substantial statistical<br />

MT model. Both challenges are addressed in the TIDA,<br />

partly based on previous <strong>CNGL</strong> ILT1 basic research<br />

reported in Banerjee et al. (<strong>2012</strong>). Recruitment for the<br />

Retraining TIDA is under way.<br />

Parts of <strong>CNGL</strong>’s MT technology have been successfully<br />

licensed for evaluation to a new spin-out company,<br />

Xcelerator Machine Translations Ltd. Founded by Tony<br />

O’Dowd, previously CEO of <strong>CNGL</strong> industry partner<br />

Alchemy, Xcelerator provides cloud-based MT solutions<br />

to individual translators and mid-sized localisation<br />

service providers through its KantanMT cloud-based<br />

MT platform. The company’s vision is to make machine<br />

translation simple to use for everyone.


Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 41<br />

A former full-time academic with <strong>CNGL</strong> ILT2 at UCD,<br />

Dr. Peter Cahill headed up the <strong>CNGL</strong> spin-out company,<br />

Scream Technologies. It specialises in creating synthetic<br />

voices from human actors, enabling companies to create<br />

human-sounding synthetic speech and control how<br />

it sounds. Dr. Peter Cahill has been named as one of<br />

Ireland’s top technology and start-up leaders.<br />

Continued collaborations between ILT3 and VistaTEC,<br />

Digital Linguistics and Symantec are planned. 2013 will<br />

also deliver direct engagement with Microsoft, through<br />

the deployment of text classification tools developed in<br />

<strong>CNGL</strong>I in the context of <strong>CNGL</strong>II, particularly in the area<br />

of classifying offensive content (this will address both<br />

linguistic and non-linguistic content).<br />

Plans for 2013<br />

For ILT, <strong>2012</strong>, <strong>CNGL</strong> Year 5, was dominated by high<br />

research and publication output, a large number of ILT<br />

PhD students completing, strong industry engagement<br />

and extensive preparations for the second-cycle <strong>CNGL</strong><br />

(<strong>CNGL</strong>II) application and site-review.<br />

ILT technologies will be spread into three key <strong>CNGL</strong>II<br />

themes supporting the Global Content Value Chainbased<br />

architecture of <strong>CNGL</strong>II: ILT3 (Text Analytics) will<br />

move into the <strong>CNGL</strong>II Curation theme, ILT1 (MT) will<br />

move into the Translation and Localisation Theme, while<br />

ILT2 (Speech) will move to the Delivery and Interaction<br />

theme.<br />

2013 will see the completion of a number of ILT-affiliated<br />

EU FP7 projects including the CoSyne and Panacea<br />

STREPs, the META-NET/T4ME Network of Excellence,<br />

and the PLuTO Public Private Partnership, all with key<br />

involvement and successful contributions from project<br />

partner DCU.<br />

At the same time, the ILT-affiliated EU FP7 Support<br />

Action QTLaunchPad will be under full steam in 2013.<br />

QTLaunchPad is charged to develop research and<br />

innovation scenarios including community mobilisation<br />

and technology support for shared tasks in the area<br />

of high-quality machine translation, focusing on novel<br />

quality metrics, quality estimation and targeting specific<br />

MT quality barriers. QTLaunchPad partner DCU is<br />

contributing key expertise. Likewise, the prestigious<br />

EXPERT EU Marie Curie PhD graduate school and<br />

mobility programme was launched at the end of <strong>2012</strong><br />

and PhD candidates will start in early 2013. EXPERT<br />

partner DCU will host 2 PhD students working on<br />

MT system combination and human-centric aspects<br />

of MT technology development. The EU FP7-funded<br />

MultilingualWeb-LT support action involves <strong>CNGL</strong><br />

partners TCD, DCU, UL, Microsoft and VistaTEC,<br />

and continues to focus on developing important<br />

standards and interoperability for multilingual content<br />

management. The EU FP7 Abu-MaTran project (Dr.<br />

Antonio Toral) will tackle the multilingualism challenge<br />

through an Industry-Academia partnership.<br />

The first <strong>CNGL</strong> funding cycle is going into a non-costed<br />

extension phase (December <strong>2012</strong> – November 2013),<br />

completing a small number of <strong>CNGL</strong> research and PhD<br />

projects and preparing and supporting the transition to<br />

<strong>CNGL</strong>II.<br />

ILT1: Machine Translation<br />

Prof. Qun Liu has fully taken charge of the DCU MT<br />

Group and will drive cooperation with research partners<br />

in particular at the Chinese Academy of Sciences as well<br />

as exploring commercial opportunities in the area of<br />

localisation with Chinese industry partners.<br />

Walid Aransa (LIUM, France), Luong Ngoc Quang (LIG, France),<br />

Dr. Antonio Toral (DCU) pictured at the MT Marathon <strong>2012</strong> in<br />

Edinburgh in September


42<br />

Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />

INTEGRATED LANGUAGE TECHNOLOGIES<br />

Commercial activities will continue to provide a strong<br />

focus for ILT1 in 2013, focusing in particular on extended<br />

collaborations with <strong>CNGL</strong> start-up company Xcelerator<br />

and <strong>CNGL</strong> industry partners Welocalize and Symantec.<br />

The TMTPrime TIDA (Dr. Declan Groves) has produced<br />

mature TM/MT combination technologies based on<br />

automatic quality prediction. TMTPrime technologies<br />

will be showcased in global localisation industry events<br />

including GALA (Miami, 2013). The second TIDA on<br />

efficient MT retraining technologies (Dr. Antonio Toral)<br />

will provide new opportunities to immediately use user<br />

feedback (such as post-editing corrections) to improve<br />

MT.<br />

ILT2: Speech<br />

ILT2 will see the completion of on-going PhD theses<br />

on prosody, speech-to-speech translation and emotive<br />

speech. Due to Dr. Peter Cahill’s (UCD) departure in<br />

order to lead the Scream Technologies <strong>CNGL</strong> spin-out<br />

company, the remaining UCD speech group (Dr. Joao<br />

Cabral) will transition to Prof. Nick Campbell’s Delivery<br />

and Interaction theme at TCD early in 2013.<br />

ILT3: Text Analytics<br />

ILT3 will complete documentation of results from the<br />

use of ILT3 text-classification methods in selecting<br />

appropriate items for training MT systems for data sets<br />

with otherwise little directly appropriate material.<br />

The analysis of epistemic markers and social signals<br />

in expert forum contexts has shown promise. ILT3 will<br />

continue to develop these analyses and seek additional<br />

ways to fund further follow-on study, including through<br />

exploitations of the methods developed and conclusions<br />

drawn from the study of linguistic and pragmatic<br />

behaviours in the Symantec user forum.<br />

Participation in text classification tasks is already planned<br />

in areas of spotting predatory contributions in social<br />

networks and other authorship attribution exercises in an<br />

upcoming CLEF shared task.<br />

Work on text classification methods extends into <strong>CNGL</strong>II,<br />

with collaborations planned with VistaTEC, Digital<br />

Linguistics, Symantec and Microsoft.


Digital Content<br />

Management


Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 45<br />

Strand Name: Digital Content Management<br />

AREA CO-ORDINATOR:<br />

PROF. VINCENT WADE<br />

Participant Names and Affiliation<br />

Industrial Collaborators<br />

International Collaborators<br />

Dr. Fred Hollowood<br />

Dr. Johann Roturier<br />

Mr. Jason Rickard<br />

Symantec<br />

Symantec<br />

Symantec<br />

Prof. Helen Ashman<br />

Dr. Prasenjit Majumder<br />

University of Southern<br />

Australia<br />

DAIICT, Gandhinagar India<br />

Mr. Dag Schmidtke<br />

Microsoft<br />

Dr. Alexander Troussov<br />

IBM<br />

Mr. Takeshi Fukunaga<br />

Dai Nippon Printing<br />

Mr. Hideyuki Suzuki<br />

Dai Nippon Printing<br />

Faculty<br />

Dr. Owen Conlan Trinity College Dublin DCM3<br />

Dr. Gareth Jones Dublin City University DCM1 Workpackage Leader<br />

Prof. Declan O’Sullivan Trinity College Dublin DCM2<br />

Dr. Claus Pahl Dublin City University DCM2<br />

Ms. Mary Sharp Trinity College Dublin DCM3<br />

Dr. Tony Veale University College Dublin DCM2 Workpackage Leader<br />

Prof. Vincent Wade Trinity College Dublin DCM3 Workpackage Leader<br />

Postdoctoral Researchers<br />

Dr. Declan Dagger Trinity College Dublin DCM3<br />

Dr. Yanfen Hao University College Dublin DCM2<br />

Prof. Séamus Lawless Trinity College Dublin DCM3<br />

Dr. Johannes Leveling Dublin City University DCM1<br />

Dr. Alexander O’Connor Trinity College Dublin DCM2<br />

Mr. Ian O’Keeffe Trinity College Dublin DCM3<br />

Dr. Melike Sah Trinity College Dublin DCM2<br />

Dr. Dong Zhou Trinity College Dublin DCM1


46<br />

Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />

DIGITAL CONTENT MANAGEMENT<br />

PhD Students<br />

Mr. Yalemisew Mintesinot Abgaz Dublin City University DCM2<br />

Ms. Yi Chen Dublin City University DCM3<br />

Mr. Mourad El Moueddeb University College Dublin DCM2<br />

Ms. Bo Fu Trinity College Dublin DCM2<br />

Mr. Debasis Ganguly Dublin City University DCM3<br />

Mr. Mohammed Rami Ghorab Trinity College Dublin DCM1<br />

Mr. Brendan Spillane Trinity College Dublin DCM2<br />

Mr. Muhammad Javed Dublin City University DCM2<br />

Mr. Kevin Koidl Trinity College Dublin DCM3<br />

Mr. Killian Levacher Trinity College Dublin DCM2<br />

Mr. Guofu Li University College Dublin DCM2<br />

Ms. Wei Li Dublin City University DCM2<br />

Ms. Alejandra López Fernández University College Dublin DCM2<br />

Mr. Walid Magdy Dublin City University DCM1<br />

Mr. Jinming Min Dublin City University DCM1<br />

Ms. Catherine Mulwa Trinity College Dublin DCM3<br />

Mr. Neil Peirce Trinity College Dublin DCM3<br />

Mr. Ben Steichen Trinity College Dublin DCM3<br />

Research Assistants<br />

Mr. David Foley Trinity College Dublin DCM3<br />

Mr. Brian Gallagher Trinity College Dublin DCM3<br />

Ms. Yang Yang Trinity College Dublin DCM3<br />

Funding<br />

<strong>2012</strong> Funding from SFI<br />

<strong>CNGL</strong> (07/CE/I1142): €680,141<br />

SFI TIDA Award – UNITE: Personalised Cross-site<br />

Personalisation €60,000<br />

<strong>2012</strong> Funding from Other Sources<br />

EC FP7 Cendari TCD €120,000<br />

Enterprise Ireland Learning Technology Centre<br />

€3,000,000<br />

SFI TIDA Award – Linguabox: Automated Open Content<br />

Repurposing Service to support Personalised eLearning<br />

€87,768<br />

SFI TIDA Award – An Integrated Software Suite to<br />

provide Next Generation Personalised Multilingual<br />

Customer Care €67,748


Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 47<br />

Research Overview: Digital<br />

Content Management (DCM)<br />

Goals<br />

The key challenge of the DCM research track is to<br />

provide a step change in multilingual digital content<br />

management to enable the delivery of next generation<br />

localisation 2 . DCM focuses on three areas: (i) user query<br />

enhancement; (ii) content metadata and knowledge<br />

model development; and (iii) adaptive content<br />

retrieval and dynamic composition of localised content<br />

(customised for the user’s needs and context of use).<br />

Because users need to gain access to information across<br />

many content boundaries, the DCM research entails not<br />

just traditional corporate content but also open corpus<br />

content, user-generated content (blogs discussion fora,<br />

blogs, wikis) and social networking interactions (tweets,<br />

postings, shared ‘walls’, location check-ins, etc.). The<br />

DCM research track is divided across three work areas,<br />

called DCM1, DCM2 and DCM3:<br />

} Enhancement of user queries based on user context<br />

information and feedback (DCM1)<br />

} Automation and semi-automation of the generation<br />

of knowledge models, metadata and identification of<br />

sentiment required for digital content management<br />

and personalised (re)composition (DCM2)<br />

} Support for dynamic composition of personalised<br />

digital content, customised for the user’s need<br />

and context across such diverse content areas as<br />

corporate, open corpora as well as user-generated<br />

content or content generated via social networking<br />

(DCM3)<br />

This research is integrated across the <strong>CNGL</strong> research<br />

tracks via combined prototypes, experiments and the<br />

<strong>CNGL</strong> Demonstrators. DCM has demonstrated its<br />

ground-breaking technologies within many application<br />

domains such as Personalised Multilingual Customer<br />

Care, Personalised Multilingual Social Networking, and<br />

Personalised Information and Learning Portals, etc.<br />

Such demonstrator systems allow the DCM research<br />

to illustrate the impact of its technology as well as<br />

2 Next Generation Localisation seeks to enable people to interact with<br />

digital content, products, services and each other, in their own language,<br />

according to their own culture, and according to their own personal needs<br />

and preferences.<br />

demonstrate the benefits of integration with all other<br />

<strong>CNGL</strong> research tracks. For example, DCM researchers<br />

collaborate with ILT’s experts on multilingual translation,<br />

speech recognition/synthesis for multimodal operation,<br />

and text analysis for enhanced understanding of the<br />

content).<br />

Research Barriers and Methodologies<br />

to Address Them<br />

With the increasing volume of digital content and<br />

the diversity of sources from which they are created<br />

(e.g. corporate content, user-generated content,<br />

social networking, community content), it is becoming<br />

impossible to discover, manually annotate, slice and<br />

compose appropriate digital content, rendered in the<br />

language and device suited to the intended users.<br />

In addition, next generation localisation is not just<br />

about corporate localisation but must be adapted to<br />

the individual user’s context, languages, preference<br />

and means of access. Therefore, next generation<br />

localisation must not only be adaptive to specific<br />

corporate localisation requirements, but also satisfy<br />

the individual user’s need for information by adapting<br />

it to the context, language, preferences and preferred<br />

delivery device of the individual. DCM research in Year<br />

5 focused increasingly on addressing the problems of<br />

dynamic user-generated (multilingual) content as well<br />

as corporate and open web content. This increased<br />

integration of global social media into the DCM research<br />

is a significant development of <strong>CNGL</strong> research.<br />

The three principal areas of DCM research relate to the<br />

challenges of (i) more accurately identifying and selectively<br />

retrieving appropriate content; (ii) capturing and modelling<br />

knowledge in a structured, reusable way so that the<br />

multilingual, heterogeneous content can be more easily<br />

managed and transformed; and (iii) supporting the user<br />

by harnessing adaptivity/personalisation (based on the<br />

user’s context) to give the user significantly improved<br />

exploration of the information he/she needs. Also<br />

involved in this research is the development of new ways<br />

to evaluate the impact and performance of adaptive<br />

(personalised) systems. A central theme running through<br />

all of these challenges is the need to provide the<br />

information in a form that is tailored to the user’s<br />

requirements, preferences and context, and which<br />

includes not only the direct response to his/her initial<br />

queries, but delivers a unique information presentation<br />

tailored to his/her context, preferences and task.


48<br />

Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />

DIGITAL CONTENT MANAGEMENT<br />

The approach taken in the DCM research is to enhance<br />

and combine key aspects of Adaptive Hypermedia<br />

(AH) and Information Retrieval (IR) research to provide<br />

techniques, technology and prototype systems to<br />

implement advanced retrieval, slicing and adaptive<br />

composition of multilingual digital content. The DCM1<br />

work package addresses the issues of personalised and<br />

contextualised multilingual IR and, more specifically,<br />

query enhancement. DCM1 research includes the<br />

application of cross-lingual techniques to permit users<br />

to gain access to information which is not in their native<br />

tongues. It also focuses on Personalised IR (PIR) to<br />

incorporate the use of user modelling techniques to<br />

alter the behaviour of IR systems. The approach employs<br />

techniques from IR and AH to produce hybrid Adaptive<br />

IR systems.<br />

integrated with adaptive hypermedia composition and<br />

social media aggregation techniques developed within<br />

DCM3.<br />

The focus of DCM2 is on the metadata and knowledge<br />

models required by systems to provide this more<br />

intelligent behaviour. DCM2 includes work on generating,<br />

managing and linking structured knowledge in the form<br />

of ontologies, content knowledge models and metadata<br />

description. The main focus of this work is on addressing<br />

the shortcomings in current work on creating and sharing<br />

metadata between different intelligent systems, slicing<br />

content so as to be more easily reused and recomposed<br />

(for personalisation) and deriving knowledge models to<br />

determine aspects of the content and user context e.g.<br />

sentiment.<br />

Finally, DCM3 focuses directly on recomposing and<br />

aggregating content and evaluating the quality and<br />

impact of adaptive systems. A key aspect of this<br />

challenge is the source of the content. DCM3 investigates<br />

the automatic re-composition and aggregation of<br />

content from corporate information repositories,<br />

open documents, user fora and discussion lists, blogs,<br />

shared community content (wikis), social networking<br />

interactions and social media (tweets, postings, shared<br />

‘walls’, etc.). DCM3 focuses on the aggregation and recomposition<br />

of these different forms of digital content to<br />

provide personalised responses for a user.<br />

Although presented separately above, the three Work<br />

Packages are highly integrated. For example, the<br />

metadata models and knowledge models produced<br />

in DCM2 are utilised in DCM3 and DCM1. Also, the<br />

techniques developed in DCM1 for multilingual query<br />

enhancement and Personalised IR techniques are<br />

DCM undergraduate intern Ciarán Porter of Trinity College Dublin<br />

(above right) presents his work on ‘Crowd Sourcing for Query<br />

Development and Relevance Judgement’ at the <strong>CNGL</strong> undergraduate<br />

intern showcase<br />

Year 5 Progress<br />

DCM research in Year 5 has achieved significant impact<br />

both in the quality of its scientific breakthroughs and the<br />

demonstration of industrial potential. DCM has published<br />

over 30 peer-reviewed journal and international<br />

conference publications this year. Journal publications<br />

include scientific papers in ACM Computing Surveys,<br />

UMUAI, Journal of IR, Web Semantic Journal, while<br />

international conference papers included publications<br />

in ACM Hypertext, SIGIR, AAAI, UMAP, COLING, CIKM,<br />

and TPDL.<br />

Progress in DCM1<br />

The research conducted in DCM1 has continued to<br />

deliver significant advancements in the area of adaptive<br />

information retrieval (IR). These advancements are<br />

achieved by enhancing both the queries a user submits<br />

to a search engine, and the results that are returned.<br />

The research developed by DCM1 utilise contextual<br />

information about individual users and implicit and<br />

explicit feedback to create more accurate or more<br />

appropriate queries, to improve the relevancy of search<br />

results and to tailor the presentation of results to that<br />

individual.


Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 49<br />

Continued progress has been made in enhancing the<br />

existing Personalised Multilingual IR (PMIR) framework,<br />

which employs a number of algorithms to perform both<br />

query and result adaptation. This work is part of the<br />

DCM1 focus on intelligent content discovery and delivery<br />

which is both multilingual and personalised. The PMIR<br />

framework allows multilingual resource discovery and<br />

delivery using on-the-fly machine translation of user<br />

queries and content. The Microsoft Bing API is used<br />

to perform multilingual searches on the open web.<br />

Result lists are then personalised for the individual user<br />

before being presented. The framework is designed<br />

to enable new components and approaches to be<br />

easily integrated and tested as part of an overall search<br />

process. In <strong>2012</strong> the development of the framework has<br />

been completed. The completed framework has been<br />

thoroughly evaluated as part of an experiment with real<br />

users in an authentic search setting. This evaluation<br />

demonstrated improvements in multilingual information<br />

retrieval using query and result adaptation based upon a<br />

multilingual user model. This research has been detailed<br />

in the high-impact Journal of User Modeling and User-<br />

Adapted Interaction, UMUAI (Ghorab et al., <strong>2012</strong>a).<br />

The framework has also been successfully showcased<br />

at the 20th International Conference on User Modeling,<br />

Adaptation and Personalization, UMAP <strong>2012</strong>, in<br />

Montréal, Canada (Ghorab et al., <strong>2012</strong>b). An additional<br />

publication has been submitted to the 22nd International<br />

World Wide Web Conference, for which we are awaiting<br />

review confirmation.<br />

metric for evaluation of patent retrieval effectiveness<br />

developed previously by DCM1, was developed for the<br />

speech retrieval domain as an evaluation metric. PRES<br />

has had continued successful take-up in official patent<br />

retrieval benchmarking tasks at international conferences<br />

and competitions, e.g. CLEF <strong>2012</strong> (CLEF-IP).<br />

DCM research has focused to a larger extent on<br />

processing user-generated queries and content such<br />

as tweets and SMS, as well as processing noisy domainspecific<br />

data. DCM researchers have discovered that<br />

information retrieval tasks on such user-generated<br />

content can benefit from error correction (e.g. from<br />

OCR, spelling errors) and handling domain terminology<br />

(e.g. abbreviations, acronyms, and technical terms).<br />

DCM established a simple but strong retrieval baseline<br />

(without domain adaptation) which would have ranked<br />

among the top five participating groups at the<br />

international TRACMed event 2011.<br />

Collaborative research has continued with DCM3 to<br />

enhance techniques for personalising the web search<br />

using social tagging data. Personalised query expansion is<br />

performed which helps to solve the vocabulary mismatch<br />

problem (Zhou et al., <strong>2012</strong>b). A novel query expansion<br />

framework has been developed which generates<br />

individual user models based upon the data mined from<br />

annotations a user has made and resources the user has<br />

bookmarked on the social bookmarking platform Del.<br />

icio.us. This approach has been extensively evaluated<br />

using test collections created by crawling authentic social<br />

media data from the web. This has resulted in a highimpact<br />

publication in the most high-profile IR venue, the<br />

Journal of Information Retrieval (Zhou et al., <strong>2012</strong>a).<br />

DCM1 has continued research in cross-language IR<br />

and IR for low-resourced languages such as Bengali or<br />

Hindim. A variant of PRES, the patent retrieval score<br />

Prof. Séamus Lawless of <strong>CNGL</strong> presents research on Web Search<br />

Personalization Using Social Data at TPDL <strong>2012</strong> in September <strong>2012</strong><br />

in Paphos, Cyprus<br />

DCM1 researchers have also been involved in the<br />

organisation of various important IR events and<br />

workshops – none more so than the 36th <strong>Annual</strong> ACM<br />

SIGIR Conference, which <strong>CNGL</strong> will host in Dublin in July<br />

2013. SIGIR has significant leadership drawn from <strong>CNGL</strong><br />

(DCM) academics and staff:<br />

} General Chair – Dr. Gareth Jones<br />

} Workshops Co-Chair – Prof. Vincent Wade<br />

} Tutorials Co-Chair – Prof. Séamus Lawless<br />

} Local Organising Chair – Prof. Séamus Lawless<br />

} Publications Chairs – Dr. Liadh Kelly and Dr. Lorraine<br />

Goeuriot


50<br />

Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />

DIGITAL CONTENT MANAGEMENT<br />

(Min et al., <strong>2012</strong>). One experiment explored a novel<br />

method for rewriting textual content to make it fit on<br />

devices with limited screen size (i.e. mobile devices)<br />

while retaining the readability.<br />

DCM has also cooperated with researchers in other<br />

tracks in <strong>CNGL</strong> and commercial partners. DCM1<br />

continued to integrate with research components<br />

produced by DCM2 and 3, as well as components<br />

from ILT, LOC and SF as part of the overall <strong>CNGL</strong><br />

Demonstrator Programme. The ClipArt search demo<br />

system was showcased in the SFI review and at the<br />

Localisation Innovation Showcase in 2011. It was adapted<br />

to mobile devices such as iPhones and iPads to provide<br />

image search on mobile devices and showcased at SIGIR<br />

<strong>2012</strong>.<br />

Dr. Páraic Sheridan and Dr. Gareth Jones of <strong>CNGL</strong> introduce SIGIR 2013<br />

to attendees at SIGIR <strong>2012</strong> in August <strong>2012</strong> in Portland, Oregon, USA.<br />

<strong>CNGL</strong> will host SIGIR 2013 in Dublin in July<br />

In terms of publications, DCM1 has maintained<br />

significant publication success in top journals and<br />

high impact conferences including ACM Computing<br />

Surveys (#1 ranked journal in computer science in the<br />

world), Journal of Information Retrieval, Journal of User<br />

Modeling and User Adapted Interaction, UMAP <strong>2012</strong>,<br />

TPDL <strong>2012</strong>, DocEng <strong>2012</strong>, etc. DCM has continued<br />

research in cross-language IR and IR for low-resourced<br />

languages such as Bengali or Hindim (Ganguly et al.,<br />

<strong>2012</strong>), (Ganguly et al., <strong>2012</strong>b), (Ganguly et al., <strong>2012</strong>c),<br />

(Leveling, <strong>2012</strong>).<br />

A recent research topic in IR are topic models, which can<br />

be used to model topical cohesion in digital content and<br />

to enhance IR effectiveness in general (Ganguly et al.,<br />

<strong>2012</strong>b) (Ganguly et al., <strong>2012</strong>c). On-going work in DCM<br />

aims at improving the user’s search experience, query<br />

formulation, and navigation in search results through<br />

topic model visualisation. DCM research still focuses<br />

on domain adaptation and domain-specific IR. In the<br />

medical domain, we conducted retrieval experiments on<br />

patient records from the TREC medical record retrieval<br />

track.<br />

DCM established a simple but strong retrieval baseline<br />

(without domain adaptation) which would have ranked<br />

among the top five participating groups on 2011 data<br />

(Leveling et al., <strong>2012</strong>). DCM also investigated adaptation<br />

of search to mobile devices (Leveling and Jones, <strong>2012</strong>),<br />

In addition, collaboration with the machine translation<br />

research group in the ILT track investigated the<br />

combination of techniques from information retrieval<br />

and machine translation to speed up fuzzy matching for<br />

machine translation (Leveling et al., <strong>2012</strong>b).<br />

The approaches above have been extensively evaluated<br />

on benchmark data provided by the organisers of TREC,<br />

CLEF, INEX and FIRE as well as on collections created by<br />

crawling the social media data. (Leveling, <strong>2012</strong>), (Ganguly<br />

et al., <strong>2012</strong>) (Leveling et al., <strong>2012</strong>), (Leveling and Jones,<br />

<strong>2012</strong>).<br />

Two of our PhD students have finished their internships<br />

in Microsoft Ireland in the area of multilingual query and<br />

personalisation.<br />

DCM1 researchers also organised various important IR<br />

events and workshops. <strong>CNGL</strong> co-organised the second<br />

workshop on Personalised Multilingual Hypertext<br />

Retrieval (PMHR <strong>2012</strong>) at Web Science <strong>2012</strong>. DCM1<br />

researchers also organised an evaluation task on<br />

personalised and collaborative information retrieval (PIR)<br />

at FIRE <strong>2012</strong>.<br />

A significant number of <strong>CNGL</strong> supervised Masters<br />

dissertations and final year undergraduate projects were<br />

submitted and graded in <strong>2012</strong>. A DCM1-specific Masters<br />

dissertation is currently underway in Trinity College<br />

Dublin under the supervision of Prof. Séamus Lawless<br />

investigating “Selecting Appropriate Verticals for Web<br />

Search Results”.


Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 51<br />

Progress in DCM2<br />

DCM2, the work package concerned with digital content<br />

knowledge modelling, extraction and organisation,<br />

recorded several key successes during <strong>2012</strong>.<br />

The research in structural content analysis for web<br />

slicing has progressed significantly. This has included<br />

the development and evaluation of a slicing tool that<br />

can extract important textual content from open-corpus<br />

web content. The system was evaluated in a successful<br />

user trial, which demonstrated the applicability of the<br />

technique in the area of language learning. This work<br />

will be further developed in 2013 under the SFI TIDA<br />

programme to support language learning through userrelevant<br />

resources harvested from the open web. It is<br />

expected that this research will result in a completed<br />

PhD in early 2013.<br />

The collaboration between DCM2 and DCM3 has<br />

continued in the area of Personalised Multilingual<br />

Customer Care, resulting in an SFI feasibility project,<br />

which developed a commercial-strength version of the<br />

research software. The Emizar system provides users<br />

with a modern, supportive environment for personalised<br />

access to federated content across several support<br />

repositories.<br />

In terms of other collaboration, DCM2 researchers have<br />

continued to work in the Digital Humanities domain,<br />

collaborating with the CULTURA EU FP7 affiliate project,<br />

co-ordinated at Trinity College Dublin.<br />

DCM2 researchers have published at several key<br />

conferences in areas such as eLearning, Hypertext and<br />

Hypermedia, and have had several successful grant<br />

applications, including an SFI Technology Innovation<br />

Development Award (TIDA).<br />

Research and development in DCM2 on lightweight<br />

subject models matured and coalesced in interesting<br />

ways in <strong>2012</strong>. This cohesion was achieved via the<br />

development of the MOODfinger framework for affective<br />

news retrieval. MOODfinger conducts continuous<br />

gathering and indexing of daily news from major web<br />

news sites, and performs affective analysis of each new<br />

story to facilitate future affective retrieval. Lightweight<br />

stereotypical models of familiar ideas are automatically<br />

acquired from the web, and are used to identify the<br />

most interesting and most affect-rich areas of a news<br />

story. These models support powerful affective query<br />

expansion and subsequent affective summarisation of<br />

any retrieved news. Publications on MOODfinger were<br />

presented at top natural language processing (NLP) and<br />

web conferences in <strong>2012</strong>, including ACL <strong>2012</strong> and WWW<br />

<strong>2012</strong>, while the MOODfinger prototype (and related<br />

natural language technologies developed within DCM2)<br />

was also showcased in public demonstrations at these<br />

conferences. MOODfinger represents both a culmination<br />

of work in DCM2 and a sound foundation for future work<br />

in affective text understanding. MOODfinger continues<br />

to be vigorously maintained and developed.<br />

Much of this (MOODfinger) work in DCM2 has focused<br />

on the challenges posed by creative language use (which<br />

is to say, the non-obvious use of familiar words and<br />

ideas). Several publications showcase our achievements<br />

in this area, such as the monograph Exploding the<br />

Creativity Myth: The Computational Foundations<br />

of Linguistic Creativity (T. Veale, from Bloomsbury<br />

Academic) and the collected volume Creativity and the<br />

Agile Mind (principal co-editor T. Veale, from Mouton<br />

deGruyter). We have helped shape European policy<br />

on computational creativity by contributing to expert<br />

consultation sessions with the European Commission,<br />

and have influenced the latest EU ICT call, which now<br />

explicitly lists Computational Creativity as a fundable<br />

objective. Building on work in DCM2, we have secured<br />

EU funding for an international coordination action<br />

to promote the field of computational creativity<br />

(PROSECCO: PROmoting the Scientific Exploration<br />

of Computational Creativity). The project will run for<br />

three years under the scientific leadership of T. Veale<br />

in UCD, and will – through its organisation of contact<br />

forums, summer schools and code camps – serve as a<br />

force magnifier for disseminating the results of DCM2<br />

research. The leadership role of DCM researchers<br />

in the computational creativity community was<br />

further emphasised by UCD’s organisation of the 3rd<br />

International Conference on Computational Creativity<br />

(ICCC <strong>2012</strong>) Dublin in May <strong>2012</strong>, which received<br />

logistical and financial support from <strong>CNGL</strong>.


52<br />

Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />

DIGITAL CONTENT MANAGEMENT<br />

Mr. Seán Sherlock, T.D., Minister for Research and Innovation,<br />

launches the <strong>CNGL</strong>-affiliated Learnovate Centre in June <strong>2012</strong><br />

Katrin Drescher of award sponsors Symantec presents the LRC Best<br />

Thesis Award to Prof. Vincent Wade, who accepts the award on behalf<br />

Dr. Ben Steichen. Also pictured is Reinhard Schäler of LRC/<strong>CNGL</strong><br />

Progress in DCM3<br />

DCM3 is responsible for the development of systems<br />

which provide dynamic aggregation and composition<br />

of content, customised for the user’s need and context.<br />

Such content can be sourced from diverse sources i.e.<br />

corporate knowledge bases, open web content and<br />

user-generated content. DCM research in personalised<br />

multilingual content has resulted in significant<br />

publications as well as industry collaboration. DCM<br />

has progressed both the personalisation and dynamic<br />

aggregation of user-generated content (e.g. blogs,<br />

forum posting, messages), corporate content (e.g.<br />

corporate product manuals, how-to guides, technical<br />

documentation), and open content harvested from the<br />

open web. This research has seen the development<br />

of demonstrators and international evaluation of the<br />

technology across multiple languages and countries. This<br />

research and evaluation has resulted in an international<br />

prize for DCM researcher Ben Steichen and his<br />

supervisor Prof. Vinny Wade (LRC Best Thesis Award<br />

<strong>2012</strong>).<br />

Likewise, the ‘Personalisation as a Service’ research in<br />

DCM 3 has reached maturity with Invention Disclosures<br />

being lodged and evaluation of demonstrators across<br />

multiple third party websites being conducted.<br />

A key impact of DCM 3 research has been the industry<br />

engagement in the evaluation of the technology and the<br />

resultant planning for two <strong>CNGL</strong> spinout companies.<br />

These spinout companies are in the area of Multilingual<br />

Personalised Customer Care (Emizar www.emizar.com)<br />

and Personalisation-as-a-service (Wripl www.wripl.com).<br />

The Wripl cross-site personalisation system has<br />

undergone several refinements, and plugins for major<br />

content management systems platforms including<br />

Wordpress have been released. The Wripl team<br />

has concluded its SFI TIDA-funded programme and<br />

is collaborating with Enterprise Ireland on further<br />

developing the company and its product. From the<br />

research perspective, this work has been successfully<br />

evaluated in several experiments, and it is expected that<br />

a PhD will be completed in early 2013.<br />

The Emizar project will complete its SFI TIDA feasibility<br />

study in 2013, and is collaborating with Enterprise Ireland<br />

to develop the product and company. A full launch and<br />

technology licence agreement are expected in early<br />

2013.


Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 53<br />

Neil Peirce presents his PhD work at the national final of the Thesis<br />

in 3 competition<br />

Industry Engagement<br />

Two of DCM’s PhD students (Rami Ghorab and<br />

Jinming Min) successfully completed their internships<br />

in Microsoft Ireland in area of multilingual query<br />

and personalisation. This Personalised Multilingual<br />

Information Retrieval demonstrator showcased the<br />

enhanced retrieval performance for Microsoft’s Clip Art<br />

collection, and has fully integrated with Microsoft Bing<br />

search and machine translation tools.<br />

Additionally, there is close and on-going collaboration<br />

with Symantec in conducting user trials for the<br />

Personalised Multilingual Customer Care portal. This<br />

has led to <strong>CNGL</strong> DCM researchers making multiple<br />

presentations to senior vice presidents within Symantec<br />

and the planning for a comprehensive trial of <strong>CNGL</strong><br />

technology using Symantec customer care content.<br />

Research in DCM has led to invention disclosures and<br />

one patent application in <strong>2012</strong>. As previously mentioned,<br />

two spinout companies have been planned for <strong>CNGL</strong>,<br />

namely Emizar and Wripl. These spinouts will involve<br />

technologies developed in DCM2 and DCM3.<br />

SFI TIDA funding was sought for a third project<br />

(Linguabox) to investigate the potential for DCM2<br />

technology to support the dynamic slice and rightsizing<br />

of multimedia and user-generated content for learning.<br />

This application was successful and work will begin in<br />

2013.<br />

‘Team wripl’ visits Silicon Valley to connect with local entrepreneurs<br />

and companies. The visit was hosted by the Irish Technology<br />

Leadership Group (ITLG) thanks to wripl’s joint win in the SFI/TIDA<br />

Entrepreneurship course.<br />

Achievements<br />

} DCM research published in over 30 international<br />

journals and conferences in <strong>2012</strong>. Conference<br />

highlights included ACM Hypertext, COLING <strong>2012</strong>,<br />

AAAI <strong>2012</strong>, ACL <strong>2012</strong>, CIKM <strong>2012</strong>, TPDL <strong>2012</strong>, SIGIR<br />

<strong>2012</strong>. Journal highlights include ACM CSUR, UMUAI<br />

and Journal IR publications.<br />

} DCM researchers were involved in the organisation of<br />

various important IR and Personalisation events and<br />

workshops during <strong>2012</strong> including FIRE <strong>2012</strong>, NOMS<br />

<strong>2012</strong>, UMAP <strong>2012</strong>, as well as planning for SIGIR 2013<br />

to be hosted in TCD.<br />

} Prof. Vincent Wade was invited to deliver the keynote<br />

address at ICWL <strong>2012</strong> on Personalisation across Open<br />

Content and Social Media.<br />

} Three PhD students graduated in <strong>2012</strong> in the areas<br />

of Multilingual IR, Adaptive Systems, and Multilingual<br />

Personalisation. A further two students submitted<br />

PhD theses which are currently under examination.<br />

} Two patents are pending from DCM research in the<br />

areas of dynamic content slicing and personalisation<br />

} Significant industry engagement with joint trials<br />

and joint evaluations of multilingual personalisation<br />

technology, e.g. Symantec, Microsoft.


54<br />

Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />

DIGITAL CONTENT MANAGEMENT<br />

} Two SFI TIDA grants were awarded to DCM Principal<br />

Investigators Prof. Vincent Wade and Prof. Owen<br />

Conlan for research in personalisation.<br />

} Prof. Vincent Wade established the Enterprise Ireland<br />

Technology Centre for Technology Enhanced Learning<br />

called Learnovate Centre. This centre, which is allied<br />

to <strong>CNGL</strong>, focuses on content technologies and<br />

innovate communication tools for informal learning in<br />

schools, university and corporate training. DCM has<br />

established close collaboration with the new Centre<br />

and is a means of exploiting <strong>CNGL</strong> research results in<br />

the vertical sector of Learning and Education.<br />

} Dr. Tony Veale was Local Chair for the 3rd<br />

International Conference on Computational Creativity<br />

(ICCC <strong>2012</strong>) at UCD.<br />

} Dr. Tony Veale delivered: a keynote (on creative<br />

uses of WordNets) at an event in Oslo hosted by the<br />

National Library of Norway, an invited talk on affective<br />

stereotype acquisition at the ILIKS event in Toulouse,<br />

and 1-week invited course on linguistic creativity at<br />

an autumn school on Computational Creativity in<br />

Helsinki.<br />

} Two spinout companies – Emizar (Aggregration and<br />

Personalisation of Multilingual open content, usergenerated<br />

content and corporate content for selfservice<br />

customer care) and Wripl (Personalisation-asa-service)<br />

– were progressed for spinout in 2013.<br />

} A new SFI TIDA award has been won by Prof. Wade<br />

for the DCM research in automated slicing of content<br />

for reuse and repurposing. This award will further<br />

the development of the technology for informal and<br />

automated e-learning content.<br />

} Two industry internships were successfully completed<br />

in Microsoft by PhD students from TCD and DCU.<br />

Plans<br />

<strong>CNGL</strong>II will be led by Prof. Wade and DCM will be<br />

principally separated into three research themes in<br />

the new <strong>CNGL</strong>II, namely Personalisation; Delivery and<br />

Interaction; and Search and Discovery. <strong>CNGL</strong>II will<br />

progress the research topics from DCM and build on the<br />

success of the DCM research.<br />

Prof. Séamus Lawless pitched Emizar’s investor-ready technology to<br />

hundreds of potential investors and business partners at Enterprise<br />

Ireland’s Big Ideas Showcase <strong>2012</strong>. Emizar was subsequently profiled<br />

in the Sunday Business Post newspaper<br />

<strong>CNGL</strong>I has been granted a no-cost extension to<br />

complete and provide rigorous evaluation of the DCM<br />

technology. This work (January – November 2013)<br />

will see the completion of a number of DCM PhDs as<br />

well as the trialling and evaluation of DCM technology,<br />

specifically in the areas of the Personalised Multilingual<br />

Information Retrieval Framework, Multilingual User<br />

Models, Personalised Multilingual Customer Care trial,<br />

and Evaluation Framework and Tools for Adaptive<br />

(Personalised) Systems.<br />

Conclusion<br />

<strong>2012</strong> was an extremely productive year for DCM in<br />

the two aspects crucial to <strong>CNGL</strong>, namely scientific<br />

excellence and industry impact. DCM researchers have<br />

also maintained and strengthened their leadership in<br />

the respective research areas, and we have seen DCM<br />

PhD students complete their Doctorates and progress<br />

to positions in industry and academia.


Next Generation<br />

Localisation


Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 57<br />

Strand Name: Next Generation Localisation<br />

AREA CO-ORDINATOR:<br />

MR. REINHARD SCHÄLER<br />

Participant Names and Affiliation<br />

Industrial Collaborators<br />

International Collaborators<br />

Dr. Fred Hollowood<br />

Mr. Enda McDonnell<br />

Mr. Phil Ritchie<br />

Mr. Dag Schmidtke<br />

Symantec<br />

Alchemy Software<br />

Development<br />

VistaTEC<br />

Microsoft<br />

Dr. Lynne Bowker<br />

Mr. José Eduardo de Lucca<br />

Prof. Patrick Hall<br />

University of Ottawa,<br />

Canada<br />

Universidade Federal<br />

de Santa Catarina, Brazil<br />

Professor Emeritus,<br />

Open University, UK<br />

Dr. James Hogan<br />

Queensland University<br />

of Technology, Australia<br />

Mr. Mahesh Kulkarni<br />

CDAC Pune, India<br />

Ms. Stefanie Scheeder<br />

The Rosetta Foundation,<br />

Germany<br />

Mr. Francis Tsang<br />

Adobe, USA<br />

Faculty<br />

Dr. Jim Buckley University of Limerick LOC3 Leader<br />

Ms. Yvonne Cleary University of Limerick LOC1<br />

Mr. J.J. Collins University of Limerick LOC2 Leader<br />

Dr. Chris Exton University of Limerick LOC1 Leader<br />

Dr. Dorothy Kenny Dublin City University LOC2<br />

Dr. Liam Murray University of Limerick LOC2<br />

Dr. Sharon O’Brien Dublin City University LOC2<br />

Mr. Reinhard Schäler University of Limerick LOC1, LOC2, LOC3, PI<br />

Postdoctoral Researchers<br />

Dr. David Filip University of Limerick LOC1<br />

Dr. Eoin Ó Conchúir University of Limerick LOC3<br />

Dr. Ian O’Keeffe University of Limerick LOC2


58<br />

Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />

NEXT GENERATION LOCALISATION<br />

PhD Students<br />

Mr. Solomon Gizaw University of Limerick LOC3.2<br />

Mr. Rajat Gupta University of Limerick LOC2.4<br />

Mr. Joss Moorkens University of Limerick LOC2.2<br />

Ms. Lucía Morado Vázquez University of Limerick LOC1.2<br />

Mr. Aram Morera-Mesa University of Limerick LOC3.3<br />

Mr. Naoto Nishio University of Limerick LOC3.1<br />

Mr. Lorcan Ryan University of Limerick LOC1.1<br />

Mr. Asanka Wasala University of Limerick LOC2.1<br />

Funding<br />

Funding from SFI<br />

<strong>CNGL</strong> (07/CE/I1142): €306,611


Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 59<br />

Research Strand Overview:<br />

Next Generation Localisation<br />

(LOC)<br />

Since its inception in the 1980s, the localisation industry<br />

has been strongly anchored to the expertise and quality<br />

of both industrial and academic assets in Ireland. Many of<br />

the key, defining, elements of the current industry either<br />

originated with or have strong roots in the pioneering<br />

activities of Irish players in the industry. Indeed, to this<br />

day, research is taking place in Ireland in industry and<br />

academia, as well as industrial academic ventures such as<br />

the <strong>CNGL</strong> that is paving the way for the industry to adapt<br />

and evolve as it moves further, deeper into the 21st<br />

century and the challenges that await it going forward.<br />

The Next Generation Localisation (LOC) track in <strong>CNGL</strong><br />

has a mission to produce world-leading research in (i)<br />

localisation content analysis, (ii) localisation component<br />

technologies evaluation, and (iii) service-oriented<br />

localisation architecture solutions, in collaboration with<br />

the academic and the industrial partners in <strong>CNGL</strong> and<br />

beyond, validated by user communities in (for-profit and<br />

not-for-profit) enterprise localisation.<br />

Taking a view that reaches beyond traditional avenues<br />

for profit and expansion, the LOC track focuses on<br />

using this research to ensure that Ireland retains its<br />

status in the field of localisation as this is, as stated by<br />

the independent international review panel, which<br />

conducted <strong>CNGL</strong>’s Mid-Term Review in July 2011, “a<br />

key industry for Ireland” and Ireland “must remain at the<br />

technological forefront in order to retain and grow this<br />

highly remunerative activity”.<br />

LOC’s view that flexible architectures, as investigated<br />

by LOC researchers in the Service-Oriented Localisation<br />

Architecture Solution (SOLAS), are key to future<br />

innovative technology frameworks supporting emerging<br />

and future localisation scenarios was also confirmed by<br />

another independent international review panel, which<br />

conducted <strong>CNGL</strong>’s Final Review (July <strong>2012</strong>) as they<br />

commented that “the SOLAS architecture offers a solid<br />

reference implementation that addresses integration and<br />

workflow issues that companies like Adobe, Dell, and<br />

Intel are currently trying to address on their own”.<br />

Having recruited four “additional high-end professional<br />

programmers” and allocating “more budgets to<br />

workflow”, as recommended by the reviewers in 2011,<br />

LOC work on the development of the Service-Oriented<br />

Localisation Architecture Solution (SOLAS) has continued<br />

apace. With work splitting the solution into two distinct<br />

frameworks, SOLAS Match and SOLAS Productivity, LOC<br />

is developing, in parallel, solutions that will cover both<br />

the needs of the traditional return on investment-based<br />

localisation industry, and the increasingly important<br />

non-profit and non-market localisation communities.<br />

Indeed, it is this approach that has led the independent<br />

review panel to note in the <strong>CNGL</strong> final review that<br />

LOC and by extension The Rosetta Foundation spinoff<br />

are pioneering “a novel, comprehensive localization<br />

model for organizations seeking to translate content<br />

for underserved communities. The panel feels this<br />

accomplishment has great societal impact that<br />

transcends the boundaries of Ireland and even the EU.”<br />

The views of the international experts on both of these<br />

panels reflect the view of industry thought leaders<br />

consulted by LOC at conferences, such as GALA,<br />

Localisation World and, most recently, the LRC’s 17th<br />

<strong>Annual</strong> International and Localisation Conference.<br />

The following is a brief summary of the LOC track’s vision<br />

and goals agreed and realised in <strong>2012</strong>.<br />

Vision<br />

We empower innovative community and social<br />

localisation efforts driving the most significant growth<br />

opportunity for the industry.<br />

Goals<br />

} Provide content authors with feedback on the quality<br />

(localisability) and re-usability of their content,<br />

demonstrating the impact of good/bad quality source<br />

content on the localisation effort, specifically in the<br />

context of user-generated content<br />

} Assess and evaluate component technologies for<br />

SOLAS, demonstrating the suitability and adaptability<br />

requirements for components, specifically in the<br />

context of community and social localisation<br />

enterprise


60<br />

Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />

NEXT GENERATION LOCALISATION<br />

} Develop the Service-Oriented Localisation<br />

Architecture Solution (SOLAS) as<br />

1. A demonstrator and testbed for innovative<br />

localisation solutions, demonstrating the<br />

innovative aspects of this framework in relation to<br />

existing mainstream paradigms, especially in the<br />

context of the emerging collaborative and social<br />

localisation enterprise.<br />

2. A unique suite of open source technologies that<br />

will be made available to service the needs of all<br />

clients that would require flexible, efficient and<br />

fully standards compliant localisation solutions.<br />

3. The de-facto localisation and translation<br />

technology for not-for-profit and development<br />

localisation activities.<br />

} Integrate <strong>CNGL</strong>, third party and open source<br />

components into SOLAS<br />

} Publish research results in world-leading journals, both<br />

related and localisation-specific, according to agreed<br />

targets<br />

} Continue to provide a forum for the publication of<br />

high-impact, innovative and scientific localisation<br />

research through the indexed, peer-reviewed and<br />

dedicated Localisation journal, Localisation Focus –<br />

The International Journal of Localisation<br />

} <strong>Report</strong> on research activities and solicit feedback<br />

at world-leading conferences, both related and<br />

localisation-specific, according to agreed targets<br />

} Actively contribute to and provide leadership<br />

for international localisation initiatives (industry<br />

associations, standards groups, conferences)<br />

} Expand the open source SOLAS code repository<br />

} Build large and significant developer and user<br />

communities around the LOC effort within <strong>CNGL</strong> and<br />

beyond, according to agreed targets<br />

} Demonstrate the industrial value and impact of the<br />

LOC research by active user engagement and trials<br />

with reference to agreed metrics<br />

} Work with <strong>CNGL</strong> towards a re-allocation of budgets to<br />

support a targeted SOLAS demonstrator development<br />

effort<br />

Fundamental Research Barriers and<br />

Methodologies to Address Them<br />

In order to convince large multinational content<br />

publishers to join open standards-based industrywide<br />

initiatives, small and medium-sized publishers<br />

to invest in state-of-the-art technologies, and nonprofit<br />

organisations to take advantage of a localisation<br />

framework, what is required is a solution that is scalable,<br />

modularised, interoperable and affordable. What is<br />

required is a demonstrator framework capable of<br />

delivering proof that the vision of an open localisation<br />

framework can be achieved. The risks involved in<br />

building such a system are considerable. Leading<br />

global management systems have been developed by<br />

companies such as Idiom and GlobalSight (Ambassador).<br />

However, while they aimed to be comprehensive, they<br />

were not; for example, some services such as machine<br />

translation (MT) never became part of the core offering<br />

of these systems; additional service modules required by<br />

customers can generally not be integrated (and even if<br />

they can, then only backed up by significant investment);<br />

and re-configuration of workflows and adaption to<br />

increasingly dynamic localisation environments are often<br />

connected with prohibitive costs. While these systems<br />

attracted significant investment for their development<br />

(in the region of $50 million in some cases), they never<br />

realised their projected market potential and return on<br />

investment.<br />

Although existing systems demonstrate a good<br />

understanding of basic technologies required for a stable<br />

corporate localisation framework, our research has<br />

shown that their overall architecture is not suitable as<br />

the backbone for a modularised, extensible and dynamic<br />

framework, to enable seamless data flows, and to allow<br />

for the automatic configuration and execution of tasks.<br />

During Years 3 and 4 of <strong>CNGL</strong>, the original <strong>CNGL</strong> Bulk<br />

Localisation Workflow (BLW) demonstrator and the work<br />

within the Next Generation Localisation research area<br />

produced a first version of a service-oriented localisation<br />

architecture solution (SOLAS) that addresses the need<br />

for an open, highly-configurable, loosely-coupled<br />

aggregation of heterogeneous services that can meet the<br />

varying demands of the enterprise, SMEs and the nonprofit<br />

sector. At the same time, it facilitates organisations<br />

with software engineering competencies to leverage the<br />

provided infrastructure encapsulated in the demonstrator


Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 61<br />

framework and tailor it to their specific needs through<br />

further component development. During Year 5 of<br />

<strong>CNGL</strong>, work on this demonstrator was branched off<br />

into two parallel streams that would allow more rapid<br />

development, deployment and testing and also coverage<br />

of additional use-case scenarios and closer integration of<br />

cross-<strong>CNGL</strong> research area component technologies, as<br />

well as connections with third party technologies, such as<br />

commercial MT and established web technology systems.<br />

This branching allows the two resultant technologies,<br />

SOLAS Productivity and SOLAS Match, to move even<br />

further away from the family of platforms whose large<br />

footprints have often proven cost prohibitive, and refine<br />

the Service-Oriented Architecture (SOA) philosophy that<br />

enables the development of a component marketplace<br />

for the platform. SOLAS Productivity makes use of a<br />

standardised data container, open web service APIs,<br />

and a common orchestration and process management<br />

module, which connect to any number of component<br />

technologies developed by academic and industrial<br />

partners within <strong>CNGL</strong> as well as with third party<br />

technologies and tools. SOLAS Match provides groundbreaking<br />

and intuitive technology that allows for the<br />

seamless, and user friendly, matching of community<br />

translation tasks with volunteer translators. This open<br />

source technology revolutionises the distribution<br />

and management of translation tasks using simplified<br />

web interfaces matched with sophisticated back-end<br />

technologies. The use of SOLAS Match increases speed<br />

and reduces overhead for these translation tasks and<br />

as such is perfectly positioned to be adopted by any<br />

number of not-for-profit and non-market localisation<br />

organisations.<br />

In SOLAS technologies, researchers have gained access<br />

to a common standards-based and interoperable open<br />

source localisation eco-system for their research, similar<br />

to those available to the MT communities with Moses<br />

and to the speech communities with platforms such<br />

as the Festival Speech Synthesis System or the MuSE<br />

speech technology platform. SOLAS is the first working<br />

innovation platform developed in its entirety within<br />

<strong>CNGL</strong>.<br />

Research Strand Overview: Next Generation<br />

Localisation<br />

In LOC, research concentrates on the improvement<br />

of key areas of localisation automation, such as the<br />

construction of a common, standards-based data<br />

model to develop, process and maintain localisation<br />

knowledge (LOC1) (Ryan, 2010; Morado Vázquez<br />

and Mooney, 2010; Anastasiou and Morado Vázquez,<br />

2010; Anastasiou, 2010); the interoperability of suitable<br />

tools and technologies, the assessment of quality<br />

measurement methodologies, and the facilitation of<br />

crowdsourcing and collaboration (LOC2) (Nishio et al.,<br />

2010; Wasala et al., 2010; Gupta and Aouad, 2010; Exton<br />

et al., 2010); and the modelling of intelligent localisation<br />

processes, workflows and process management (LOC3)<br />

(Filip and O’Conchúir, 2011; Lenker et al., 2010; Lenker,<br />

2010; Lenker and Anastasiou, 2010). The availability<br />

of a demonstrator system has been a pre-requisite for<br />

advancing this research and for measuring its success.<br />

The Service-Oriented Localisation Architecture Solution<br />

(SOLAS) has become an important focus for research in<br />

LOC for several reasons (Aouad et al., 2011; Ó Conchúir,<br />

2011). It offers a common standards-based (meta-)<br />

data container, web services API for Next Generation<br />

Localisation communication and connectivity, and an<br />

orchestration and process management module all<br />

shared across the framework (Morado and del Rey, 2011;<br />

Morado et al., 2011). Component technologies from<br />

industrial partners and third parties as well as research<br />

components coming from across <strong>CNGL</strong> (Wasala et<br />

al., 2011) can be integrated into SOLAS with relative<br />

ease, demonstrating in very real terms the benefits of<br />

individual components in an end-to-end localisation<br />

workflow, as well as providing a showcase for cross-<strong>CNGL</strong><br />

industrial and academic collaboration. While SOLAS<br />

origins lie in our research around the development of a<br />

demonstrator system for bulk localisation workflows, it<br />

is transcending this narrow field and is aiming to offer<br />

frameworks for a whole open localisation eco-system,<br />

addressing the needs not just of commercial large and<br />

medium-sized enterprises but also those of non-profit<br />

organisations which require solutions that can easily<br />

adapt to new languages, actors and workflows in a<br />

highly collaborative and dynamic environment.


62<br />

Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />

NEXT GENERATION LOCALISATION<br />

Initially, against this background, the main objective<br />

was to develop a heterogeneous loosely coupled<br />

platform. This is achieved through Component-Based<br />

Development (CBD) techniques where SOLAS integrates<br />

components that are connected through web services<br />

to realise a Service-Oriented Architecture (SOA). These<br />

concepts are also capable of operating in a stand-alone<br />

mode, further increasing the flexibility of this approach.<br />

The architecture also permits the easy integration of any<br />

future component developments from across <strong>CNGL</strong>.<br />

(XLIFF TC), GALA, and Localization World, including<br />

diversification of funding (Canada Research Council)”. A<br />

Partner Group with The Rosetta Foundation was created<br />

to raise the visibility and to develop the involvement of<br />

enterprises (for-profit and not-for-profit) collaborating and<br />

supporting community translation efforts, to find ways<br />

to connect this effort to economic criteria that resonate<br />

with industrial partners, to look for new enterprise<br />

partners, and to seek alliances with non-profit and other<br />

organisations to promote these efforts.<br />

In this regard the initial SOLAS technology was<br />

demonstrated during the <strong>CNGL</strong> SFI Mid-Term Review<br />

of 2011, at the Localisation Innovation Showcase 2011<br />

at Croke Park, as well as at the Autumn Scientific<br />

Committee Meeting in TCD. It generated significant<br />

interest from industrial collaborators and invited industry<br />

representatives, multinational publishers, SMEs, and the<br />

non-profit and government sector.<br />

However, as research and development progressed on<br />

SOLAS in <strong>2012</strong>, and as collaboration with The Rosetta<br />

Foundation deepened, leading to the granting of an<br />

exclusive licence for SOLAS Match (aka Translation<br />

eXchange) to The Rosetta Foundation by UL, it became<br />

apparent that there was potential for more than what<br />

was detailed in this initial offering. The decision was<br />

made to branch SOLAS development into two distinct<br />

yet connectable technologies. SOLAS Productivity, which<br />

is a continuation of the initial technology path as detailed<br />

above and SOLAS Match, a new paradigm for enabling<br />

volunteer translation and localisation through intuitive<br />

and user-friendly interfaces backed by dynamic and<br />

powerful backend technologies.<br />

The collaboration with The Rosetta Foundation and the<br />

move of <strong>CNGL</strong> IP generated by LOC researchers to the<br />

Foundation with its 2,600+ volunteers has been very<br />

successful (Wasala et al., 2011). Uptake and trials of<br />

<strong>CNGL</strong> output by the Foundation provide very valuable<br />

feedback to <strong>CNGL</strong> researchers and evidence of the value<br />

of this output to potential commercial parties, especially<br />

in the SME sector. As noted by the international<br />

independent review panel in <strong>CNGL</strong>’s Final Review of<br />

<strong>CNGL</strong> (July <strong>2012</strong>), further evidence of the value of this<br />

collaboration comes from the increase of “International<br />

reach and exposure to government and industry outside<br />

of the usual Ireland and EU-centric bodies: the creation<br />

of AGIS conferences, growing presence at W3C, OASIS<br />

Other Relevant Work in the Field and How<br />

This Compares<br />

There are commercial efforts under way to develop<br />

proprietary automated localisation platforms integrating<br />

process automation and management functionality<br />

with localisation and translation automation, such as<br />

terminology management, translation memory systems<br />

and machine translation. Large multinational content<br />

publishers, among them Oracle, SAP and Microsoft, have<br />

demonstrated the commercial viability of such solutions<br />

with their proprietary in-house solutions. However, they<br />

have also shown the limits of proprietary solutions and<br />

have started exploring ways to connect their proprietary<br />

systems with third party tools and technologies; one<br />

example is that of the open XML-based Localisation File<br />

Format (XLIFF) and the Microsoft proprietary Localisation<br />

Exchange Format (LCX) as reported at the LRC XV<br />

conference in 2010 by Microsoft and LOC researchers<br />

(Wasala et al., 2010). Oracle also presented its usage<br />

of XLIFF in its localisation strategies at the LRC XVI<br />

conference in 2011. At FEISGILT <strong>2012</strong> it became known<br />

that, based in large part upon the research initiated by<br />

Wasala et al. (2010 and <strong>2012</strong>), Microsoft will be adopting<br />

XLIFF as a primary file format going forward.<br />

In this regard <strong>CNGL</strong> research is at the forefront of many<br />

industry concerns with the SOLAS platform representing<br />

a head-start with its highly innovative approach to<br />

addressing a wide variety of localisation requirements<br />

that, as noted by the <strong>2012</strong> international independent<br />

review panel, “companies like Adobe, Dell, and Intel are<br />

currently trying to address on their own”.<br />

SOLAS is the first open, standards-based framework of its<br />

kind in the localisation space anywhere in the world. It<br />

already provides an integrated plug-and-play framework<br />

for configurable component technologies to interoperate,<br />

and as it continues to be developed and refined, SOLAS


Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 63<br />

Productivity will allow the seamless connection and<br />

integration of complementary technologies into a core,<br />

functional and industrial-scale platform which itself is<br />

highly modular and extensible, while SOLAS Match will<br />

redefine the technological landscape of volunteer and<br />

development localisation with its open source translation<br />

space.<br />

Achievements<br />

Work Package LOC1<br />

The overall aim of LOC1 is to embed internationalisation<br />

and localisation issues into the design and development<br />

cycle of digital content production (Ryan, 2010), moving<br />

localisation up the value chain. The Work Package<br />

is divided into two sections, LOC1.1 Digital Content<br />

Production for Localisation and LOC1.2 Localisation<br />

Knowledge – Capture, Organisation, Use.<br />

LOC1 has produced highly innovative research results<br />

into an XLIFF-based (meta-)data container tasked<br />

with identifying, classifying and leveraging localisation<br />

knowledge encapsulated in previous processes<br />

(Anastasiou and Morado Vázquez, 2010). The result is<br />

a localisation memory container (LMC), conceptually<br />

similar to the established translation memory technology,<br />

but focused directly on localisation rather than “just” on<br />

translation requirements (Morado Vázquez and Mooney,<br />

2010). The LMC will improve the quality and consistency<br />

of the localisation process itself and minimise errors in<br />

the final product. This work is closely linked to ILT and<br />

SF2 (data access, exchange and integrity issues).<br />

LOC1 researchers have also produced highly innovative<br />

research into the benefits of the development of a<br />

(meta-) data container, the Localisation Knowledge<br />

Repository (LKR) (Ryan, 2010; Ryan, 2011). The highly<br />

innovative LKR developed as part of this research is<br />

based on a localisation taxonomy that allows the storage,<br />

maintenance and reuse of localisation-relevant data<br />

during content development.<br />

Lucía Morado Vázquez (LOC1.2) successfully passed her<br />

PhD viva in September. Lucía completed her PhD under<br />

the supervision of Reinhard Schäler. Lucía has now taken<br />

up a postdoctoral position at the Multilingual Information<br />

Processing Department at the Faculty of Translation and<br />

Interpretation, University of Geneva.<br />

Pictured at the launch of UL’s MSc in Multilingual Computing and<br />

Localisation co-hosted by the UN in Africa are (L-R) Solomon Gizaw,<br />

<strong>CNGL</strong>, Reinhard Schäler, <strong>CNGL</strong>, Prof. Don Barry, President, University<br />

of Limerick and Ms. Aida Opoku-Mensah, United Nations Economic<br />

Commission for Africa (UNECA)<br />

The research into internationalisation and localisation<br />

knowledge leveraging aims to increase the quality,<br />

consistency and accessibility of content throughout the<br />

localisation process. It addresses the needs for standards<br />

and guidelines to content developers. In an environment<br />

that is increasingly dealing with (often) low quality, usergenerated<br />

content, this will facilitate the preparation<br />

of content that is more usable and readable for source<br />

language speakers, and more translatable for localisation<br />

professionals and technologies. The guidelines are<br />

sourced from both academic research and industrial<br />

best practices. LOC1 also has two representatives on the<br />

XLIFF Technical Committee.<br />

Three articles by Lorcan Ryan – on ‘Global Authoring<br />

Techniques’, ‘Global Diversity and Localistion Issues’<br />

and ‘Global Authoring Resources’ – were published in<br />

Communicator during <strong>2012</strong>.<br />

Work Package LOC2<br />

The Work Package is divided into four sections, LOC2.1<br />

Addressing the Problem of Interoperability in Localisation<br />

Process Management, LOC2.2 Technology Evaluation<br />

– The User Perspective, LOC2.3 Service Descriptor<br />

Development (Web Services) and LOC2.4 Collaborative<br />

Localisation Platform: Crowdsourcing.


64<br />

Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />

NEXT GENERATION LOCALISATION<br />

LOC2 addresses quality assessment of translations in<br />

a crowdsourced and distributed localisation context<br />

(Gupta and Aouad, 2010; Exton et al., 2010; Anastasiou<br />

and Gupta, 2011). The specification of evaluation metrics<br />

is specifically targeting quantitative and qualitative<br />

evaluation of translation memories (TMs) in order<br />

to verify the existence of inconsistency propagation<br />

(Moorkens, 2011a; Moorkens, 2011b). It also addresses<br />

more general metrics in evaluation methodologies<br />

throughout the localisation process. Joss Moorkens<br />

successfully defended his PhD thesis, entitled “Measuring<br />

Consistency in Translation Memories: A Mixed-Methods<br />

Case Study”, in July. Joss was supervised at DCU by<br />

Dr. Dorothy Kenny and Dr. Sharon O’Brien.<br />

Finally, research is also being carried out in the area of<br />

cultural adaptation, with a particular focus on multimedia<br />

content, and how this might be supported in interchange<br />

formats such as XLIFF (O’Keeffe, 2011b).<br />

Dr. David Filip played a central role in the organisation<br />

and delivery of the inaugural FEISGILTT event, which<br />

took place on 16th-17th October in Seattle, USA. The<br />

FEISGILTT <strong>2012</strong> (Federated Event for Interoperability<br />

Standardization in Globalization, Internationalization,<br />

Localization, and Translation Technologies) brought<br />

together experts from the language services industry,<br />

R&D labs that are exploring new interoperability<br />

solutions, and the various standards bodies instrumental<br />

in making such solutions accessible as conformable<br />

specifications. It offered a neutral venue where these<br />

stakeholders exchanged knowledge and experiences<br />

and discussed future directions for addressing the<br />

interoperability challenges facing the industry. FEISGILTT<br />

incorporated the 3rd International XLIFF Symposium.<br />

Lucía Morado Vázquez, Aram Morera Mesa, Dr. Chris Exton and Karl<br />

Kelly pictured at the LRC Summer School <strong>2012</strong>. The theme of this year’s<br />

Summer School was Mobile Application Development and Localisation<br />

LOC2 is addressing component and data interoperability<br />

in order to allow an efficient information exchange<br />

specifically through the specification and use of<br />

standardised metadata (Wasala et al., 2010). Research<br />

from this work package continues to drive the<br />

development of several components within SOLAS as<br />

well as feeding back into ILT (development of automated<br />

translation technologies) and SF2. LOC2 is also specifying<br />

templates for supporting service descriptions necessary<br />

for Service Level Agreements between localisationoriented<br />

service providers and consumers. Web Services<br />

contract negotiation and agreement protocols will then<br />

be used to map abstract localisation units into concrete<br />

services and components (Nishio et al., 2010).<br />

Dr. David Filip has also led work on Internationalization<br />

Tag Set (ITS) Version 2.0 as co-chair of the<br />

MultilingualWeb-LT (Language Technology) Working<br />

Group. The Working Group aims to develop new W3C<br />

(World Wide Web Consortium) standards to support<br />

the translation and adaptation of Web content to local<br />

needs, from its creation through to its delivery to end<br />

users. By so doing, the new standards will help to remove<br />

language barriers to international trade and facilitate the<br />

free flow of information across language borders.<br />

At the <strong>CNGL</strong> Localisation Innovation Showcase in<br />

Limerick in September, Dr. David Filip demonstrated<br />

the <strong>CNGL</strong> demonstrator system CMS-LIONSolas Integration: Full Content Lifecycle Metadata<br />

Interoperability TestBed. Developed in collaboration with<br />

the SF track, this is a unique platform for testing complex<br />

metadata designs spanning process areas over the full<br />

multilingual content life cycle. David showed how a RDFbased<br />

provenance store is used between Web Content<br />

Management System (CMS) and XLIFF-based translation<br />

workflows. This demonstrates use cases for the roundtripping<br />

of Internationalisation Tag Set (ITS) metadata<br />

between content generation and publication in HTML5/<br />

XML and localisation processes in XLIFF. This therefore<br />

provides direct testable input into current standardisation<br />

working groups developing ITS, XLIFF and HTML5.


Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 65<br />

Dr. Ian O’Keeffe, postdoctoral researcher, left University<br />

of Limerick during Quarter 3 <strong>2012</strong>. He is now Manager<br />

of Software Engineering/Development at Fidelity<br />

Investments. Also departing University of Limerick<br />

in Quarter 3 was postdoctoral researcher Dr. Eoin Ó<br />

Conchúir. Eoin is now participating in the New Frontiers<br />

entrepreneur development programme.<br />

The overall aim of LOC3 is to focus on localisation<br />

workflow re-engineering and recommendation, in<br />

addition to empirically defining relevant attributes and<br />

terms in generating personalised localised content<br />

(Morera, Aouad et al., 2011a; Morera, Aouad et al.,<br />

2011b). This research has conducted an empirical<br />

evaluation of proposed localisation workflows against<br />

current industry practice (Lenker et al., 2010; Lenker,<br />

2011a; Lenker, 2011b).<br />

Dr. Thomas Arend, International Product Lead at Twitter addresses the<br />

LRC Conference <strong>2012</strong> on the theme “Social Localisation at Twitter –<br />

translating the world in 140 Characters”<br />

Joss Moorkens submitted his thesis in <strong>2012</strong>, reporting<br />

on the outcome of Measuring Consistency in Translation<br />

Memories: A Mixed-Methods Case Study. His work<br />

questioned the widely-held assumption that humanmade<br />

translation memories lead to higher quality, as<br />

well as faster and cheaper translations as they provided<br />

access to a large body of high-quality bilingual or<br />

multilingual language resources produced by professional<br />

human translators. The result of his research involving<br />

an examination of large volumes of authentic translation<br />

memories acquired from <strong>CNGL</strong> partners, as well as<br />

qualitative research involving industry experts, clearly<br />

corrects this view and suggests caution. Joss’s thesis has<br />

already led to enquiries by and significant interest from<br />

academia and industry alike.<br />

Work Package LOC3<br />

The LOC3 Work Package is divided into three<br />

sections, LOC3.1 Localisation Workflow Specifications<br />

for Enterprise Localisation; LOC3.2 Taxonomy of<br />

Personalisation for Generating Personalised Content,<br />

and LOC3.3 Localisation Workflow Mining.<br />

Another focus is the research, design and experimental<br />

implementation of a workflow recommendation system.<br />

This system takes into account a list of the most relevant<br />

tasks in a localisation process, and uses a decision tree<br />

to select those that should be part of the workflow<br />

according to the specific quality requirements, time<br />

constraints, and cost constraints of the project on<br />

hand. Aram Morera has advanced his research on the<br />

identification and description of workflow patterns in<br />

social localisation, leading to a workflow recommender<br />

for specific social localisation scenarios, stretching from<br />

charitable, to non-profit, to for-profit approaches. The<br />

identification of these patterns has led to the discovery<br />

of serious shortcomings in current technologies which<br />

are being addressed by the SOLAS development team<br />

in LOC. It is expected that Aram will submit his thesis<br />

reporting on his research in the first half of 2013.<br />

The final area of research concerns personalisation<br />

issues in localisation. This involves considering individual<br />

preferences, gathered explicitly or implicitly, to go<br />

beyond the traditional ‘locale’ or ‘community interest’.<br />

The aim here is the creation of an empirical definition of<br />

personalisation attributes to demonstrate their feasibility<br />

and relevance for generating adequate personalised<br />

content. Research conducted within this work package<br />

includes the specification and the development of<br />

demonstrator crowdsourcing localisation environments<br />

and platforms (Lenker, 2010; Lenker and Anastasiou,<br />

2010). Solomon Gizaw has focused on the identification<br />

of communication patterns in cross-cultural information<br />

exchange and the application of personalisation<br />

techniques to a community-based translation and<br />

localisation environment. Solomon has analysed a large<br />

amount of actual user data from live communication<br />

exchanges and is planning to use the results of this<br />

analysis for the adaptation of SOLAS to the requirement<br />

and needs of specific users, rather than just locales.<br />

Solomon is planning to submit his thesis in the first half<br />

of 2013.


66<br />

Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />

NEXT GENERATION LOCALISATION<br />

Industry Engagement<br />

LOC has closely collaborated with its main industrial<br />

partners, especially with Symantec, VistaTEC and<br />

Microsoft. Additional collaboration with international<br />

collaborators from The Rosetta Foundation also<br />

provided valuable input. Following the open sourcing<br />

of GlobalSight and the establishment of The Rosetta<br />

Foundation as a spin-off from the University of Limerick<br />

and <strong>CNGL</strong>, LOC also collaborated closely with The<br />

Rosetta Foundation and Welocalize. The engagement<br />

with industrial partners happened through site visits and<br />

one-to-one focused meetings between them and LOC<br />

researchers.<br />

In the SOLAS platform LOC supports the development<br />

of a <strong>CNGL</strong> open localisation platform that will, in<br />

addition to serving as a test bed for <strong>CNGL</strong> research in<br />

the different work packages, provide large multinational<br />

publishers with a solid case study for the viability of<br />

open standards for the negotiation of localisation<br />

data and localisation knowledge, thus providing them<br />

with the arguments necessary for a migration from an<br />

enclosed proprietary localisation scenario to a more<br />

open, interconnecting and interoperable framework. This<br />

platform will also encourage the uptake of localisation<br />

and process automation solutions by small and mediumsized<br />

enterprises, create new business opportunities and<br />

support the up-scaling of localisation offerings by smaller<br />

firms. More than 40 individuals and companies have so<br />

far joined the Dynamic Coalition for a Global Localisation<br />

Platform: Localisation for All, initiated by LOC and The<br />

Rosetta Foundation. We expect the platform to generate<br />

increased activity in sectors of the localisation industry<br />

(some first indicators show that growth by a factor of 100,<br />

in certain sectors, is not out of reach). Subsequently, we<br />

expect employment to rise in these sectors driven by a<br />

growth in translation and localisation as well as in the<br />

technical support and development area.<br />

The opportunities and the requirements for SOLAS,<br />

especially in the non-profit sector, are significant. In 2007,<br />

almost 1.5 million non-profits were registered with the US<br />

Tax Authorities and non-profits reported US$1.9 trillion in<br />

revenue and US$4.3 trillion in assets. From 1998 to 2005,<br />

non-profit employment grew 16.4 per cent, compared to<br />

6.2 per cent for overall employment in the US.<br />

It is in the nature of non-profit to deal with a multilingual<br />

and multicultural constituency. Surprisingly, no adequate<br />

technology is available to support their localisation and<br />

translation activities.<br />

In Ireland, the non-profit sector employs more than<br />

100,000 people with pay costs in the order of €3.5bn, has<br />

revenues of more than €6bn, and holds assets valued at<br />

more than €3.5bn. The sector is, perhaps, the principal<br />

source of social capital in Irish society, with more than<br />

560,000 people engaged as volunteers, and more than<br />

50,000 people engaged in their governance. In scale,<br />

the non-profit sector in Ireland is at least comparable to<br />

if not greater than agriculture or tourism as a source of<br />

employment.<br />

Research into SOLAS by <strong>CNGL</strong>, with subsequent<br />

development of this framework through The Rosetta<br />

Foundation, has the potential to turn Ireland into the hub<br />

for the internationally traded localisation and translation<br />

service provision of the world-wide non-profit sector, with<br />

revenues of more than US$1.9 trillion in the USA alone.<br />

Indeed, as the international independent review panel<br />

stated in its review of <strong>CNGL</strong> in July <strong>2012</strong>, “<strong>CNGL</strong>’s goal<br />

of making significant societal impact is illustrated by the<br />

potentially ground-breaking social localization concept,<br />

embodied in a spinout (The Rosetta Foundation).”<br />

Achievements (grouped by category)<br />

Operational Management and Governance<br />

} On-going research collaboration with <strong>CNGL</strong> ILT<br />

Track, e.g. in the area of MT; with DCM, e.g. in the<br />

area of personalisation; and SF, e.g. in the area of<br />

interoperability and metadata<br />

} On-going active engagement with LOC’s international<br />

collaborators<br />

} On-going engagement with world-leading standards<br />

associations, including Unicode and the world-wideweb<br />

consortium (W3C)<br />

} Participation and programme input to the world’s<br />

leading localisation events, including Localization<br />

World and GALA<br />

} Engagement with the non-profit sector, including<br />

the Irish umbrella body for non-profits, The Wheel,<br />

representing close to 2,000 Irish non-profit enterprises,<br />

and Dochas, representing the Irish-based overseas aid<br />

organisations<br />

} Collaboration with one of the developers of one of<br />

the most widely used open source localisation tools,<br />

Translate.za.org, and its principal Dwayne Bailey


Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 67<br />

Research Programme<br />

LOC1<br />

} Continuing contributions to the development of<br />

the XLIFF standard of OASIS (members of Technical<br />

Committee)<br />

} RDF-XLIFF mapping (contacts from LREC: Thierry<br />

Declerck, Tobias Wunner and John McCrae (DERI),<br />

also Dr. David Lewis SF, Dr. Alex O’Connor SF)<br />

} Successful PhD defence by Lucía Morado Vázquez,<br />

who is now employed as postdoctoral researcher at<br />

the University of Geneva, Switzerland<br />

} Significant contributions to the knowledge of content<br />

development for global markets<br />

} Filing of invention disclosures<br />

} Integration of LOC1 components into the overall<br />

SOLAS framework<br />

} Successful PhD defence by Lorcan Ryan<br />

LOC 2<br />

} Significant contribution to the knowledge of<br />

localisation resource evaluation and interoperability<br />

} Further research and implementation of LocConnect<br />

component<br />

} Integration of LOC2 components into the overall<br />

SOLAS framework<br />

} Further research and assessment of quality and<br />

consistency in Translation Memories, including<br />

successful PhD defence by Joss Moorkens of his thesis<br />

“Measuring Consistency in Translation Memories:<br />

A Mixed-Methods Case Study”. This work involved<br />

significant industry input and has led to substantial<br />

industry interest in its outcomes.<br />

} Further research and implementation of Localisation<br />

Service Descriptor component<br />

} Further research and implementation of Quality<br />

Assessment Engine component<br />

} Cross-strand collaboration with ILT1<br />

} Asanka Wasala writing up PhD thesis<br />

} Filing of invention disclosure for several research<br />

demonstrators<br />

LOC3<br />

} Research and implementation of Workflow<br />

Recommendation Engine component<br />

} Investigation of industrial workflows<br />

} Investigation of data transfer practices for Term Bases<br />

and Glossaries<br />

} PhD students approaching write-up stage, reporting<br />

very significant results on their research into<br />

localisation service descriptors, strategies to surpass<br />

the established concept of locale in localisation, and<br />

community-based social localisation workflows.<br />

} Filing of invention disclosure for several research<br />

demonstrators<br />

} Integration of LOC3 components into the overall<br />

SOLAS framework<br />

LOC Overall<br />

} Collaboration with the United Nations Internet<br />

Governance Forum (IGF)<br />

} Support for the University of Limerick and the United<br />

Nations Economic Commission for Africa’s launch of<br />

the MSc in Multilingual Computing and Localisation<br />

to be delivered through distance learning and cohosted<br />

by UNECA at its Information Training Centre<br />

for Africa (ITCA) in Addis Ababa, Ethiopia. The aim of<br />

the programme is to promote African languages in the<br />

Information Society.<br />

} Invention Disclosures for Localisation Knowledge<br />

Repository (LKR), Automated Optimal Machine<br />

Translation System Selection supporting XLIFF, XLLIFF<br />

Phoenix, LocConnect and Workflow Recommender.<br />

} Further development of SOLAS integrated system and<br />

branching into SOLAS Productivity and SOLAS Match<br />

products.<br />

Industry Partner Engagement<br />

} Alchemy Software Development played an integral<br />

part in the <strong>2012</strong> LRC Summer School, preparing<br />

and presenting materials related to mobile device<br />

localisation.<br />

} LRC <strong>Annual</strong> Conference featured contributions from<br />

industry partners Symantec and Welocalize, as well<br />

as presentations from <strong>CNGL</strong> Spinout The Rosetta<br />

Foundation.


68<br />

Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />

NEXT GENERATION LOCALISATION<br />

} LRC Best Thesis Award <strong>2012</strong> was sponsored by<br />

Symantec Ireland.<br />

Tech Transfer Activities<br />

The following invention disclosures have been filed with<br />

the Technology Transfer office at UL:<br />

2006167 – Deed of Assignment of Intellectual<br />

Property Rights (1702<strong>2012</strong>), <strong>2012</strong>/February –<br />

Localisation Knowledge Repository (LKR)<br />

2006166 – Deed of Assignment of Intellectual<br />

Property Rights (1702<strong>2012</strong>), <strong>2012</strong>/February –<br />

Automated Optimal Machine Translation System<br />

Selection Supporting XML Localization Interchange<br />

File Format (XLIFF)<br />

2006165 – Deed of Assignment of Intellectual<br />

Property Rights (1702<strong>2012</strong>), <strong>2012</strong>/February –<br />

XLIFF Phoenix<br />

2006164 – Deed of Assignment of Intellectual<br />

Property Rights (1702<strong>2012</strong>), <strong>2012</strong>/February<br />

– LocConnect – Localisation Orchestration<br />

Framework<br />

2006163 – Deed of Assignment of Intellectual<br />

Property Rights (1702<strong>2012</strong>), <strong>2012</strong>/February –<br />

Workflow Recommender<br />

} Localization World Paris and Seattle – Rosetta<br />

Foundation invited to exhibit at both European and<br />

American events.<br />

} LOC postdoctoral researcher Dr. David Filip launched<br />

FEISGILTT <strong>2012</strong>, a new federated event dedicated<br />

to Interoperability Standardization in Globalization,<br />

Internationalization, Localisation, and Translation<br />

Technologies.<br />

} Launch of AGIS Africa initiative, in collaboration with<br />

the Rosetta Foundation, United Nations Economic<br />

Commission for Africa, GALA and the University of<br />

Limerick.<br />

Plans<br />

The Next Generation Localisation area will work with<br />

The Rosetta Foundation as well as with the United<br />

Nation’s Internet Governance Forum (IGF) working<br />

group Dynamic Coalition for a Global Open Localization<br />

Platform: Localization for All on the further development<br />

of SOLAS leading to its deployment as an Open<br />

Localisation Platform, supported by the SF1 and SF2<br />

<strong>CNGL</strong> research areas.<br />

Education and Outreach<br />

} 10th <strong>Annual</strong> LRC Internationalisation and Localisation<br />

Summer School took place from 13-15 June <strong>2012</strong> in<br />

Limerick. The Summer School focused on Mobile<br />

Application development and localisation and was<br />

presented by a mix of <strong>CNGL</strong> industrial partners<br />

(Alchemy Software Development), PhD Students,<br />

academic staff and UL students.<br />

} Localisation Focus – The International Journal of<br />

Localisation published and sent out to libraries and<br />

subscribers, as well as being made available online<br />

for free at www.localisation.ie. Direct download links<br />

were sent to all members of the LRC mailing list<br />

(approximately 2,500).<br />

} LRC XVII, 20-21 September <strong>2012</strong>, Limerick, annual<br />

conference. Conference also featured <strong>CNGL</strong><br />

Innovation Showcase <strong>2012</strong>.<br />

} Launch and support of the MSc in Global Computing<br />

and Localisation by distance learning.<br />

Reinhard Schäler (second from left) presented on “Opportunities and<br />

Growth in Africa” at GALA <strong>2012</strong> in Monaco in March. Pictured with<br />

Reinhard are Renée Salzman (GALA Co-Founder), Hans Fenstermacher<br />

(GALA CEO) and María José Velasco (GALA founding member and<br />

Mondragón Lingua)<br />

The Rosetta Foundation was launched in 2009 by the<br />

President of UL and is supported by <strong>CNGL</strong> through<br />

formal decisions by its Integration and Management<br />

Committees. It works with more than 2,600 volunteers<br />

in over 40 languages and with 50 partner organisations<br />

including Special Olympics Europe Eurasia/International,<br />

Trócaire, the London School for Tropical Medicine and


Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 69<br />

Hygiene, and Ruhama. IP developed by <strong>CNGL</strong> has been<br />

transferred to The Rosetta Foundation to support its<br />

technology platform and The Rosetta Foundation has<br />

become a UL Campus Company. The Rosetta Foundation<br />

has already provided very valuable feedback into <strong>CNGL</strong><br />

research which has resulted in joint publications (ASLIB<br />

Translation and the Computer, 2010). The platform<br />

serves as a test bed for the SOLAS research carried out in<br />

LOC, specifically with regard to SOLAS Match, allowing<br />

it to demonstrate the viability and to measure the<br />

improvements achieved in the localisation process. This<br />

work has been documented in at least two non-<strong>CNGL</strong><br />

funded MSc theses in <strong>2012</strong>.<br />

The LOC track publishes ‘Localisation Focus – the International Journal<br />

of Localisation’<br />

In line with this research, the platform is being open<br />

sourced with the aim of allowing SOLAS Match to<br />

become the de facto platform for non-industrial, nonprofit<br />

and non-market localisation and translation<br />

activities, driving social localisation as defined by LOC<br />

researchers and in support of the social agenda of <strong>CNGL</strong>.<br />

The results of this work have been commented on in the<br />

<strong>CNGL</strong> Final Review as the independent international<br />

review panel commented that “The most visible success<br />

here is without a doubt the Rosetta Foundation spinoff,<br />

which pioneers a novel, comprehensive localization<br />

model for organizations seeking to translate content<br />

for underserved communities. The panel feels this<br />

accomplishment has great societal impact that<br />

transcends the boundaries of Ireland and even the EU.”<br />

Testing is underway with a focus on demonstrating<br />

the viability of the SOLAS Match platform with a<br />

subset of projects within the Rosetta Foundation.<br />

The publication of specifications and an invitation for<br />

“open” contributions (such as from the African Network<br />

for Localisation; the Centre for the Development of<br />

Advanced Computing (CDAC) in Pune, India; the<br />

micro-lending organisation KIVA and other organisations<br />

such as TechSoup Global or Zafen), the creation of<br />

the component repository, and the demonstration of<br />

“open” interoperability, in collaboration with industry<br />

associations such as GALA and Interoperability Now are<br />

on-going priorities in this area.<br />

Improvements will be demonstrated and measured in<br />

relation to particular tasks, e.g. MT and MT post-editing,<br />

and in relation to the overall process, e.g. user interaction<br />

evaluation, (re-)use of localisation knowledge and flexible<br />

workflow specification supported by the platform. Each<br />

section in each LOC work package is associated with<br />

one particular aspect of this demonstrator and each will<br />

contribute to an improvement in the performance of the<br />

overall platform with component technologies from LOC<br />

sections connected to the localisation platform. This will<br />

enable us to measure the impact of these technologies<br />

on the performance of the overall localisation workflow.<br />

The LOC research track will support The Rosetta<br />

Foundation on the development and the deployment<br />

of SOLAS which, in turn, will provide highly valuable<br />

feedback from a concrete implementation scenario<br />

into the scientific research carried out within LOC<br />

and other <strong>CNGL</strong> areas. Now that the platform can be<br />

demonstrated, additional component technologies from<br />

other <strong>CNGL</strong> research areas are being considered for<br />

integration.<br />

The LOC research strand of <strong>CNGL</strong> will be subsumed<br />

in <strong>CNGL</strong>II under the Translation and Localisation<br />

Challenge (T&L), and the Interoperability and Analytics<br />

Challenge (I&A). T&L2 will focus on Social Localisation<br />

and continue with the research and development of<br />

the service-oriented localisation architecture solution<br />

(SOLAS) initiated under <strong>CNGL</strong> as the bulk localisation<br />

demonstrator. The work will focus on the identification<br />

and resolution of current problems around the correct<br />

identification of resources for localisation (SOLAS Match),<br />

as well as the identification and development of an<br />

adequate support infrastructure in terms of language<br />

technologies and resources in an ad hoc and dynamic<br />

setting.


Systems<br />

Framework


Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 71<br />

Strand Name: Systems Framework<br />

AREA CO-ORDINATOR: DR. SATURNINO LUZ<br />

Participant Names and Affiliation<br />

Industrial Collaborators<br />

Prof. Andy Way<br />

Mr. Takeshi Fukunaga<br />

Mr. Dag Schmidtke<br />

Dr. Alexander Troussov<br />

Mr. David Clarke<br />

Capita<br />

Dai Nippon Printing<br />

Microsoft<br />

IBM<br />

Welocalize<br />

International<br />

Collaborators<br />

Dr. Alistair Edwards<br />

Dr. Masood Masoodian<br />

Prof. Michael McTear<br />

Prof. Chris Mellish<br />

University of York<br />

The University of Waikato<br />

University of Ulster<br />

University of Aberdeen<br />

Dr. Olga Beregovaya<br />

Welocalize<br />

Mr. Phil Richie<br />

VistaTEC<br />

Dr. Fred Hollowood<br />

Symantec<br />

Mr. Jason Rickard<br />

Symantec<br />

Faculty<br />

Prof. Julie Carson-Berndsen University College Dublin SF1<br />

Dr. Gavin Doherty Trinity College Dublin SF1, SF2<br />

Prof. Josef van Genabith Dublin City University SF2<br />

Dr. David Lewis Trinity College Dublin SF2 Leader<br />

Dr. Saturnino Luz Trinity College Dublin SF1 Leader<br />

Mr. Reinhard Schäler University of Limerick SF1, SF2<br />

Prof. Vincent Wade Trinity College Dublin SF1, SF2<br />

Postdoctoral Researchers<br />

Mr. Dominic Jones Trinity College Dublin SF2<br />

Dr. Nikiforos Karamanis* Trinity College Dublin SF1<br />

Dr. Anton Gerdelan* Trinity College Dublin SF2


72<br />

Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />

SYSTEMS FRAMEWORK<br />

PhD Students<br />

Mr. John McAuley Trinity College Dublin SF2<br />

Mr. John Moran Trinity College Dublin SF2<br />

Ms. Ilana Rozanes Trinity College Dublin SF1<br />

Ms. Anne Schneider Trinity College Dublin SF1<br />

Mr. Stephan Schlögl Trinity College Dublin SF1<br />

Technicians<br />

Mr. Leroy Finn Trinity College Dublin SF2<br />

* Affiliated postdoctoral researchers<br />

Funding<br />

<strong>2012</strong> Funding from SFI<br />

€342,924<br />

SFI TIDA ‘iOmegaT: Instrumented CAT Tool’<br />

(12/TIDA/I2424) €92,273 over 12 months<br />

<strong>2012</strong> Funding from Other Sources<br />

EC FP7 Coordination and Support Action Language<br />

Technology Web – €149,280 to TCD over two years<br />

(UL, DCU, Microsoft and VistaTEC are also partners)


Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 73<br />

Research Overview: Systems<br />

Framework (SF)<br />

Goals<br />

The Systems Framework track seeks to ensure that basic<br />

language technologies can be effectively integrated to<br />

form next generation localisation systems that meet<br />

high standards of usability, and to facilitate the use of<br />

such technologies in advanced research prototypes to<br />

creatively explore novel design spaces for interactive<br />

systems. SF aims to produce system services architecture<br />

and a system design methodology to support the<br />

integration of linguistic technologies, localisation<br />

workflow and digital content management. The ultimate<br />

goal being to enable rapid, iterative and instrumented<br />

integration of industrial software and academic research<br />

prototypes and to support their evaluation through<br />

provision of: a software integration platform based on<br />

open standards, guidelines and tools for developing<br />

workflows and applications using this platform, and<br />

methods for iterative prototyping and user studies. From<br />

a research perspective, SF focuses on the study of users<br />

(and potential users) of language technology-enabled<br />

systems in real work contexts, on the investigation of<br />

novel interaction design techniques, and on system<br />

support to the development of speech- and languageenabled<br />

applications.<br />

The work packages, SF1 and SF2 pursue these objectives<br />

from different perspectives. The Interaction Design<br />

Work Package (SF1) deals primarily with human-factors<br />

research and it explores the design of novel systems<br />

incorporating language technology. The Systems<br />

Service Architecture Work Package (SF2) has a dual<br />

role in <strong>CNGL</strong>: it acts as a coordinator and facilitator of<br />

practical systems integration for the <strong>CNGL</strong> Demonstrator<br />

Programme and it conducts research into service<br />

integration and service management techniques. These<br />

two roles are interrelated in that the Demonstrator<br />

Programme, due to its size and variety, offers a unique<br />

interoperability and evaluation laboratory that operates<br />

over a wide range of linguistic and digital content<br />

processing services and applications.<br />

The specific goals for <strong>2012</strong> were to (1) provide continued<br />

support for the demonstrator activities and incorporate<br />

lessons learned into service and metadata models that<br />

are contributing to international standards activities; (2)<br />

to analyse and create theories based on the workplace<br />

studies conducted in various work contexts, with focus<br />

on the work of medical interpreters; (3) to report<br />

research results in journal and conference publications;<br />

(4) to further disseminate and evaluate the Wizard-of-Oz<br />

system; and (5) to conduct further evaluation of language<br />

technologies in interactive contexts (e.g. speech-tospeech<br />

systems). These goals were satisfactorily met,<br />

several papers were published, substantive contributions<br />

were made to extension to the ITS (W3C) and XLIFF<br />

(OASIS) standards, and 3 PhD theses were submitted.<br />

Research Barriers and Methodologies<br />

to Address Them<br />

As noted in previous reports, we have identified a gap<br />

between language technology and systems development<br />

methodologies (including both systems and interaction<br />

design issues) which seems to extend beyond the<br />

usual issues in putting together demonstrator systems<br />

and research prototypes. The research done by SF has<br />

attempted to bridge this gap.<br />

John Moran of TCD presents at AMTA-<strong>2012</strong> Workshop on Post-editing<br />

Technology and Practice (WPTP) in San Diego, USA.


74<br />

Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />

SYSTEMS FRAMEWORK<br />

semantic web, and to interface with standards used in<br />

localisation, such as XLIFF, for integration with work<br />

done in the LOC strand. SF also examines specific<br />

service management issues, in the context of its support<br />

for the demonstrator systems, namely the support for<br />

management of online communities, the unobtrusive<br />

monitoring of post-editing effort with regard to different<br />

configurations of SMT and other localisation support<br />

technologies and interoperable linked data formats for<br />

content management and localisation integration.<br />

Year 5 Progress<br />

Dominic Jones presents his PhD work at the national final of the Thesis<br />

in 3 competition<br />

From an interaction design perspective, SF has<br />

investigated methods for incorporating work contexts<br />

into the analysis of requirements for natural language<br />

generation systems, with special focus on MT technology<br />

in a localisation context (Doherty, Karamanis and Luz,<br />

<strong>2012</strong>; Karamanis, Luz and Doherty, 2011), ethnographic<br />

methods for the study of multilingual situations, with<br />

focus on the work of medical interpreters (Rozanes, Luz<br />

and Doherty, 2011) and rapid prototyping and evaluation<br />

methods for interactive language technologies such<br />

as systems that combine speech input/output to MT<br />

(Schlögl et al., 2011; Schneider and Luz, 2011).<br />

From a software engineering perspective, SF has<br />

successfully promoted the adoption of a Service Oriented<br />

Architecture (SOA) approach across <strong>CNGL</strong>, integrating<br />

different technologies into a range of applications<br />

spanning the use scenarios addressed by <strong>CNGL</strong>. This has<br />

allowed individual components, tools and platforms to<br />

retain autonomy in their choice of software technology,<br />

provided they adhere to some common interoperability<br />

models. The overall strategy was to employ existing<br />

standards as much as possible, by defining a common<br />

model based on standard languages from the W3C<br />

addressing provenance, internationalisation and the<br />

SF activities in Year 5 consisted largely of analysing<br />

and publishing results of research work conducted in<br />

the last 18 months, with a focus on the completion<br />

of PhD theses. Several papers have been written.<br />

Two papers appeared in major HCI journals (van der<br />

Sluis, Luz et al., <strong>2012</strong>; Doherty, Karamanis and Luz,<br />

<strong>2012</strong>), one will appear in the proceedings of the ACM<br />

Computer Supported Cooperative Work conference<br />

(Kane, Toussaint and Luz, 2013) and four others are<br />

under preparation for publication (to be submitted to<br />

journals ‘Interacting with Computers’ and ‘Computer<br />

Supported Cooperative Work’ and the conferences ACL<br />

2013 and Interact 2013). Research related to service and<br />

content and language resource interoperability were<br />

published at WWW <strong>2012</strong> (Filip, Lewis and Sasaki, <strong>2012</strong>)<br />

and LREC <strong>2012</strong> (Lewis et al., <strong>2012</strong>) and a paper and<br />

book chapter on community management were also<br />

published. Several presentations and talks were given,<br />

including presentations at the <strong>CNGL</strong> review meeting<br />

and <strong>CNGL</strong> Scientific Committee Meeting. In addition,<br />

several presentations were made at industrially-focused<br />

events, including Multilingual Web workshops. Two<br />

international workshops were organised. The first,<br />

focused on Multilingual Web and Linked Open Data,<br />

was held in June in Dublin. The second (FEISGILT <strong>2012</strong>),<br />

organised in collaboration with LOC and co-located with<br />

Localization World in Seattle in September, was focused<br />

on standardisation and interoperability issues around<br />

globalisation, internationalisation, localisation and<br />

translation. SF members also contributed significantly to<br />

the <strong>CNGL</strong>II proposal.


Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 75<br />

Accomplishments, Impact and Plans<br />

We have concluded the study on support for<br />

collaborative aspects of translation work and analysis<br />

of impacts of language technology, particularly MT,<br />

and reported this work in two major journal papers:<br />

the Machine Translation journal and the Computer<br />

Supported Cooperative Work journal (Karamanis, Luz<br />

and Doherty, 2011; Doherty, Karamanis and Luz, <strong>2012</strong>).<br />

The work on language generation in cross-cultural<br />

settings has also been concluded and published in the<br />

premier HCI journal (van Der Sluis et al., <strong>2012</strong>). The<br />

Wizard-of-Oz platform has been fully deployed online<br />

and released as an open source project. Experiments and<br />

interviews to assess wizard performance and the usability<br />

of the platform have been successfully concluded and<br />

a paper is currently in preparation for submission to<br />

the journal ‘Interacting with Computers’. Community<br />

management trials with Symantec were successfully<br />

completed, and MT post-editing trials with professional<br />

translators at Welocalize and with crowdsource translators<br />

were completed. The former resulted in post-editing<br />

machine translation (PEMT) analytics solutions being<br />

licensed to Welocalize, while the latter demonstrated<br />

strong MT improvement resulting from selective training<br />

based on PEMT logging. Further details of on-going<br />

activities and plans for future work are given below.<br />

Fieldwork for Language Technologies in Work<br />

Contexts<br />

SF PhD student Ilana Rozanes has concluded the<br />

elaboration of a grounded theory of the work of<br />

medical interpreters. This work spanned two years of<br />

extensive observation of medical interpreters at work,<br />

interviews, data collection and data coding. Results are<br />

being currently written up for publication in journal and<br />

HCI conference papers. These papers will explore the<br />

data and theory in the context of designing languagetechnology<br />

applications for use by interpreters in medical<br />

settings, drawing on <strong>CNGL</strong> technology.<br />

Figure 4: <strong>CNGL</strong> Wizard-of-Oz Homepage<br />

Writing of a paper describing the results of these<br />

activities is in progress for submission to a journal.<br />

Stephan Schlögl has submitted his PhD thesis, and<br />

his viva is scheduled for January 2013. The WebWOZ<br />

software has now been released under an open source<br />

licence and we plan to use and extend it in <strong>CNGL</strong>II.<br />

Interaction Design for Speech-to-Speech<br />

Translation<br />

Complementing our published work (Schneider and<br />

Luz, 2011; Schneider) a further experiment has been<br />

conducted on the use of speech recognition in an<br />

instructional task. Results are currently being written<br />

up for a paper to be submitted to ACL 2013 or SIGdial<br />

2013. The overall aim of this line of research is to<br />

assess the potential mismatches between intrinsic<br />

and extrinsic evaluation methods for component<br />

language technologies. In this case we focused on how<br />

well or otherwise (intrinsic metric) word-error rate<br />

correlates to (extrinsic measures of) task success, and<br />

proposed alternative methods for identifying potential<br />

communication difficulties in automatic speech<br />

recognition (ASR)-mediated communication.<br />

Wizard-of-Oz Platform<br />

The <strong>CNGL</strong> Wizard-of-Oz platform (WebWOZ) was made<br />

available online in 2011 (http://www.webwoz.com).<br />

Since then it has been used to gather data on patterns<br />

of usage of the tool and on wizard performance. Specific<br />

projects (e.g. HCI coursework projects) were designed to<br />

assess the tool under controlled conditions.


76<br />

Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />

SYSTEMS FRAMEWORK<br />

Figure 5: End-to-end localisation workflow monitoring using RDF Provenance, through integration of CMS-LION,<br />

SOLAS, Matrex and other components<br />

Linked Data for Content Management<br />

and Localisation Integration<br />

Integration of Content Management Systems and<br />

Localisation Workflows remains a challenge with no<br />

established standard. However, content is increasingly<br />

generated and revised in a continuous stream including<br />

user-generated content, while existing push-based<br />

integration between content management and<br />

localisation systems constrains both agility in support<br />

of new content processing modes (e.g. crowdsourcing)<br />

and upstream feedback from translators. This activity<br />

therefore provides a standard linked data-oriented<br />

approach to agile multilingual content management.<br />

The approach supports both push- and pull-based<br />

CMS-Localisation interactions via a common Resource<br />

Description Framework (RDF) Provenance Model. This is<br />

implemented in a system, CMS-LION, which populated<br />

the RDF Provenance Model from exchanges of XLIFF files<br />

within a localisation workflow operated by LOC’s SOLAS<br />

platform. This model uses the RDF Open Provenance<br />

Vocabulary to log all CMS-Localisation interactions<br />

and content transformations. This allows standard<br />

SPARQL queries to be used for workflow monitoring and<br />

translation corpora extraction from fresh post-editing,<br />

for immediate retraining of an MT engine based on<br />

MaTrex from DCU and the bi-text corpora processing<br />

chains developed in the PANACEA project. Over several<br />

retraining iterations, this approach showed a strong 25%<br />

improvement in BLEU scores within a single crowdsourced<br />

translation job.<br />

To demonstrate this approach, a crowd-sourced<br />

translation application has been implemented with<br />

a Drupal frontend via which users can create and<br />

contribute to translation jobs in XLIFF. An RDFLogger<br />

component is used to change the XLIFF document<br />

into RDF provenance statements and then log these to<br />

a triple store. The Sesame Triple Store used provides<br />

an open source Java framework for storing, querying<br />

and reasoning with RDF. A RDF Provenance Visualiser<br />

has been implemented for exploring outcomes of<br />

process steps. This platform was also used in prototype<br />

integration with translation quality assurance data<br />

gathered by translation review processes conducted<br />

by VistaTEC.


Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 77<br />

Figure 6: Petri – A community analytics tool developed for identifying and tracking community guru behaviour<br />

at Symantec<br />

This experience and demonstration using CMS-LION<br />

and SOLAS have enabled <strong>CNGL</strong> to promote a strong<br />

vision of end-to-end interoperability and monitoring<br />

of localisation workflows. This, in turn, has fed into new<br />

metadata definitions related to translation provenance<br />

in the new version of the Internationalization Tag Set<br />

being developed by the Language Technology-Web<br />

project via the W3C’s Multilingual Web-Language<br />

Technology working group. Feedback on this approach<br />

has also been provided to the XLIFF Technical<br />

Committee and the W3C PROV working group. This<br />

capability has also placed <strong>CNGL</strong> well for continued<br />

international collaboration at the intersection of linked<br />

data and language resource technology research, both<br />

through collaboration on workshops in the area and<br />

through two EU-funded project proposal submissions.<br />

Visual Analytics for Online Communities<br />

Visual analytics can help users to extract knowledge<br />

from massive amounts of data, make sound decisions<br />

based on evidence and increase understanding of<br />

complex online processes. However, applications are<br />

generally developed with a focus on the researcher or the<br />

analyst, and lack a clear context for the end-user. This<br />

research seeks to investigate the potential<br />

of visual analytics for online communities. It<br />

has evaluated how to extract knowledge from<br />

communication data and represent this visually<br />

to support evidence-based decision making and<br />

understanding complex processes in online communities.<br />

An initial visual analytics tool was developed for the<br />

Stack Exchange Super-User meta community. The<br />

tool visualises the community’s social and temporal<br />

interaction patterns and provides collaboration support<br />

in the form of visualisation bookmarking, view sharing<br />

and threaded discussion. Based on this experience, a<br />

revised tool was tailored to the community management<br />

requirements of customer support staff in Symantec,<br />

enabling an evaluation of innovative methodologies<br />

and tools in developing such a tool.<br />

The tool, Petri, was designed to encourage a more<br />

analytical approach to online community management<br />

that is based on cycles of observation and intervention.<br />

We conducted several interviews and design workshops<br />

with Symantec’s online community team to help<br />

formalise a set of requirements. These requirements<br />

were then used to inform the design. Petri enables<br />

the community manager to analyse their community<br />

from multiple perspectives, shifting between phases


78<br />

Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />

SYSTEMS FRAMEWORK<br />

of explorative and confirmative analysis, and to identify<br />

users that could prove valuable to the community<br />

over time. Explorative evaluation, conducted with five<br />

members of the Symantec community management<br />

team, found the visualisation tool to be both useful<br />

and usable. This work, therefore, proposed a new<br />

approach to online community management, which is<br />

built upon cycles of analysis and informed intervention.<br />

It is supported by the implementation of advanced<br />

visual analytic technologies and has established a set of<br />

design requirements that can be readdressed by other<br />

researchers interested in online community visualisation.<br />

termed instrumented OmegaT or iOmegaT, has been<br />

deployed at a large <strong>CNGL</strong> industrial partner, Welocalize.<br />

As a result, large quantities of translation process field<br />

data have been gathered from production tests where<br />

individual translator speed was measured for segments<br />

that were translated using MT and segments that were<br />

not (Human Translation). So far, over half a million<br />

words in approximately 60,000 sentences have been<br />

translated by more than 50 translators using iOmegaT<br />

and this number is growing on a monthly basis as more<br />

productivity tests are carried out. This uniquely large data<br />

set is currently being analysed to determine: whether<br />

automated MT metrics or string distance calculations<br />

correlate with post-editing (PE) time data; if analysis<br />

of patterns in keystroke and other translation process<br />

(TP) field data provides insight into the MT post-editing<br />

process; if features of source sentences which correlate<br />

with increased post-editing time across multiple<br />

languages can be identified; and what volume of PEMT<br />

data needs to be gathered to form reliable analyses of<br />

MT engines.<br />

Industry Engagement and Future Plans<br />

Prof. Felix Sasaki of DFKI and Dag Schmidtke of Microsoft Ireland confer<br />

with Dr. Mark Davis, President of the Unicode Consortium via video link<br />

at the W3C Multilingual Web Workshop at TCD<br />

Instrumenting CAT Tools to evaluate Post-editing<br />

of SMT<br />

Machine translation (MT) evaluation metrics based on<br />

n-gram co-occurrence statistics are financially cheap<br />

to execute and their value in comparative research is<br />

well documented. However, their value as a standalone<br />

measure of MT output quality is questionable. In<br />

contrast, manual methods of MT evaluation are<br />

financially expensive. This work is developing a lowcost<br />

means of acquiring MT evaluation data in an<br />

operationalised manner in a commercial post-edited<br />

MT context. To this effect, OmegaT, a popular open<br />

source CAT tool has been augmented to capture postediting<br />

keystroke and other CAT tool actions, and to<br />

capture this in an open XML log file so that it can be<br />

analysed by workflow managers. The resulting tool,<br />

Strong industry engagement through deployment and<br />

trialling of tools has been conducted with Welocalize<br />

and Symantec, resulting in one technology licence.<br />

Further close collaboration is being undertaken together<br />

with LOC and ILT through the W3C’s MultilingualWeb-<br />

Language Technology working group. These and<br />

earlier engagements are resulting in on-going industry<br />

collaboration at the level of proposal writing, especially<br />

<strong>CNGL</strong>II, FP7 and Science Foundation Ireland/Enterprise<br />

Ireland Technology Innovation Development Award<br />

(TIDA). The WOZ and CMS-LION systems now also<br />

form core platforms for research in interactivity,<br />

interoperability and analytics in <strong>CNGL</strong>II.


Year 5 Demonstrator<br />

Programme


Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 81<br />

Year 5 Demonstrator Programme<br />

Goals<br />

The <strong>CNGL</strong> Demonstrator Programme aims to: promote<br />

and guide collaborative scientific work between <strong>CNGL</strong><br />

partners and between research tracks; showcase the<br />

relevance of <strong>CNGL</strong> research to industry and society in<br />

general; and provide regular milestones for assessing the<br />

collective progress and impact of <strong>CNGL</strong>.<br />

Research Challenges and Methods<br />

The Demonstrator Programme has achieved these<br />

objectives through a rolling programme of engagement<br />

by multiple teams of collaborating researchers across<br />

<strong>CNGL</strong> tracks that address specific use scenarios in<br />

response to industry needs. Each team developed<br />

a demonstrator system in an iterative manner<br />

and presented them in bi-annual showcases. The<br />

Demonstrator Programme balanced the scientific<br />

needs of individual researchers and PhD topics, diverse<br />

and evolving requirements of industry partners, and<br />

of the Centre in advancing collaborative research<br />

and IP commercialisation. It has also tracked and<br />

assessed progress visible via demonstrator systems<br />

and communicated this internally, to reviewers and<br />

advisers, to industry and the general public at <strong>CNGL</strong><br />

Localisation Innovation Showcase events. <strong>CNGL</strong> has<br />

carefully developed and resourced a flexible coordination<br />

organisational structure that enabled the programme to<br />

address its challenges effectively and in a timely fashion.<br />

Work on advancing the demonstrator systems was<br />

conducted by demonstration teams with members<br />

drawn from across universities, research tracks and<br />

industry partners. Demonstrator systems must exhibit<br />

potential industrial impact, but are also vehicles for<br />

scientific collaboration and instances of model-driven<br />

interoperability. These three factors therefore form the<br />

basis for evaluating demonstrator systems. Evaluations<br />

are recorded so as to track the progress through<br />

increasing maturity across the Demonstrator Programme<br />

as well as track links to the peer-review publications<br />

produced by the Centre.<br />

Achievements in Year 5 (<strong>2012</strong>)<br />

The Demonstrator Programme accomplished four major<br />

milestones in <strong>2012</strong>:<br />

1. In July, a showcase of selected demonstrator<br />

systems was presented to a panel of distinguished<br />

international reviewers as part of <strong>CNGL</strong>’s Year 5<br />

Review site visit.<br />

2. The Programme’s work on Metadata Semantics for<br />

Next Generation Localisation and its instantiation<br />

in demonstrator systems is receiving international<br />

recognition and is exerting a coordinated impact on<br />

both the major extant international standardisation<br />

efforts in localisation, namely W3C working group<br />

on Multilingual Web – Language Technology and<br />

the OASIS XLIFF Technical Committee.<br />

3. A large set of the demonstrator systems was<br />

showcased at a final public event at the Localisation<br />

Research Centre conference in Limerick in<br />

September.<br />

4. Several of the demonstrator systems have successfully<br />

graduated to the <strong>CNGL</strong> Commercialisation<br />

Programme and are now receiving seed funding from<br />

various sources to further develop their commercial<br />

potential.<br />

Another important achievement for the Demonstrator<br />

Programme has been in showing the real benefits<br />

of active resource curation and its role in improving<br />

the quality and performance of language technology<br />

components.<br />

As shown below in Figure 7, collectively the <strong>CNGL</strong><br />

Demonstrator Programme covers a range of content<br />

processing scenarios, from community management, to<br />

multimodal interaction to personalised discovery and<br />

consumption of content. Processes to both translate and<br />

slice/recompose content are core to these activities. The<br />

role of language technology (e.g. text analytics, machine<br />

translation and speech processing) in these scenarios is<br />

supported by the active curation of language resources.


82<br />

Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />

YEAR 5 DEMONSTRATOR PROGRAMME<br />

Figure 7: Content Processing Scenarios covered by the Demonstrator Programme<br />

Curating language resources as a secondary output<br />

of content processing activities promises significant<br />

progressive improvement of language technology<br />

components through systematic targeting and reuse of<br />

these resources. Such active curation has already shown<br />

significant improvement in Statistical Machine Translation<br />

(SMT) performance within a crowd-sourced translation<br />

project where the rapid retraining of SMT has been made<br />

possible by the active curation and reuse of human<br />

translation corrections. This work has led to funding<br />

being secured (SFI TIDA feasibility funding) to develop<br />

further rapid SMT retraining techniques.<br />

Demonstrator Showcases<br />

As mentioned above, the demonstrator systems were<br />

showcased at two events in <strong>2012</strong>, the <strong>CNGL</strong> Year 5<br />

Review (July) and the <strong>CNGL</strong> Localisation Innovation<br />

Showcase at the Localisation Research Centre conference<br />

(September). The following provides an overview of the<br />

key systems that were showcased at these events.<br />

This initial set of demonstrators highlights the<br />

commercialisation outputs of <strong>CNGL</strong> that have emanated<br />

from the Demonstrator Programme:<br />

} Text Classification for Bulk Localisation Review<br />

[Digital Linguistics/TCD – ILT/SF]: Phil Ritchie<br />

(Digital Linguistics) and Gerard Lynch demonstrated<br />

Review Sentinel, a software-as-a-service offering for<br />

scalable and consistent language quality management<br />

from <strong>CNGL</strong> spinout Digital Linguistics. This direct<br />

licensing and commercialisation of <strong>CNGL</strong> academic/<br />

industrial collaboration reduces linguistic review cost<br />

while ensuring the highest levels of style and brand<br />

consistency.<br />

} Wripl – Personalisation-as-a-Service across<br />

Websites [TCD – DCM]: Kevin Koidl and Brian<br />

Gallagher demonstrated non-invasive cross-site<br />

personalisation. This work improves a user’s<br />

experience as they browse across multiple different<br />

CMS systems to solve a particular task. As the user<br />

browses from site to site, the system gains knowledge<br />

about their task and gives hints to the CMS on which<br />

content to recommend. Wripl has been developed<br />

with the support of Science Foundation Ireland (SFI)/<br />

Enterprise Ireland (EI) Technology Innovation<br />

Development Award (TIDA) funding and its<br />

development is now supported by the Enterprise<br />

Ireland Commercialisation Development Fund.


Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 83<br />

} Emizar – Personalised Retrieval Composition and<br />

Presentation [TCD/Symantec – DCM]: Dr. Alex<br />

O’Connor presented the Personalised Multilingual<br />

Customer Care Adaptive Portal. This system combines<br />

formal technical content with social content harvested<br />

from user forums to present tailored, task-specific<br />

solutions to users for customer support and technical<br />

support problems, across languages and levels of<br />

expertise. This is already in receipt of SFI/EI TIDA<br />

funding and is undertaking trials with industrial<br />

reference customers.<br />

} KantanMT – Moses on the Cloud [DCU/<br />

Xcelerator – ILT]: Tony O’Dowd (Xcelerator)<br />

and Dr. Declan Groves demonstrated how ILTbased<br />

machine technology and know-how is being<br />

leveraged commercially to provide cloud-based MT<br />

services. Tony O’Dowd has formed a DCU spin out,<br />

Xcelerator, to commercialise this technology, which<br />

already has over 400 mid-sized client LSPs. Xcelerator<br />

secured US$1.2 million in funding from a syndicate of<br />

investors which will allow for the creation of 25 new<br />

development jobs.<br />

} iOmegaT – An Instrumented CAT Tool and its<br />

use in a Commercial Machine Translation Study<br />

[TCD/Welocalize – SF]: John Moran demonstrated<br />

how his instrumented version of the CAT tool<br />

OmegaT was used to collate post-editing time data<br />

during commercial MT evaluation projects conducted<br />

by Welocalize. Such data has the potential to be vital<br />

in assessing the post-editing effort and quality of<br />

machine translation and assessing the performance<br />

of different MT offerings in a commercial translation<br />

setting. This technology has also now secured SFI/EI<br />

TIDA feasibility funding for 2013.<br />

} PLuTO – Facilitating Patent Search with Machine<br />

Translation [DCU/FP7 – ILT]: Dr. John Tinsley<br />

demonstrated work by the EU-funded PLuTO<br />

project which has developed in-browser software<br />

that allows patent search professionals to carry out<br />

personalised translations on-the-fly. The technology<br />

uses statistical machine translation that has been<br />

adapted to the patent domain and deployed as a web<br />

service. This technology is now supported by the EI<br />

Commercialisation Development Fund for further<br />

development at DCU.<br />

} SOLAS Match – Leveraging community translation<br />

[UL/Rosetta Foundation – LOC]: Dr. Eoin Ó<br />

Conchúir demonstrated how SOLAS Match is used as<br />

a collaborative localisation platform for communitybased<br />

volunteer translators. This is being rolled out in<br />

the non-profit <strong>CNGL</strong> spin-out, the Rosetta Foundation,<br />

where it is being used to support a cohort of 6,000<br />

volunteer translators.<br />

} Rapid SMT Re-training [DCU/TCD/MLW-LT/<br />

PANACEA – ILT/SF]: Dr. Antonio Toral (Affiliated<br />

project – PANACEA) and Leroy Finn showed how<br />

a statistical machine translation (SMT) engine is<br />

re-trained using post-edits from non-professional<br />

translators. <strong>CNGL</strong> provides CMS-LION, which offers<br />

crowd-sourced post-editing integrated with Content<br />

Management Systems (CMS). PANACEA provides a<br />

web service for machine translation and workflows for<br />

the retraining of the SMT engine. This has resulted in<br />

additional SFI/EI TIDA feasibility funding to develop<br />

more rapid SMT retraining techniques.<br />

The following demonstrators showcased a high degree<br />

of industrial engagement and impact:<br />

} Visual Analytics for the Management of Online<br />

Communities [TCD/Symantec – SF]: John McAuley<br />

showed how visual analytics can make analysis of<br />

online interactions accessible to all members of an<br />

online community. This actively supports online<br />

communities in discussing and planning the evolution<br />

of their policies and processes, thereby increasing<br />

member engagement, and has been trialled through<br />

development of a tool to enable members of customer<br />

support communities at Symantec to observe and gain<br />

insight into the behaviour of key community members,<br />

or ‘gurus’.<br />

} Multilingual User Modelling for Personalised<br />

Multilingual Information Retrieval [TCD/DCU/<br />

Microsoft – DCM]: M. Rami Ghorab demonstrated<br />

a framework for multilingual search personalisation.<br />

This work provides a system to permit the delivery<br />

and evaluation of different combinations of functional<br />

elements of a personalised, multilingual information<br />

retrieval system, such as user modelling, query<br />

adaptation, results adaptation and translation. This<br />

demonstrator was advanced through a placement at<br />

Microsoft Ireland offices.


84<br />

Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />

YEAR 5 DEMONSTRATOR PROGRAMME<br />

} CMS-LIONSolas Integration: Full Content Lifecycle<br />

Metadata Interoperability TestBed [UL/TCD/MLW-<br />

LT – SF/LOC]: Dr. David Filip demonstrated a unique<br />

platform for testing complex metadata designs<br />

spanning process areas over the full multilingual<br />

content life cycle. David showed how a RDF-based<br />

provenance store is used between Web Content<br />

Management System (CMS) and XLIFF-based<br />

translation workflows. This demonstrates use cases<br />

for the round-tripping of Internationalisation Tag Set<br />

(ITS) metadata between content generation and<br />

publication in HTML5/XML and localisation processes<br />

in XLIFF. This therefore provides direct testable<br />

input into current standardisation working groups<br />

developing ITS, XLIFF and HTML5<br />

The final set of demonstrators highlights promising<br />

research directions that have influenced the focus of<br />

<strong>CNGL</strong>II:<br />

} MOODfinger – An Affective Search Engine [UCD-<br />

DCM]: Alejandra López-Fernández and Yanfen Hao<br />

presented their initial prototype of a search engine<br />

that retrieves texts that express a certain mood for a<br />

given query and then ranks the texts according to the<br />

degree to which they exhibit this mood. As part of this<br />

work, an affective lexicon is built which can be used to<br />

help retrieve, filter and rank web content in the most<br />

emotionally useful ways. The affective qualities of<br />

content, especially user-generated content, underpin<br />

several research activities in <strong>CNGL</strong>II.<br />

} WebWOZ – A Wizard of Oz Platform [SF/<br />

ILT – TCD/UCD]: Stephan Schlögl demonstrated<br />

a web-based system for supporting online dynamic<br />

intervention by designers while testing user<br />

interactions with application prototypes that will later<br />

incorporate language processing components. This<br />

provides a flexible tool for rapidly iterating low fidelity<br />

application prototypes using Wizard-of-Oz techniques.<br />

Stephan also discussed how WebWOZ was leveraged<br />

by ILT researchers in UCD for user evaluations of<br />

their MySpeech system. Released as an open source<br />

system, WebWOZ forms a key platform for multimodal<br />

interaction and dialogue research in <strong>CNGL</strong>II.<br />

} WinkTalk – Linking Facial Expressions to<br />

Expressive Synthetic Voices [UCD – ILT]: Éva<br />

Székely and Zeeshan Ahmed presented their work on<br />

using facial gestures to automatically select between<br />

expressive synthetic voice styles for use by synthetic<br />

voices and speech generating devices. The expressive<br />

features of the synthetic voices represent dimensions<br />

of emotional intensity rather than distinct emotions.<br />

This work shows the potential for supporting affectdriven<br />

dialogue systems.<br />

Metadata Semantics for Next Generation<br />

Localisation<br />

In addition to developing and showcasing a set of<br />

demonstrator systems, the Demonstrator Programme<br />

provides a basis for examining and modelling problems of<br />

interoperability across the scope of end-to-end content<br />

processing. The components used and integrated in<br />

the demonstrator systems derive from a number of<br />

different research and industrial communities, where<br />

typically either metadata was not formally defined or was<br />

specified in a fragmented set of standards.<br />

The Metadata Group (MDG) was established to<br />

concentrate and integrate the metadata knowledge from<br />

these different communities, including statistical machine<br />

translation and text analytics research, adaptive content<br />

and personalisation research, and localisation workflow<br />

and interoperability expertise. To address the universal<br />

trend towards web-based content and to offer a wellsupported,<br />

community-neutral approach to semantic<br />

modelling of metadata, the standardised languages of<br />

the W3C Semantic Web initiative were used, specifically<br />

the Resource Description Framework (RDF). This allowed<br />

multiple existing metadata standards and component<br />

metadata requirements to be incorporated into a single<br />

model. This thereby demonstrates the interrelation<br />

and utility of such interlinked metadata and provides<br />

a focus for wider consensus building on a semantic<br />

model that combines content management, localisation,<br />

natural language processing and content adaptation/<br />

personalisation. Such an approach enables existing<br />

service-oriented system integration to be enhanced<br />

through semantic annotation of differing interfaces,<br />

e.g. those used in SOLAS, or for SMT integration. It also<br />

supports linked-data provenance annotation for the pullbased<br />

interoperability approach used for the CMS-LION<br />

system.


Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 85<br />

Figure 8: WinkTalk Workflow<br />

With strong input from the demonstrator teams, the<br />

MDG has developed semantic models of content and<br />

process taxonomies that span the content processing<br />

scenarios (see Figure 7) supported by the Demonstrator<br />

Programme. This provides a broad and validated<br />

semantic model that will guide future interoperability<br />

solutions across the Global Intelligent Content space.<br />

This broad view of interoperability based around<br />

semantic metadata and its mapping to existing<br />

standards has allowed <strong>CNGL</strong> to impact on international<br />

standardisation efforts. As well as UL’s established<br />

participation in the OASIS XLIFF Technical Committee,<br />

in <strong>2012</strong> UL, TCD, DCU, Microsoft and VistaTEC together<br />

with several international academic and industrial<br />

collaborator and the support of EU funding, founded<br />

a new W3C working group on Multilingual Web –<br />

Language Technology. This working group addresses<br />

the interoperability challenges that exist in integrating<br />

content management systems, localisation systems and<br />

machine translation services. Interoperability use cases<br />

being addressed include: CMS-based content translation<br />

and quality assurance; CMS-LSP metadata round-tripping<br />

and content metadata for machine translation training<br />

and on-demand content translation. The consortium is<br />

led by DFKI (Germany), and contains other academic<br />

experts, a CMS vendor (Cocomore), several LSP and<br />

language technology providers (Moravia, Enlaso,<br />

LinguaServ. ]Init[. Logrus, Tilde and Lucy Software)<br />

as well as attracting further participation from large<br />

localisation clients including Adobe, SAP, Intel and IBM.<br />

Input from the Demonstrator Programme has been in the<br />

form of integration between CMS-LION, SOLAS, MaTrEx,<br />

PANACEA MT training services and localisation quality<br />

assurance from Digital Linguistics and VistaTEC. UL(LOC)<br />

and TCD(SF) have also been instrumental in driving<br />

roundtrip scenarios between ITS in HTML5/XML files<br />

and XLIFF-based workflows, thereby acting to harmonise<br />

parallel specification activities in the MLW-LT working<br />

group at the W3C and the XLIFF Technical Committee at<br />

OASIS as well as contributing to those groups individually<br />

as editors and co-chairs.


86<br />

Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />

YEAR 5 DEMONSTRATOR PROGRAMME<br />

Figure 9: Overview of Semantic Modelling of the Metadata Group in influencing international standards<br />

The chairing and organisation by the Metadata Group<br />

of the inaugural FEISGILTT <strong>2012</strong> interoperability and<br />

standards harmonisation workshop, co-located with<br />

Localization World in Seattle in October, played a key<br />

role in this harmonisation, and will be repeated at<br />

Localization World in London in 2013. In addition, the<br />

Metadata Group organised, in collaboration with the<br />

MLW-LT working group, a workshop in the Multilingual<br />

Web series on the role of Linked Open Data in the<br />

development of the multilingual web. This together with<br />

committee involvement in the Multilingual Semantic Web<br />

workshop in Boston and the Multilingual Linked Open<br />

Data for Enterprises workshop in Leipzig demonstrates<br />

that the <strong>CNGL</strong> Metadata Group is playing a significant<br />

role in guiding the convergence of language and<br />

localisation technologies with the linked data cloud. This<br />

role will continue in <strong>CNGL</strong>II through the Interoperability<br />

and Analytics theme as well as through proposed new<br />

EU projects.


Industry Partnerships and<br />

Technology Transfer


Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 89<br />

Industry Partnerships and<br />

Technology Transfer<br />

Overview<br />

For the last twenty years the localisation industry has<br />

focused on delivering valuable solutions that adapted<br />

content for specific geographic regions and cultures.<br />

The next twenty years will be about inventing realtime<br />

solutions which operate across the global content<br />

value chain to transform content into small actionable<br />

bits of information personalised to specific individuals,<br />

regardless of their current location. To accommodate<br />

this industry shift, the <strong>CNGL</strong> research programme has<br />

expanded out to look at key aspects of the end-to-end<br />

content value chain.<br />

Knowledge transfer within the Centre operates under<br />

an industry-standard Collaborative Research and<br />

IP Agreement. The IP agreement was signed by all<br />

parties in May 2008, while the Collaborative Research<br />

Agreement was signed in May 2009 at an event held at<br />

the IBM campus in Dublin. The Collaborative Research<br />

Agreement clearly defines how intellectual property<br />

generated by the Centre is managed and ultimately<br />

commercialised.<br />

<strong>CNGL</strong> is working with our industrial partners to deliver<br />

a range of solutions across the global content value<br />

chain that provide consistently fine-grained analysis and<br />

services to an ever more empowered and demanding<br />

group of global consumers.<br />

As <strong>CNGL</strong>’s fifth year draws to a close, we can report<br />

progress on multiple fronts, particularly in our<br />

commercialisation and industry outreach efforts. During<br />

the past year the <strong>CNGL</strong> Centre Management team has<br />

placed significant emphasis on maturing and deepening<br />

relationships with our current industry partners, as<br />

well as engaging with the broader ecosystem. At the<br />

same time, our Intellectual Property portfolio and<br />

commercialisation pipeline have come together and<br />

are demonstrating significant market potential. To date<br />

<strong>CNGL</strong> spinouts have raised in excess of €1.25M in<br />

venture capital funding and are projecting the creation of<br />

25+ private-sector jobs in the coming year.<br />

In <strong>2012</strong> <strong>CNGL</strong> continued its successful Localisation<br />

Innovation Showcase series, which has continually<br />

strong attendance since it was launched in 2009. The<br />

event, which attracts upwards of 100 attendees, is an<br />

opportunity to showcase emerging <strong>CNGL</strong> innovations.<br />

In addition, the event has become a catalyst for<br />

an expanding array of interactions between <strong>CNGL</strong><br />

researchers and practitioners from the broader industrial<br />

ecosystem.<br />

<strong>CNGL</strong> Spinout Showcase at Symantec’s offices in Ballycoolin, Dublin<br />

As a commercially-focused research centre, <strong>CNGL</strong><br />

depends upon its industrial partners to provide<br />

candid guidance regarding the research agenda and<br />

to continually assess our progress towards key project<br />

milestones. Industrial partners have representatives on<br />

every significant management committee within the<br />

<strong>CNGL</strong> organisational structure; this provides them with<br />

formal top-down communication channels through<br />

which to influence the research agenda. Furthermore,<br />

our corporate engagement strategy emphasises oneon-one<br />

reciprocal relationships between academic<br />

researchers in <strong>CNGL</strong> and their corporate equivalents,<br />

which provides equally important and effective bottomup<br />

communication channels.


90<br />

Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />

INDUSTRY PARTNERSHIPS AND TECHNOLOGY TRANSFER<br />

Alchemy’s initial 5-year commitment to <strong>CNGL</strong> had been<br />

valued at €630K, which is a combination of software<br />

licences and consulting expertise. The company has<br />

already contributed the full complement of software to<br />

our research tracks, valued at over €600K. In addition to<br />

software licences, Alchemy personnel have dedicated a<br />

significant number of hours working directly with <strong>CNGL</strong><br />

staff.<br />

TCD Winners of EI/SFI Technology Innovation Development Awards<br />

(TIDA) <strong>2012</strong> include Prof. Séamus Lawless, Dr. Alex O’Connor and Prof.<br />

Vincent Wade of <strong>CNGL</strong><br />

During <strong>2012</strong> Alchemy Software Development has been<br />

particularly active with respect to the use of Machine<br />

Translation technology and was a key supporter during<br />

<strong>2012</strong> of the successful SFI TIDA application which<br />

secured additional funding for research prototyping in<br />

the area of rapid machine translation retraining.<br />

Capita Translation & Interpreting<br />

(Previously Applied Language Solutions)<br />

Current Industrial Partnerships<br />

<strong>CNGL</strong> currently has 10 diverse corporate partners<br />

who maintain a strong commitment to the long-term<br />

success of our research efforts. Our partners include<br />

multinational companies such as DNP, IBM, Microsoft,<br />

and Symantec as well as indigenous and regional SMEs<br />

including Alchemy, SDL, SpeechStorm, Applied Language<br />

Solutions, Welocalize and VistaTEC.<br />

The diversity of our partners is a reflection of the<br />

challenges facing <strong>CNGL</strong> as well as the importance of<br />

our research to both the Irish economy and global<br />

marketplace. A successful realisation of the <strong>CNGL</strong><br />

objectives will help drive not only the development and<br />

productisation of novel early stage technologies but also<br />

solidify Ireland as the centre of excellence for multilingual<br />

localisation research and development.<br />

Alchemy<br />

Capita Translation & Interpreting became a full member<br />

of <strong>CNGL</strong> in January <strong>2012</strong> with its acquisition of Applied<br />

Language Solutions, which in turn had previously<br />

acquired original <strong>CNGL</strong> Partner Traslán. The company<br />

employs more than 150 members of staff worldwide<br />

and provides language solutions to customers in over 90<br />

countries, in more than 200 different languages. Traslán’s<br />

initial 5-year commitment to <strong>CNGL</strong> had been valued at<br />

€958K, which is a combination of software licences and<br />

consulting expertise. Applied Language Solutions has<br />

taken over the mantra and is already making significant<br />

contributions in terms of translation memories. One<br />

of the key benefits of <strong>CNGL</strong> membership is talent<br />

acquisition – having access to highly skilled researchers.<br />

Applied Language Solutions has hired three <strong>CNGL</strong><br />

researchers to enable the fast growth translation services<br />

provider to improve further its industry-leading service,<br />

through driven development of its machine-assisted<br />

translation solution.<br />

Alchemy Software Development is one of the world’s<br />

foremost and recognised localisation technology<br />

providers. The company was founded as an Irish SME<br />

in 2000 and, as a result of its phenomenal growth and<br />

success, completed a merger with Translations.com, a<br />

leading provider of software, website and enterprisewide<br />

localisation services, as well as localisation-related<br />

technology products, in 2008.


Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 91<br />

DNP<br />

Founded in Japan in 1876, Dai Nippon Printing (DNP)<br />

has grown to become one of the world’s leading<br />

comprehensive printing companies. DNP has developed<br />

a unique vision of the future of multilingual multi-modal<br />

digital media based on its significant expertise in the<br />

management of global multilingual content distribution.<br />

With the company predicting the coexistence of paper<br />

and digital media along with the anticipated creation<br />

of new forms of media, DNP’s participation in <strong>CNGL</strong><br />

is of particular strategic importance to its long-term<br />

objectives.<br />

Despite the distance, DNP is actively involved in the<br />

strategic direction of the Centre. During the course of<br />

<strong>2012</strong> DNP has sent representatives to Dublin to discuss<br />

commercialisation strategy as well as hosted a <strong>CNGL</strong><br />

Delegration to discuss <strong>CNGL</strong>’s new research programme.<br />

IBM<br />

IBM is one of the world’s leading technology and<br />

service providers dedicated to helping clients succeed<br />

in delivering business value by becoming more efficient<br />

and competitive through the use of business insight<br />

and information technology. As a multinational firm,<br />

IBM takes a globally integrated approach to innovation<br />

with a network of more than 60 software development<br />

and research laboratories that explore, test and support<br />

a wide range of emerging technologies. IBM first set<br />

up operations in Ireland over 50 years ago and since<br />

then the region has become the hub of worldwide<br />

research into linguistic technologies. Furthermore, the<br />

recently established IBM Dublin Centre for Advanced<br />

Studies (CAS) has made Human Language Technologies<br />

one of its core research priorities. IBM launched the<br />

LanguageWare project in 2001 with the vision of creating<br />

a componentised linguistic platform with applications<br />

across the company’s entire product portfolio.<br />

LanguageWare is now the most broadly used linguistic<br />

technology across IBM.<br />

Over the initial five years of the <strong>CNGL</strong> operation, IBM<br />

has committed a total of €8.65M in funding to the<br />

programme, €7.7M in the form of software licences and<br />

1.75 FTEs valued at €950K. To date we have integrated<br />

€6.9M worth of IBM software licences.<br />

Microsoft<br />

Founded in 1975, Microsoft is the global leader in<br />

software, services and solutions that help people and<br />

businesses realise their full potential. The company first<br />

set up operations in Ireland in 1985 and has steadily<br />

expanded its base of activity, now employing almost<br />

2,000 full-time and contract staff. As a company that<br />

localises products and services into 60+ languages,<br />

the need for integrated enterprise and personalised<br />

localisation tools is one of the fundamental challenges<br />

stretching across each of Microsoft’s business units.<br />

The company’s participation in <strong>CNGL</strong> provides our<br />

researchers with a unique industry perspective on the<br />

challenges of international product development.<br />

Microsoft has already contributed the full complement<br />

of original proposed contribution to the research tracks,<br />

valued at over €2M in terms of translation memories,<br />

helping researchers both in Bulk Enterprise Localisation<br />

and Personalised Multilingual Customer Care. Microsoft<br />

has filled two intern positions with <strong>CNGL</strong> researchers<br />

during <strong>2012</strong> and continues to be proactive on the<br />

industrial committee.<br />

SDL<br />

SDL was founded in 1992 and has since grown to become<br />

one of the world’s foremost localisation providers to<br />

businesses maintaining a global market presence. SDL<br />

is at the forefront of research and development in the<br />

fields of machine translation and global information<br />

management technologies. SDL’s industry leading<br />

position in the translation supply chain offers <strong>CNGL</strong><br />

researchers unparalleled access the tools and expertise<br />

that are used to serve over 400 of the world’s leading<br />

enterprises.


92<br />

Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />

INDUSTRY PARTNERSHIPS AND TECHNOLOGY TRANSFER<br />

SDL’s initial commitment to <strong>CNGL</strong> included a localisation<br />

management system (Idiom Worldserver) valued in<br />

excess of €300K over the life of the project. This software<br />

had already been delivered during the first year of the<br />

Centre’s operation and formed the backbone of the<br />

baseline <strong>CNGL</strong> Demonstrator System.<br />

SpeechStorm<br />

SpeechStorm is a solutions provider that specialises<br />

in integrating market leading voice platforms and<br />

speech recognition software with in-house application<br />

development expertise. The company is an SME based<br />

in Northern Ireland and serves a range of customers<br />

including multiple government agencies, utility providers<br />

and financial service firms. The company’s expertise<br />

in integrating multiple voice platforms and speech<br />

recognition systems is particularly relevant to the<br />

research work packages on Speech Technology within<br />

the Integrated Language Technologies track.<br />

SpeechStorm’s initial five-year commitment to <strong>CNGL</strong> was<br />

valued at €140K, which includes €80K worth of software<br />

services and 0.10 FTEs valued at €60K. SpeechStorm<br />

has to date interacted primarily through direct research<br />

engagements with the Speech Technology groups at<br />

UCD and TCD.<br />

Symantec<br />

Symantec is a global forerunner in the provision of<br />

solutions to help individuals and enterprises assure<br />

the security, availability and integrity of their digital<br />

information. The Symantec Shared Engineering Services<br />

group is responsible for company-wide localisation<br />

management along with on-going research and<br />

development efforts.<br />

Symantec’s primary areas of localisation-related research<br />

focus on machine translation, MT customer satisfaction<br />

studies, and techniques to enhance Rule-Based MT<br />

(RBMT) performance. During <strong>2012</strong> Symantec funded an<br />

additional PhD and postdoctoral research in the area of<br />

natural language parsing.<br />

Symantec’s initial commitments to <strong>CNGL</strong> have been<br />

exceeded, valued at €2.25M comprised of €2.0M worth<br />

of multiple translation memories and 2.15 FTEs valued<br />

at €225K. <strong>CNGL</strong> has seen additional commitments<br />

of content and translation memory resources from<br />

the company during <strong>2012</strong>. Symantec has also helped<br />

the researchers with specification of use scenarios for<br />

Demonstrator Systems and provided cash contributions<br />

to further the research and development in the area<br />

of Domain Adaption and Personalised Multilingual<br />

Customer Care.<br />

VistaTEC<br />

VistaTEC is a supplier of premier quality Translation,<br />

Linguistic Review and other language-related business<br />

services to leading high-tech companies throughout<br />

the world. Its sophisticated service delivery platforms<br />

contribute significant value to customers by providing<br />

them with enterprise solutions which are: scalable, time<br />

efficient, cost effective, synergistic and innovative.<br />

As a prominent provider of Language Services, VistaTEC<br />

has committed to an extensive programme of Research<br />

and Development that ensures that the firm remains at<br />

the forefront of the localisation industry and can offer<br />

its customers the pinnacle of added value. VistaTEC<br />

is a founding Industrial Partner of the Centre for Next<br />

Generation Localisation. VistaTEC’s research activities<br />

during <strong>2012</strong> have centred on the area of Text Analytics<br />

for translation review and the company has contributed<br />

to this research in terms of providing large testing data<br />

and access to human translation quality review. This<br />

commitment from VistaTEC has resulted in a very<br />

successful commercialisation of the research.


Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 93<br />

New Industrial Partnerships<br />

In the past year <strong>CNGL</strong> has continued with its extensive<br />

programme of industry outreach, following a strategy<br />

of targeting specific industry verticals where <strong>CNGL</strong> has<br />

developed robust, rapidly transferable expertise. This<br />

has resulted in one-on-one discussions with an array of<br />

companies and a number of new industrial collaborations<br />

that serve to extend the reach of our activities and help<br />

diversify funding to complement the initial investment<br />

made by Science Foundation Ireland. As a result of these<br />

activities we were pleased to welcome Intel/McAfee,<br />

our as our newest industrial partner, to <strong>CNGL</strong>’s research<br />

consortium starting in 2013.<br />

The industrial outreach efforts of <strong>CNGL</strong> emphasise two<br />

main pillars:<br />

Mr. Phil Ritchie presents Digital Linguistics at the <strong>CNGL</strong> Spinout<br />

Showcase in September <strong>2012</strong><br />

Welocalize<br />

} Ireland as a centre of excellence for high-value R&D<br />

(top-20 globally) with a critical mass of industry<br />

participants and ancillary activities<br />

} <strong>CNGL</strong> has a critical mass of applied academic research<br />

expertise in localisation and related industries which is<br />

valuable for partners and collaborators.<br />

Welocalize became the tenth industrial partner of <strong>CNGL</strong><br />

during 2011. Welocalize was founded in 1997, and is a<br />

privately-held, venture-backed company. Welocalize<br />

has more than 500 employees in 11 offices located<br />

in the USA, UK, Ireland, Germany, China and Japan.<br />

Clients include eight of the world’s top ten software<br />

and hardware companies. Welocalize provides nextgeneration<br />

translation supply chain management that<br />

delivers market-ready, translated content – when and<br />

where users demand – at a higher output, a faster<br />

pace and an affordable price. Welocalize supports<br />

organisations throughout the entire global content<br />

lifecycle, from authoring and product development,<br />

translation and quality assurance, to complete business<br />

process outsourcing and market validation.<br />

In conjunction with our industry outreach efforts, we<br />

have launched the <strong>CNGL</strong> Collaboration Framework,<br />

which provides mechanisms for new partners to<br />

engage with the Centre. This collaboration framework<br />

is designed to foster the flow of information among<br />

trusted partners while at the same time respecting the<br />

intellectual property obligations set forth by the <strong>CNGL</strong><br />

Collaborative Research Agreement. There are three<br />

broad types of classified collaboration opportunities set<br />

out: Full Members, Collaborators and Associates.<br />

Figure 10: <strong>CNGL</strong> Collaboration Framework<br />

Welocalize’s contribution to <strong>CNGL</strong> will be in terms<br />

of software development resources and supporting<br />

researchers with access to GlobalSight, a collaborative,<br />

flexible and sustainable translation management system.<br />

Welocalize was a key supporter during <strong>2012</strong> of the<br />

successful TIDA application which secured €98K funding<br />

for research in the area of rapid retraining of machine<br />

translation systems.


94<br />

Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />

INDUSTRY PARTNERSHIPS AND TECHNOLOGY TRANSFER<br />

Full Members<br />

Full Members are both industrial and academic partners<br />

who have agreed to be bound by the terms of the<br />

<strong>CNGL</strong> Collaborative Research and IP Agreements. Full<br />

Membership is available on a limited basis to third<br />

parties, who have a long-term strategic interest in <strong>CNGL</strong><br />

and the wherewithal to contribute substantial resources<br />

to on-going research activities within the Centre. Full<br />

Membership provides preferential IP access, Committee<br />

Membership, and direct access to researchers and staff<br />

within <strong>CNGL</strong>.<br />

Associate Members<br />

Associate Membership provides a springboard for<br />

organisations that may be interested in establishing<br />

deeper ties with <strong>CNGL</strong>. In exchange for a small<br />

membership fee, associates are granted an array of<br />

benefits, the most noteworthy being access to the prescreened<br />

<strong>CNGL</strong> publication stream. While Associates<br />

are not granted preferential access to IP generated in<br />

<strong>CNGL</strong>, it is expected that this group will play a critical<br />

role in the commercialisation and licensing of emerging<br />

technologies.<br />

Commercialisation<br />

<strong>CNGL</strong> is entering its fifth year with a rich pipeline of<br />

business opportunities. Previously, in order to support<br />

the maturation of our commercial pipeline, the<br />

management of <strong>CNGL</strong> placed significant emphasis on<br />

developing the Centre’s entrepreneurial ecosystem. In<br />

<strong>2012</strong>, as part of our Commercialisation Strategy, <strong>CNGL</strong><br />

initiated a comprehensive outbound effort to engage<br />

with the broader entrepreneurial ecosystem. This effort<br />

was made possible through the continued support of the<br />

Enterprise Ireland Commercial Development Manager<br />

(CDM) programme. The CDM programme has provided<br />

<strong>CNGL</strong> with a full-time staff member who focuses<br />

specifically on partnering strategies, open innovation<br />

initiatives, fund-raising and business development<br />

activities within the Centre.<br />

Mr. Tony O’Dowd of <strong>CNGL</strong> spinout Xcelerator Machine Translations<br />

discusses the KantanMT product with Mr. Steve Gotz, <strong>CNGL</strong><br />

Commercial Development Manager<br />

Collaborators<br />

Collaborators engage directly with <strong>CNGL</strong> on issues of<br />

strategic importance to them. Collaborators can be<br />

both industrial and academic entities that are either<br />

1) a <strong>CNGL</strong> Full Member who has sponsored a specific<br />

research project or 2) a legal entity not previously<br />

affiliated with <strong>CNGL</strong>. Collaborator projects are governed<br />

under separate and individual Collaborative Research, IP<br />

and Confidentiality Agreements, which provide a range<br />

of structural options. While collaborators operate under<br />

separate agreements, there is a benefit to integrating<br />

them under the broader <strong>CNGL</strong> umbrella, thereby<br />

facilitating valuable interactions and sharing of expertise.<br />

<strong>CNGL</strong> finished <strong>2012</strong> with two actively trading spinout<br />

companies: Xcelerator Machine Translation Solutions<br />

and Scream Technologies. To date these companies<br />

have raised a combined €1.25M in venture capital<br />

funding from an array of investors including Delta<br />

Partners, Enterprise Ireland as well as two private family<br />

offices. The companies are projecting the creation of<br />

over 25 private-sector jobs in the coming year.<br />

Scream Technologies<br />

Scream Technologies is a <strong>CNGL</strong> spinout company that<br />

specialises in creating synthetic voices from human<br />

actors, enabling companies to create human-sounding<br />

synthetic speech and control how it sounds. The service,<br />

which can run as a standalone installation, embedded<br />

solution or web application, has valuable applications in<br />

areas as diverse as video games, customer support and<br />

advertising.


Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 95<br />

Scream Technologies is currently located in DogPatch<br />

Labs Dublin, a startup incubator funded by Polaris<br />

Venture Partners. The company is promoted by Dr.<br />

Peter Cahill, a funded <strong>CNGL</strong> researcher, who joined<br />

the company full-time in <strong>2012</strong>. During <strong>2012</strong> Dr. Cahill<br />

was named one of Ireland’s top technology and startup<br />

leaders by well-known entrepreneurs Dylan Collins and<br />

Sean Blanchfield.<br />

Xcelerator Machine Translation Solutions<br />

The Localisation Service Market generates over US$20BN<br />

in annual revenues and has a robust and resilient<br />

annual growth rate of 7.5%. However, while the demand<br />

for translation services is surging upwards, there is<br />

downward pressure on prices and reducing margins. This<br />

is coupled with a demand by customers for shortened<br />

turnaround cycles for translation projects (shortened<br />

project cycles). Essentially, clients want more for less,<br />

faster.<br />

SME Collaboration Spotlight<br />

Reverbeo is a startup company using technology<br />

to help companies of all sizes growth their global<br />

audience. The company’s novel technology is able<br />

to harvest monolingual websites, translate them<br />

into multiple languages using a range of services<br />

including machine translation, crowd-sourcing and<br />

professional translators, and ultimately republish<br />

them with minimal effort.<br />

During <strong>2012</strong>, while the company participated<br />

in the NDRC Launchpad Programme, a team of<br />

<strong>CNGL</strong> researchers worked with the founders to<br />

help refine their minimum viable product and<br />

extend their product development roadmap.<br />

During 2013 <strong>CNGL</strong> is expanding the collaboration<br />

with Reverbeo, supported by Enterprise Ireland,<br />

and applying <strong>CNGL</strong> expertise to the challenge of<br />

domain-tuned machine translation systems.<br />

Professional translators need to explore new ways of<br />

improving productivity and reducing project turnaround<br />

times whilst maintaining exacting quality standards and<br />

linguistic consistency for their clients. The downward<br />

pressure on pricing and restraints on client budgetary<br />

plans makes this a daunting challenge. Xcelerator is a<br />

spin-out, promoted by Tony O’Dowd, which is developing<br />

software solutions to help professional translators<br />

address these challenges head-on; improving quality and<br />

consistency, and reducing project turnaround times and<br />

costs.<br />

Beyond startups, <strong>CNGL</strong> research and expertise<br />

has helped a range of external companies which<br />

are launching new products and services. These<br />

collaborations have leveraged crucial Enterprise Ireland<br />

funding schemes (Innovation Partnerships, Innovation<br />

Vouchers, Commercialisation Fund) to bridge the<br />

gap between research and the market. During <strong>2012</strong><br />

DigitalLinguistics, a <strong>CNGL</strong> licensee, launched its<br />

first product: ReviewSentinel. The product leverages<br />

core <strong>CNGL</strong> research in the area of text analytics to<br />

automatically perform linguistic quality assurance<br />

testing in a scalable and cost-efficient manner.<br />

Intellectual Property Management<br />

There are three agreements providing the legal<br />

framework in which the <strong>CNGL</strong> operates. The Funding<br />

Agreement outlines the financial arrangements between<br />

SFI and the lead institution. The IP Agreement outlines<br />

how IP is managed within <strong>CNGL</strong>, and the Collaborative<br />

Research Agreement is the all-encompassing agreement<br />

on how the programme is governed and managed.<br />

One of the core missions of <strong>CNGL</strong> is excellence in<br />

research, expanding the state-of-the-art through<br />

dissemination of research results. At the same time,<br />

<strong>CNGL</strong> is required to protect valuable IP and make it<br />

available for commercial exploitation. This needs careful<br />

management and our researchers operate under a<br />

publication code of practice. Before a paper is submitted<br />

to a conference it is uploaded to a publication tracking<br />

system and in turn emailed automatically to the <strong>CNGL</strong><br />

IP Committee to review for valuable IP. <strong>CNGL</strong> is a large<br />

research centre that generates over 100 publications<br />

each year and this is one of the ways in which all<br />

partners and PIs can identify IP across all research tracks.


96<br />

Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />

INDUSTRY PARTNERSHIPS AND TECHNOLOGY TRANSFER<br />

Another way to identify IP is through relationship and<br />

event-driven audits. Formal mechanisms are in place<br />

such as software disclosures and invention disclosures<br />

and publication reviews. Nonetheless, one of the best<br />

ways to identify IP is through continuous engagement<br />

with the researchers through both informal and formal<br />

meetings across the four universities. This engagement<br />

enables the IP team to identify patterns of activity across<br />

the research streams before publication of material.<br />

This also helps to promote awareness of IP at an early<br />

stage before the formal disclosures and helps the<br />

commercialisation team to bridge any gaps between<br />

research and industry.<br />

One of the mandates of <strong>CNGL</strong> is to diversify the funding<br />

base through affiliate collaborations. This presents certain<br />

challenges with regards to IP Management. Nevertheless,<br />

there is a framework in place that allows us to manage<br />

these collaborative projects in a way that protects the<br />

rights of the <strong>CNGL</strong> members as well as our affiliated<br />

partners. This year has seen a successful application of<br />

this framework across multiple EU FP 7 projects, IRCSETand<br />

EI-funded projects and direct industry funded<br />

engagements. The collaboration framework is designed<br />

to foster the flow and control of information between the<br />

affiliated project and the core <strong>CNGL</strong>, while at the same<br />

time respecting the IP obligations set forth by the original<br />

CRA.<br />

Spinouts panel at the <strong>CNGL</strong> Spring Scientific Committee Meeting,<br />

which took place in Dublin in May<br />

To facilitate successful implementation of our<br />

commercialisation strategy, we have been at the forefront<br />

of developing internal platforms that allow us to better<br />

collect, identify and manage all of the IP being generated<br />

by our researchers. This has been evident in the roll-out<br />

of a new product called LabJam that is currently in Beta.<br />

This system is designed to provide a more detailed view<br />

into our research streams and activities and to give our<br />

industry partners and SFI visibility into our innovation<br />

pipeline.


Management and<br />

Governance


Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 99<br />

Management and Governance<br />

Management Overview<br />

The Centre for Next Generation Localisation believes<br />

that clear and simple Management and Governance<br />

structures are essential to ensure the scientific,<br />

commercial and operational success of the Centre. Our<br />

Management and Governance structures are designed to<br />

support a world-class research environment based on:<br />

} simple, effective and efficient planning and decision<br />

making<br />

} clear responsibility<br />

} open and transparent communication structures<br />

} balanced and comprehensive representation and<br />

involvement of all partners and stakeholders<br />

} provision of point of contact and procedures for<br />

conflict resolution<br />

} flexibility to respond quickly and appropriately<br />

to changing environments<br />

} structures and support for Intellectual Property<br />

management, Technology Transfer and commercial<br />

exploitation<br />

} regular appraisal of the scientific programme by<br />

international experts<br />

} regular appraisal of management and governance<br />

structures<br />

} reflecting best practice in management and<br />

governance of large collaborative research centres.<br />

The Centre Director, Prof. Josef van Genabith, provides<br />

overall scientific leadership and responsibility for<br />

the running of the Centre. A number of boards and<br />

committees support the Director in the management,<br />

integration and oversight of the Centre’s research and<br />

operations following the principles set out above. In<br />

particular, the research efforts of the Centre involve a<br />

considerable amount of cross-site collaboration and<br />

interdependency between our four academic and ten<br />

industrial partners. This requires a strong emphasis on<br />

cross-site coordination.<br />

The overall management and governance of the Centre is organised as follows:<br />

Figure 11: <strong>CNGL</strong> Governance and Management Structure


100<br />

Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />

MANAGEMENT AND GOVERNANCE<br />

Research Co-Ordination<br />

The <strong>CNGL</strong> research programme is organised in a<br />

hierarchy of research tracks, work-packages, and subwork-packages.<br />

The four main research tracks relate<br />

to work in Integrated Language Technologies (ILT),<br />

Digital Content Management (DCM), Next Generation<br />

Localisation (LOC) and Systems Framework (SF). Within<br />

these four research tracks, the research programme is<br />

organised into 11 main work-packages, with individual<br />

research projects then organised in 50 sub-workpackages.<br />

Following this structure, co-ordination of the<br />

<strong>CNGL</strong> research activities operates across four interrelated<br />

levels:<br />

} CSET Coordination<br />

} Research Track Coordination<br />

} Main Work-package Coordination<br />

} Sub-Work-package Coordination<br />

Overall CSET Coordination is the responsibility of the<br />

Centre Director, Prof. Josef van Genabith. Research<br />

track coordination is the responsibility of the four Track<br />

Coordinators:<br />

} Integrated Language Technologies (ILT): Prof. Nick<br />

Campbell, TCD<br />

} Digital Content Management (DCM): Prof. Vincent<br />

Wade, TCD<br />

} Next Generation Localisation (LOC): Mr. Reinhard<br />

Schäler, UL<br />

} Systems Framework (SF): Dr. Saturnino Luz, TCD<br />

Each of the eleven main work-packages within the<br />

four research tracks has a work-package co-ordinator<br />

who liaises with the relevant research track leader. The<br />

structure of the four research tracks, 11 main workpackages<br />

and 50 individual sub-work-packages is shown<br />

below:<br />

Figure 12: <strong>CNGL</strong> Research Organisation


Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 101<br />

Integration Committee<br />

The <strong>CNGL</strong> research programme is highly collaborative,<br />

with two basic (ILT & DCM) and two applied (LOC<br />

& SF) research tracks and a demonstrator systems<br />

programme centred around shared use scenarios and<br />

demonstrator systems. Given the level of research coordination<br />

and integration across the four research tracks<br />

and main work-packages, and the level of integration<br />

involved in building demonstrator systems from research<br />

outputs, the <strong>CNGL</strong> Integration Committee is the main<br />

body dealing with the operations of the <strong>CNGL</strong> with<br />

particular emphasis on scientific matters. The Integration<br />

Committee is composed of the Centre Director (who<br />

chairs the committee), the Associate Director, all four<br />

track leaders, Prof. Julie Berndsen from UCD, and<br />

a representative of each industry partner to ensure<br />

maximum engagement of industry partners in oversight<br />

of the research programme. The Integration Committee<br />

meets on a bi-monthly schedule, with additional ad-hoc<br />

meetings called when necessary.<br />

Scientific Committee<br />

The <strong>CNGL</strong> Scientific Committee is comprised of all<br />

members of the Centre across all levels and functions.<br />

The full Scientific Committee typically meets twice every<br />

year in a two- or three-day plenary session to review and<br />

share research progress and outcomes. The meetings of<br />

the Scientific Committee also provide the opportunity<br />

for engagement with our International Collaborators and<br />

External Scientific Advisory Board.<br />

The inaugural <strong>CNGL</strong> Innovation Charette at the Spring Scientific<br />

Committee Meeting<br />

The <strong>CNGL</strong> Spring Scientific Meeting was held over<br />

two days (17th–18th May) at Chartered Accountants<br />

House near Trinity College Dublin. With participation<br />

from across the entire CSET and Industry Partners, the<br />

Meeting focused on discussion of the past and future<br />

of language and content research as well as ways to<br />

further catalyse collaboration with industry. The Meeting<br />

included presentations on key scientific areas including<br />

rapid-prototyping tools, personalised search using<br />

social media, and open-source localisation frameworks.<br />

It also featured demonstrations by <strong>CNGL</strong> spinout<br />

companies, along with a hands-on session on <strong>CNGL</strong>’s<br />

LabJam research activity platform, and the inaugural<br />

<strong>CNGL</strong> Innovation Charette. A charrette is an intense<br />

collaborative session designed to allow participants the<br />

opportunity to work together in a close setting to discuss<br />

real-world challenges and potential solutions. Following<br />

a vigorous period of interaction, each of the teams<br />

presented a three-minute pitch and the audience then<br />

had the opportunity to “invest” in the best ideas. The<br />

charette encouraged participants to imagine inspirational<br />

products that the Centre’s members could create with<br />

their knowledge, and it proved an excellent vehicle<br />

through which to foster imaginative thinking.<br />

<strong>CNGL</strong> researchers and industrial collaborators share their research<br />

highlights at the <strong>CNGL</strong> Spring Scientific Committee Meeting<br />

Due to significant Centre-wide planning for <strong>CNGL</strong>II<br />

and preparations for the <strong>CNGL</strong> Localisation Innovation<br />

Showcase in September, the Centre did not host an<br />

Autumn Scientific Committee Meeting in <strong>2012</strong>.


102<br />

Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />

MANAGEMENT AND GOVERNANCE<br />

Operational Management<br />

Centre Operations Team<br />

The day-to-day implementation of the Centre’s<br />

operational decisions and policies, financial management,<br />

activity co-ordination, tracking and reporting is carried<br />

out by the Centre Operations Team in close co-operation<br />

with the Centre Director. The Centre Operations Team<br />

is led by Dr. Páraic Sheridan and meets weekly with<br />

the Centre Director and Deputy Director to continually<br />

monitor and prioritise activities across all operational<br />

functions, including finance, human resources, reporting,<br />

system administration and software and IP management.<br />

The composition of the Centre Operations team is as<br />

follows:<br />

} Dr. Páraic Sheridan, Associate Director<br />

} Ms. Hilary McDonald, Project Manager<br />

} Mr. Steve Gotz, Commercial Development Manager<br />

} Mr. Stephen Roantree, IP Manager (departed <strong>CNGL</strong><br />

in Quarter 2 <strong>2012</strong>)<br />

} Ms. Sophie Matabaro, Centre Administrator (on<br />

maternity leave from September <strong>2012</strong>)<br />

} Mr. Joachim Wagner, Systems Administrator<br />

} Ms. Fiona Maguire, Financial Administrator<br />

} Ms. Eithne McCann, Centre Secretary<br />

} Ms. Cara Greene, Education and Outreach Manager<br />

} Ms. Laura Grehan, Marketing and Communications<br />

Officer<br />

Mr. Stephen Roantree departed from his position as<br />

<strong>CNGL</strong> Intellectual Property Manager during Quarter 2,<br />

to take up a senior management role with Lionbridge,<br />

based in Dublin. Stephen now leads Lionbridge’s Quality<br />

and Innovation, Engineering, Testing, DTP and Web<br />

Publishing Groups. He continues to engage with <strong>CNGL</strong>.<br />

In addition to the day-to-day work of the Centre<br />

Operations team in executing the operational policies<br />

and activities of the <strong>CNGL</strong>, several Management Boards<br />

and Committees provide direction and prioritisation of<br />

the Centre’s various activities.<br />

Management Committee<br />

The Management Committee is the <strong>CNGL</strong>’s decision<br />

making body and provides leadership, policy, strategy,<br />

resource allocation, performance monitoring and<br />

review, management of CSET membership, and conflict<br />

resolution. The Management Committee meets quarterly<br />

and is chaired by the Centre Director. Its membership is<br />

made up of the Centre’s Co-Principal Investigators and,<br />

although Industry Partner representatives are invited to<br />

participate in Management Committee meetings, they<br />

do not hold a vote. The membership of the Management<br />

Committee for <strong>2012</strong> included:<br />

} Prof. Josef van Genabith, DCU (Director) [Chair]<br />

} Prof. Vincent Wade, TCD (Deputy Director)<br />

} Prof. Nick Campbell, TCD<br />

} Mr. Reinhard Schäler, UL<br />

} Dr. Saturnino Luz, TCD<br />

Education and Outreach Board<br />

The Education and Outreach Board provides leadership,<br />

policy and strategy, objectives and resource allocation<br />

for the Centre’s Education and Outreach Programme.<br />

The Education and Outreach Board meets quarterly<br />

and reports to the <strong>CNGL</strong> Management Committee.<br />

The Board is chaired by the Education and Outreach<br />

Manager, and consists of participants from the academic<br />

participants who have funded E&O Programmes<br />

(TCD and UL) and one nominee from the Industrial<br />

participants in the Centre. The membership of the<br />

Education and Outreach Board in <strong>2012</strong> included:<br />

} Ms. Cara Greene, DCU [Chair]<br />

} Dr. Páraic Sheridan, DCU<br />

} Mr. Karl Kelly, UL<br />

} Dr. Seamus Lawless, TCD<br />

} Ms. Laura Grehan, DCU<br />

} Dr. Fred Hollowood, Symantec


Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 103<br />

IP Management Board<br />

The IP Management Board manages the Intellectual<br />

Property of the Centre and facilitates Technology<br />

Transfer and commercial exploitation of IP generated<br />

by the Centre. The IP Management Board advises the<br />

Centre on all IP issues and, in particular, evaluates<br />

proposed publications and invention disclosures in<br />

accordance with the Centre’s IP agreement. The<br />

IP Management Board meets quarterly. The IP<br />

Management Board consists of nominees from each of<br />

the participating university and industrial partners plus<br />

the Centre’s Associate Director. It was chaired by the IP<br />

Manager, Mr. Stephen Roantree until his departure from<br />

<strong>CNGL</strong> to Lionbridge in Quarter 2. The IP Management<br />

Board membership draws on academic membership both<br />

from the Research Leaders (Co-PIs) and representatives<br />

from the respective University Technology Transfer<br />

Offices (TTOs). The IP Management Board reports to the<br />

Management Committee.<br />

External Oversight<br />

Following SFI guidelines and best practice for the<br />

oversight and governance of large research centres,<br />

<strong>CNGL</strong> has two external advisory and oversight boards<br />

that meet regularly to review the scientific and<br />

operational progress of the Centre.<br />

Mr. Steve Gotz, <strong>CNGL</strong> Commercial Development Manager, H.E. Mr. John<br />

Neary, Ambassador of Ireland to Japan, Dr. Páraic Sheridan, Associate<br />

Director, <strong>CNGL</strong>, and Ms. Diane Foley, IDA Ireland Deputy-Director<br />

Japan. <strong>CNGL</strong> delivered a seminar to Japanese businesses in Tokyo in<br />

April, which was hosted by the Irish Ambassador to Japan and facilitated<br />

by IDA Ireland’s Japan Office.<br />

External Scientific Advisory Board<br />

Mr. Stephen Roantree, previously <strong>CNGL</strong> Intellectual Property Manager,<br />

now with Lionbridge Dublin<br />

Commercialisation Committee<br />

The Commercialisation Committee promotes and<br />

oversees the agenda of research commercialisation,<br />

which is a core part of the Centre’s strategy. The<br />

Committee meets on a quarterly basis and its meetings<br />

are co-located with meetings of the IP Management<br />

Board and the Industry Advisory Board at <strong>CNGL</strong> Industry<br />

Partner sites.<br />

The External Scientific Advisory Board provides review of<br />

the long-term scientific direction, impact and progress of<br />

the Centre. It advises, challenges and provides guidance<br />

to the Management Committee on both the overall<br />

scientific goals and objectives of the Centre as well as on<br />

the on-going management of the Centre. The External<br />

Scientific Advisory Board aims to meet bi-annually and<br />

work in close co-operation with the Executive Committee<br />

and the Centre Director. The <strong>CNGL</strong> External Scientific<br />

Advisory Board consists of recognised world leaders from<br />

both academia and industry in the fields of Language<br />

Technology, Machine Translation, Speech, Adaptive<br />

Hypermedia, Information Retrieval, and Localisation.<br />

The External Scientific Advisory Board is chaired by an<br />

expert from the area of Localisation, Mr. Francis Tsang.<br />

Mr. Tsang is Director of Globalisation at Adobe Systems<br />

Inc. He is responsible for the strategy and delivery of all<br />

localised Adobe product releases and the development<br />

of tools and libraries in the internationalisation area. Mr.<br />

Tang has spent the last twenty years building software<br />

for various international markets. He holds degrees in<br />

computing and business management.


104<br />

Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />

MANAGEMENT AND GOVERNANCE<br />

The <strong>CNGL</strong> External Scientific Advisory Board actively<br />

participates in the bi-annual <strong>CNGL</strong> Scientific Committee<br />

meetings and reports back to the Centre Director<br />

and Management Committee. The board is currently<br />

composed of the following members:<br />

} Mr. Francis Tsang, Adobe Corporation, USA<br />

[Localisation] (Chair)<br />

} Dr. Andrew Bredenkamp, Acrolinx GmbH, Germany<br />

[Language Technology]<br />

} Prof. Lauri Karttunen, PARC, USA [Language<br />

Technology]<br />

} Prof. Makato Nagao, President, NIST, Japan<br />

[Machine Translation]<br />

} Prof. Carol Espy-Wilson, University of Maryland, USA<br />

[Speech Technology]<br />

} Prof. Peter Brusilovsky, University of Pittsburgh, USA<br />

[Adaptive Hypermedia]<br />

} Prof. Elizabeth Liddy, Syracuse University, USA<br />

[Information Retrieval and NLP]<br />

} Dr. Mike Dillinger, Principal, TOPs Globalization<br />

Consulting<br />

External Oversight Board<br />

In accordance with SFI requirements, the President of<br />

DCU as the host institution has appointed an External<br />

Oversight Board to help with the oversight and<br />

assessment of the Centre’s progress. The Oversight Board<br />

reports to SFI on a quarterly basis. The Oversight Board<br />

is composed of members drawn from a mix of academic<br />

partners, a representative from the <strong>CNGL</strong> Industry<br />

Partners, and other external independent members.<br />

The board currently consists of the following members:<br />

} Mr. David MacDonald [Chair]<br />

} Prof. Josef van Genabith, Centre Director<br />

} Prof. Alan Harvey (VP Research, DCU)<br />

} Prof. Vinny Cahill (Dean of Research, TCD)<br />

} Mr. Gearóid Mooney (Enterprise Ireland)<br />

} Mr. Aidan Sweeney (IBEC)<br />

In addition to the full members of the External<br />

Governance Board (which included Centre Director Prof.<br />

Josef van Genabith), <strong>CNGL</strong> is represented at quarterly<br />

meetings by:<br />

} Prof. Vincent Wade, Deputy Director<br />

} Dr. Páraic Sheridan, Associate Director<br />

The Oversight Board met quarterly during <strong>2012</strong> to review<br />

<strong>CNGL</strong> progress against its scientific and operational<br />

targets to review Key Performance Indicators (KPIs) and<br />

report back to SFI.<br />

<strong>2012</strong> Significant Accomplishments<br />

In the fifth year of the Centre for Next Generation<br />

Localisation, the following management and governance<br />

accomplishments have been recorded:<br />

} <strong>CNGL</strong> successfully passed its SFI Final Review and<br />

succeeded in its application for a second cycle of<br />

funding from SFI. The Review and funding application<br />

appraisal were conducted over two days in July at<br />

Trinity College Dublin. The review panel, which<br />

comprised senior figures from industry and academia,<br />

assessed the Centre’s performance and future<br />

potential on a range of criteria, including scientific<br />

excellence and social and economic impact. In its<br />

report the panel stated that “<strong>CNGL</strong> successfully built<br />

the infrastructure for a fully functioning, professional<br />

research centre, including strong capabilities in<br />

overall research direction, reporting, professional<br />

administration, outreach, budget allocation, and<br />

more.” The panel also acknowledged the Centre’s<br />

“mature change management approach” and its<br />

“forward-thinking, strong IP management and tech<br />

transfer capability”, and “was impressed by the<br />

educational outreach at all levels”.<br />

} The Centre Operations Team performed excellently<br />

the challenging task of final reporting for <strong>CNGL</strong>I<br />

alongside providing significant input into preparation<br />

of the <strong>CNGL</strong>II proposal and coordinating the Site Visit<br />

of the review panel in July. The Site Visit included an<br />

exhibition of posters and demos of <strong>CNGL</strong> research<br />

to date. This substantial additional workload was<br />

managed while still maintaining quality delivery of the<br />

day-to-day operations of the Centre and roll-out of a<br />

number of new initiatives in the areas of education<br />

and outreach, commercialisation, reporting and<br />

finance.


Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 105<br />

Demonstrator showcase at the <strong>CNGL</strong> SFI Site Visit in July at Trinity College Dublin<br />

} The Centre Operations Team has continued to adapt<br />

to the evolving needs of the Centre with changes in<br />

<strong>2012</strong> reflecting in particular the greater emphasis on<br />

commercial engagement. The Associate Director and<br />

Commercial Development Manager spearheaded<br />

a coordinated campaign to attract new industrial<br />

collaborators. Supported by industry-facing Marketing<br />

and Communications resources, the campaign team<br />

stepped up the Centre’s presence at and input into<br />

key industry events in the Intelligent Content area<br />

and delivered pitches to high priority targets. The<br />

campaign has led to Intel signing up as Industry<br />

Partner for <strong>CNGL</strong>II and it has also generated a<br />

number of other promising active leads.<br />

Operational and Management plans for the coming<br />

year focus on ensuring smooth transition to <strong>CNGL</strong>’s<br />

second cycle of funding. Priorities include rollout of<br />

the Centre’s novel research programme centred on the<br />

Global Intelligent Content theme, attracting talented new<br />

recruits at all levels, establishing the Centre’s new Design<br />

and Innovation Lab, finalising and signing off on renewed<br />

IP and collaborative research agreements, and securing<br />

additional industry partners. There will also be significant<br />

input from the Centre Operations team into the running<br />

of SIGIR 2013 – The 36th <strong>Annual</strong> Conference of the ACM<br />

Special Interest Group on Information Retrieval. <strong>CNGL</strong><br />

will co-host SIGIR 2013 in Dublin in July-August 2013.<br />

} The ‘Localisation Innovation Showcase’ event<br />

collocated in Limerick with the LRC <strong>Annual</strong><br />

Conference in September was a huge success,<br />

drawing in more than 70 industry representatives<br />

from companies based in Ireland and abroad. The<br />

Showcase event included 10 individual stations of<br />

<strong>CNGL</strong> demonstrator systems as well as a multitude<br />

of research posters, industry partner booths, and<br />

display of education and outreach activities.


Education and Outreach


Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 107<br />

Education and Outreach<br />

The <strong>CNGL</strong> Education and Outreach Programme<br />

encompasses a broad range of activities from internal<br />

communications and professional development, public<br />

relations and marketing to public-facing projects and<br />

education programmes to foster the next generation of<br />

professionals in content-related industries. We aim to<br />

raise the profile of scientific research within Ireland by<br />

highlighting education and career opportunities in key<br />

areas in the content field. Through <strong>CNGL</strong> carrying out<br />

world-class research and commercialisation activities,<br />

we are promoting Ireland as a global leader in the<br />

localisation industry. Below is an overview of activities<br />

under each Programme.<br />

Overview of <strong>CNGL</strong> Education and Outreach<br />

Reach and Impact<br />

Education and Human Capital Development<br />

Strategic Marketing and Communications<br />

Education and Human Capital Development<br />

The aim of our Education Programme is to provide<br />

education and promote career opportunities in key areas<br />

of content intelligence, computer science and language<br />

technology. We aim to engage young people in these<br />

areas to build a strong Irish base of future computer<br />

scientists in content related industries.<br />

<strong>CNGL</strong> offers a comprehensive programme of education<br />

programmes aimed at all age-groups ranging from<br />

courses for primary school students, secondary<br />

school programmes, undergraduate and postgraduate<br />

programmes, to internal professional development for<br />

our <strong>CNGL</strong> researchers and staff. Above is an overview<br />

of the education programme’s aims for each target level.<br />

Education and Human Capital Development<br />

Highlights from <strong>2012</strong><br />

Fourth Level Education: <strong>CNGL</strong> supports a number of<br />

seminar series across individual component research<br />

disciplines, including a popular series with the National<br />

Centre for Language Technology, seminars hosted by<br />

each of the member research groups and the Dublin<br />

Computational Linguistics Research Seminars series.<br />

<strong>CNGL</strong> operates internal member-focused training<br />

programmes on presentation skills, Intellectual Property,<br />

commercialisation and entrepreneurship and project<br />

management. <strong>CNGL</strong> also provides “101” sessions for all<br />

staff on key <strong>CNGL</strong> topics. PhD students are also given<br />

opportunities to undertake an internship with industry<br />

partners.<br />

Eleven visiting MSc and PhD interns joined ILT<br />

over a period of five months in <strong>2012</strong>, under <strong>CNGL</strong>’s<br />

postgraduate internship programme. The programme<br />

enables students to gain valuable experience as part of a<br />

highly-regarded and continually-growing research centre.<br />

This year’s programme attracted interns from institutions<br />

across the globe, including Italy, France, China and India.<br />

The internships covered a wide range of topics in Natural<br />

Language Processing and Machine Translation.<br />

Education Programme Aims and Targets<br />

Encourage ICT and Language awareness<br />

Promote study of STEM disciplines<br />

Promote focus on <strong>CNGL</strong><br />

research topics<br />

Preparing graduates<br />

for careers<br />

Career Opportunities<br />

Primary Level<br />

Second Level<br />

Third Level<br />

Fourth Level<br />

In partnership with DCU and the National Centre for<br />

Language Technology, <strong>CNGL</strong> was successful in a Marie<br />

Curie Mobility grant application for the EXPERT PhD<br />

Graduate School with a total of 15 PhD Marie Curie<br />

fellowships (two of them at DCU) and three postdoctoral<br />

researchers. EXPERT comprises DCU and five other<br />

university partners and five industry partners. It focuses<br />

on empirical approaches to (machine) translation, and<br />

as part of their training PhD students will spend time at<br />

DCU’s EXPERT university and industry partners.


108<br />

Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />

EDUCATION AND OUTREACH<br />

Table 1<br />

Project name Student Supervisor Track<br />

Using Biometric Response to Locate Personally Interesting Digital Content Robert Lis Liadh Kelly DCM<br />

Implementing new methods for speech retrieval Tom Mason Liadh Kelly DCM<br />

Exploring Personalised and Collaborative Information Retrieval Paul Redmond Liadh Kelly DCM<br />

Visualisation of Topic Models Conor O’Gorman Liadh Kelly DCM<br />

Crowd-sourcing for query development and relevance judgment Ciaran Porter Liadh Kelly DCM<br />

Communications and Education Siobhan O’Mara Cara Greene/<br />

Laura Grehan<br />

E&O<br />

Facial recognition for real-time content personalisation (Kinect) Thomas Dunne Steve Gotz CM<br />

Facial recognition for real-time content personalisation (Kinect) Emer Hedderman Steve Gotz CM<br />

Building ontology-based content management (OCM) system<br />

James Mark<br />

Hender<br />

Yalemisew<br />

Abgaz<br />

DCM<br />

Generation of interactive infographics from Semantic and Open Data Erika Duriakova Alex O’Connor DCM<br />

Yodle – Generating Presentations from Wikipedia Alla Kovaleva Alex O’Connor DCM<br />

Query-biased summarization Shane McQuillan Gareth Jones DCM<br />

Communications and Education Siobhan Swords Cara Greene/<br />

Laura Grehan<br />

E&O<br />

Real-time Web Annotation Kristo Mikkonen Dominic Jones E&O<br />

Economic Commission for Africa at its Information<br />

Training Centre for Africa in Addis Ababa, Ethiopia.<br />

The aim of the programme is to promote African<br />

languages in the Information Society.<br />

Finally, the LRC Best Thesis Award <strong>2012</strong> was presented<br />

in September to former <strong>CNGL</strong> PhD student Ben<br />

Steichen for his thesis “Adaptive Retrieval, Composition<br />

& Presentation of Closed-Corpus and Open-Corpus<br />

Information”. Katrin Drescher of Award-sponsors<br />

Symantec praised the scientific excellence and industrial<br />

relevance of Ben’s work.<br />

Ms. Aida Opoku-Mensah of the United Nations Economic Commission<br />

for Africa (UNECA) speaks at the launch of University of Limerick’s MSc<br />

in Multilingual Computing and Localisation to be co-hosted by UNECA<br />

in Ethiopia.<br />

Another exciting development on the fourth level<br />

education front was the announcement in November<br />

that University of Limerick’s MSc in Multilingual<br />

Computing and Localisation is to be delivered through<br />

distance learning and co-hosted by the United Nations<br />

Third Level Education: The <strong>CNGL</strong> Undergraduate<br />

Internship Programme continued to attract top students<br />

in <strong>2012</strong>. The primary aim of the <strong>CNGL</strong> undergraduate<br />

internship is to offer exceptional undergraduate students<br />

the opportunity to participate in and contribute to<br />

exciting research projects at <strong>CNGL</strong>. The programme<br />

enables interns to use leading research facilities and<br />

we aim to inspire these students to take the first step<br />

on a path to a research career. It is also an important<br />

opportunity to promote taught Masters programmes<br />

at <strong>CNGL</strong> universities and to host interns at our


Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 109<br />

industrial partners. The internships consist of INTRA/<br />

Co-op placements for 6 months, and 8-week summer<br />

internships. Many of those interns go on to do <strong>CNGL</strong>themed<br />

third and fourth year projects with <strong>CNGL</strong><br />

supervisors.<br />

<strong>CNGL</strong> hosted ten undergraduate interns across a wide<br />

range of research areas. Table 1 shows the list of <strong>2012</strong><br />

Undergraduate Summer internships.<br />

<strong>CNGL</strong> is currently creating an online graduate brochure<br />

aimed at third level students with information on the<br />

Taught Masters and PhD programmes available in each<br />

of the <strong>CNGL</strong> universities. The brochure also includes<br />

profiles of our graduated PhD students and former<br />

postdoctoral researchers. The profiles detail <strong>CNGL</strong><br />

alumni education and career paths since graduating<br />

from <strong>CNGL</strong>.<br />

Some of the 450 second level students completed the <strong>CNGL</strong>-supported<br />

‘ComputeTY’ programme at DCU in January <strong>2012</strong><br />

<strong>CNGL</strong> continued to support the ComputeTY <strong>2012</strong><br />

Programme in DCU. ComputeTY students select<br />

one of two streams: Web Design or Introduction to<br />

Programming. The overall content offers a broad<br />

range of computing skills from the creative aspect of<br />

website design to the problem-solving challenges of the<br />

programming stream. 450 students attended the course<br />

over 4 weeks in January <strong>2012</strong> with the same number due<br />

to complete the course in January 2013. Since its launch<br />

in 2005, ComputeTY has been completed by almost<br />

3,500 Transition Year students from Dublin schools.<br />

The programme has a strong track record of recruiting<br />

students to study computing at third level.<br />

<strong>CNGL</strong>’s undergraduate interns showcase the outcomes of their work<br />

at a poster and demo display at DCU<br />

Second Level Education: Secondary school students are<br />

a key demographic for the education programme with<br />

more than 1,500 secondary school students engaging<br />

with <strong>CNGL</strong> education programmes and competitions.<br />

<strong>CNGL</strong> aims to attract students to study fields related to<br />

content intelligence by running programmes that foster<br />

key problem-solving skills that are needed for<br />

this industry.<br />

The outstanding success of the Education Programme is<br />

the <strong>CNGL</strong> All Ireland Linguistics Olympiad (AILO). Over<br />

3,500 secondary school students from 167 schools in<br />

the Republic of Ireland and Northern Ireland have taken<br />

part in AILO since the first competition in 2009. The<br />

competition challenges secondary school students to<br />

apply logic and computational thinking to solve complex<br />

puzzles in unfamiliar languages. Past participants have<br />

gone on to pursue studies in computer science, maths<br />

and linguistics at third level, which suggests that the<br />

competition is meeting its goal of fostering the next<br />

generation of problem solvers.<br />

More than 400 students from 44 schools in 23 counties<br />

competed in the preliminary round of AILO <strong>2012</strong>. The top<br />

100 performers were allocated a <strong>CNGL</strong> researcher, who<br />

acted as a tutor for the national final in March at DCU.


110<br />

Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />

EDUCATION AND OUTREACH<br />

The top four individual students went on to represent<br />

Ireland at the International Linguistics Olympiad (ILO) in<br />

Slovenia in July <strong>2012</strong>.<br />

Also targeting the second level market is <strong>CNGL</strong>’s<br />

Language Trap: An Adaptive Language Learning Video<br />

Game. The game was initially designed to aid students<br />

in preparing for the Leaving Certificate German Oral<br />

Examinations by means of an iteractive dialogue<br />

system. The Irish language version of the game has<br />

been evaluated with schools. The German game is now<br />

available on http://seriousgames.cs.tcd.ie/.<br />

<strong>CNGL</strong> promoted its research and education programmes at the SFI<br />

booth at the BY Young Scientist & Technology Exhibition in January<br />

Education Programme plans for 2013 will focus on<br />

the transition to a second cycle of funding for <strong>CNGL</strong>,<br />

in which the Centre will pioneer the concept “Global<br />

Intelligent Content”. Opportunities in this area include<br />

app competitions for secondary school students, and<br />

establishing a Masters programme in Intelligent Content.<br />

Strategic Marketing and Communications<br />

Ms. Mary Mitchell-O’Connor, T.D. attended the national final of the All<br />

Ireland Lingusitics Olympiad at DCU. Deputy Mitchell-O’Connor urged<br />

students to use their aptitude for problem-solving to pursue careers at<br />

the intersection of computing, language and linguistics<br />

The <strong>CNGL</strong> Education Programmes are complemented<br />

by Transition Year internships in the <strong>CNGL</strong> labs and<br />

by the high-quality Careers Brochure focused on the<br />

commercial career opportunities at the intersection of<br />

Computing, Languages, Culture and Business. The guide<br />

was distributed to guidance counsellors in 729 secondary<br />

schools. <strong>CNGL</strong> exhibited at the BT Young Scientist <strong>2012</strong><br />

competition in the RDS in January <strong>2012</strong>. Students got<br />

the chance to try out demos and also test their problemsolving<br />

skills with AILO puzzles.<br />

<strong>CNGL</strong>’s Outreach Programme aims to highlight <strong>CNGL</strong><br />

achievements, to engage with the public and to promote<br />

Ireland as a world leader in localisation. The programme<br />

spans public relations and marketing, to hosting industry<br />

and academic events, publishing ‘Localisation Focus – the<br />

International Journal of Localisation’, and attending the<br />

BT Young Scientist and Technology Exhibition.<br />

Strategic Marketing and Communications<br />

Higlights from <strong>2012</strong><br />

<strong>CNGL</strong> has raised its media profile with 84 media<br />

mentions recorded in <strong>2012</strong>. The <strong>CNGL</strong> newsletter<br />

was published on quarterly basis and has proved an<br />

effective means through which to communicate <strong>CNGL</strong><br />

news, events, success stories and researcher profiles to<br />

government, media, industry and academic stakeholders.


Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 111<br />

VOL 01 ISSUE<br />

07<br />

QUARTER 2 <strong>2012</strong><br />

Mr. Steve Gotz, <strong>CNGL</strong><br />

Commercial Development<br />

Manager, H.E. Mr. John<br />

Neary, Ambassador of<br />

Ireland to Japan, Dr.<br />

Páraic Sheridan, Associate<br />

Director, <strong>CNGL</strong>, and Ms.<br />

Diane Foley, IDA Ireland<br />

Deputy-Director Japan<br />

<strong>CNGL</strong>News<br />

News<br />

QUARTERLY NEWSLETTER OF THE CENTRE FOR NEXT GENERATION LOCALISATION (<strong>CNGL</strong>)<br />

this issue<br />

Headline News P.1-3<br />

Partnerships & Commercialisation P.4-5<br />

Education & Outreach P.6<br />

Research Track Updates P.7-8<br />

News: In Brief P.9<br />

<strong>CNGL</strong> People P.10<br />

Conferences & Workshops P.11-12<br />

Upcoming Events P.13<br />

Irish Ambassador to Japan hosts <strong>CNGL</strong> Seminar in Tokyo<br />

T<br />

Subscribe to<br />

<strong>CNGL</strong>News!<br />

represented at the event by Mr. Steve<br />

he Centre for Next Generaon Localisaon (<strong>CNGL</strong>) delivered a with Dai Nippon Prinng (DNP). The Tokyo seminar aracted representaves from a further 14 Japanese-based<br />

Gotz, <strong>CNGL</strong>’s Commercial Development<br />

Manager. The seminar concluded<br />

with a networking recepon, which<br />

seminar to Japanese businesses companies, who were<br />

“The event aracted<br />

produced some promising leads.<br />

in Tokyo in April, which was hosted by the Irish Ambassador to Japan and officially welcomed by H.E.<br />

Mr. John Neary, many new contacts for<br />

Japan”<br />

Mr. Derek Fitzgerald of IDA Ireland<br />

aracted<br />

commented that the event facilitated by IDA Ireland’s Japan Office. Ambassador of Ireland to IDA Ireland in variety of high-level<br />

a good professionals and, in many cases,<br />

Japan.<br />

The event, which was held at the aimed to<br />

these were new contacts for IDA<br />

residence, Mr. Derek Fitzgerald, IDA<br />

Ambassador’s Ireland in Japan.<br />

highlight opportunies for Japanese companies to engage with <strong>CNGL</strong>’s team of more than 150 researchers and to reinforce Ireland’s status as a world and Ireland Director Japan and<br />

Ms. Diane Foley, IDA Ireland Deputy-Director Japan presented an overview of Irish research and development.<br />

The seminar marked the start of a<br />

series of meengs which <strong>CNGL</strong> aended<br />

with individual companies, including<br />

leader in the fields of localisaon Dr. Páraic Sheridan, <strong>CNGL</strong> Associate partner DNP, in Japan over two weeks in<br />

global content.<br />

Director, introduced aendees to <strong>CNGL</strong>’s<br />

also April.<br />

<strong>CNGL</strong> already has strong links with research programme. <strong>CNGL</strong> was Japan through its industry partnership C<br />

- Mr. Derek Fitzgerald,<br />

IDA Ireland Director, Japan<br />

<strong>CNGL</strong>: Contributing to a Strategic Research Agenda for Europe<br />

<strong>CNGL</strong> Director advocates support of technologies for data access across languages<br />

assessing the key<br />

aended by 1,150 delegates. A further 4,000 idenfying and challenges to delivering the benefits of a<br />

NGL is influencing Europe’s<br />

people followed through live web stream and through digital society and economy to Europe’s<br />

23<br />

strategic research agenda<br />

more than 1,000 contributed acvely cizens”, says van Genabith. “With through its engagement with<br />

official languages in the EU alone, it is<br />

the Digital Agenda for Europe iniave.<br />

social media.<br />

vital that we connue to develop<br />

Digital Agenda<br />

Headed by European Nellie Kroes, Digital Agenda is<br />

technologies to enable cizens and<br />

<strong>CNGL</strong> Director, Prof. Josef van Genabith last month addressed the European Commission’s Digital Agenda Assembly <strong>2012</strong> (DAA12) on the Commissioner Europe’s strategy for a flourishing digital<br />

economy by 2020. It outlines policies and<br />

to maximise the benefits of the Digital<br />

companies to access digital content in<br />

their own language”, adds van Genabith.<br />

A recorded stream of DAA12 will be<br />

substanal benefits to be derived from of advanced<br />

acons Revoluon for all, and will help to shape the<br />

Programme for<br />

available shortly at daa.ec.europa.eu<br />

the development for the access and EU’s Horizon 2020 Framework Innovaon.<br />

technologies exploitaon of data across languages. The Assembly, which was hosted in<br />

Brussels on 21st-22nd June, was Research and “<strong>CNGL</strong> is among the stakeholders involved in<br />

www.cngl.ie<br />

1<br />

The Centre’s international reach has been enhanced<br />

through closer engagement with international<br />

organisations including the Globalization and<br />

Localization Association (GALA). A new industry<br />

prospectus is in production, and this will support the<br />

Centre’s drive to attract additional industry partners and<br />

clients.<br />

The <strong>CNGL</strong> quarterly newsletter, available in both e-zine and print format<br />

The Marketing and Communications Officer has worked<br />

closely with the Centre’s Commercial Development<br />

Manager to strenghten industry outreach efforts.<br />

Significant progress has been made on the customer<br />

relationship management front, including further<br />

development of <strong>CNGL</strong>’s mailing list, which now includes<br />

over 2,000 subscribers. <strong>CNGL</strong> exhibited at a significant<br />

number of industry and commercialisation events,<br />

including Localization World (in Seattle, USA in October),<br />

DCU Tech Transfer Exhibition in June, and Enterprise<br />

Ireland’s ‘Big Ideas’ showcase in November. <strong>CNGL</strong> also<br />

presented a panel on Global Content Intelligence at<br />

the Gilbane Conference in Boston in November, and a<br />

seminar for Japanese Business which was hosted by the<br />

Ambassador of Ireland to Japan and supported by IDA<br />

Ireland Japan in April.<br />

<strong>CNGL</strong> booth at Localization World Seattle in October<br />

<strong>CNGL</strong> continued to host conferences and workshops<br />

for the international research community this year<br />

in the computational linguistics, digital content<br />

management and localisation areas. The 17th<br />

<strong>Annual</strong> LRC Internationalisation and Localisation<br />

Conference took place in Limerick in September<br />

with 70 participants from localisation companies and<br />

academia. The conference was collocated with the<br />

<strong>2012</strong> <strong>CNGL</strong> Localisation Innovation Showcase, which<br />

has now been established as a “must attend” event<br />

for professionals in Ireland involved in localistion and<br />

multilingual customer care. The keynote address was<br />

this year delivered by Dr. Thomas Arend, International<br />

Product Lead at Twitter.<br />

Irish Times coverage of <strong>CNGL</strong>’s work on sign language machine<br />

translation(left) and opinion piece by Prof. Josef van Genabith in the<br />

Irish Independent (right)


112<br />

Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />

EDUCATION AND OUTREACH<br />

Feature on localisation careers in ‘Education’ magazine<br />

<strong>CNGL</strong>’s Localisation Innovation Showcase was collocated with the 17th<br />

<strong>Annual</strong> LRC Internationalisation and Localisation Conference in Limerick<br />

Other significant scientific events organised by <strong>CNGL</strong> in<br />

<strong>2012</strong> include the Interntational Postgraduate Conference<br />

in Translating and Interpreting, the Workshop on<br />

Innovation and Applications in Speech Technology,<br />

and the Workshop on Best Practices in Post-editing<br />

(in assocation with the Translation Automation Users’<br />

Society) at Localization World in Paris. The Centre<br />

was successful in its bid to bring COLING 2014, one of<br />

the world’s largest and most influential computational<br />

linguistics conferences, to Dublin in 2014.<br />

<strong>CNGL</strong>’s strong second-level education programmes were<br />

this year strengthened significantly by the production<br />

of a guide to ‘Careers in Next Generation Localisation’.<br />

This high-quality brochure focuses on commercial<br />

career opportunities at the intersection of Computing,<br />

Languages, Culture and Business. The guide was<br />

distributed to guidance counsellors in 729 secondary<br />

schools and has generated over 1,400 unique views<br />

of our careers web page to date. The brochure was<br />

launched in February by Mr. Seán Sherlock, T.D., Minister<br />

for Research and Innovation, and generated substantial<br />

media interest including spreads in ‘Education’ magazine<br />

and ‘Guideline’ – the official magazine of the Institute of<br />

Guidance Counsellors.<br />

The social impact of <strong>CNGL</strong>’s research programmes<br />

is evident in its social spinout activity, The Rosetta<br />

Foundation. The Foundation now has more than 2,600<br />

registered volunteer translators and the number of NGO<br />

partners increased fourfold during <strong>2012</strong>, allowing it to<br />

further its goal of facilitating access to information and<br />

knowledge to those who really need it. The Rosetta<br />

Foundation’s first NGO partner, Special Olympics – the<br />

world’s largest sports organisation for children and adults<br />

with intellectual disabilities – remained the most active in<br />

<strong>2012</strong>, with over fifty translation projects submitted. Other<br />

partners benefitting from the work of the Foundation’s<br />

volunteers include Community Eye Journal, The World<br />

Association of Girl Guides and Girl Scouts, Ruhama and<br />

Trócaire.<br />

Mr. Seán Sherlock T.D., Minister for Research and Innovation and Prof<br />

Josef van Genabith, Director of <strong>CNGL</strong> pictured at the launch of <strong>CNGL</strong>’s<br />

Next Generation Localisation careers guide in February


Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 113<br />

<strong>CNGL</strong> coordinated Thesis in Three <strong>2012</strong> with the<br />

Systems Biology Ireland and CLARITY research centres.<br />

The aim of the competition is for PhD students to give<br />

an elevator pitch for the PhD thesis. Three slides in<br />

just three minutes. On the night, centre directors and<br />

principal investigators also delivered elevator pitches for<br />

their research centres. The event, held in collaboration<br />

with Innovation Dublin, attracted an audience of more<br />

than two hundred. The night celebrated the best of Irish<br />

science and innovation in bite-sized chunks.<br />

Plans for 2013<br />

Strategic marketing and communication plans for 2013<br />

will focus on the transition to a second cycle of funding<br />

for <strong>CNGL</strong>, in which the Centre will pioneer the concept<br />

“Global Intelligent Content”. Creation of a new brand<br />

for <strong>CNGL</strong> is already underway. This brand will reflect<br />

the broadening of the Centre’s research programme and<br />

will reflect the Centre’s greater emphasis on industrial<br />

engagement. A new website that communicates our<br />

vision of global intelligent content is in train, and the new<br />

branding will be rolled out across a suite of marketing<br />

materials designed to support business development<br />

efforts.<br />

Significant events planned for 2013 include SIGIR<br />

2013 – the 36th <strong>Annual</strong> ACM SIGIR Conference, which<br />

<strong>CNGL</strong> will co-host in July/August 2013, and Think Latin<br />

America, which will take place at Carton House, Kildare<br />

in April 2013.<br />

Jonathan McCrea, host of Newstalk’s ‘Futureproof’ show, is MC<br />

for Thesis in 3


Appendices


Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 115<br />

Appendix 1: People and Partnerships<br />

CSET RESEARCH TEAMS<br />

Team Members Associated with the CSET During the <strong>Report</strong>ing Period<br />

First<br />

Name<br />

Surname Type Institution Research<br />

Strand<br />

Highest<br />

Degree<br />

Gender Nationality CSET<br />

Funded<br />

Supervisor<br />

Yalemisew Abgaz PhD DCU DCM MSc M Ethiopian Yes Dr Claus Pahl<br />

Mohamed Abou-Zleikha PhD UCD ILT MSc M Syrian Yes Prof Julie Carson-<br />

Berndsen<br />

Zeeshan Ahmed PhD UCD ILT MSc M Pakistani Yes Prof Julie Carson-<br />

Berndsen<br />

Dimitra Anastasiou Postdoctoral<br />

Researcher<br />

Lamine Aouad Postdoctoral<br />

Researcher<br />

Ruwan<br />

Asanka<br />

Wasala<br />

UL LOC PhD F Greek Yes Mr Reinhard Schäler<br />

UL LOC PhD M Algerian Yes Mr Reinhard Schäler<br />

PhD UL LOC MSc M Sri Lankan No Mr Reinhard Schäler<br />

Akshat Bakliwal PhD Intern DCU ILT MSc M Indian Yes Prof Josef van<br />

Genabith<br />

Renu Balyan PhD Intern DCU ILT MSc M Indian Yes Prof Josef van<br />

Genabith<br />

Pratyush Banerjee PhD DCU ILT MSc M Indian Yes Prof Josef van<br />

Genabith<br />

Jonathan Barr Graphics<br />

Designer<br />

DCU E&O BA M Irish Yes Prof Josef van<br />

Genabith<br />

Hanna Béchara PhD DCU ILT BA F Irish Yes Prof Josef van<br />

Genabith<br />

Urvesh Bhowan Research<br />

Assistant<br />

TCD DCM MSc M South Afican Yes Prof Vincent Wade<br />

Arianna Bisazza PhD Intern DCU ILT MSc M Italian Yes Prof Josef van<br />

Genabith<br />

Anton Bryl Postdoctoral<br />

Researcher<br />

DCU ILT PhD M Belarussian Yes Prof Josef van<br />

Genabith<br />

Jim Buckley Co-Supervisor UL LOC PhD M Irish No N/A<br />

Joao Cabral Postdoctoral<br />

Researcher<br />

Nick Campbell Co-Principal<br />

Investigator<br />

Julie<br />

Carson-<br />

Berndsen<br />

Co-Principal<br />

Investigator<br />

Özlem Çetinoglu Postdoctoral<br />

Researcher<br />

UCD ILT PhD M Portugese Yes Prof Julie Carson-<br />

Berndsen<br />

TCD ILT PhD M British No N/A<br />

UCD ILT DPhil F Irish No N/A<br />

DCU ILT PhD F Turkish Yes Prof Josef van<br />

Genabith<br />

Yi Chen PhD DCU DCM MSc F Chinese Yes Dr Gareth Jones<br />

Yvonne Cleary Co-Supervisor UL LOC PhD F Irish No N/A<br />

JJ Collins Co-Supervisor UL LOC PhD M Irish No N/A<br />

Declan Dagger Postdoctoral<br />

Researcher<br />

TCD DCM PhD M Irish Yes Prof Vincent Wade


116<br />

Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />

APPENDIX 1: PEOPLE AND PARTNERSHIPS<br />

First<br />

Name<br />

Surname Type Institution Research<br />

Strand<br />

Sandipan Dandapat Postdoctoral<br />

Researcher<br />

Domenico De Feo Research<br />

Assistant<br />

Gavin Doherty Co-Principal<br />

Investigator<br />

Highest<br />

Degree<br />

Gender Nationality CSET<br />

Funded<br />

Supervisor<br />

DCU ILT PhD M Indian Yes Prof Josef van<br />

Genabith<br />

TCD DCM MSc M Italian Yes Prof Vincent Wade<br />

TCD SF PhD M Irish No N/A<br />

Amelie Dorn PhD TCD ILT MSc F French Yes Prof Ailbhe Ní<br />

Chasaide<br />

Thomas Dunne Intern DCU CM UnderGrad M Irish Yes Prof Josef van<br />

Genabith<br />

Erika Duriak Intern TCD DCM UnderGrad F Slovakian Yes Prof Josef van<br />

Genabith<br />

Mohammed<br />

Rami<br />

ElHussein<br />

Ghorab<br />

Martin Emms Co-Principal<br />

Investigator<br />

PhD TCD DCM MSc M Egyptian Yes Prof Vincent Wade<br />

TCD ILT PhD M Irish No N/A<br />

Maria Eskevich PhD DCU DCM Msc F Russian Yes Prof Gareth Jones<br />

Chris Exton Co-Supervisor UL LOC PhD M Australian/Irish No N/A<br />

David Filip Postdoctoral<br />

Researcher<br />

UL LOC PhD M Czech Yes Mr Reinhard Schäler<br />

Ríona Finn Administrative DCU CM MSc F Irish Yes N/A<br />

Hector Hugo Franco Penya PhD TCD ILT BSc M Spanish Yes Dr Martin Emms<br />

Brian Gallagher Technician TCD DCM MSc M Irish Yes Prof Vincent Wade<br />

Debasis Ganguly PhD DCU DCM MTech M Indian Yes Dr Gareth Jones<br />

Solomon Gizaw PhD UL LOC MSc M Ethiopian Yes Mr Reinhard Schäler<br />

Christer Gobl Co-Principal<br />

Investigator<br />

Yvette Graham Postdoctoral<br />

Researcher<br />

TCD ILT PhD M American No N/A<br />

DCU ILT PhD F Irish Yes Prof Josef van<br />

Genabith<br />

Cara Nicole Greene E&O Manager DCU E&O BSc F Irish Yes N/A<br />

Laura Grehan Marketing and<br />

Communications<br />

Officer<br />

Alfredo<br />

Guerra<br />

Maldonado<br />

DCU E&O MSc F Irish Yes N/A<br />

PhD TCD ILT BSc M Mexican/Irish Yes Dr Carl Vogel<br />

Rajat Gupta PhD UL LOC BSc M Indian Yes Mr Reinhard Schäler<br />

Yanfen Hao Postdoctoral<br />

Researcher<br />

UCD DCM PhD M Chinese Yes Dr Tony Veale<br />

Geraldine Harrahill Administrative UL CM FETAC F Irish Yes N/A<br />

Emer Hedderman Intern DCU CM UnderGrad F Irish Yes Prof Josef van<br />

Genabith<br />

James Mark Hender Intern DCU DCM UnderGrad M Irish Yes Prof Josef van<br />

Genabith<br />

Yu Hui PhD Intern DCU ILT MSc M Chinese Yes Prof Josef van<br />

Genabith<br />

Muhammad Javed PhD DCU DCM MSc M Pakistani Yes Dr Claus Pahl


Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 117<br />

First<br />

Name<br />

Surname Type Institution Research<br />

Strand<br />

Gareth Jones Co-Principal<br />

Investigator<br />

Highest<br />

Degree<br />

Gender Nationality CSET<br />

Funded<br />

DCU DCM PhD M British No N/A<br />

Supervisor<br />

Amir Kamran PhD Intern DCU ILT MSc M Indian Yes Prof Josef van<br />

Genabith<br />

John Kane PhD TCD ILT MPhil M Irish Yes Prof Ailbhe Ní<br />

Chasaide<br />

Mark Kane PhD UCD ILT MSc M Irish Yes Prof Julie Carson-<br />

Berndsen<br />

Bridget Kane Postdoctoral<br />

Researcher<br />

TCD SF PhD F Irish Yes Dr Saturnino Luz<br />

Karl Kelly Administrative UL E&O Grad Dip M Irish Yes N/A<br />

Dorothy Kenny Co-Principal<br />

Investigator<br />

DCU ILT PhD F Irish No N/A<br />

Kevin Koidl PhD TCD DCM MSc M Irish Yes Prof Vincent Wade<br />

Alla Kovaleva Intern TCD DCM UnderGrad F Kazakhstan Yes Prof Josef van<br />

Genabith<br />

Ru Kuang PhD Intern DCU ILT MSc M Chinese Yes Prof Josef van<br />

Genabith<br />

Sudip Kumar Naskar Postdoctoral<br />

Researcher<br />

Séamus Lawless Assistant<br />

Professor<br />

DCU ILT PhD M Indian Yes Prof Josef van<br />

Genabith<br />

TCD DCM PhD M Irish Yes Prof Vincent Wade<br />

Madeleine Lenker PhD UL LOC MA F German Yes Mr Reinhard Schäler<br />

Killian Levacher PhD TCD DCM MSc M French/Irish Yes Prof Vincent Wade<br />

Johannes Leveling Postdoctoral<br />

Researcher<br />

David Lewis Funded<br />

Investigator<br />

DCU DCM PhD M German Yes Dr Gareth Jones<br />

TCD SF PhD M English Yes N/A<br />

Wei Li PhD DCU DCM MSc F Chinese Yes Dr Gareth Jones<br />

Junhui Li Postdoctoral<br />

Researcher<br />

DCU ILT PhD M Chinese Yes Prof Josef van<br />

Genabith<br />

Robert Lis Intern DCU DCM UnderGrad M Irish Yes Prof Josef van<br />

Genabith<br />

Qun Liu Co-Principal<br />

Investigator<br />

Luca Longa Research<br />

Assistant<br />

Alejandra<br />

Lopez<br />

Fernandez<br />

DCU ILT PhD M Chinese No N/A<br />

TCD DCM MSc M Italian Yes Prof Vincent Wade<br />

PhD UCD DCM MSc F Mexican Yes Dr Tony Veale<br />

Juan Luo PhD Intern DCU ILT MSc M Chinese Yes Prof Josef van<br />

Genabith<br />

Saturnino Luz Co-Principal<br />

Investigator<br />

TCD SF PhD M Brazilian No N/A<br />

Gerard Lynch PhD TCD ILT MSc M Irish Yes Dr Carl Vogel<br />

Gerard Lynch PhD TCD ILT MSc M Irish Yes Dr. Carl Vogel<br />

Walid Magdy Postdoctoral<br />

Researcher<br />

DCU DCM PhD M Egyptian Yes Dr Gareth Jones


118<br />

Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />

APPENDIX 1: PEOPLE AND PARTNERSHIPS<br />

First<br />

Name<br />

Surname Type Institution Research<br />

Strand<br />

Fiona Maguire Finance<br />

Administrative<br />

Liliana<br />

Mamani-<br />

Sanchez<br />

Highest<br />

Degree<br />

Gender Nationality CSET<br />

Funded<br />

DCU CM CIMA F Irish Yes N/A<br />

Supervisor<br />

PhD TCD ILT MSc F Peruvian Yes Dr Carl Vogel<br />

Tom Mason Intern DCU DCM UnderGrad M Irish Yes Prof Josef van<br />

Genabith<br />

Sophie Matabaro Centre<br />

Administrative<br />

DCU CM F Irish Yes N/A<br />

John McAuley PhD TCD SF MPhil M Irish Yes Dr David Lewis<br />

Eithne McCann PA to Director DCU E&O National<br />

Cert<br />

F Irish Yes N/A<br />

Hilary McDonald Project Manager TCD CM MSc F Irish Yes N/A<br />

Shane McQuillan Intern DCU DCM UnderGrad M Irish Yes Prof Josef van<br />

Genabith<br />

Kristos Mikkonen Intern TCD E&O UnderGrad M Finnish Yes Dr David Lewis<br />

Jinming Min PhD DCU DCM MSc M Chinese Yes Dr Gareth Jones<br />

Joss Moorkens Postdoctoral<br />

Researcher<br />

Lucía<br />

Morado<br />

Vásquez<br />

DCU LOC PhD M Irish Yes Dr Sharon O’Brien<br />

PhD UL LOC MSc F Spanish Yes Mr Reinhard Schäler<br />

John Moran PhD TCD SF BSc M Irish Yes Dr David Lewis<br />

Erwan Moreau Postdoctoral<br />

Researcher<br />

TCD ILT PhD M French Yes Dr Carl Vogel<br />

Aram Morera Mesa PhD UL LOC Grad Dip M Spanish Yes Mr Reinhard Schäler<br />

Sara Morrissey Postdoctoral<br />

Researcher<br />

DCU ILT PhD F Irish Yes Prof Josef van<br />

Genabith<br />

Catherine Mulwa PhD TCD DCM MSc F Kenyan Yes Prof Vincent Wade<br />

Dat Tien Nguyen PhD Intern DCU ILT MSc M Vietnamese Yes Prof Josef van<br />

Genabith<br />

Dat Quoc Nguyen PhD Intern DCU ILT MSc M Vietnamese Yes Prof Josef van<br />

Genabith<br />

Ailbhe Ní Chasaide Co-Principal<br />

Investigator<br />

TCD ILT PhD F Irish No N/A<br />

Neasa Ní Chiaráin PhD TCD ILT MSc F Irish Yes Prof Ailbhe Ní<br />

Chasaide<br />

Naoto Nishio PhD UL LOC Grad Dip M Japanese Yes Mr Reinhard Schäler<br />

Conor O Gorman Intern DCU DCM UnderGrad M Irish Yes Prof Josef van<br />

Genabith<br />

Siobhan O Mara Research<br />

Assistant<br />

Sharon O’Brien Co-Principal<br />

Investigator<br />

DCU E&O BA F Irish Yes Prof Josef van<br />

Genabith<br />

DCU ILT PhD F Irish No N/A<br />

Eoin Ó’Conchuir Technician UL CM PhD M Irish Yes Mr Reinhard Schäler<br />

Alexander O’Connor Postdoctoral<br />

Researcher<br />

TCD DCM PhD M Irish Yes Prof Vincent Wade


Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 119<br />

First<br />

Name<br />

Udochukwu<br />

Kalu<br />

Surname Type Institution Research<br />

Strand<br />

Highest<br />

Degree<br />

Gender Nationality CSET<br />

Funded<br />

Supervisor<br />

Ogbureke PhD UCD ILT MPhil M Nigerian Yes Prof Julie Carson-<br />

Berndsen<br />

Ian O’Keeffe Postdoctoral<br />

Researcher<br />

Declan O’Sullivan Co-Principal<br />

Investigator<br />

Claus Pahl Co-Principal<br />

Investigator<br />

UL LOC PhD M Irish Yes Mr Reinhard Schäler<br />

TCD DCM PhD M Irish No N/A<br />

DCU DCM PhD M German No N/A<br />

Santanu Pal PhD Intern DCU ILT MSc M Indian Yes Prof Josef van<br />

Genabith<br />

Ciaran Porter Intern DCU DCM UnderGrad M Irish Yes Prof Josef van<br />

Genabith<br />

Enda Quigley PhD UL LOC BSc M Irish Yes Mr Reinhard Schäler<br />

Paul Redmond Intern DCU DCM UnderGrad M Irish Yes Prof Josef van<br />

Genabith<br />

Corentin Ribeyre PhD Intern DCU ILT MSc M French Yes Prof Josef van<br />

Genabith<br />

Stephen Roantree IP Manager DCU CM Grad Dip M Irish Yes N/A<br />

Ilana Rozanes PhD TCD SF MSc F American Yes Dr Saturnino Luz<br />

Lorcan Ryan PhD UL LOC MSc M Irish Yes Mr Reinhard Schäler<br />

Melike Sah Postdoctoral<br />

Researcher<br />

Reinhard Schäler Lead Principal<br />

Investigator<br />

TCD DCM PhD F Cypriot Yes Prof Vincent Wade<br />

UL LOC MSc M German No N/A<br />

Stephan Schlögl PhD TCD SF MSc M Austrian Yes Dr Saturnino Luz<br />

Anne Schneider PhD TCD SF BSc F German Yes Dr Saturnino Luz<br />

Mary Sharp Co-Principal<br />

Investigator<br />

Páraic Sheridan Associate<br />

Director<br />

TCD DCM BSc F Irish No N/A<br />

DCU CM PhD M Irish Yes N/A<br />

Harold Somers E&O DCU E&O PhD M British Yes N/A<br />

Brendan Spillane PhD TCD DCM MSc M Irish Yes Prof Vincent Wade<br />

Ben Steichen Postdoctoral<br />

Researcher<br />

TCD DCM PhD M Luxembourgish Yes Prof Vincent Wade<br />

Siobhan Swords Intern DCU E&O UnderGrad F Irish Yes Prof Josef van<br />

Genabith<br />

Eva Szekely PhD UCD ILT MA F Hungarian Yes Prof Julie Carson-<br />

Berndsen<br />

Josef van Genabith Lead Principal<br />

Investigator<br />

Tony Veale Co-Principal<br />

Investigator<br />

Carl Vogel Co-Principal<br />

Investigator<br />

DCU CM PhD M German Yes N/A<br />

UCD DCM PhD M Irish No N/A<br />

TCD ILT PhD M American No N/A<br />

Joris Vreeke Programmer DCU E&O M Dutch Yes Prof Josef van<br />

Genabith


120<br />

Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />

APPENDIX 1: PEOPLE AND PARTNERSHIPS<br />

First<br />

Name<br />

Surname Type Institution Research<br />

Strand<br />

Vincent Wade Lead Principal<br />

Investigator<br />

Joachim Wagner Systems<br />

Administrator<br />

Andy Way Lead Principal<br />

Investigator<br />

Xiaofeng Wu Postdoctoral<br />

Researcher<br />

Irena Yanushevskaya Postdoctoral<br />

Researcher<br />

Highest<br />

Degree<br />

Gender Nationality CSET<br />

Funded<br />

TCD DCM PhD M Irish No N/A<br />

Supervisor<br />

DCU CM MA M German Yes Prof Josef van<br />

Genabith<br />

DCU ILT PhD M British No N/A<br />

DCU ILT PhD M Chinese Yes Prof Josef Van<br />

Genabith<br />

TCD ILT PhD F Russian Yes Prof Ailbhe Ní<br />

Chasaide<br />

Amalia Zahra PhD UCD ILT BSc F Indonesian Yes Prof Julie Carson-<br />

Berndsen<br />

Dong Zhou Postdoctoral<br />

Researcher<br />

TCD DCM PhD M Chinese Yes Prof Vincent Wade<br />

Affiliated Members and Collaborators Not Receving Funds<br />

First<br />

Name<br />

Surname Type Institution Research<br />

Strand<br />

Highest<br />

Degree<br />

Gender Nationality CSET<br />

Funded<br />

Supervisor<br />

Hala Al Maghout Postdoctoral<br />

Researcher<br />

Mohammed Attia Postdoctoral<br />

Researcher<br />

DCU Affiliated PhD F Syrian No Prof Josef van<br />

Genabith<br />

DCU Affiliated PhD M Egyptian No Prof Josef Van<br />

Genabith<br />

Eoin Bailey PhD TCD Affiliated MSc M Irish No Prof Vincent Wade<br />

Ergun Bicicci Postdoctoral<br />

Researcher<br />

Peter Cahill Co-Principal<br />

Investigator<br />

DCU Affiliated PhD M Cypriot No Prof Josef van<br />

Genabith<br />

UCD Affiliated PhD M Irish No Prof Julie Carson-<br />

Berndsen<br />

Oscar Cassetti PhD TCD Affiliated MSc M Italian No Dr Saturnino Luz<br />

Alexandru Ceausu Postdoctoral<br />

Researcher<br />

DCU Affiliated PhD M Romanian No Dr Páraic Sheridan<br />

Yi Chen PhD DCU Affiliated PhD F Chinese No Prof Gareth Jones<br />

Owen Conlan Assistant Professor TCD Affiliated PhD M Irish No NA<br />

Seamus Coogan Marketing Lead TCD Affiliated BSc M Irish No Prof Vincent Wade<br />

Stephen Curran Research<br />

Assistant/<br />

Programmer<br />

Aswarth Dara Postdoctoral<br />

Researcher<br />

Stephen Doherty Postdoctoral<br />

Researcher<br />

TCD Affiliated MSc M Irish Yes Dr David Lewis<br />

DCU Affiliated PhD M Indian No Prof Josef van<br />

Genabith<br />

DCU Affiliated PhD M Irish No Prof Josef van<br />

Genabith<br />

David Faherty Research Assistant TCD Affiliated BSc M Irish No Prof Vincent Wade<br />

Leroy Finn Programmer TCD Affiliated MSc M Irish No Dr David Lewis<br />

Frank Fowley Research Assistant DCU Affiliated MSc M Irish No Dr Claus Pahl<br />

Manisha Ganguly Programmer DCU Affiliated F Indian No Prof Gareth Jones


Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 121<br />

First<br />

Name<br />

Surname Type Institution Research<br />

Strand<br />

Highest<br />

Degree<br />

Gender Nationality CSET<br />

Funded<br />

Supervisor<br />

Federico Gaspari Postdoctoral<br />

Researcher<br />

Anton Gerdelan Postdoctoral<br />

Researcher<br />

Lorraine Goeuriot Postdoctoral<br />

Researcher<br />

Steve Gotz Commercialisation<br />

Manager<br />

Declan Groves Research<br />

Integration Officer<br />

Cormac Hampson Postdoctoral<br />

Researcher<br />

Deirdre Hogan Postdoctoral<br />

Researcher<br />

Dominic Jones Postdoctoral<br />

Researcher<br />

John Judge Postdoctoral<br />

Researcher<br />

Liadh Kelly Postdoctoral<br />

Researcher<br />

DCU Affiliated PhD M Italian No Prof Josef van<br />

Genabith<br />

TCD Affiliated PhD M New Zealand No Dr Dave Lewis<br />

DCU Affiliated MSc F French No Dr Gareth Jones<br />

DCU Affiliated MSc M American No N/A<br />

DCU Affiliated PhD M Irish No N/A<br />

TCD Affiliated PhD M No Prof Vincent Wade<br />

DCU Affiliated Phd F Irish No N/A<br />

TCD Affiliated MSc M British No Dr David Lewis<br />

DCU Affiliated PhD M Irish No Prof Josef Van<br />

Genabith<br />

DCU Affiliated MSc F Irish No Dr Gareth Jones<br />

Alex Killen Programmer DCU Affiliated BSc M Irish No Prof Josef van<br />

Genabith<br />

Kris McGlinn Research Assistant TCD Affiliated MSc M Irsh No Dr David Lewis<br />

Brenda McGuirk Project<br />

Co-ordinator<br />

TCD Affiliated F Irish No Prof Vincent Wade<br />

Gavin<br />

Mendel-<br />

Gleeson<br />

Postdoctoral<br />

Researcher<br />

DCU Affiliated PhD M Irish No Dr Deirdre Hogan<br />

Sebastian Molines Research Assistant TCD Affiliated MSc M French Yes Dr David Lewis<br />

Adam Moore Postdoctoral<br />

Researcher<br />

TCD Affiliated PhD M Irish No Prof Vincent Wade<br />

Lynda O Donovan Pedagogical Lead TCD Affiliated MSc F Irish No Prof Vincent Wade<br />

Ian O’Keeffe Postdoctoral<br />

Researcher<br />

TCD Affiliated PhD M Irish No Prof Vincent Wade<br />

Tsuyoshi Okita PhD DCU Affiliated MSc M Japanese No Prof Josef Van<br />

Genabith<br />

Neil Peirce Research Assistant TCD Affiliated PhD M Irish No Prof Vincent Wade<br />

Raphael Rubino Postdoctoral<br />

Researcher<br />

Rasoul Samad Zadeh Postdoctoral<br />

Researcher<br />

DCU Affiliated PhD M French No Dr Jennifer Foster<br />

DCU Affiliated PhD M Iranian No Dr Jennifer Foster<br />

Eduardo Shanahan Programmer DCU Affiliated BSc M Argentina No Prof Josef van<br />

Genabith<br />

Ankit Srivastava Postdoctoral<br />

Researcher<br />

DCU Affiliated PhD M Indian No Prof Josef van<br />

Genabith


122<br />

Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />

APPENDIX 1: PEOPLE AND PARTNERSHIPS<br />

First<br />

Name<br />

Surname Type Institution Research<br />

Strand<br />

Highest<br />

Degree<br />

Gender Nationality CSET<br />

Funded<br />

Supervisor<br />

Thanos Staikopolous Postdoctoral<br />

Researcher<br />

John Tinsley Project Coordinator<br />

Antonio Toral Postdoctoral<br />

Researcher<br />

Lamia Tounsi Postdoctoral<br />

Researcher<br />

TCD Affiliated PhD M Greek No Prof Vincent Wade<br />

DCU Affiliated PhD M Irish No Dr Páraic Sheridan<br />

DCU Affiliated PhD M Spanish No Prof Andy Way<br />

DCU Affiliated PhD F Algerian No Prof Josef Van<br />

Genabith<br />

Eddie Walsh PhD TCD Affiliated MSc M Irish No Prof Vincent Wade<br />

Rachel Wrafter Postdoctoral<br />

Researcher<br />

Lei Xu Postdoctoral<br />

Researcher<br />

TCD Affiliated PhD F Irish No Prof Vincent Wade<br />

DCU Affiliated PhD M Chinese No Dr Claus Pahl<br />

Hong Yi Wang Research Assistant DCU Affiliated BSc F Chinese No Dr Deirdre Hogan<br />

Bilal Yousuf PhD TCD Affiliated MSc M No Prof Vincent Wade<br />

Jian Zhang Technician DCU Affiliated BSc M Chinese No Dr Páraic Sheridan<br />

Industry Partners and Contact Names<br />

Industry Partners<br />

Contact<br />

Organisation<br />

Type<br />

Organisation<br />

Name<br />

Location<br />

Date joined<br />

CSET<br />

Date departed First Name Surname Position<br />

SME<br />

MNC<br />

Alchemy Software<br />

Development<br />

Dai Nippon<br />

Printing<br />

Dublin, Ireland 04/12/2007 N/A Enda McDonnell Director of Engineering<br />

Tokyo, Japan 04/12/2007 N/A Takeshi Fukunaga Advisor of Headquarters<br />

MNC IBM Dublin, Ireland 04/12/2007 N/A Brian O’Donovan Program Director,<br />

IBM Dublin Centre for<br />

Advanced Studies<br />

MNC Microsoft Dublin, Ireland 04/12/2007 N/A Dag Schmidtke Program Manager for<br />

Language Technology<br />

Strategy<br />

MNC SDL Wicklow, Ireland 04/12/2007 N/A Paul McManus General Manager<br />

SME SpeechStorm Belfast, Northern<br />

Ireland<br />

04/12/2007 N/A Oliver Lennon Chief Executive Officer<br />

MNC Symantec Dublin, Ireland 04/12/2007 N/A Fred Hollowood Research Director,<br />

Shared Engineering<br />

Services<br />

MNC<br />

CAPITA (Formerly<br />

Applied Language<br />

Solutions)<br />

Manchester, U.K. 04/12/2007 N/A Gavin Wheeldon Chief Executive Officer<br />

SME VistaTEC Dublin, Ireland 04/12/2007 N/A Phil Ritchie Chief Technology Officer<br />

MNC Welocalize Dublin, Ireland 23/02/2011 N/A Derek Coffey Vice President,<br />

Technology and<br />

Professional Services


Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 123<br />

Governance Committee Members<br />

Role First Name Surname Organisation Position<br />

Chair David MacDonald IMS Maxims Chairman<br />

Member Alan Harvey DCU Vice President for Research<br />

Member Vinny Cahill TCD Dean of Research<br />

Member Gearóid Mooney Enterprise Ireland Director, Informatics Research and Commercialisation<br />

Member Phil Ritchie VistaTEC Chief Technical Officer<br />

Member Aidan Sweeney IBEC R&D Policy Executive<br />

Member Josef van Genabith DCU <strong>CNGL</strong> Director<br />

In Attendance Páraic Sheridan DCU <strong>CNGL</strong> Associate Director<br />

In Attendance Vincent Wade TCD <strong>CNGL</strong> Deputy Director<br />

Scientific Advisory Board Members<br />

Role First Name Surname Organisation Position<br />

Chair Francis Tsang Adobe Systems Director of Globalisation<br />

Member Andrew Bredenkamp Acrolinx Chief Executive Officer<br />

Member Carol Espy-Wilson University of Maryland, Department of Electrical<br />

and Computer Engineering<br />

Professor<br />

Member Lauri Karttunen Palo Alto Research Center Computational Linguist<br />

Member Makato Nagao NIST President


124<br />

Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />

Appendix 2: Outputs<br />

PhDs Awarded<br />

Name Nationality Gender Institute<br />

Dominic Jones English M TCD<br />

Joachim Wagner Germany M DCU<br />

Ben Steichen Luxembourgish M TCD<br />

Lucia Morado Vazquez Spanish F UL<br />

Stephen Doherty Irish M UL<br />

Joss Moorkens Irish M DCU<br />

Ian O’Keeffe Irish M TCD<br />

Zohar Etzioni Israeli M TCD<br />

Walid Magdy Egyptian M DCU<br />

Sandipan Dadapat Indian M DCU<br />

Ankit Srivastava Indian M DCU<br />

Pratyush Banerjee Indian M DCU<br />

Hala Al Maghout Syrian F DCU<br />

All CSET Publications<br />

All <strong>CNGL</strong> publications are stored in a central document management system and are available through the institutional<br />

Open Access repositories.<br />

Refereed Conference and Workshop Papers<br />

Abagaz, Y., Javed, M., Pahl, C. (<strong>2012</strong>). Dependency Analysis in Ontology-driven Content-based Systems. In 12th International Conference on Artificial<br />

Intelligence and Soft Computing (ICAISC<strong>2012</strong>), Zakopane, Poland<br />

Abou-Zleikha M., Cahill, P., Carson-Berndsen, J. (<strong>2012</strong>). Pitch Recovery of Missing Syllables Using Sparse Representation in Exemplar-based Pitch<br />

Generation. In Proceedings of the 11th International Conference on Information Sciences, Signal Processing and their Applications, Montreal, Canada<br />

Abou-Zleikha, M., Cahill, P., Carson-Berndsen, J. (<strong>2012</strong>). Exemplar-based pitch contour generation using DOP for syntatic tree decomposition.<br />

In Proceedings 37th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP <strong>2012</strong>, Kyoto, Japan<br />

Abou-Zleikha, M., Szekely, E., Cahill, P., Carson-Berndsen, J. (<strong>2012</strong>). Multi-level Exemplar-based Duration Generation for Expressive Speech Synthesis.<br />

In Proceedings 6th International Conference on Speech Prosody <strong>2012</strong>, Shanghai, China<br />

Almaghout, H., Jiang, J., Way, A. (<strong>2012</strong>). Extending CCG-based Syntactic Constraints in Hierarchical Phrase-Based SMT In Proceedings of the 16th <strong>Annual</strong><br />

Conference of European Association of Machine Translation (EAMT-<strong>2012</strong>). Trento, Italy<br />

Asanka Wasala, A., Schaler, R., Weerasinghe, R. Exton, C. (<strong>2012</strong>). Collaboratively Building Language Resources while Localising the Web. In Proceedings<br />

of ACL <strong>2012</strong>: 3rd workshop on the People’s Web meets NLP: Collaboratively Constructed Semantic Resources and their Applications to NLP, Jeju,<br />

Republic of Korea<br />

Attia, M., Pecina, P., Samih, Y., Shaalan, K., van Genabith, J. (<strong>2012</strong>). Improved Spelling Error Detection and Correction for Arabic, COLING <strong>2012</strong>, Mumbai, India<br />

Attia, M., Samih, Y., Shaalan, K., Genabith, J. (<strong>2012</strong>). The Floating Arabic Dictionary: An Automatic Method for Updating a Lexical Database through the<br />

detection and lemmatization of the Unknown Word. In The International Conference on Computational Linguistics (COLING), December <strong>2012</strong>, Mumbai, India<br />

Banerjee P., Naskar, S., Way, A, van Genabith, J., Roturier, J. (<strong>2012</strong>). Supplementary Data Selection by Incremental Update of Translation Models. In the<br />

24th International Conference on Computational Linguistics, Mumbai, India


Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 125<br />

Refereed Conference and Workshop Papers<br />

Banerjee, P., Naskar, S., Roturier, J., Way, A., van Genabith, J. (<strong>2012</strong>). Domain Adaptation in SMT of User-Generated Forum Content Guided by OOV Word<br />

Reduction: Normalization and/or Supplementary Data In Proceedings of the 16th <strong>Annual</strong> Conference of European Association of Machine Translation<br />

(EAMT-<strong>2012</strong>), Trento, Italy<br />

Cabral, C., Kane, M., Ahmed, Z., Abou-Zleikha, M., Szekely, E., Zahra, A., U. Ogbureke, K., Cahill, P., Carson-Berndsen, J., Schlogl, S. (<strong>2012</strong>). Rapidly<br />

Testing the Interaction Model of a Pronunciation Training System via Wizard-of-Oz. In Proceedings of the LREC International Conference on Language<br />

Resources and Evaluation (LREC), Istanbul, Turkey<br />

Cabral, J. P. and Carson-Berndsen, J. (<strong>2012</strong>). Controlling Voice Source Parameters to Transform Characteristics of Synthetic Voices. In Listening Talker<br />

(LISTA) Workshop, Edinburgh, UK<br />

Dandapat, S., Morrissey, S., Way, A., van Genabith, J. (<strong>2012</strong>). Combining EBMT, SMT, TM and IR Technologies for Quality and Scale. In Proceedings of the<br />

Joint Workshop on Exploiting Synergies between Information Retrieval and Machine Translation (ESIRMT) and Hybrid Approaches to Machine Translation<br />

(HyTra), a workshop in EACL <strong>2012</strong>, Avignon, France<br />

Doherty S., Kenny, D., Way, A. (<strong>2012</strong>). Taking Statistical Machine Translation to the Student Translator, AMTA <strong>2012</strong>, San Diego, USA<br />

Doherty, S. and Moorkens, J. (<strong>2012</strong>). An Experiential Analysis of Translation Technology Labs. 2nd <strong>Annual</strong> Conference of Education and Humanities,<br />

30 March <strong>2012</strong>, St. Patrick’s College, Ireland<br />

Doherty, S. and O’Brien, S. (<strong>2012</strong>). A User-Based Usability Assessment of Raw Machine Translated Technical Instructions. Conference of the Association<br />

for Machine Translation in the Americas (AMTA <strong>2012</strong>), San Diego, USA<br />

Drugman, T., Kane, J., Gobl, C. (<strong>2012</strong>) Resonator-based creaky voice detection. In Proceedings of Interspeech <strong>2012</strong>, Orgeon, USA<br />

Emms, M. (<strong>2012</strong>). On Stochastic Tree Distances and their training via Expectation-Maximisation. In Proceedings of ICPRAM <strong>2012</strong> International Conference<br />

on Pattern Recognition Application and Methods, Portugal<br />

Eskevich, M., Magdy, W., Jones, G.J.F. (<strong>2012</strong>). New Metrics for Meaningful Evaluation of Informally Structured Speech Retrieval. ECIR <strong>2012</strong>, pages 170-181<br />

Filip, D. (<strong>2012</strong>). Managing Industry Wisdom as a Portfolio of Technical Standards, in Management Re-Imagined. Presented at the International Federation<br />

of Scholarly Associations of Management (IFSAM <strong>2012</strong>), Limerick, Ireland.<br />

Filip, D. (<strong>2012</strong>). Using Business Process Management and Modelling to Analyse the Role of Human Translators and Reviewers in Bitext Management<br />

Workflows. In International Association for Translation and Intercultural Studies (IATIS <strong>2012</strong>), Presented at the IATIS <strong>2012</strong>, Belfast, UK<br />

Filip, D., Lewis, D., Sasaki, F. (<strong>2012</strong>). The Multilingual Web. In Proceedings of the 21st World Wide Web Conference WWW<strong>2012</strong>, April 16-20, <strong>2012</strong>, Lyon,<br />

France, ACM proceedings 978-1-4503-1229-5/11/04<br />

Gauguly D., Jones, G. (<strong>2012</strong>). Cross-Lingual Topical Relevance Models. The 24th International Conference on Computational Linguistics (COLING <strong>2012</strong>),<br />

Mumbai, India<br />

Ganguly, D., Leveling, J., Jones, G. (<strong>2012</strong>.) Topical Relevance Models, CIKM <strong>2012</strong>, Hawaii, USA<br />

Ganguly, D., Leveling, J., Jones., J. (<strong>2012</strong>). Approximate Sentence Retrieval for Scalable and Efficient Example-based Machine Translation. The 24th<br />

International Conference on Computational Linguistics (COLING <strong>2012</strong>), Mumbai, India<br />

Ganguly, D., Leveling, J., Jones, G.J.F. (<strong>2012</strong>). DCU@FIRE <strong>2012</strong>: Rule-based stemmers for Bengali and Hindi. In FIRE <strong>2012</strong>, Fourth Workshop of the Forum<br />

for Information Retrieval Evaluation, pages 37-42, Kolkata,India, <strong>2012</strong>. ISI.<br />

Ganguly, D., Leveling, J., Jones, G.J.F. (<strong>2012</strong>). DCU@INEX-<strong>2012</strong>: Exploring sentence retrieval for tweet contextualization. In Pamela Forner, Jussi Karlgren,<br />

and Christa Womser-Hacker, editors, CLEF <strong>2012</strong> Evaluation Labs and Workshop, Online Working Notes, 17-20 September, Rome, Italy<br />

Ganguly, D., Leveling, J., Jones, G.J.F. (<strong>2012</strong>). Technical challenges and design issues in Bangla language processing, chapter Bengali (Bangla) Information<br />

Retrieval. IGI Global, <strong>2012</strong>. (to appear)<br />

Ghorab, M. R., Zhou, D., Lawless, S., Wade, V. (<strong>2012</strong>). Multilingual User Modeling for Personalized Re-ranking of Multilingual Web Search Results.<br />

In Conference on User Modeling, Adaptation, and Personalization (UMAP <strong>2012</strong>), Montreal, Canada<br />

Graham Y. (<strong>2012</strong>). Deep Syntax in Statistical Machine Translation. Lexical Functional Grammar Conference, Udayana University, Bali, Indonesia<br />

Javed, M., Abgaz, Y., Pahl, C. (<strong>2012</strong>). Composite Ontology Change Operators and their Customizable Evolution Strategies, 2nd Joint Workshop on<br />

Knowledge Evolution and Ontology Dynamics, Boston, USA<br />

Kale Ogbureke U., Cabral, J., Carson-Berndsen, J. (<strong>2012</strong>). Using Noisy Speech to Study the Robustness of a Continuous F0 Modelling Method in<br />

HMM-based Speech Synthesis<br />

Kane, B., Toussaint, P., Luz, S. Shared decision making needs a communication record. To appear in Proceedings of the 16th ACM Conference on Computer<br />

Supported Cooperative Work and Social Computing (CSCW 2013), San Antonio, Texas


126<br />

Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />

APPENDIX 2: OUTPUTS<br />

Refereed Conference and Workshop Papers<br />

Kane J., Scherer, Kane, J., Gobl, C., Schwenker, F. (<strong>2012</strong>). The Effect of Fuzzy Training Targets on Voice Quality Classification, Interspeech <strong>2012</strong>,<br />

Portland, USA<br />

Kane, J. and Gobl, C. (<strong>2012</strong>). Identifying regions of non-modal phonation using features of the wavelet transform. In Proceedings of Interspeech <strong>2012</strong>,<br />

Florence, Italy<br />

Kane, J., Papay, K., Hunyadi, L., Gobl, C. (<strong>2012</strong>). On the use of creak in Hungarian spontaneous speech. In Proceedings of ICPhS 2011, Hong Kong, China<br />

Kane, J., Scherer, Layher, G., Neumann, H. (<strong>2012</strong>). An audiovisual political speech analysis incorporating eye-tracking and perception data. The eighth<br />

international conference on Language Resources and Evaluation (LREC <strong>2012</strong>), Istanbul, Turkey<br />

Kane, J., Yanushevskaya, I., Ní Chasaide, A., Gobl, C. (<strong>2012</strong>). Exploiting time and frequency domain measures for precise voice source parameterisation. In<br />

Proceedings of Speech Prosody <strong>2012</strong>, Shanghai, China<br />

Kane, M., Ahmed, Z., Carson-Berndsen, J. (<strong>2012</strong>). Underspecification in Pronunciation Variation. In Proceedings of the International Symposium on<br />

Automatic Detection of Errors in Pronunciation Training (IS ADEPT), Stockholm, Sweden<br />

Kane. J., Oertel, C. (<strong>2012</strong>). Conversational involvement and multimodal cues: summary and outlook. Fonetic <strong>2012</strong>, Gothenburg, Sweden<br />

Levacher, K., Lawless S., Wade V. (<strong>2012</strong>). Slicepedia: Towards Long Tail Resource Production through Open Corpus Reuse. In Proceedings of International<br />

Conference on Web-based Learning (ICWL <strong>2012</strong>), Sinai, Romania<br />

Levacher, K., Lawless, S., Wade, V. (<strong>2012</strong>). Slicepedia: Automating the Production of Learning Objects from Open Corpus Content. In Proceedings of<br />

The European Conference on Technology Enhanced Learning (EC-TEL), September <strong>2012</strong>, Paphos, Cyprus<br />

Levacher, K., Lawless, S., Wade, V. (<strong>2012</strong>). Slicepedia: Providing Customized Reuse of Open-Web Resources for Adaptive Hypermedia. In Proceedings<br />

of the 23rd ACM conference on Hypertext and SocialMedia (HT ‘12), Milwaukee, USA<br />

Leveling, J. (<strong>2012</strong>). DCU@FIRE <strong>2012</strong>: Monolingual and crosslingual SMS-based FAQ retrieval. In FIRE <strong>2012</strong>, Fourth Workshop of the Forum for Information<br />

Retrieval Evaluation, pages 37-42, Kolkata, India, <strong>2012</strong>. ISI.<br />

Leveling, J. (<strong>2012</strong>). On the effect of stopword removal for SMS-based FAQ retrieval. In Gosse Bouma, Ashwin Ittoo, Elisabeth Métais, and Hans Wortmann,<br />

editors, Natural Language Processing and Information Systems – 17th International Conference on Applications of Natural Language to Information<br />

Systems, NLDB <strong>2012</strong>, 26-28 June, Groningen, The Netherlands, Proceedings, volume 7337 of Lecture Notes in Computer Science (LNCS), pages 128-139.<br />

Springer, <strong>2012</strong>.<br />

Leveling, J., Goeuriot, L., Kelly, L., Jones, G.J.F. (<strong>2012</strong>). DCU@TRECMed <strong>2012</strong>: Using ad-hoc baselines for domain-specific retrieval. In Proceedings of<br />

TREC <strong>2012</strong>. NIST, <strong>2012</strong>.<br />

Leveling, J., Jones, G., Ganguly, D. (<strong>2012</strong>). Topical Relevance Models. In Proceedings of the Eighth ASIA Information Retrieval Societies Conference<br />

(AIRS <strong>2012</strong>), December <strong>2012</strong>, Tianjin, China<br />

Leveling, J., Jones, G.F. (<strong>2012</strong>). Making Results Fit Into 40 Characters: A Study in Document Rewriting. In Proceedings of the Thirty-Fifth <strong>Annual</strong><br />

International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR <strong>2012</strong>), August <strong>2012</strong>, Portland, USA,<br />

Lewis, D., O’Connor, A., Zydroń, A., Sjögren, G., Choudhury (<strong>2012</strong>). On Using Linked Data for Language Resource Sharing in the Long Tail of the<br />

Localisation Market. In Proceedings of Language Resources and Evaluation Conference (LREC), May <strong>2012</strong>, Istanbul, Turkey<br />

Lewis, D., O’Connor, A., Molines, S., Finn, L., Jones, D., Curran, S. and Lawless, S. (<strong>2012</strong>). Linking localisation and language resources, Linked Data in<br />

Linguistics. Lecture Notes in Computer Science (LNCS), 7-9 March <strong>2012</strong>, Frankfurt/Main, Germany, Springer-Verlag,<br />

Li, J., Tu, Z., Zhou, G., van Genabith, J. (<strong>2012</strong>). Head-Driven Hierarchical Phrase-based Translation. In Proceedings of the 50th <strong>Annual</strong> Meeting of the<br />

Association for Computational Linguistics (ACL-<strong>2012</strong>), Jeju, Korea, Association for Computational Linguistics [PDF, 317 KB]<br />

Li, J., Tu, Z., Zhou, G., van Genabith, J. (<strong>2012</strong>). Using Syntactic Head Information in Hierarchical Phrase-based Translation. In Proceedings of the Seventh<br />

Workshop on Statistical Machine Translation (WMT <strong>2012</strong>), Montreal, Canada<br />

Lynch, G., Moreau, E., Vogel, C. (<strong>2012</strong>). A Naïve Bayes classifier for automatic correction of preposition and deteminer errors in ESL text. In Proceedings<br />

of the Seventh Workshop on Innovative Use of NLP for Building Educational Applications, June <strong>2012</strong>, Montreal, Canada<br />

Lynch, G., Vogel, C. (<strong>2012</strong>). Towards the Automatic Detection of the Source Language of a Literary Translation. In Proceedings of the 24th International<br />

Conference on Computational Linguistics (Coling <strong>2012</strong>), Mumbai, India<br />

Maldonado-Guerra, A., and Emms, M. (<strong>2012</strong>). First-order and second-order context representations: geometrical considerations and performance in<br />

word-sense disambiguation and discrimination. In Proceedings of the 11es Journées internationales d’Analyse statistique des Données Textuelles<br />

(JADT <strong>2012</strong>), Liège.<br />

Mamani Sanchez, L. and Vogel, C. (<strong>2012</strong>). Emoticons Signal Expertise in Technical Web Fora. Special Session: Computational Intelligence in Emotional<br />

or Affective Systems. In Proceedings of the 22nd Italian Workshop on Nueral Networks. Smart Innovation, Systems and Technologies, Salerno, Italy


Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 127<br />

Refereed Conference and Workshop Papers<br />

Mamani Sanchez, L. and Vogel, C. (<strong>2012</strong>). Epistemic Signals and Emoticons Affect Kudos. In 3rd IEEE international Conference on Cognitive<br />

Infocommunications, Kosice, Slovenia<br />

McAuley, J., Lewis, D., O’Connor, A. (<strong>2012</strong>). Exploring reflection in online communities. In Learning Analytics and Knowledge (LAK12), Vancouver,<br />

Canada: ACM<br />

Min, J., Lopes, C., Leveling, J., Schmidtke, D., Jones, G.J.F. (<strong>2012</strong>). Multi-Platform Image Search using Tag Enrichment. In Proceedings of the Thirty-Fifth<br />

<strong>Annual</strong> International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR <strong>2012</strong>), August <strong>2012</strong>, Portland, USA<br />

Moreau, E. (<strong>2012</strong>). Quality Estimation: a experimental study using unsupervised similarity measures. In Proceedings of the Seventh Workshop on Statistical<br />

Machine Translation, Montreal, Canada<br />

Mulwa, C., Lawless, S., Sharp, M., Wade, V. (<strong>2012</strong>). The Evaluation of Adaptive Technology Enhanced Learning Systems, E-LEARN <strong>2012</strong> – World Conference<br />

on E-Learning in Corporate, Government, Healthcare and Higher Education, Montreal, Canada<br />

Ogbureke, U. K., Cabral, J., Carson-Berndsen, J. (<strong>2012</strong>). Explicit Duration Modelling in HMM-based Speech Synthesis Using Continuous Hidden Markov<br />

Model. In The 11th International Conference on Information Sciences, Signal Processing and their Applications (ISSPA <strong>2012</strong>), 3-5 July <strong>2012</strong>, Montreal,<br />

Canada<br />

Ogbureke, U. K., Cabral, J., Carson-Berndsen, J. (<strong>2012</strong>). Using Multilayer Perceptron for Voicing Strength Estimation in HMM-based Speech Synthesis.<br />

In The 11th International Conference on Information Sciences, Signal Processing and their Applications, 3-5 July <strong>2012</strong>, Montreal, Canada<br />

O’Keeffe I. (<strong>2012</strong>) Multimedia Localisation: Cultural Implications for XLIFF. In The 2nd International XLIFF Symposium, Warsaw, Poland<br />

O’Keeffe I. (<strong>2012</strong>). Multimedia Localisation: Cultural Implications for the Adaptation of Multimedia Content. In Proceedings of 4th Conference of the<br />

International Association for Translation and Intercultural Studies, Queen’s University Belfast, Northern Ireland, UK<br />

O’Keeffe I., (<strong>2012</strong>). A Mechanism for Facilitating Emotional Regulation through Music. <strong>2012</strong> CUES <strong>Annual</strong> Conference – Regulating Emotions:<br />

Contemporary Understandings and Interdisciplinary Perspective, Limerick, Ireland<br />

O’Keeffe, I., O’Connor, A., Lawless, S., Wade, V. (<strong>2012</strong>). Linked Open Corpus Models, Leveraging the Semantic Web for Adaptive Hypermedia.<br />

In Proceedings of the 23rd ACM Conference on Hypertext and Social Media, HT <strong>2012</strong>, Milwaukee, USA<br />

Pecina, P., Toral, A., van Genabith, J. (<strong>2012</strong>). Simple and Effective Parameter Tuning for Domain Adaptation of Statistical Machine Translation, COLING<br />

<strong>2012</strong>, Mumbai, India<br />

Sah, M. and Wade, V. (<strong>2012</strong>). A Novel Concept-based Search for the Web of Data using UMBEL and a Fuzzy Retrieval Model. In Proceedings of 9th<br />

Extended Semantic Web Conference (ESWC12), May <strong>2012</strong>, Crete, Greece<br />

Sah, M. and Wade, V. (<strong>2012</strong>). A Novel Concept-based Search for the Web of Data. In Proceedings of the 8th International I-SEMANTICS Conference Posters<br />

& Demonstrations Track, Graz, Austria<br />

Schneider, A., Luz, S. (<strong>2012</strong>) Speaker alignment in synthesised, machine translated communication. In International Workshop on Spoken Language<br />

Translation, December 2011, San Francisco, USA<br />

Szekely E., Ahmed, Z., Steiner, I., Carson-Berndsen, J. (<strong>2012</strong>). Facial expressions as an input annotation modality for affective speech-to-speech<br />

translation, Multimodal Analyses enabling Artificial Agents in Human-Machine Interaction, Santa Cruz, USA<br />

Szekely E., Cabral, J., Carson-Berndsen, J. (<strong>2012</strong>). WinkTalk: a demonstration of a multimodal speech synthesis platform linking facial expressions to<br />

expressive synthetic voices. In Third Workshop on Speech and Language Processing for Assistive Technologies (SLPAT), Montreal, Canada<br />

Szekely, E. (<strong>2012</strong>). Detecting a Targeted Voice Style in an Audiobook using Voice Quality Features. In Proceedings of IEEE International Conference on<br />

Acoustics, Speech and Signal Processing (ICASSP <strong>2012</strong>), March <strong>2012</strong>, Kyoto, Japan<br />

Szekely, E., Abou-zleikha, M., Cabral, J., Carson-Berndsen, J. (<strong>2012</strong>). Evaluating expressive speech synthesis from audiobooks in conversational phrases,<br />

LREC <strong>2012</strong>, Istanbul, Turkey<br />

Szekely, E., Csapot, T., Toth, B., Mihajlik, P., Carson-Berndsen, J. (<strong>2012</strong>). Synthesizing expressive speech from amateur audiobook recordings.<br />

In Proceedings of IEEE Workshop on Spoken Language Technology, December <strong>2012</strong>, Florida, USA<br />

Szekely, E., Kane, J., Scherer, S., Gobl, C., Carson-Berndsen, J. (<strong>2012</strong>). Detecting a targeted voice style in an audiobook using voice quality features.<br />

In Proceedings of ICASSP, Kyoto, Japan<br />

Truran, M., Georg, G., Cavazza, M., Zhou, D. (<strong>2012</strong>). A Section Title Authoring Tool for Clinical Guidelines. In Proceedings of 12th ACM Symposium<br />

on Document Engineering (DocEng <strong>2012</strong>), 4-7 September, Paris, France, 41-44.<br />

Tu, Z., He, Y., Foster, J., van Genabith, J., Liu, Q. and Lin, S. (<strong>2012</strong>). Identifying High-Impact Sub-Structures for Convolution Kernels in Document-level<br />

Sentiment Classification. In Proceedings of the 50th <strong>Annual</strong> Meeting of the Association for Computational Linguistics, July <strong>2012</strong>, Jeju, Republic of Korea


128<br />

Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />

APPENDIX 2: OUTPUTS<br />

Refereed Conference and Workshop Papers<br />

Tu, Z., Liu, Y., He, Y., van Genabith, J., Liu, Q., Lin, S. (<strong>2012</strong>). Combining Multiple Alignments to Improve Machine Translation, COLING <strong>2012</strong>, Mumbai,<br />

India<br />

Veale, T. (<strong>2012</strong>) Detecting and Generating Ironic Comparisons: An Application of Creative Information Retrieval. AAAI Fall Symposium Series <strong>2012</strong><br />

Veale, T. and Hao, Y. (<strong>2012</strong>). In the Mood for Affective Search. In Proceedings of WWW’<strong>2012</strong>, the 21st World-Wide-Web conference, Lyon, France<br />

Veale, T. and Li, G. (<strong>2012</strong>). Specifying Viewpoint and Information Need with Affective Metaphors: A System Demonstration of Metaphor Magnet.<br />

In Proceedings of ACL’<strong>2012</strong>, the 50th <strong>Annual</strong> Conference of the Association for Computational Linguistics, Jeju, South Korea<br />

Wagner J., Foster, J., Cetinoglu, O., Nivre, J., Hogan, D., Le Roux, J., van Genabith, J. (<strong>2012</strong>). From News to Comment: Resources and Benchmarks for<br />

Parsing the Language of Web 2.0. In 5th International Joint Conference on Natural Language Processing (IJCNLP), Chiang Mai, Thailand<br />

Wagner, J., Bryl, A., Foster, J., Le Roux, J., Kaljahi, R. (<strong>2012</strong>). DCU-Paris13 Systems for the SANCL <strong>2012</strong> Shared Task, First Workshop on Syntactic Analysis of<br />

Non-Canonical Language (SANCL), Montreal, Canada<br />

Wagner, J., Cetinoglu, O., Foster, J., Hogan, S., Le Roux, J. (<strong>2012</strong>). #hardtoparse: POS Tagging and Parsing the Twitterverse. In Workshop on Analyzing<br />

Microtext at the Twenty-Fifth Conference on Artificial Intelligence (AAAI-11), San Francisco, USA<br />

Wagner, J., Cetinoglu, O., van Genabith, J., Foster, J. (<strong>2012</strong>). Comparing the use of edited and unedited text in parser self-training. The 12th International<br />

Conference on Parsing Technologies (IWPT 2011), Dublin, Ireland<br />

Zahra A., Carson-Berndsen, J. (<strong>2012</strong>). English to Indonesian Transliteration to Support English Pronunciation Practice. In Proceedings of the eighth<br />

international conference on Language Resources and Evaluation (LREC), Istanbul, Turkey<br />

Zahra, A., Cabral, J., Carson-Berndsen, J., Kane, M. (<strong>2012</strong>). Automatic Classification of Pronunciation Errors Using Decision Trees and Speech Recognition<br />

Technology. In Proceedings of International Symposium on Automatic Detection of Errors in Pronunciation Training (IS ADEPT), Stockholm, Sweden<br />

Zeeshan A., Cahill, P., Carson-Berndsen, J. (<strong>2012</strong>). Phonetically aided Syntactic Parsing of Spoken Language. In Proceedings of The KONVENS, the 11th<br />

Conference on Natural Language Processing, Vienna, Austria<br />

Zeeshan, A., Cahill, P., Carson-Berndsen, J., Jiang, J., Way, A. (<strong>2012</strong>). Hierarchical Phrase-Based MT for Phonetic Representation-Based Speech<br />

Translation. In Proceedings The Tenth Biennial Conference of the Association for Machine Translation in the Americas, San Diego, California<br />

Zhou, D., Lawless, S., Wade, V. (<strong>2012</strong>). Web Search Personalization Using Social Data. In Proceedings of the 16th International Conference on Theory<br />

and Practice of Digital Libraries (TPDL <strong>2012</strong>), Pafos, Cyprus<br />

Not Recorded in 2011 <strong>Annual</strong> <strong>Report</strong><br />

Refereed Conference and Workshop Papers<br />

Doherty, S. Exploring the Cognitive Elements of Think-Aloud Protocols. Show and Tell: Proceedings of the 2011 SALIS Postgraduate Showcase<br />

Abgaz, Y., Javed, M., Pahl, C. A Framework for Change Impact Analysis of Ontology-driven Content-based Systems. In Proceedings of On the Move to<br />

Meaningful Internet Systems: OTM Workshops. 7th International IFIP Workshop on Semantic Web and Web Semantics (SWWS), October, 2011, Crete,<br />

Greece<br />

Book Chapters<br />

Asanka Wasala, R., Buckley, J., Exton, C., Schaler, R., Weerasinghe, A. R. (<strong>2012</strong>). Building Multilingual Language Resources in Web Localisation:<br />

A Crowdsourcing Approach. In I. Gurevych and J. Kim (Eds.), The People’s Web Meets NLP: Collaboratively Constructed Language Resources,<br />

Springer Verlag Berlin Heidelberg [In Press]<br />

Banerjee, P. (<strong>2012</strong>). In Alexander Clark, Chris Fox and Shalom Lappin (eds.): Handbook of computational linguistics and natural language processing.<br />

Machine Translation. 10.1007/s10590-012-9124-2 (OnlineFirst)<br />

“Kane, M., Mauclair, J. and Carson-Berndsen, J. (2011). Automatic Identification of Phonetic Similarity based on Underspecification. Human Language<br />

Technology: Challenges for Computer Science and Linguistics. Lecture Notes in Computer Science (LNCS 6562) Poznan, pp.47-58<br />

Morera Mesa, A., Collins, J.J., Aouad, L. (<strong>2012</strong>). Assessing Support for Community Workflows in Localisation. In Florian Daniel, Kamel Barkaoui and<br />

Schahram Dustdar (Eds.) Business Process Management Workshops, BPM 2011 International Workshops Clermont-Ferrand, France, August 29, 2011,<br />

Revised Selected Papers, Part I, Lecture Notes in Business Information Processing (LNBIP) volume 99, part 3, pp 195-206, Springer Berlin Heidelberg<br />

O’Keeffe, I., Aouad, L., Collins, J.J., Asanka Wasala, R., Nishio, N., Morera Mesa, A., Morado Vázquez, L., Ryan, L., Gupta, R., Schaler, R. (<strong>2012</strong>).<br />

A View of Future Technologies and Challenges for the Automation of Localisation Processes: Visions and Scenarios. ICHIT (2) 2011: 371-382


Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 129<br />

Refereed Original Articles<br />

Kane, J. and Gobl, C. Wavelet maxima dispersion for breathy to tense voice discrimination, IEEE Transactions on Audio, Speech and Language Processing<br />

[In Press]<br />

Doherty, G., Karamanis, N., Luz, S. (<strong>2012</strong>). Collaboration in translation: The impact of increased reach on cross-organisational work. Computer Supported<br />

Cooperative Work (CSCW), August <strong>2012</strong><br />

Ghorab, R.M., Zhou, D., O’Connor, A., Wade, V. (<strong>2012</strong>). Personalised Information Retrieval: survey and classification. User Modeling and User-Adapted<br />

Interaction (UMUAI), <strong>2012</strong>.<br />

Kane, J., Drugman, T., Gobl, C. Improved automatic detection of creak, Computer Speech and Language [In Press]<br />

Karamanis, N., Doherty, G., Luz, S. (<strong>2012</strong>). Collaboration in translation: The impact of increased reach on cross-organisational work, Computer Supported<br />

Cooperative Work (CSCW), August <strong>2012</strong><br />

Lambert, P., Petitrenaud S., Ma Y., Way A., (<strong>2012</strong>). What types of word alignment improve statistical machine translation In Machine Translation, Volume<br />

26,(4), edited by Springer, p.289-323, <strong>2012</strong><br />

Moorkens, J. (<strong>2012</strong>). A mixed-methods study of consistency in Translation Memories. Localisation Focus, Volume 11(1)<br />

Morena Mesa, A. (<strong>2012</strong>). Translation and localization project management: the art of the possible. In Keiran J. Dunne and Elena S. Dunne (eds.).<br />

Translation and localization project management: the art of the possible<br />

Mulwa, C., Lawless, S., O’Keeffe, I., Sharp, M., Wade, V. (<strong>2012</strong>). A recommender Framework for the Evaluation of End User Experience in Adaptive<br />

Technology enhanced Learning Systems. International Journal of Technology Enhanced Learning, IJTEL, Special Issue on “Datasets and Data Supported<br />

Learning in Technology-Enhanced Learning”, Vol. 4, pp. 67-84, Nos. 1/2, <strong>2012</strong><br />

O’Keeffe, I. (<strong>2012</strong>). Soundtrack Localisation: Culturally Adaptive Music Content for Computer Games, Journal of Internationalisation and Localisation<br />

Rami Ghorab, M., Zhou, D., O’Connor, A., Wade, V. (<strong>2012</strong>). Personalised Information Retrieval: Survey and Classification, In User Modeling and User<br />

Adapted Interaction (UMUAI) Journal1-63 (Published Online First: http://dx.doi.org/10.1007/s11257-012-9124-1), Springer.<br />

Ryan, L. (<strong>2012</strong>). Global Authoring Resources, Communicator, Spring <strong>2012</strong>, ISTC<br />

Ryan, L. (<strong>2012</strong>). Global Authoring Techniques. Communicator, Autumn <strong>2012</strong>, ISTC<br />

Ryan, L. (<strong>2012</strong>). Global Diversity and Localisation Issues. Communicator, Summer <strong>2012</strong>, ISTC XXX<br />

Sah, M. and Wade, V. (<strong>2012</strong>). Automatic Metadata Mining from Multilingual Enterprise Content. In Web Semantics: Science, Services and Agents on the<br />

World Wide Web, Volume 11, issue (March, <strong>2012</strong>), p. 41-62<br />

Van Der Sluis, I., Luz, S., Breitfus, W., Ishizuka, M., Prendinger, H. (<strong>2012</strong>). Cross-cultural assessment of automatically generated multimodal referring<br />

xpressions in a virtual world. International Journal of Human-Computer Studies, Volume 70, Issue 9, <strong>2012</strong><br />

Pages 611-619<br />

Van der Sluis, I., Luz, S., Breitfuß, W., Ishizuka, M., Prendinger, H. (<strong>2012</strong>). Cross-cultural assessment of automatically generated multimodal referring<br />

expressions in a virtual world. International Journal of Human-Computer Studies, 70(9):611-629, <strong>2012</strong>.<br />

Wasala, A., Schmidtke, D., Schaler, R. (<strong>2012</strong>). XLIFF and LCS Format: A Comparison. Localisation Focus, Volume 11(1)<br />

Zhou, D., Lawless, S., Wade, V. (<strong>2012</strong>). Improving search via personalized query expansion using social media, Information Retrieval, 15(3-4), 218-242<br />

Zhou, D., Truran, M., Brailsford, T., Wade, V., Ashman, H. (<strong>2012</strong>). Translation Techniques in Cross-Language Information Retrieval. ACM Computing<br />

Surveys (CSUR), 45(1), Article 1, 1-44. <strong>2012</strong><br />

Conference Presentations<br />

Abagaz, Y., Javed, M., Pahl, C. (<strong>2012</strong>). Dependency Analysis in Ontology-driven Content-based Systems. In 12th International Conference on Artificial<br />

Intelligence and Soft Computing (ICAISC<strong>2012</strong>), Zakopane, Poland<br />

Abou-Zleikha M., Cahill, P., Carson-Berndsen, J. (<strong>2012</strong>). Pitch Recovery of Missing Syllables Using Sparse Representation in Exemplar-based Pitch<br />

Generation. In Proceedings of the 11th International Conference on Information Sciences, Signal Processing and their Applications, Montreal, Canada<br />

Abou-Zleikha, M., Cahill, P., Carson-Berndsen, J. (<strong>2012</strong>). Exemplar-based pitch contour generation using DOP for syntatic tree decomposition.<br />

In Proceedings 37th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP <strong>2012</strong>, Kyoto, Japan<br />

Abou-Zleikha, M., Szekely, E., Cahill, P., Carson-Berndsen, J. (<strong>2012</strong>). Multi-level Exemplar-based Duration Generation for Expressive Speech Synthesis.<br />

In Proceedings 6th International Conference on Speech Prosody <strong>2012</strong>, Shanghai, China<br />

Almaghout, H., Jiang, J., Way, A. (<strong>2012</strong>). Extending CCG-based Syntactic Constraints in Hierarchical Phrase-Based SMT In Proceedings of the 16th <strong>Annual</strong><br />

Conference of European Association of Machine Translation (EAMT-<strong>2012</strong>). Trento, Italy


130<br />

Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />

APPENDIX 2: OUTPUTS<br />

Conference Presentations<br />

Asanka Wasala, A., Schaler, R., Weerasinghe, R. Exton, C. (<strong>2012</strong>). Collaboratively Building Language Resources while Localising the Web. In Proceedings<br />

of ACL <strong>2012</strong>: 3rd workshop on the People’s Web meets NLP: Collaboratively Constructed Semantic Resources and their Applications to NLP, Jeju,<br />

Republic of Korea<br />

Attia, M., Pecina, P., Samih, Y., Shaalan, K., van Genabith, J. (<strong>2012</strong>). Improved Spelling Error Detection and Correction for Arabic, COLING <strong>2012</strong>, Mumbai, India<br />

Attia, M., Samih, Y., Shaalan, K., Genabith, J. (<strong>2012</strong>). The Floating Arabic Dictionary: An Automatic Method for Updating a Lexical Database through the<br />

detection and lemmatization of the Unknown Word. In The International Conference on Computational Linguistics (COLING), December <strong>2012</strong>, Mumbai,<br />

India<br />

Banerjee P., Naskar, S., Way, A, van Genabith, J., Roturier, J. (<strong>2012</strong>). Supplementary Data Selection by Incremental Update of Translation Models. In the<br />

24th International Conference on Computational Linguistics, Mumbai, India<br />

Banerjee, P., Naskar, S., Roturier, J., Way, A., van Genabith, J. (<strong>2012</strong>). Domain Adaptation in SMT of User-Generated Forum Content Guided by OOV Word<br />

Reduction: Normalization and/or Supplementary Data In Proceedings of the 16th <strong>Annual</strong> Conference of European Association of Machine Translation<br />

(EAMT-<strong>2012</strong>), Trento, Italy<br />

Cabral, C., Kane, M., Ahmed, Z., Abou-Zleikha, M., Szekely, E., Zahra, A., U. Ogbureke, K., Cahill, P., Carson-Berndsen, J., Schlogl, S. (<strong>2012</strong>). Rapidly<br />

Testing the Interaction Model of a Pronunciation Training System via Wizard-of-Oz. In Proceedings of the LREC International Conference on Language<br />

Resources and Evaluation (LREC), Istanbul, Turkey<br />

Cabral, J. P. and Carson-Berndsen, J. (<strong>2012</strong>). Controlling Voice Source Parameters to Transform Characteristics of Synthetic Voices. In Listening Talker<br />

(LISTA) Workshop, Edinburgh, UK<br />

Dandapat, S., Morrissey, S., Way, A., van Genabith, J. (<strong>2012</strong>). Combining EBMT, SMT, TM and IR Technologies for Quality and Scale. In Proceedings of the<br />

Joint Workshop on Exploiting Synergies between Information Retrieval and Machine Translation (ESIRMT) and Hybrid Approaches to Machine Translation<br />

(HyTra), a workshop in EACL <strong>2012</strong>, Avignon, France<br />

Doherty S., Kenny, D., Way, A. (<strong>2012</strong>). Taking Statistical Machine Translation to the Student Translator, AMTA <strong>2012</strong>, San Diego, USA<br />

Doherty, S. and Moorkens, J. (<strong>2012</strong>). An Experiential Analysis of Translation Technology Labs. 2nd <strong>Annual</strong> Conference of Education and Humanities,<br />

30 March <strong>2012</strong>, St. Patrick’s College, Ireland<br />

Doherty, S. and O’Brien, S. (<strong>2012</strong>). A User-Based Usability Assessment of Raw Machine Translated Technical Instructions. Conference of the Association<br />

for Machine Translation in the Americas (AMTA <strong>2012</strong>), San Diego, USA<br />

Drugman, T., Kane, J., Gobl, C. (<strong>2012</strong>) Resonator-based creaky voice detection. In Proceedings of Interspeech <strong>2012</strong>, Orgeon, USA<br />

Emms, M. (<strong>2012</strong>). On Stochastic Tree Distances and their training via Expectation-Maximisation. In Proceedings of ICPRAM <strong>2012</strong> International Conference<br />

on Pattern Recognition Application and Methods, Portugal<br />

Filip, D. (<strong>2012</strong>). Managing Industry Wisdom as a Portfolio of Technical Standards, in Management Re-Imagined. Presented at the International Federation<br />

of Scholarly Associations of Management (IFSAM <strong>2012</strong>), Limerick, Ireland.<br />

Filip, D. (<strong>2012</strong>). Using Business Process Management and Modelling to Analyse the Role of Human Translators and Reviewers in Bitext Management<br />

Workflows. In International Association for Translation and Intercultural Studies (IATIS <strong>2012</strong>), Presented at the IATIS <strong>2012</strong>, Belfast, UK<br />

Filip, D., Lewis, D., Sasaki, F. (<strong>2012</strong>). The Multilingual Web. In Proceedings of the 21st World Wide Web Conference WWW<strong>2012</strong>, April 16-20, <strong>2012</strong>, Lyon,<br />

France, ACM proceedings 978-1-4503-1229-5/11/04<br />

Filip, D., Lewis, D., Wasala, A., Jones, D., Finn, L. (<strong>2012</strong>). CMSL10n SOLAS Integration as an ITS 2.0 XLIFF test bed. Paper presented at the W3C<br />

MultilingualWeb (ITS 2.0) Track, FEISGILTT <strong>2012</strong> (collocated with Localization World <strong>2012</strong>), Seattle, USA.<br />

Ganguly, D., Leveling, J., Jones, G. (<strong>2012</strong>.) Topical Relevance Models, CIKM <strong>2012</strong>, Hawaii, USA<br />

Ganguly, D., Leveling, J., Jones., J. (<strong>2012</strong>). Approximate Sentence Retrieval for Scalable and Efficient Example-based Machine Translation. The 24th<br />

International Conference on Computational Linguistics (COLING <strong>2012</strong>), Mumbai, India<br />

Gauguly D., Jones, G. (<strong>2012</strong>). Cross-Lingual Topical Relevance Models. The 24th International Conference on Computational Linguistics (COLING <strong>2012</strong>),<br />

Mumbai, India<br />

Ghorab, M. R., Zhou, D., Lawless, S., Wade, V. (<strong>2012</strong>). Multilingual User Modeling for Personalized Re-ranking of Multilingual Web Search Results.<br />

In Conference on User Modeling, Adaptation, and Personalization (UMAP <strong>2012</strong>), Montreal, Canada<br />

Graham Y. (<strong>2012</strong>). Deep Syntax in Statistical Machine Translation. Lexical Functional Grammar Conference, Udayana University, Bali, Indonesia<br />

Javed, M., Abgaz, Y., Pahl, C. (<strong>2012</strong>). Composite Ontology Change Operators and their Customizable Evolution Strategies, 2nd Joint Workshop on<br />

Knowledge Evolution and Ontology Dynamics, Boston, USA<br />

Kale Ogbureke U., Cabral, J., Carson-Berndsen, J. (<strong>2012</strong>). Using Noisy Speech to Study the Robustness of a Continuous F0 Modelling Method in HMMbased<br />

Speech Synthesis<br />

Kane J., Scherer, Kane, J., Gobl, C., Schwenker, F. (<strong>2012</strong>). The Effect of Fuzzy Training Targets on Voice Quality Classification, Interspeech <strong>2012</strong>, Portland, USA


Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 131<br />

Conference Presentations<br />

Kane, B., Toussaint, P., Luz, S. Shared decision making needs a communication record. To appear in Proceedings of the 16th ACM Conference on Computer<br />

Supported Cooperative Work and Social Computing (CSCW 2013), San Antonio, Texas<br />

Kane, J. and Gobl, C. (<strong>2012</strong>). Identifying regions of non-modal phonation using features of the wavelet transform. In Proceedings of Interspeech <strong>2012</strong>,<br />

Florence, Italy<br />

Kane, J., Papay, K., Hunyadi, L., Gobl, C. (<strong>2012</strong>). On the use of creak in Hungarian spontaneous speech. In Proceedings of ICPhS 2011, Hong Kong, China<br />

Kane, J., Scherer, Layher, G., Neumann, H. (<strong>2012</strong>). An audiovisual political speech analysis incorporating eye-tracking and perception data. The eighth<br />

international conference on Language Resources and Evaluation (LREC <strong>2012</strong>), Istanbul, Turkey<br />

Kane, J., Yanushevskaya, I., Ní Chasaide, A., Gobl, C. (<strong>2012</strong>). Exploiting time and frequency domain measures for precise voice source parameterisation.<br />

In Proceedings of Speech Prosody <strong>2012</strong>, Shanghai, China<br />

Kane, M., Ahmed, Z., Carson-Berndsen, J. (<strong>2012</strong>). Underspecification in Pronunciation Variation. In Proceedings of the International Symposium on<br />

Automatic Detection of Errors in Pronunciation Training (IS ADEPT), Stockholm, Sweden<br />

Kane. J., Oertel, C. (<strong>2012</strong>). Conversational involvement and multimodal cues: summary and outlook. Fonetic <strong>2012</strong>, Gothenburg, Sweden<br />

Levacher, K., Lawless S., Wade V. (<strong>2012</strong>). Slicepedia: Towards Long Tail Resource Production through Open Corpus Reuse. In Proceedings of International<br />

Conference on Web-based Learning (ICWL <strong>2012</strong>), Sinai, Romania<br />

Levacher, K., Lawless, S., Wade, V. (<strong>2012</strong>). Slicepedia: Automating the Production of Learning Objects from Open Corpus Content. In Proceedings of The<br />

European Conference on Technology Enhanced Learning (EC-TEL), September <strong>2012</strong>, Paphos, Cyprus<br />

Levacher, K., Lawless, S., Wade, V. (<strong>2012</strong>). Slicepedia: Providing Customized Reuse of Open-Web Resources for Adaptive Hypermedia. In Proceedings<br />

of the 23rd ACM conference on Hypertext and SocialMedia (HT ‘12), Milwaukee, USA<br />

Leveling, J., Jones, G., Ganguly, D. (<strong>2012</strong>). Topical Relevance Models. In Proceedings of the Eighth ASIA Information Retrieval Societies Conference (AIRS<br />

<strong>2012</strong>), December <strong>2012</strong>, Tianjin, China<br />

Leveling, J., Jones, G.F. (<strong>2012</strong>). Making Results Fit Into 40 Characters: A Study in Document Rewriting. In Proceedings of the Thirty-Fifth <strong>Annual</strong><br />

International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR <strong>2012</strong>), August <strong>2012</strong>, Portland, USA,<br />

Lewis, D., O’Connor, A., Zydroń, A., Sjögren, G., Choudhury (<strong>2012</strong>). On Using Linked Data for Language Resource Sharing in the Long Tail of the<br />

Localisation Market. In Proceedings of Language Resources and Evaluation Conference (LREC), May <strong>2012</strong>, Istanbul, Turkey<br />

Lewis, D., O’Connor, A., Molines, S., Finn, L., Jones, D., Curran, S. and Lawless, S. (<strong>2012</strong>). Linking localisation and language resources, Linked Data in<br />

Linguistics. Lecture Notes in Computer Science (LNCS), 7-9 March <strong>2012</strong>, Frankfurt/Main, Germany, Springer-Verlag,<br />

Li, J., Tu, Z., Zhou, G., van Genabith, J. (<strong>2012</strong>). Head-Driven Hierarchical Phrase-based Translation. In Proceedings of the 50th <strong>Annual</strong> Meeting of the<br />

Association for Computational Linguistics (ACL-<strong>2012</strong>), Jeju, Korea, Association for Computational Linguistics [PDF, 317 KB]<br />

Li, J., Tu, Z., Zhou, G., van Genabith, J. (<strong>2012</strong>). Using Syntactic Head Information in Hierarchical Phrase-based Translation. In Proceedings of the Seventh<br />

Workshop on Statistical Machine Translation (WMT <strong>2012</strong>), Montreal, Canada<br />

Lynch, G., Moreau, E., Vogel, C. (<strong>2012</strong>). A Naïve Bayes classifier for automatic correction of preposition and deteminer errors in ESL text. In Proceedings<br />

of the Seventh Workshop on Innovative Use of NLP for Building Educational Applications, June <strong>2012</strong>, Montreal, Canada<br />

Lynch, G., Vogel, C. (<strong>2012</strong>). Towards the Automatic Detection of the Source Language of a Literary Translation. In Proceedings of the 24th International<br />

Conference on Computational Linguistics (Coling <strong>2012</strong>), Mumbai, India<br />

Maldonado-Guerra, A., and Emms, M. (<strong>2012</strong>). First-order and second-order context representations: geometrical considerations and performance in<br />

word-sense disambiguation and discrimination. In Proceedings of the 11es Journées internationales d’Analyse statistique des Données Textuelles (JADT<br />

<strong>2012</strong>), Liège.<br />

Mamani Sanchez, L. and Vogel, C. (<strong>2012</strong>). Emoticons Signal Expertise in Technical Web Fora. Special Session: Computational Intelligence in Emotional<br />

or Affective Systems. In Proceedings of the 22nd Italian Workshop on Nueral Networks. Smart Innovation, Systems and Technologies, Salerno, Italy<br />

Mamani Sanchez, L. and Vogel, C. (<strong>2012</strong>). Epistemic Signals and Emoticons Affect Kudos. In 3rd IEEE international Conference on Cognitive<br />

Infocommunications, Kosice, Slovenia<br />

McAuley, J., Lewis, D., O’Connor, A. (<strong>2012</strong>). Exploring reflection in online communities. In Learning Analytics and Knowledge (LAK12), Vancouver,<br />

Canada: ACM<br />

Min, J., Lopes, C., Leveling, J., Schmidtke, D., Jones, G.J.F. (<strong>2012</strong>). Multi-Platform Image Search using Tag Enrichment. In Proceedings of the Thirty-Fifth<br />

<strong>Annual</strong> International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR <strong>2012</strong>), August <strong>2012</strong>, Portland, USA<br />

Moorkens, J., Doherty, S., Kenny, D., O’Brien, S. 2013 (forthcoming). A Virtuous Circle: Laundering Translation Memory Data using Statistical Machine<br />

Translation. Tralogy Conference, January 2013, Paris, France<br />

Moreau, E. (<strong>2012</strong>). Quality Estimation: a experimental study using unsupervised similarity measures. In Proceedings of the Seventh Workshop on Statistical<br />

Machine Translation, Montreal, Canada


132<br />

Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />

APPENDIX 2: OUTPUTS<br />

Conference Presentations<br />

Mulwa, C., Lawless, S., Sharp, M., Wade, V. (<strong>2012</strong>). The Evaluation of Adaptive Technology Enhanced Learning Systems, E-LEARN <strong>2012</strong> – World Conference<br />

on E-Learning in Corporate, Government, Healthcare and Higher Education, Montreal, Canada<br />

Ogbureke, U. K., Cabral, J., Carson-Berndsen, J. (<strong>2012</strong>). Explicit Duration Modelling in HMM-based Speech Synthesis Using Continuous Hidden Markov<br />

Model. In The 11th International Conference on Information Sciences, Signal Processing and their Applications (ISSPA <strong>2012</strong>), 3-5 July <strong>2012</strong>, Montreal, Canada<br />

Ogbureke, U. K., Cabral, J., Carson-Berndsen, J. (<strong>2012</strong>). Using Multilayer Perceptron for Voicing Strength Estimation in HMM-based Speech Synthesis.<br />

In The 11th International Conference on Information Sciences, Signal Processing and their Applications, 3-5 July <strong>2012</strong>, Montreal, Canada<br />

O’Keeffe I. (<strong>2012</strong>) Multimedia Localisation: Cultural Implications for XLIFF. In The 2nd International XLIFF Symposium, Warsaw, Poland<br />

O’Keeffe I. (<strong>2012</strong>). Multimedia Localisation: Cultural Implications for the Adaptation of Multimedia Content. In Proceedings of 4th Conference of the<br />

International Association for Translation and Intercultural Studies, Queen’s University Belfast, Northern Ireland, UK<br />

O’Keeffe I., (<strong>2012</strong>). A Mechanism for Facilitating Emotional Regulation through Music. <strong>2012</strong> CUES <strong>Annual</strong> Conference – Regulating Emotions:<br />

Contemporary Understandings and Interdisciplinary Perspective, Limerick, Ireland<br />

O’Keeffe, I., O’Connor, A., Lawless, S., Wade, V. (<strong>2012</strong>). Linked Open Corpus Models, Leveraging the Semantic Web for Adaptive Hypermedia.<br />

In Proceedings of the 23rd ACM Conference on Hypertext and Social Media, HT <strong>2012</strong>, Milwaukee, USA<br />

Pecina, P., Toral, A., van Genabith, J. (<strong>2012</strong>). Simple and Effective Parameter Tuning for Domain Adaptation of Statistical Machine Translation,<br />

COLING <strong>2012</strong>, Mumbai, India<br />

Sah, M. and Wade, V. (<strong>2012</strong>). A Novel Concept-based Search for the Web of Data using UMBEL and a Fuzzy Retrieval Model. In Proceedings of 9th<br />

Extended Semantic Web Conference (ESWC12), May <strong>2012</strong>, Crete, Greece<br />

Sah, M. and Wade, V. (<strong>2012</strong>). A Novel Concept-based Search for the Web of Data. In Proceedings of the 8th International I-SEMANTICS Conference Posters<br />

& Demonstrations Track, Graz, Austria<br />

Schneider, A., Luz, S. (<strong>2012</strong>) Speaker alignment in synthesised, machine translated communication. In International Workshop on Spoken Language<br />

Translation, December 2011, San Francisco, USA<br />

Szekely E., Ahmed, Z., Steiner, I., Carson-Berndsen, J. (<strong>2012</strong>). Facial expressions as an input annotation modality for affective speech-to-speech<br />

translation, Multimodal Analyses enabling Artificial Agents in Human-Machine Interaction, Santa Cruz, USA<br />

Szekely E., Cabral, J., Carson-Berndsen, J. (<strong>2012</strong>). WinkTalk: a demonstration of a multimodal speech synthesis platform linking facial expressions to<br />

expressive synthetic voices. In Third Workshop on Speech and Language Processing for Assistive Technologies (SLPAT), Montreal, Canada<br />

Szekely, E. (<strong>2012</strong>). Detecting a Targeted Voice Style in an Audiobook using Voice Quality Features. In Proceedings of IEEE International Conference on<br />

Acoustics, Speech and Signal Processing (ICASSP <strong>2012</strong>), March <strong>2012</strong>, Kyoto, Japan<br />

Szekely, E., Abou-zleikha, M., Cabral, J., Carson-Berndsen, J. (<strong>2012</strong>). Evaluating expressive speech synthesis from audiobooks in conversational phrases,<br />

LREC <strong>2012</strong>, Istanbul, Turkey<br />

Szekely, E., Csapot, T., Toth, B., Mihajlik, P., Carson-Berndsen, J. (<strong>2012</strong>). Synthesizing expressive speech from amateur audiobook recordings.<br />

In Proceedings of IEEE Workshop on Spoken Language Technology, December <strong>2012</strong>, Florida, USA<br />

Szekely, E., Kane, J., Scherer, S., Gobl, C., Carson-Berndsen, J. (<strong>2012</strong>). Detecting a targeted voice style in an audiobook using voice quality features.<br />

In Proceedings of ICASSP, Kyoto, Japan<br />

Tu, Z., He, Y., Foster, J., van Genabith, J., Liu, Q. and Lin, S. (<strong>2012</strong>). Identifying High-Impact Sub-Structures for Convolution Kernels in Document-level<br />

Sentiment Classification. In Proceedings of the 50th <strong>Annual</strong> Meeting of the Association for Computational Linguistics, July <strong>2012</strong>, Jeju, Republic of Korea<br />

Tu, Z., Liu, Y., He, Y., van Genabith, J., Liu, Q., Lin, S. (<strong>2012</strong>). Combining Multiple Alignments to Improve Machine Translation, COLING <strong>2012</strong>, Mumbai, India<br />

Veale, T. and Hao, Y. (<strong>2012</strong>). In the Mood for Affective Search. In Proceedings of WWW’<strong>2012</strong>, the 21st World-Wide-Web conference, Lyon, France<br />

Veale, T. and Li, G. (<strong>2012</strong>). Specifying Viewpoint and Information Need with Affective Metaphors: A System Demonstration of Metaphor Magnet.<br />

In Proceedings of ACL’<strong>2012</strong>, the 50th <strong>Annual</strong> Conference of the Association for Computational Linguistics, Jeju, South Korea<br />

Wagner J., Foster, J., Cetinoglu, O., Nivre, J., Hogan, D., Le Roux, J., van Genabith, J. (<strong>2012</strong>). From News to Comment: Resources and Benchmarks for<br />

Parsing the Language of Web 2.0. In 5th International Joint Conference on Natural Language Processing (IJCNLP), Chiang Mai, Thailand<br />

Wagner, J., Bryl, A., Foster, J., Le Roux, J., Kaljahi, R. (<strong>2012</strong>). DCU-Paris13 Systems for the SANCL <strong>2012</strong> Shared Task, First Workshop on Syntactic Analysis<br />

of Non-Canonical Language (SANCL), Montreal, Canada<br />

Wagner, J., Cetinoglu, O., Foster, J., Hogan, S., Le Roux, J. (<strong>2012</strong>). #hardtoparse: POS Tagging and Parsing the Twitterverse. In Workshop on Analyzing<br />

Microtext at the Twenty-Fifth Conference on Artificial Intelligence (AAAI-11), San Francisco, USA<br />

Wagner, J., Cetinoglu, O., van Genabith, J., Foster, J. (<strong>2012</strong>). Comparing the use of edited and unedited text in parser self-training. The 12th International<br />

Conference on Parsing Technologies (IWPT 2011), Dublin, Ireland<br />

Wasala, A., Filip, D., Exton, C., R., Schäler R. (<strong>2012</strong>). Making Data Mining of XLIFF Artefacts Relevant for the Ongoing Development of the XLIFF Standard.<br />

Paper presented at the 3rd International XLIFF Symposium, FEISGILTT <strong>2012</strong> (collocated with Localization World <strong>2012</strong>), Seattle, USA.


Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 133<br />

Conference Presentations<br />

Zahra A., Carson-Berndsen, J. (<strong>2012</strong>). English to Indonesian Transliteration to Support English Pronunciation Practice. In Proceedings of the eighth<br />

international conference on Language Resources and Evaluation (LREC), Istanbul, Turkey<br />

Zahra, A., Cabral, J., Carson-Berndsen, J., Kane, M. (<strong>2012</strong>). Automatic Classification of Pronunciation Errors Using Decision Trees and Speech Recognition<br />

Technology. In Proceedings of International Symposium on Automatic Detection of Errors in Pronunciation Training (IS ADEPT), Stockholm, Sweden<br />

Zeeshan A., Cahill, P., Carson-Berndsen, J. (<strong>2012</strong>). Phonetically aided Syntactic Parsing of Spoken Language. In Proceedings of The KONVENS, the 11th<br />

Conference on Natural Language Processing, Vienna, Austria<br />

Zeeshan, A., Cahill, P., Carson-Berndsen, J., Jiang, J., Way, A. (<strong>2012</strong>). Hierarchical Phrase-Based MT for Phonetic Representation-Based Speech<br />

Translation. In Proceedings The Tenth Biennial Conference of the Association for Machine Translation in the Americas, San Diego, California<br />

Zhou, D., Lawless, S., Wade, V. (<strong>2012</strong>). Web Search Personalization Using Social Data. In Proceedings of the 16th International Conference on Theory and<br />

Practice of Digital Libraries (TPDL <strong>2012</strong>), Pafos, Cyprus<br />

Workshops and Conferences Hosted<br />

Date Event Title Location<br />

09/03/<strong>2012</strong> - 10/03/<strong>2012</strong> Workshop on Innovation and Applications in Speech Technology (IAST) University College Dublin<br />

11/03/<strong>2012</strong> <strong>CNGL</strong> Hadoop Hackathon Dublin City University<br />

16/05/<strong>2012</strong> - 17/05/<strong>2012</strong> <strong>CNGL</strong> Spring Scientific Committee Meeting (incorporating inaugural<br />

Innovation Charette)<br />

Chartered Accountants House, Dublin<br />

30/05/<strong>2012</strong> - 01/06/<strong>2012</strong> International Conference on Computational Creativity (ICCC) University College Dublin<br />

04/06/<strong>2012</strong> - 06/06/<strong>2012</strong> Workshop on Best Practices in Post-editing (in association with TAUS) at<br />

Localization World<br />

Paris, France<br />

13/06/<strong>2012</strong> - 15/06/<strong>2012</strong> LRC Summer School <strong>2012</strong> – Mobile App Localisation Carlton Castletroy Park Hotel, Limerick<br />

25/06/<strong>2012</strong> Trinity Access Programme ‘Editing Wikipedia’ Workshop Trinity College Dublin<br />

20/09/<strong>2012</strong> - 21/09/<strong>2012</strong> 17th <strong>Annual</strong> Localisation & Internationalisation Conference (LRC XVII) Carlton Castletroy Park Hotel, Limerick<br />

11/06/<strong>2012</strong> - 13/06/<strong>2012</strong> W3C Multilingual Web Workshop Trinity College Dublin<br />

08/10/<strong>2012</strong> - 09/10/<strong>2012</strong> International Workshop on Intelligent Exploration of Semantic Data (IESD)<br />

<strong>2012</strong> at 18th International Conference on Knowledge Engineering and<br />

Knowledge Management (EKAW<strong>2012</strong>)<br />

16/10/<strong>2012</strong> - 17/10/<strong>2012</strong> FEISGILTT <strong>2012</strong> (Federated Event for Interoperability Standardization<br />

in Globalization, Internationalization, Localization, and Translation<br />

Technologies)<br />

Galway, Ireland<br />

Seattle, USA<br />

11/1/<strong>2012</strong> Workshop on Monolingual Translation at AMTA <strong>2012</strong> San Diego, USA<br />

10/28/<strong>2012</strong> Workshop on Post-editing Technology and Practice (WPTP-12) at AMTA<br />

<strong>2012</strong><br />

08/11/<strong>2012</strong> - 11/11/<strong>2012</strong> International Postgraduate Conference in Translating and Interpreting<br />

(IPCITI)<br />

San Diego, USA<br />

Dublin City University<br />

24/11/<strong>2012</strong> The Multimodality and Cyberpsychology Conference Dublin City University<br />

09/12/<strong>2012</strong> Second Workshop on Applying Machine Learning Techniques to Optimise<br />

the Division of Labour in Hybrid MT (ML4HMT-12 WS and Shared Task)<br />

[<strong>CNGL</strong> co-organiser]<br />

08/12/<strong>2012</strong> -15/12/<strong>2012</strong> Machine Translation and Parsing in Indian Languages (MTPIL-<strong>2012</strong>) [<strong>CNGL</strong><br />

co-organiser]<br />

Mumbai, India<br />

Mumbai, India


134<br />

Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />

APPENDIX 2: OUTPUTS<br />

Equipment Valued at Over €50,000 Funded by <strong>CNGL</strong><br />

Item Price Description<br />

No equipment valued over €50,000 was purchased by <strong>CNGL</strong> in <strong>2012</strong><br />

Invention and Software Disclosures Filed<br />

Date Track Title<br />

25/01/<strong>2012</strong> ILT A Measurement method for detecting changes in tone-of-voice (voice quality) from recorded speech signals<br />

25/01/<strong>2012</strong> ILT AlignRank:An Evidence Propagation Algorithmfor Word Alignment<br />

12/03/<strong>2012</strong> LOC WorkFlow Recommender<br />

12/03/<strong>2012</strong> LOC LocConnect – Localisation Orchestration Framework<br />

12/03/<strong>2012</strong> LOC Localisation Knowledge Repository<br />

12/03/<strong>2012</strong> LOC XLIFF Phoenix<br />

12/03/<strong>2012</strong> LOC MT Mapper<br />

19/04/<strong>2012</strong> ILT IR Retrieval Model which combines the integrated Recommendation Results with IR retrieval results<br />

30/05/<strong>2012</strong> SF CAT Tool Instrumentation<br />

08/10/<strong>2012</strong> ILT Machine Translation Performance Predictor<br />

Patent Applications Submitted or Granted, and Licence Agreements Signed<br />

Date Title Application number Inventor Track Status<br />

30/05/<strong>2012</strong> Automatic Metadata Extraction from Multilingual<br />

Enterprise Content<br />

61/656,499 Melike Sah DCM US Provisional<br />

Licensed Technologies<br />

Licensed To Technology Track<br />

Xcelerator Data Visualisation Dashboard ILT<br />

Xcelerator Data Health Estimator for Machine Translation ILT<br />

Xcelerator Predictive Performance Estimator for Machine Translation ILT<br />

Welocalize A System for Tracking and Analysing Translator Behaviour in an Instrumented Post-editing Environment ILT<br />

Spinout Companies Created<br />

Company<br />

Emizar Customer Solutions Ltd.<br />

Incorporation Date 8th November 2011<br />

Registration # 505776<br />

Website<br />

www.emizar.com


Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 135<br />

Awards and Honours Received<br />

Name Award Body Award Type Date<br />

Prof. Josef van Genabith Dublin City University DCU President’s Research Award<br />

for Science and Engineering<br />

February <strong>2012</strong><br />

Dr. Martin Emms,<br />

Hector Franco-Penya<br />

Joachim Wagner (<strong>CNGL</strong>/DCU),<br />

Dr. Jennifer Foster (NCLT/DCU),<br />

Rasul Samad Zadeh Kaljahi (NCLT/Symantec),<br />

Dr. Anton Bryl (Systransis AG, formerly <strong>CNGL</strong>),<br />

Dr. Joseph Le Roux (Université Paris 13,<br />

formerly NCLT/DCU)<br />

International Conference on Pattern<br />

Recognition Application and Methods<br />

(ICPRAM <strong>2012</strong>)<br />

First Workshop on Syntactic Analysis of<br />

Non-Canonical Language (SANCL)<br />

Best Paper Award February <strong>2012</strong><br />

Shared Task Win June <strong>2012</strong><br />

Ben Steichen Localisation Research Centre Best Thesis Award September <strong>2012</strong><br />

Liliana Mamani Sanchez,<br />

Dr. Carl Vogel<br />

Debasis Ganguly<br />

Debasis Ganguly,<br />

Dr. Johannes Leveling,<br />

Dr. Gareth Jones<br />

3rd IEEE International Conference on<br />

Cognitive Infocommunications<br />

Morpheme Extraction Task (MET)<br />

of FIRE <strong>2012</strong><br />

Eighth ASIA Information Retrieval<br />

Societies Conference (AIRS <strong>2012</strong>)<br />

Steering Committee<br />

Best Paper Award December <strong>2012</strong><br />

Bengali (Best), Hindi (Second Best) December <strong>2012</strong><br />

AIRS’12 Best Poster Paper Award December <strong>2012</strong><br />

Media Coverage<br />

Date Media Outlet Event Headline Link<br />

05/01/<strong>2012</strong> Techcentral.ie Influence of LRC in<br />

attracting Cetra European<br />

base to Limerick<br />

05/01/<strong>2012</strong> Siliconrepublic.com Influence of LRC in<br />

attracting Cetra European<br />

base to Limerick<br />

05/01/<strong>2012</strong> Businessandfinance.ie Influence of LRC in<br />

attracting Cetra European<br />

base to Limerick<br />

05/01/<strong>2012</strong> Businessandleadership.com Influence of LRC in<br />

attracting Cetra European<br />

base to Limerick<br />

14/01/<strong>2012</strong> Limerick Leader Influence of LRC in<br />

attracting Cetra European<br />

base to Limerick<br />

14/01/<strong>2012</strong> Limerick Post Influence of LRC in<br />

attracting Cetra European<br />

base to Limerick<br />

Cetra to grow Limerick<br />

Operation (translation<br />

services company attracted<br />

by third level research)<br />

20 new jobs as Cetra rolls<br />

into town<br />

120 new jobs for Dublin and<br />

Limerick<br />

Cetra to locate European<br />

centre in Limerick creating<br />

20 jobs<br />

New Limerick firm translates<br />

into 20 positions<br />

Translation services<br />

company to create 20 jobs<br />

http://www.techcentral.ie/article.<br />

aspxid=18040<br />

http://www.siliconrepublic.com/careers-centre/<br />

item/25210-20-new-jobs-for-limerick-as/<br />

http://www.businessandfinance.ie/news/120ne<br />

wjobsfordublinandlimerick<br />

http://www.businessandleadership.com/<br />

business/item/33541-cetra-to-locate-european/<br />

Page 6<br />

Page 14<br />

24/01/<strong>2012</strong> Siliconrepublic.com Launch of <strong>CNGL</strong> careers<br />

guide<br />

Students urged to consider a<br />

career in localisation<br />

http://www.siliconrepublic.com/careers-centre/<br />

item/25466-students-urged-to-consider/<br />

24/01/<strong>2012</strong> Education Matters (www.<br />

educationmatters.ie)<br />

Launch of <strong>CNGL</strong> careers<br />

guide<br />

High demand for graduates<br />

in localisation<br />

http://www.educationmatters.ie/<strong>2012</strong>/01/24/<br />

high-demand-for-graduates-in-localisation/<br />

25/01/<strong>2012</strong> Scoop It! Language Blog<br />

(www.scoop.it)<br />

Launch of <strong>CNGL</strong> careers<br />

guide<br />

<strong>CNGL</strong> Localisation Careers<br />

http://www.scoop.it/t/translation-andlocalization/p/1050392099/cngl-localisationcareers


136<br />

Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />

APPENDIX 2: OUTPUTS<br />

Date Media Outlet Event Headline Link<br />

25/01/<strong>2012</strong> www.science.ie Launch of <strong>CNGL</strong> careers<br />

guide<br />

A world of opportunities in<br />

localisation<br />

http://www.science.ie/science-news/<br />

opportunities-in-localisation.html<br />

25/01/<strong>2012</strong> Waterford Institute of<br />

Technology website (www.<br />

wit.ie)<br />

Launch of <strong>CNGL</strong> careers<br />

guide<br />

Graduates in high demand<br />

in Ireland’s 16,000-job<br />

localisation sector<br />

http://www.wit.ie/News/News/<br />

MainBody,48355,en.html<br />

26/01/<strong>2012</strong> Lingport i18nlog Launch of <strong>CNGL</strong> careers<br />

guide<br />

<strong>CNGL</strong> Launches Localization<br />

Careers Guide<br />

http://i18nblog.com/<strong>2012</strong>/01/26/cngllaunches-localization-careers-guide/<br />

26/01/<strong>2012</strong> Gradireland.com (Ireland’s<br />

official graduate jobs and<br />

careers website)<br />

Launch of <strong>CNGL</strong> careers<br />

guide<br />

Localisation – a growth area<br />

and a career opportunity<br />

http://gradireland.wordpress.com/<strong>2012</strong>/01/26/<br />

localisation-a-growth-area-and-a-careeropportunity/<br />

26/01/<strong>2012</strong> www.mysciencecareer.ie Launch of <strong>CNGL</strong> careers<br />

guide<br />

01/02/<strong>2012</strong> Education Matters ezine Launch of <strong>CNGL</strong> careers<br />

guide<br />

01/02/<strong>2012</strong> Siliconrepublic.com All Ireland Linguistics<br />

Olympiad<br />

01/02/<strong>2012</strong> egovmonitor.com Launch of <strong>CNGL</strong> careers<br />

guide<br />

06/02/<strong>2012</strong> Evening Echo Launch of <strong>CNGL</strong> careers<br />

guide<br />

08/02/<strong>2012</strong> Irish Independent Fostering foreign language<br />

skills<br />

Ireland becoming a global<br />

expert in localisation<br />

High demand for graduates<br />

in localisation area<br />

AILO fosters next generation<br />

of Irish computational<br />

linguists<br />

“Ireland Is Recognised As A<br />

Leader In The Localisation<br />

And Global Services Sector<br />

But We Need To Do More”<br />

– Sherlock<br />

With technology and a<br />

second language you will be<br />

a professional in demand<br />

Teaching languages at<br />

primary level will be a key to<br />

our economic future<br />

http://www.mysciencecareer.ie/resources/newsand-events/localisation-in-ireland<br />

http://www.siliconrepublic.com/innovation/<br />

item/25584-skillsfeb/<br />

http://www.egovmonitor.com/node/46072<br />

Page 33<br />

Page 15<br />

08/02/<strong>2012</strong> Irish Independent website<br />

(www.independent.ie)<br />

Fostering foreign language<br />

skills<br />

Teaching languages at<br />

primary level will be a key to<br />

our economic future<br />

http://www.independent.ie/lifestyle/education/<br />

features/in-my-opinion-teaching-languagesat-primary-level-will-be-a-key-to-our-economicfuture-3012676.html<br />

09/02/<strong>2012</strong> Tipperary Star All Ireland Linguistics<br />

Olympiad<br />

14/02/<strong>2012</strong> Roscommon Herald All Ireland Linguistics<br />

Olympiad<br />

All Ireland Linguistics<br />

Olympiad<br />

Budding Strokestown<br />

linguists seek to decode the<br />

languages of the world<br />

Page 15<br />

Page 53<br />

07/03/<strong>2012</strong> Dublin City of Science<br />

website (www.<br />

dublinscience<strong>2012</strong>.ie)<br />

All Ireland Linguistics<br />

Olympiad<br />

All Ireland Linguistics<br />

Olympiad<br />

http://www.dublinscience<strong>2012</strong>.ie/<strong>2012</strong>/03/allireland-linguistics-olympiad/<br />

07/03/<strong>2012</strong> Evening Echo All Ireland Linguistics<br />

Olympiad<br />

09/03/<strong>2012</strong> Céist website (www.ceist.ie) All Ireland Linguistics<br />

Olympiad<br />

13/02/<strong>2012</strong> Techcentral.ie ComputeTY transition year<br />

programme<br />

Students have strategy to<br />

solve problems<br />

All Ireland Linguistics<br />

Olympiad (AILO)<br />

Transition year students<br />

decode Web design<br />

http://www.ceist.ie/news_events/view_article.<br />

cfmloadref=2&id=595<br />

http://www.techcentral.ie/article.<br />

aspxid=18301&utm_source=TechCentral<br />

+newsletter&utm_campaign=4755350324-<br />

13_022_13_<strong>2012</strong>&utm_<br />

medium=email#ixzz1mGomGpL0


Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 137<br />

Date Media Outlet Event Headline Link<br />

14/03/<strong>2012</strong> Sunday Business Post<br />

website (www.businesspost.<br />

ie)<br />

Xcelerator/DCU licence<br />

Startup of the day:<br />

Xcelerator<br />

http://www.businesspost.ie/#!story/Home/<br />

News/Startup+of+the+day%3A+Xcelerator/<br />

id/86478410-84f6-07f6-0e90-4e777652148<br />

18/03/<strong>2012</strong> Sunday Business Post Xcelerator/DCU licence Startup of the day:<br />

Xcelerator<br />

22/03/<strong>2012</strong> Guideline Magazine Next Generation<br />

Localisation Careers<br />

21/03/<strong>2012</strong> Irish Examiner All Ireland Linguistics<br />

Olympiad<br />

21/03/<strong>2012</strong> Irish Examiner Online All Ireland Linguistics<br />

Olympiad<br />

Graduates in high demand<br />

in Ireland’s localisation<br />

sector<br />

Pupils pit wits against<br />

language puzzles<br />

Pupils pit wits against<br />

language puzzles<br />

Cover & Page 3<br />

http://www.irishexaminer.com/ireland/pupilspit-wits-against-language-puzzles-187781.html<br />

27/03/<strong>2012</strong> Roscommon Herald All Ireland Linguistics<br />

Olympiad<br />

AILO Olympiad Page 55<br />

28/03/<strong>2012</strong> Irish Independent Language Advocacy Adios Espanol – Quinn<br />

dumps languages in primary<br />

schools (Cara Greene<br />

comment)<br />

Page 17<br />

29/03/<strong>2012</strong> South Tipp Today All Ireland Linguistics<br />

Olympiad<br />

29/03/<strong>2012</strong> Tipperary Star All Ireland Linguistics<br />

Olympiad<br />

30/03/<strong>2012</strong> www.sam-xlation.de SAM Xlation GbmH tests<br />

KantanMT product of <strong>CNGL</strong><br />

spinout Xcelerator<br />

02/04/<strong>2012</strong> Education Magazine Next Generation<br />

Localisation Careers<br />

School Ruain student in<br />

Linguistics Olympiad<br />

Scoil Ruain Student in<br />

Linguistics Olympiad<br />

Machine Translation Testing<br />

Graduates in high demand<br />

in Ireland’s localisation<br />

sector<br />

Page 31<br />

Page SS 3<br />

http://www.sam-xlation.de/index.php/de/aktue<br />

lles#MachineTranslationTesting<br />

Pages 12-13<br />

22/04/<strong>2012</strong> LANGTECHNEWS Innovation Voucher<br />

collaboration with Cipherion<br />

Translations<br />

Irish localisation company to<br />

add MT, crowd-sourcing and<br />

gamification<br />

24/04/<strong>2012</strong> Irish Times Insight<br />

supplement<br />

30/04/<strong>2012</strong> Department of Jobs,<br />

Enterprise & Innovation<br />

website (http://www.<br />

enterprise.gov.ie)<br />

Sign Language Machine<br />

Translation<br />

wripl winning pitch at Get<br />

Started Technology Venture<br />

Programme<br />

Lost in translation Page 13<br />

SFI-funded scientists head to<br />

Silicon Valley<br />

http://www.enterprise.gov.ie/News/Irish_<br />

researchers_secure_coveted_prize_of_trip_to_<br />

Silicon_Valley_.html<br />

30/04/<strong>2012</strong> Techcentral.ie wripl winning pitch at Get<br />

Started Technology Venture<br />

Programme<br />

30/04/<strong>2012</strong> TechCentral ezine wripl winning pitch at Get<br />

Started Technology Venture<br />

Programme<br />

Irish researchers secure trip<br />

to Silicon Valley<br />

Irish researchers secure trip<br />

to Silicon Valley<br />

http://www.techcentral.ie/article.<br />

aspxid=18832<br />

30/04/<strong>2012</strong> www.studentnews.ie wripl winning pitch at Get<br />

Started Technology Venture<br />

Programme<br />

Irish science researchers<br />

land key trip to Silicon Valley<br />

to meet technology chiefs<br />

http://langtechnews.hivefire.com/<br />

articles/146423/irish-localisation-company-toadd-mt-crowd-sourcin/<br />

http://www.studentnews.ie/irish-scienceresearchers-land-key-trip-to-silicon-valley-tomeet-technology-chiefs-5724<br />

April/<br />

May <strong>2012</strong><br />

edition<br />

Multilingual Magazine Localistion standards The localization standards<br />

ecosystem (article by Dr<br />

David Filip, <strong>CNGL</strong> at UL)


138<br />

Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />

APPENDIX 2: OUTPUTS<br />

Date Media Outlet Event Headline Link<br />

03/05/<strong>2012</strong> Department of Jobs,<br />

Enterprise & Innovation<br />

website (http://www.<br />

enterprise.gov.ie)<br />

All Ireland Linguistics<br />

Olympiad<br />

Double Gold for Belfast<br />

Schools in All Ireland<br />

Linguistics Olympiad<br />

http://www.enterprise.gov.ie/News/Double_<br />

Gold_for_Belfast_Schools_in_All_Ireland_<br />

Linguistics_Olympiad.html<br />

03/05/<strong>2012</strong> Roscommon Herald All Ireland Linguistics<br />

Olympiad<br />

All-Ireland Linguistics Final Page SS 6<br />

12/05/<strong>2012</strong> South Belfast News All Ireland Linguistics<br />

Olympiad<br />

17/05/<strong>2012</strong> Northern Standard All Ireland Linguistics<br />

Olympiad<br />

Olympiad gold for<br />

Wellington team<br />

Photo: Mrs Geraldine Kelly<br />

making a presentation to<br />

Zoe Vance for her success<br />

in the Linguistics Olympiad<br />

© Rory Geary/Northern<br />

Standard<br />

Page 17<br />

Page 24<br />

24/05/<strong>2012</strong> Department of Jobs,<br />

Enterprise & Innovation<br />

website (http://www.<br />

enterprise.gov.ie)<br />

Google parsing challenge<br />

DCU-Paris 13 Team excels in<br />

Google parsing challenge<br />

http://www.enterprise.gov.ie/News/DCU-<br />

Paris_13_Team_excels_in_Google_Parsing_<br />

Challenge.html<br />

31/5/012 Ballincollig Today All Ireland Linguistics<br />

Olympiad<br />

31/05/<strong>2012</strong> Mid Cork Today All Ireland Linguistics<br />

Olympiad<br />

Photo: Among the<br />

winners at the Ballincollig<br />

Community School’s <strong>Annual</strong><br />

Awards Night was Grainne<br />

Hutchinson (Ovens),<br />

bronze award at All Ireland<br />

Linguistics Olympiad<br />

Photo: Among the<br />

winners at the Ballincollig<br />

Community School’s <strong>Annual</strong><br />

Awards Night was Grainne<br />

Hutchinson (Ovens),<br />

bronze award at All Ireland<br />

Linguistics Olympiad<br />

Page 8<br />

Page 8<br />

13/06/<strong>2012</strong> Silicon Republic (www.<br />

siliconrepublic.com)<br />

LRC Summer School<br />

Irish mobile app developers<br />

urged to localise their apps<br />

http://www.siliconrepublic.com/new-media/<br />

item/27729-irish-mobile-app-developers/<br />

13/06/<strong>2012</strong> Techcentral.ie LRC Summer School Irish mobile app developers<br />

must think global, says LRC<br />

http://www.techcentral.ie/19122/irishmobile-app-developers-must-think-global-sayslrc#ixzz1xfU9YK00<br />

13/06/<strong>2012</strong> TechCentral ezine LRC Summer School Irish mobile app developers<br />

must think global, says LRC<br />

13/06/<strong>2012</strong> Department of Jobs,<br />

Enterprise & Innovation<br />

website (http://www.<br />

enterprise.gov.ie)<br />

LRC Summer School<br />

Irish mobile app developers<br />

must think global, says<br />

Localisation Research Centre<br />

http://www.enterprise.gov.ie/News/Irish_<br />

mobile_app_developers_must_think_global_<br />

says_Localisation_Research_Centre.html<br />

13/06/<strong>2012</strong> Polish Interpreting (www.<br />

polish-interpreting.co.uk)<br />

LRC Summer School<br />

Irish mobile app developers<br />

urged to localise …<br />

http://polish-interpreting.co.uk/<strong>2012</strong>/06/13/<br />

irish-mobile-app-developers-urged-to-localise/<br />

14/06/<strong>2012</strong> Silicon Republic (www.<br />

siliconrepublic.com)<br />

W3C Multilingual Web<br />

Workshop<br />

Internet experts in Dublin to<br />

talk about multilingual web<br />

http://www.siliconrepublic.com/innovation/<br />

item/27759-internet-experts-in-dublin/<br />

14/06/<strong>2012</strong> Department of Jobs,<br />

Enterprise & Innovation<br />

website (http://www.<br />

enterprise.gov.ie)<br />

W3C Multilingual Web<br />

Workshop<br />

<strong>CNGL</strong> researchers at<br />

heart of efforts to facilitate<br />

Internationalisation of Web<br />

http://www.enterprise.gov.ie/News/<strong>CNGL</strong>_<br />

researchers_at_heart_of_efforts_to_facilitate_<br />

Internationalisation_of_Web.html<br />

17/06/<strong>2012</strong> Sunday Business Post Xcelerator/DCU<br />

collaboration<br />

Translation is finally brought<br />

up to speed


Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong> 139<br />

Date Media Outlet Event Headline Link<br />

30/06/<strong>2012</strong> Limerick Leader – County<br />

Edition<br />

Influence of LRC in<br />

attracting Cetra European<br />

base to Limerick<br />

Cetra Ireland’s new office Page 20<br />

30/06/<strong>2012</strong> Limerick Leader Influence of LRC in<br />

attracting Cetra European<br />

base to Limerick<br />

Cetra Ireland’s new office Page 20<br />

30/06/<strong>2012</strong> Limerick Leader West<br />

Edition<br />

Influence of LRC in<br />

attracting Cetra European<br />

base to Limerick<br />

Cetra Ireland’s new office Page 20<br />

11/07/<strong>2012</strong> Multilingual E-Zine Xcelerator/DCU<br />

collaboration<br />

02/08/<strong>2012</strong> East Cork Journal International Linguistics<br />

Olympiad<br />

Commercialisation Fund<br />

Project<br />

Cork participating in<br />

International Linguistics<br />

Olympiad<br />

http://www.multilingual.com/<br />

mlNewsArchiveDetail.phpid=2521<br />

Page 16<br />

03/08/<strong>2012</strong> Silicon Republic (www.<br />

siliconrepublic.com)<br />

International Linguistics<br />

Olympiad<br />

Four Irish students in<br />

Slovenia to battle it out in<br />

Linguistics Olympiad<br />

http://www.siliconrepublic.com/innovation/<br />

item/28660-four-irish-students-in/<br />

03/08/<strong>2012</strong> World Irish (www.worldirish.<br />

com)<br />

International Linguistics<br />

Olympiad<br />

Four Irish Students Compete<br />

in International Linguistics<br />

Olympiad in Slovenia<br />

http://m.worldirish.com/listening-post/view/<br />

four-irish-students-compete-in-internationallinguistics-olympiad-in-slovenia-1641<br />

10/09/<strong>2012</strong> Silicon Republic (www.<br />

siliconrepublic.com)<br />

Innovation Showcase<br />

<strong>CNGL</strong> Localisation<br />

Innovation Showcase <strong>2012</strong><br />

http://www.siliconrepublic.com/events/<br />

event/2859-cngl-localisation-in<br />

11/09/<strong>2012</strong> Silicon Republic (www.<br />

siliconrepublic.com)<br />

LRC Conference<br />

Localisation conference in<br />

Limerick to focus on social<br />

trends<br />

http://www.siliconrepublic.com/innovation/<br />

item/29199-localisation-conference-in/<br />

13/09/<strong>2012</strong> www.newswhip.com LRC Conference Localisation conference in<br />

Limerick to focus on social<br />

trends<br />

21/09/<strong>2012</strong> Irish Independent Language Advocacy Only one in 25 primary<br />

pupils learn a language<br />

21/09/<strong>2012</strong> Limerick Post LRC Conference Twitter trends to aid<br />

translation<br />

http://www.newswhip.com/MoreInfo/<br />

Localisation-conference-in-Limerick-to-f/7480567<br />

11<br />

Page 86<br />

21/09/<strong>2012</strong> Galway City Tribune KantanMT spinout<br />

recruitment drive<br />

Cloud-based operation<br />

seeks people ‘hungry for a<br />

challenge’<br />

Page 10<br />

25/09/<strong>2012</strong> Tech Central (www.<br />

techcentral.ie)<br />

META-NET White Paper<br />

Most European languages<br />

not ready for ‘digital age’<br />

http://www.techcentral.ie/article.<br />

aspxid=19947<br />

25/09/<strong>2012</strong> Silicon Republic (www.<br />

siliconrepublic.com)<br />

Qun Liu joins <strong>CNGL</strong><br />

Prof Qun Liu, Professor Of<br />

Machine Translation<br />

http://www.siliconrepublic.com/careers/<br />

appointments/984-prof-qun-liu-centre-for<br />

26/09/<strong>2012</strong> The Sociable (http://<br />

sociable.co)<br />

META-NET White Paper<br />

Most European languages<br />

“unlikely to survive in the<br />

digital age”<br />

http://sociable.co/technology/most-europeanlanguages-unlikely-to-survive-in-the-digital-age/<br />

26/09/<strong>2012</strong> Multilingual E-Zine Qun Liu joins <strong>CNGL</strong> Centre for Next Generation<br />

Localisation appoints<br />

Professor of Machine<br />

Translation<br />

26/09/<strong>2012</strong> Gaelport META-NET White Paper Bagairt don Ghaeilge sa ré<br />

dhigiteach<br />

http://www.multilingual.com/<br />

mlNewsArchiveDetail.phpid=2526#8441<br />

http://www.gaelport.com/<br />

nuachtNewsItemID=8677


140<br />

Centre for Next Generation Localisation <strong>Annual</strong> <strong>Report</strong> <strong>2012</strong><br />

APPENDIX 2: OUTPUTS<br />

Date Media Outlet Event Headline Link<br />

27/09/<strong>2012</strong> Radio na Gaeltachta META-NET White Paper Cormac ag a cuig http://www.rte.ie/radio/radioplayer/<br />

rteradioweb.html#!rii=17%3A3402740%3A1159<br />

8%3A27%2D09%2D<strong>2012</strong>%3A<br />

28/09/<strong>2012</strong> Newstalk – Splanc META-NET White Paper Agallamh le Ailbhe Ní<br />

Chasaide<br />

30/09/<strong>2012</strong> The Sunday Times META-NET White Paper Briefing Digital Irish: Lost for<br />

Words<br />

http://www.newstalk.ie/programmes/all/<br />

splanc/<br />

Page 16<br />

October/<br />

November<br />

<strong>2012</strong> Issue<br />

Multilingual Magazine Localisation Localization for the long tail:<br />

Part 1 (article by Dr David<br />

Filip, <strong>CNGL</strong> at UL)<br />

03/10/<strong>2012</strong> Siliconrepublic.com META-NET White Paper Irish language at risk of<br />

digital extinction, research<br />

shows<br />

19/11/<strong>2012</strong> South East Radio Cipherion Translations Mark Rodgers of Cipherion<br />

Translations on fruits of<br />

collaboration with <strong>CNGL</strong> at<br />

DCU (17 mins 45 secs)<br />

http://www.siliconrepublic.com/innovation/<br />

item/29483-irish-language-at-risk-of/<br />

https://www.youtube.com/<br />

watchv=zEhEPaPzZXU<br />

December<br />

<strong>2012</strong> Issue<br />

Multilingual Magazine Localisation Localization for the long tail:<br />

Part 2 (article by Dr David<br />

Filip, <strong>CNGL</strong> at UL)<br />

02/12/<strong>2012</strong> Sunday Business Post Emizar Emizar<br />

19/12/<strong>2012</strong> Multilingual E-Zine LORG parser LORG natural language<br />

parser<br />

http://www.multilingual.com/<br />

mlNewsArchiveDetail.phpid=2532#8521<br />

25/12/<strong>2012</strong> Antrim Times All Ireland Linguistics<br />

Olympiad<br />

Successful year for Antrim<br />

Grammar<br />

Page 6


Centre for Next Generation Localisation<br />

Dublin City University<br />

Dublin 9, Ireland<br />

Tel: +353-1-700 6700<br />

Fax: +353-1-700 6702<br />

Email: info@cngl.ie<br />

www.cngl.ie

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!