Dan Cristea
Research Activity

My main topics of research are: workflows for natural language processing, discourse structure, incremental discourse parsing, the relationship between structure and reference, anaphora resolution, computational lexicography, annotated textual resources and applications involving processing language.

In the past I have also been focused on problems related to lexical semantics, WordNet and applications of wordnets, and aspects of language evolution.

The group of people with whom I worked over time, ever renewed, since my students move so quickly as soon as they graduate, get a master or a PhD degree, is known as the NLP-Group@UAIC-FII. I am glad to say that in the last years a number of experienced researches from other institutions have also joined our group. Its pages can be accessed here.

The Consortium for the Romanian Language: Resources & Tools (in Romanian: Consortiul de Informatizare pentru Limba Romana - ConsILR) represents an initiative which aims to facilitate and augment the efforts of linguists and computer science researchers working on Romanian language by promoting software tools and resources for linguistic processing. The ConsILR conferences, a series of events organised yearly since 2001, are aimed to promote the research on resources and tools dedicated to natural language, with a special emphasis on Romanian.

Research projects

Following is a selection of my most representative projects.

Colours' code:

  • Magenta - projects coordinated by me
  • Blue - projectes in which I have been responsible, representing UAIC
  • Black - projects in which I have been a simple member
  • 2014-2018 (ongoing): E-READ: Evolution of Reading in the Age of Digitization - official site - a COST action IS1404 aiming to improve scientific understanding of the implications of digitization on reading and help individuals, disciplines, societies and sectors across Europe to cope optimally with these effects.

  • 2013-2017 (ongoing): PARSEME: PARSing Multi-word Expressions - official site - a COST action aiming to gather interdisciplinary experts (linguists, computational linguists, computer scientists, psycholinguists, and industrials) from 30 countries, representing 29 languages and 6 dialects belonging to 10 language families, with the goal to study the role of multi-word expressions in different parsing frameworks: CCG (Combinatory Categorial Grammar), DG (Dependancy Grammar), GG (Generative Grammar), HPSG (Head-driven Phrase Structure Grammar), LFG (Lexical Functional Grammar), TAG (Tree Adjoining Grammar), etc. It addresses different methodologies (symbolic, probabilistic and hybrid parsing) and language technology applications (machine translation, information retrieval).

  • 2014-2016 (ongoing): MappingBooks - Let's jump in the book! - A project financed by the Romanian Ministry of Education and Research (UEFISCDI) under the Partnerships Programme (PN II Parteneriate, competition PCCA 2013), project code: PN-II-PT-PCCA-2013-4-1878. The project develops a new type of electronic product with a potential high impact in education. The technology makes heavy use of natural language processing, web cartography, web mapping, mixed reality techniques and ambient intelligence. Toponyms and other mentions of interest to the reader, contained in the book, are supplemented with different types of information, diagrams, graphical data, links to virtual sites.

  • Partners:
    • "Alexandru Ioan Cuza" University of Iasi
    • SIVECO S.A. Bucharest
    • "Stefan cel Mare" University of Suceava
    Value for UAIC: 569,170 RON (129,357 EUR).

  • 2013-2017 (ongoing): ENeL - European Network of e-Lexicography - official site - a COST action aiming to establish an European network of lexicographers and computer scientists interested: to give users easier access to dictionaries in electronic form, to organise a systematic exchange of expertise and common standards and solutions in the representation of lexicographic resources, and to develop a common approach to e-lexicography for fully embracing the pan-European nature of much of the vocabularies of the languages spoken in Europe.

  • 2010-2013: ATLAS - Applied Technology for Language-Aided CMS - official site. A project funded by the European Commission under the ICT Policy Support Programme, Grant Agreement 250467. ATLAS' goals were to establish an innovative software platform providing three online services for heterogeneous multilingual content management, equipped with natural language processing capabilities, including automatic annotation, summarisation, categorisation and machine translation. The collaborative, user-oriented, shared and interoperable services are:
    • i-Publisher: Automatic processing of the Web content (categorisation, summarisation, annotation etc.);
    • i-Librarian: The ability to easily create, organise and publish various types of documents;
    • EUDocLib: A publicly accessible repository of EU documents, providing enhanced navigation and easier access to relevant documents in the user's language.
    • The technology was developed for 6 languages: Bulgarian, English, German, Greek, Polish and Romanian.
    • Tetracom Interactive Solutions, Sofia - coordinator
    • German Institute for Artificial Intelligence, Saarbruecken
    • Atlantis Consulting SA, Athens
    • Institute for Bulgarian Language, Sofia
    • Institute of Computer Science, Polish Academy of Sciences, Warsaw
    • University of Hamburg
    • "Alexandru Ioan Cuza" University of Iasi
    • University of Zagreb
    • Institute of Technologies and Development Foundation, Sofia
    Value for UAIC: 98,341 EUR.

  • 2011-2013: METANET4U - official site - an ICT-PSP project, Grant Agreement 270893. METANET4U was part of the META-NET Network of Excellence, a cluster of projects aiming at fostering the mission of META (the Multilingual Europe Technology Alliance), dedicated to building the technological foundations of a multilingual European information society. The goal of METANET4U was to contribute to the establishment of a pan-European digital platform that makes available language resources and services, datasets and software tools, for speech and language processing, and to support a new generation of exchange facilities for them. All resources and tools gathered during the project, documented and updated, were delivered through the network of open digital exchange platforms META-SHARE. Partners:
    • Faculty of Sciences, University of Lisbon - coordinator
    • Instituto Superior Tecnico, Lisbon
    • University of Manchester
    • "Alexandru Ioan Cuza" University of Iasi
    • Research Institute for Artificial Intelligence, Romanian Academy, Bucharest
    • University of Malta
    • Technical University of Catalonia
    • Universitat Pompeu Fabra, Barcelona
    Value for UAIC: 242,338 EUR.

  • 2008-2011: CLARIN - Common Language Resources and Technology Infrastructure - official site. An European initiative committed to establish an integrated and interoperable research infrastructure of language resources (all knowledge sources based on language, written or spoken) and its technology (tools to carry out operations on such language material). Features of this technology are:
    • integration: the resource and service centres are connected via Grid technology and form a virtually integrated domain;
    • interoperability: the resources and services will be based on Semantic Web technologies to overcome format, structure and terminological differences;
    • stability: the resources and services are offered with a high availability;
    • persistency: the resources and services are planned to be accessible for many years so that researchers can rely on them;
    • accessibility: the resources and services are accessible via the web; different access methods and training possibilities are offered tailored to the needs of the communities making use of them;
    • extendability: the infrastructure is open so that new resources and services can be added easily.
    Value for UAIC: 77,961 EUR.

  • 2008-2011: ALEAR - Artificial Language Evolution on Autonomous Robots - official site. An FP7 project aiming the achievement of open-ended cognitive development and open-ended verbal dialogues among fully embodied situated agents (humanoid robots, mechanisms which include sensori-motor intelligence, scripts for establishing the turn-taking interaction among them, perceptual processes, processes that perform the conceptualisation of what to say, the expression of these conceptualisations in language and processes that perform the parsing of sentences and their interpretation in sensori-motor experience). ALEAR proved that humanoid robots may evolve their own artificial languages adapted to the environment and task settings in which they are placed. Partners:
    • Humboldt University, Berlin - coordinator
    • SONY CSL - Paris
    • Osnabruck University
    • Autonomous University of Barcelona
    • Vrije Universiteit Brussel - Brussels
    • "Alexandru Ioan Cuza" University of Iasi
    Value for UAIC: 222,781 EUR.

  • 2009-2011: ALEAR-RO (Artificial Language Evolution on Autonomous Robots) - a mirror project sponsored by the Romanian Ministry of Research.

    Value for UAIC: 278,155 RON. (57,477 EUR)

  • Sept. 2007 - Dec. 2010: eDTLR - The Thesaurus Dictionary of Romanian Language in Digital Form - official site: building the electronic version of the biggest Romanian dictionary, edited and printed by the Romanian Academy between 1913 and 2010. The two series of the Dictionary, the Dictionary of the Academy (DA) and the Dictionary of the Romanian Language (DLR) include 36 volumes, more than 15,000 pages, about 175,000 entries and more than 1,300,000 examples. The creation of the electronic version went through the following steps: scanning, OCR, proofreading (first volunteers in a collaborative effort, then experts), parsing of the dictionary entries, building the database (as XML TEI-P5 files), scanning of sources (over 2,500 volumes) and building of indexing and browsing mechanisms. Partners:
    • Faculty of Computer Science of the "Alexandru Ioan Cuza" University of Iasi - coordinator
    • Institute of Linguistics "Iorgu Iordan - Alexandru Rosetti", Romanian Academy, Bucharest
    • Institute of Romanian Philology ""Alexandru Philippide", Romanian Academy, Iasi
    • Institute of Literary History "Sextil Puscariu", Romanian Academy, Cluj-Napoca
    • Research Institute for Artificial Intelligence, Romanian Academy, Bucharest
    • Research Institute of Computer Science, Romanian Academy, Iasi
    • Faculty of Letters of the "Alexandru Ioan Cuza" University, Iasi
    Value for UAIC: 346,065 RON (80,480 EUR).

  • 2007-2010: SIR-RESDEC - Open Domain Question Answering System for Romanian and English. The project developed an advanced, interlingual and parametrisable system for interpretation of questions expressed in Romanian and answering them in English and Romanian. Questions are relative to a dynamic collection of documents, containing an arbitrary number of texts. The domains chosen as case studies have been the legislative domain and the bioinformatics domain. Partners:
    • Research Institute for Artificial Intelligence, Romanian Academy, Bucharest - coordinator
    • Central Institute for Informatics, Bucharest
    • Faculty of Computer Science of the "Alexandru Ioan Cuza" University of Iasi
    Value for UAIC: 246,520 RON (57,330 EUR).

  • Since 2008: Institutional Member of FLaReNet - Fostering Language Resources Network - official site: a network of excellence (in eContentPlus Programme, grant agreement no. ECP-2007-LANG-617001) with the mission to identify priorities, short, medium, and long-term strategic objectives and provide consensual recommendations in the form of a plan of action for EC, national organisations and industry.

    At the end of the project (31 August 2011) FLaReNet counted 38 partners, 99 institutional members, 25 support groups and 400 individual subscribers from all over the world.

  • 2007-2010: COST A31 - Stability and Adaptation of Classification Systems in a Cross-Cultural Perspective - official site. Coordinator: CNRS Paris.

  • 2006-2008: LT4eL - Language Technology for eLearning - official site: an FP6 project that aimed to apply multilingual language technology tools and semantic web techniques for improving the retrieval of learning material. The developed technology facilitate personalized access to knowledge within learning management systems and support decentralisation and co-operation in content management. The LT4eL technology has been developed for 9 languages: Bulgarian, Czech, Dutch, English, German, Maltese, Portuguese, Polish and Romanian. Partners:
    • University of Hamburg - coordinator
    • "Alexandru Ioan Cuza" University of Iasi
    • University of Lisbon
    • Charles University, Prague
    • Institute of Parallel Processing of Information, Bulgarian Academy of Sciences, Sofia
    • Eberhard Karls University, Tuebingen
    • Institute of Computer Science, Polish Academy of Sciences, Warsaw
    • School of Communication, Winterthur, Switzerland
    • University of Malta
    • University of Koeln
    • The Open University, Milton Keynes, UK
    Value for UAIC: 116,155 EUR.

  • 2006-2008: RolTech - Romanian Language Technologies - official site: an INTAS project aimed to acquire electronic resources for the Romanian language, to develop Romanian language processing tools, and to create applications based on these resources. Partners:
    • Faculty of Computer Science of the "Alexandru Ioan Cuza" University of Iasi - coordinator
    • Institute for Computer Science of the Moldavian Academy of Sciences, Chisinau
    • University of Sheffield
    Value for UAIC: 7,572 EUR.

  • 2006-2008: ROTEL - Intelligent Systems for Semantic Web, Based on Ontology Logics and Natural Language Technologies. Applications for Romanian - official site. A project financed by the Romanian Ministry of Education and Research. Partners:
    • Central Institute for Informatics, Bucharest - coordinator
    • Research Institute for Artificial Intelligence, Romanian Academy, Bucharest
    • Faculty of Computer Science of the "Alexandru Ioan Cuza" University of Iasi
    Value for UAIC: 330,339 RON (76,823 EUR).

  • 2006-2008: InterOb - creation of a three-dimensional model of the human head, capable of expressing emotions. The chosen approach was to simulate the physical properties of the anatomical components that are part of the human head: skeleton, muscles, skin. Partners:
    • "Stefan cel Mare" University of Suceava - coordinator
    • Faculty of Computer Science of the "Alexandru Ioan Cuza" University of Iasi
    Value for UAIC: 192,666 RON (44,806 EUR).

  • 2006-2008: e-MANAGE - Stabilirea si adaptarea sistemelor de clasificare dintr-o perspectiva cros-culturala"

    Value for UAIC: 90,000 RON (20,930 EUR).

  • 2004-2007: Knowledge Web - a network of excellence aiming to foster the creation of Semantic Web (UAIC-FII has been invited as a non-financed member).

  • 2004-2005: Enhancing prosodic aspects in Romanian Text-To-Speech Synthesis (a CNCSIS grant). Partners:
    • Romanian National Institute of Inventics, Iasi - coordinator
    • Faculty of Computer Science of the "Alexandru Ioan Cuza" University of Iasi - coordinator
  • 2004-2005: Studies regarding the acquisition of the Dictionary of Romanian Language in electronic form (a CNCSIS grant) Partners:
    • Institute of Philology and Folclore "Alexandru Philippide", Iasi - coordinator
    • Faculty of Computer Science of the "Alexandru Ioan Cuza" University of Iasi
  • 2001-2004: IST-2000 29388 Balkanet - official site -, the project that has built a network of wordnets for 5 Balkan languages (Bulgarian, Greek, Romanian, Serbian and Turkish), aligned with the English WordNet and the Czech WordNet. When finished, the Romanian WordNet included 25,000 synsets (sets of synonym senses of words). The partner RACAI (Romanian Academy) continued to finance the development of the Ro-WN after the end of the project (it includes now almost 60,000 synsets; visit the RACAI online browser). Partners:
    • Databases Laboratory, University of Patras - coordinator
    • Computer Technology Institute, Athens
    • "Alexandru Ioan Cuza" University of Iasi
    • Research Institute for Artificial Intelligence, Romanian Academy, Bucharest
    • Institute of Bulgarian language, Sofia
    • Sabanci University, Istanbul
    • Faculty of Informatics, Mararyk University, Brno
    • Memodata, Caen
    • University of Plovdiv
    • University of Athens
    Value for UAIC: aprox. 200,000 EUR.

  • 2001-2004: BALKANET-CORINT Creation and development of a multilingual wordnet of the Balkan languages (a mirror project sponsored by the Romanian Ministry of Research). Partners:
    • RACAI, Bucharest
    • UAIC-FII, Iasi
  • 1998-2001: TELRI II - official site. Objectives (extracts from the oficial TELRI II web page): to strengthen the pan-European infrastructure for the multilingual language research and development community; to collect, promote, and make available monolingual and multilingual language resources and tools for the extraction of language data and linguistic knowledge; to offer a customized comprehensive service to academic and industrial users; to prepare and organize research and development projects focusing on translation aids, multilingual authoring systems, information retrieval, etc. Coordinator: University of Mannheim.

  • 1998-1999: ELAN (an inception of TELRI) - an early initiative aiming to create/reinforce international standards by conforming a significant part of the data of their members to a common format, to design a common query language and to operate on such a basis an experimental service network that will make accessible a large stock of electronic resources and that will follow awareness-raising policies. Among partners: University of Liege, Institute of Deutcher Spracher - Mannheim, etc.

  • 1995-1998: TELRI I - Trans-European Language Resources Infrastructure - official site. Objectives (extracts from the oficial web page): an EC-funded initiative for creating of a viable infrastructure involving European language and language technology centres, to provide a platform for industry, research institutes and universities, and to supply the NLP community with public domain monolingual and multilingual language resources, such as: corpora, machine readable dictionaries and lexica, lexical data bases, and software tools for the creation, re-use, maintenance, valorisation and exploitation of linguistic data. Coordinator: University of Mannheim.

  • 1995: PROSODICS - development of software for analysis and visualisation of the prosody of the spoken utterances in exercises assisting foreign language learning. A project sponsored by University of Venice.

  • 1987-1989: QUERNAL (QUERy by NAtural Language) - development of an interactive and configurable dialogue system able to answer questions in Romanian addressing a database. Based on QUERNAL, a number of applications have been built:
    • For the Institute of Metallurgy, Bucharest: a dialogue system for their metallurgy database. Co-partner: ICI-Bucharest.
    • For the "Flamura Rosie" enterprise Sibiu: a dialogue system for their personnel and salaries database. Co-partner: ICI-Bucharest.
    • For the Moinesti Drilling-Production Trust: a dialogue system to their database of petroleum drilling and exploitation records.
  • 1984-1985: IURES (I Understand and Reply Eliminating Syntax) - a natural language dialogue system acting as a configurable interface to any semantic content expressed as a semantic network. Based on IURES, a number of applications have been built:
    • For ICI-Bucharest: a dialog system accessing the National Software Library, with ICI Bucharest.
    • For the Research Institute on Hydrology Iasi: a dialogue system on Geography of Romania.
  • 1984: Contract 4709/27.04.1984: Human-computer communication intermediated by natural language. Financed by ICI Bucuresti

  • 1983: Contract 1906/22.02.1983: Human-computer communication intermediated by natural language. Financed by ICI Bucuresti

  • 1882: Contract 4774/28.04.1982: Human-computer communication intermediated by natural language. Financed by ICI Bucuresti

  • Last update: August 2013