Dan Cristea
Research Activity


My main topics of research are: workflows for natural language processing, computational lexicography, discourse structure and incremental discourse parsing, anaphora, the relationship between structure and reference, cognitive aspects of language.

I have also been focused on problems related to lexical semantics, WordNet and applications of wordnets, and evolution of language.

The group of people in my University with whom I worked over time, ever renewed, since my students move so quickly as soon as they graduate, get a master or a PhD degree, is known as the Natural Language Processing Group.

The Consortium for the Romanian Language: Resources & Tools (in Romanian: Consortiul de Informatizare pentru Limba Romana) represents an initiative which aims to facilitate and augment the efforts of linguists and computer science researchers who work on Romanian language by promoting software tools and resources for linguistic processing.

Active research projects


  • Projects finnanced by the EC:


    • CLARIN (Common Language Resources and Technology Infrastructure): an European initiative committed to establish an integrated and interoperable research infrastructure of language resources (all knowledge sources based on language, written or spoken) and its technology (tools to carry out operations on such language material). Features of this technology are:
      • integration: the resource and service centres are connected via Grid technology and form a virtually integrated domain;
      • interoperability: the resources and services will be based on Semantic Web technologies to overcome format, structure and terminological differences;
      • stability: the resources and services are offered with a high availability;
      • persistency: the resources and services are planned to be accessible for many years so that researchers can rely on them;
      • accessibility: the resources and services are accessible via the web; different access methods and training possibilities are offered tailored to the needs of the communities making use of them;
      • extendability: the infrastructure is open so that new resources and services can be added easily.
      The overall costs for the CLARIN Research Infrastructure are estimated at 165 Mio Euro covering all European countries. The first phase of CLARIN extends over 36 months, starting January 1, 2008. In CLARIN, UAIC is responsible for WP6.

    • ALEAR (Artificial Language Evolution on Autonomous Robots): An FP7 project aiming the achievement of open-ended cognitive development and open-ended verbal dialogues among fully embodied situated agents (i.e. humanoid robots). It is expected that the humanoid robots (mechanisms which will include the necessary sensori-motor intelligence, scripts for establishing the turn-taking interaction among them, perceptual processes, processes that perform the conceptualisation of what to say, the expression of these conceptualisations in language and processes that perform the parsing of sentences and their interpretation in sensori-motor experience) will evolve their own artificial languages adapted to the environment and task settings in which they will be placed. Partners of ALEAR are:
      • Humboldt University - Berlin (coordinator)
      • SONY CSL - Paris
      • Osnabruck University
      • Autonomous University of Barcelona
      • Vrije Universiteit Brussel - Brussels
      • University Alexandru Ioan Cuza of Iasi
      The project starts at Febrary 1, 2008 and runs for 36 months.

  • Projects finnanced by the Romanian Ministry of Education Research and Youth:


    • eDTLR (The Thesaurus Dictionary of Romanian Language in Digital Form): a project aiming to create the electronic version of the dictionary edited and printed by the Romanian Academy ever since 1913. Planned to be finished by the three institutes of the Academy in the original paper format in 2008, the two series of the Dictionary, the Dictionary of the Academy (DA) and the Dictionary of the Romanian Language (DLR) will have 33 volumes, more than 15,000 pages, about 175,000 entries and more than 1,300,000 examples. The creation of the electronic version necessitates: scanning, OCR, proofreading (volunteers, then experts), parsing of the dictionary entries, building of the database format (XML TEI-P5), scanning of sources (over 2500 volumes) and building of indexing and browsing mechanisms.
      The partners in the project are:
      • The Faculty of Computer Science of the "Alexandru Ioan Cuza" University of Iasi (coordinator)
      • The Institut of Linguistics "Iorgu Iordan - Alexandru Rosetti" of the Romanian Academy - Bucharest
      • The Institut of Romanian Philology ""Alexandru Philippide" of the Romanian Academy - Iasi
      • The Institut of Literary History "Sextil Puscariu" of the Romanian Academy - Cluj-Napoca
      • The Research Institut of Artificial Intelligence of the Romanian Academy - Bucharest
      • The Research Institut of Computer Science of the Romanian Academy - Iasi
      • The Faculty of Letters of the "Alexandru Ioan Cuza" University of Iasi
      The project will run for 36 months, since September 2007. The on-line interface for collaborative proofreading for the acquisition of the large electronic dictionary can be accessed here.

    • SIR-RESDEC (Open Domain Question Answering System for Romanian and English): The project will develop an advanced, interlingual and parametrisable system for question in Romanian and answering in English and Romanian, relative to a dynamic collection of documents containing an arbitrary number of texts. The domains chosen as case studies are the legislative domain and the bioinformatics domain.
      The other partners in the project are:
      • The Research Institute for Artificial Intelligence, Romanian Academy - Bucharest (coordinator)
      • The Central Institute for Informatics - Bucharest
      The project will run for 36 months, since September 2007.

Past research projects

  • 2006-2008 LT4eL (Language Technology for eLearning): an FP6 project that amied to apply multilingual language technology tools and semantic web techniques for improving the retrieval of learning material. The developed technology facilitate personalized access to knowledge within learning management systems and support decentralisation and co-operation in content management. The LT4eL tehnology has been developed for 9 languages: Bulgarian, Czech, Dutch, English, German, Maltese, Portuguese, Polish, and Romanian.
  • 2006-2008 RolTech (Romanian Language Technologies): an INTAS project aimed to acquire electronic resources for the Romanian language, to develop Romanian language processing tools, and to create applications based on these resources. The partners in the project have been:
    • The Faculty of Computer Science of the "Alexandru Ioan Cuza" University of Iasi (coordinator)
    • The Institute for Computer Science of the Moldovian Academy of Sciences, Chisinau, Republic of Moldova
    • The University of Sheffield
  • 2006-2008 ROTEL (Intelligent Systems for Semantic Web, Based on Ontology Logics and Natural Language Technologies. Applications for Romanian): a project financed by the Romanian Ministry of Education and Reseach. Other partners in the project are:
    • The Central Institute for Informatics - Bucharest
    • The Research Institute for Artificial Intelligence, Romanian Academy - Bucharest
  • 2006-2008 InterOb - intended to create a three-dimensional model of the human head, capable of expressing emotions. We have chosen an approach based on the simulation of the physical properties of the anatomical components that are part of the human head: skeleton, muscles, skin.
  • 2004-2005 (grant CNCSIS): Imbunatatirea aspectelor prosodice in sinteza Text-to-Speech pentru limba romana. Coordinator: Romanian National Institute of Inventics - Iasi.
  • 2004-2005 (grant CNCSIS): Studii privind achizitionarea Dictionarului Limbii Romane in format electronic. Coordinator: Institute of Phylology and Folclore "Alexandru Philippide" Iasi.
  • 2001-2004: Balkanet. Coordinator: University of Patras.
  • 1998-1999: ELRA
  • 1994-1995: PROSODICS - with the University of Venice
  • 1985-1989: QUERNAL - with the Research Institute for Computer Science - Bucharest
  • 1981-1983: IURES - with the Research Institute for Computer Science - Bucharest

Last update: February 2007