My PhD Thesis
I created a page with more details about my PhD.
My Publications
Here's the list of my publications in the last years.
- Diana Trandabat, Daniela Gifu, Plesescu Adrian (2022) Detecting Offensive Language in Romanian Social Media
, in Procedia Computer Science Volume 207, 2022, Pages 2883-2890
- Manoleasa, T., Sandu, I., Gifu, D., Trandabat, D.(2022) FII UAIC at SemEval-2022 Task 6: iSarcasmEval - Intended Sarcasm Detection in English and Arabic, in SemEval 2022 - 16th International Workshop on Semantic Evaluation, Proceedings of the Workshop
pp. 970-977
- Rares, Arvinte, Trandabat, D.(2022) Predictability of Kidney Dialysis Survivability, in CEUR Workshop Proceedings
3302, pp. 192-201
Back to top
- Bodnar, C., Tapuc, A., Pintilie, C., Gifu, D., Trandabat, D.(2021) FII_CROSS at SemEval-2021 Task 2: Multilingual and Cross-lingual Word-in-Context Disambiguation, in SemEval 2021 - 15th International Workshop on Semantic Evaluation, Proceedings of the Workshop
pp. 787-792
Back to top
- Dan Cristea, Ionut Pistol, Serban Boghiu, Anca Bibiri, Daniela Gifu, Andrei Scutelnicu, Mihaela Onofrei, Diana Trandabat, George Bugeag(2020) CoBiLiRo: a Research Platform for Bimodal Corpora, in the 1st International Workshop on Language Technology Platforms held at the 12th International Scientific Conference eLearning and Software for Education, IWLTP, 16 May 2020, Marseille, France.
- Iftene, A., Trandabat, D., Radulescu, V.(2020) Eye and voice control for an augmented reality cooking experience, Procedia Computer Science
176, pp. 1469-1478
- Onofrei M., Trandabat, D.(2020) Author confidence as a predictor of th acceptance of scientific papers, in PROCEEDINGS OF THE 15TH INTERNATIONAL CONFERENCE LINGUISTIC RESOURCES AND TOOLS FOR NATURAL LANGUAGE PROCESSING , pp.131-140
Back to top
- Dan Cristea, Cristian Padurariu, Serban Boghiu, Daniela Gifu, Mihaela Onofrei, Diana Trandabat, Ionut Cristian Pistol, Anca-Diana Anca Bibiri, Andrei Scutelnicu(2019) The COBILIRO Project: Building and Distributing a Bimodal Corpus for Romanian Language, at the 14th International Conference Linguistic Resources and Tools for Processing The Romanian Language, ConsILR-2019 Program, 18-20 Nov. 2019, Cluj-Napoca, Romania.
- Filimon, M., Iftene, A., Trandabat, D.(2019) Bob - A general culture game with voice interaction, in Procedia Computer Science
159, pp. 323-332.
- Filimon, M., Iftene, A., Trandabat, D.(2019) Using games and smart devices to enhance learning geography and music history, in Proceedings of the 28th International Conference on Information Systems Development: Information Systems Beyond 2020, ISD 2019
- Patras, G.-F., Lungu, D.-F., Gifu, D., Trandabat, D.(2019) Hope at SemEval-2019 task 6: Mining social media language to discover offensive language, in NAACL HLT 2019 - International Workshop on Semantic Evaluation, SemEval 2019, Proceedings of the 13th Workshop
pp. 635-638
- Plamada, M.O., Trandabat, D., Gifu, D.(2019) Towards identifying author confidence in biomedical articles, in Data
4(1),18
- Gifu, D., Trandabat, D., Cohen, K., Xia, J.(2019) Special issue on the curative power of medical data, in Data
4(2),85
Back to top
- Ionuţ Pistol, Diana Trandabăţ, Mădălina Răschip(2018) Medi-Test: Generating Tests from Medical Reference Texts, in Data Journal, Volume 3, issue 4, ISSN: 2306-5729, DOI 10.3390/data3040070.
The Medi-test system we developed was motivated by the large number of resources available for the medical domain, as well as the number of tests needed in this field (during and after the medical school) for evaluation, promotion, certification, etc. Generating questions to support learning and user interactivity has been an interesting and dynamic topic in NLP since the availability of e-book curricula and e-learning platforms. Current e-learning platforms offer increased support for student evaluation, with an emphasis in exploiting automation in both test generation and evaluation. In this context, our system is able to evaluate a student's academic performance for the medical domain. Using medical reference texts as input and supported by a specially designed medical ontology, Medi-test generates different types of questionnaires for Romanian language. The evaluation includes 4 types of questions (multiple-choice, fill in the blanks, true/false, and match), can have customizable length and difficulty, and can be automatically graded. A recent extension of our system also allows for the generation of tests which include images. We evaluated our system with a local testing team, but also with a set of medicine students, and user satisfaction questionnaires showed that the system can be used to enhance learning.
- Daniela Gifu, Diana Trandabăţ, Kevin Bretonnel Cohen, Jigbo Xia (2018) The Curative Power of Medical Data, in Proceedings of the 18th ACM/IEEE on Joint Conference on Digital Libraries, 431-432, ACM, ISBN: 978-1-4503-5178-2, DOI 10.1145/3197026.3200210.
In an era when massive amounts of medical data became available, researchers working in biological, biomedical and clinical domains have increasingly started to require the help of language engineers to process large quantities of biomedical and molecular biology literature, patient data or health records. With such a huge amount of reports, evaluating their impact has long seized to be a trivial task. Linking the contents of these documents to each other, as well as to specialized ontologies, could enable access to and discovery of structured clinical information and foster a major leap in natural language processing and health research.
- Petronela Savin and Diana Trandabăţ (2018) Ethnolinguistic Audio-visual Atlas of the Cultural Food Heritage of Bacau County - Elements of methodology, in Journal BRAIN. Broad Research in Artificial Intelligence and Neuroscience, Volume 9, Issue 1, pages 125-131.
The paper aims to present the methodology of the platform Ethnolinguistic audio-visual atlas of the cultural food heritage of Bacau County - eCULTFOOD Atlas, the main product of the project "The Digitization of the Cultural Food Heritage. The Region of Bacau" eCULTFOOD (PNIII-P2-2.1-BG-2016-0390). The platform eCULTFOOD Atlas is a comprehensive database containing the results of field research and scientific documentation on local cultural food traditions. It includes a representative corpus of audio-visual documents recording the traditional food cultural heritage based on surveys involving the older generation from the rural county of Bacau, Romania. The eCULTFOOD Atlas meets the aspirations of EU policies that regard the digitization of cultural resources as a key factor that would contribute to improving accessibility and undivided flow of information in a knowledge economy. Once transposed into electronic format, the cultural food heritage of Bacau County may become a resource for a broad spectrum of activities impacting sectors such as education, economy and tourism.
- Onofrei, M., I. Hulub, D. Trandabat and D. Gifu (2018) Apollo at SemEval-2018 Task 9: Detecting Hypernymy Relations Using Syntactic Dependencies, In: Proceedings of The 12th International Workshop on Semantic Evaluation, workshop at ACL2018, pp. 898-902.
This paper presents the participation of Apollo's team in the SemEval-2018 Task 9 "Hypernym Discovery", Subtask 1: "General-Purpose Hypernym Discovery", which tries to produce a ranked list of hypernyms for a specific term. We propose a novel approach for automatic extraction of hypernymy relations from a corpus by using dependency patterns. The results show that the application of these patterns leads to a higher score than using the traditional lexical patterns.
- Sandra Maria Amarandei, Iuliaua-Alexadra Flescan-Lovin-Arseni, Ramona-Andreea Turcu, Daniela Gifu, Diana Trandabat (2018) EmoIntens Tracker at SemEval-2018 Task 1: Emotional Intensity Levels in #Tweets In Proceedings of the 12th International Workshop on Semantic Evaluation (SemEval-2018), see, Association for Computational Linguistics: Human Language Technologies (NAACL HLT 2018), New Orleans, Louisiana, United States, 2018, pp. 177-180, ISBN 978-1-948087-20-9.
The "Affect in Tweets" task is centered on emotions categorization and evaluation matrix using multi-language tweets (English and Spanish). In this research, SemEval Affect dataset was preprocessed, categorized, and evaluated accordingly (precision, recall, and accuracy). The system described in this paper is based on the implementation of supervised machine learning (Naive Bayes, KNN and SVM), deep learning (NN Tensor Flow model), and decision trees algorithms.
- Larisa Alexa, Alina Lorenţ, Daniela Gifu, Diana Trandabăţ (2018) The Dabblers at SemEval-2018 Task 2: Multilingual Emoji Prediction. In Proceedings of the 12th International Workshop on Semantic Evaluation (SemEval-2018), see, Association for Computational Linguistics: Human Language Technologies (NAACL HLT 2018), New Orleans, Louisiana, United States, 2018, pp. 405-409, ISBN 978-1-948087-20-9.
The "Multilingual Emoji Prediction" task focuses on the ability of predicting the correspondent emoji for a certain tweet. In this paper, we investigate the relation between words and emojis. In order to do that, we used supervised machine learning (Naive Bayes) and deep learning (Recursive Neural Network).
- Iftene, A. and Trandabăţ, D. (2018) Enhancing the Attractiveness of Learning through Augmented Reality in Procedia Computer Science, 126, 166-175, International Conference on Knowledge Based and Intelligent Information and Engineering Systems, KES2018, 3-5 September 2018, Belgrade, Serbia
Over the last years, augmented reality was used in various domains, from medical, industrial design, modeling and production, robot teleoperation, military, entertainment, leisure activities to translation, facial recognition, assistance while driving, interior and exterior design, virtual friends, internet of things and eLearning. In eLearning, the combination between classical and augmented content (the later coming with 3D models, images, sounds, animations, Internet browsing, etc.) can help the teacher to better explain the content of the courses. In this paper, we present four augmented reality applications, created with the aim to improve communication and collaboration skills (two of them) and to ease the learning of biology and geography (the other two). The motivation behind these applications is to enhance the attractiveness of the classes, allow students to retrain new information more easily and reduce the stress behind tests when presented as games.
Back to top
- Diana Trandabăţ, Daniela Gifu, Dan Cristea (2017) Mining Social Media to Extract Structured Knowledge through Semantic Roles, in Paspallis, N., Raspopoulos, M. Barry, M. Lang, H. Linger, & C. Schneider (Eds.), Information Systems Development: Advances in Methods, Tools and Management (ISD2017 Proceedings). Larnaca, Cyprus: University of Central Lancashire Cyprus. ISBN: 978-9963-2288-3-6.
We use semantics in our daily communications without giving it too much attention. However, things are not so trivial when computers try to incorporate semantic knowledge. In an attempt to enhance machines with human-like behavior and understanding, computer scientists and linguists have joined efforts in making the language easier to be understood. Language models need to be derived from large knowledge bases, hence this paper presents a platform able to extract user generated content for social media websites, analyze it and generate a structured knowledge base, in an attempt to discover the crowd intelligence hidden within.
- Andreea Macovei, Diana Trandabăţ (2017) How can we reconstruct stories based on memories?, in Proceedings of Futurity-2017 Workshop on Modeling Societal Future, held at TPDL2017, Greece, Thessaloniki.
Memories in literary texts reveal past events which complete the itinerary of a character throughout the book. Whether they are exposed from the point of view of that character or from the perspective of the other characters joining the action, the memories bring novelties about the distant or the recent past to the actual action. This explorative study intends to identify memories, with an ulterior aim of temporally ordering narrative sequences for reconstructing the story of each character. Starting from the premise that memories contain temporal information, the goal is to see if this information can be used to restructure a discourse so that the entire action gets to be chronologically ordered according to the order of the events' deployment. The described model is tested on a novel of Tash Aw (Map of the Invisible World), with possible applications to non-literary texts such as newspaper articles, social media posts, recipes, political texts.
- Daniela Gifu, Diana Trandabăţ (2017) FUTURITY 2017 - Workshop on Modeling Societal Future, in Kamps J., Tsakonas G., Manolopoulos Y., Iliadis L., Karydis I. (eds) Research and Advanced Technology for Digital Libraries. TPDL 2017. Lecture Notes in Computer Science, vol 10450, Springer, Cham, pp. 675-676, 2017, DOI: 10.1007/978-3-319-67008-9.
People seek and share ideas, information, experiences, expertise, opinions, and emotion with both acquaintance and strangers on the Internet, based on the effect of the Wisdom of Crowds. Over the last few years, the use of Social Media has increased tremendously all over the world. The huge popularity of social networks provides an ideal environment for scientists to test and simulate new models, algorithms and methods to process knowledge. Structured social knowledge can be used by different actors (companies, public institutions, researchers and scholars interested in formal and empirical analysis of social trends) to understand the behaviors in users or groups.
- Diana Trandabăţ(2017) Enriching learning materials with semantic roles, in Proceedings of KES 2017, Marseille, September 2017.
One of the most challenging tasks in human-computer communication is the decomposition of meaning. The theory of semantic frames allows for the identification of the roles that various constituents have in an event: the doer of the action, the receiver of the action, the person towards whom the action is directed, the means and purposes of an action, etc. Through this paper, we propose to introduce semantic frames in eLearning contexts, with the conviction that users may find it easier to learn concepts if they are offered in a semantically related manner. In order to achieve this, we propose a system that, for every concept searched by the user, offers a network of concepts, by analyzing the semantic relations which appear between concepts. In other words, the proposed system starts with a concept, retrieves sentences containing it from the collection of learning materials and identifies the semantic relations between the considered concept and the ones found in their neighborhood using semantic role labeling. Additional information is completed using DBpedia's knowledge base before establishing the final network of relations.
- Diana Trandabăţ(2017) Towards building knowledge resources from social media using semantic roles, In: Kamps J., Tsakonas G., Manolopoulos Y., Iliadis L., Karydis I. (eds) Research and Advanced Technology for Digital Libraries. TPDL 2017. Lecture Notes in Computer Science, vol 10450, Springer, Cham, DOI https://doi.org/10.1007/978-3-319-67008-9_50, Print ISBN 978-3-319-67007-2, Online ISBN 978-3-319-67008-9.
Text semantics is a well-hidden treasure, whose deciphering requires deep understanding. Artificial Intelligence enhances computers with human-like judgments, thus decoding the covered message and sharing it between machines is one of the main challenges that the computational linguistics domain faces nowadays. In an attempt to learn how humans communicate, computers use language models derived from human knowledge. While still far from completely understanding insinuated messages in political discourses, computer scientists and linguists have joined efforts in modeling a human-like linguistic behavior. This paper aims to introduce the VoxPopuli platform, an instrument to collect user generated content, to analyze it and to generate a map of semantically-related concepts to capturing crowd intelligence
- Andreea Macovei, Diana Trandabăţ (2017) Towards an analysis of remembered events in texts, in Proceedings of the 3rd International Workshop on Social media and the Web of Linked Data RUMOUR-2017, Workshop at ACM/IEEE-CS Joint Conference on Digital Libraries, JCDL 2017, Toronto, Ontario, Canada, June 19-23, 2017, pages 33-37, ISBN 978-973- 0-24730-5.
This study is a work in progress that proposes a way of reordering discourse starting from the natural course of action and not from the order of events displayed in a text, with a focus on remembered events. Although this aspect is common in belletristic texts where memories are inserted in a randomly matter in order to give the reader more details about the characters, the remembrance or resumption of an event or events can be also observed in direct speech and non-literary texts such as journalistic texts, culinary recipes and social media posts in a form more or less similar to literary texts.
- Diana Trandabăţ (2017) Semantic Role Annotation for eLearning, in Proceedings of 13th International Scientific Conference
eLearning and Software for Education eLSE2017, Bucharest, April 2017, pp 41-47, DOI 10.12753/2066-026X-17-179.
The analysis of semantic roles reveals the hidden structure of a sentence and contributes to the construction of meaning, identifying specific roles that entities play in various contexts and actors involved in an event. The semantic role expresses the correlation between a predicate and its arguments. This paper describes a preliminary study about the impact semantic roles could have in eLearning contexts. The goal of our application is to identify, from a collection of learning materials, all contexts referring to a specific entity, in order to analyse relations between the entity and words with which it frequently co-occurs. Thus, through semantic role analysis, we intend to determine temporal, spatial or modal constraints which determine or restrict a concept. More concretely, the system we propose starts from an input concept, searches learning materials for that particular concept, selects the snippets that contains it, and applies semantic role labelling. At a further step, we extract semantic relations between the entity and neighbouring words, resulting in a list of binary relations. Finally, the program uses WordNet hypernyms trying to generalize over all extracted snippets. Thus, the program may facilitate the understanding of the concept through its neighbours, by creating a map of structured data related to a target concept, where each related entity is marked with its corresponding role (which can be of type Agent, Patient, Effect, Location, Cause, Time, etc.)
- Diana Trandabăţ and Daniela Gifu(2017) Social Media and the Web of Linked Data, in Proceedings of 2017 ACM/IEEE Joint Conference on Digital Libraries (JCDL), Toronto, June 2017, pp 350-352, IEEE Catalog Number: CFP17JCD-ART (Xplore), ISBN: 978-1-5386-3861-3 (Xplore).
Written texts have perhaps never been so widely used as they are in today's social media context, with people constantly writing, sharing, commenting, getting involved. At the same time, Linked Data is emerging as an increasingly important topic, and research in this area has resulted in massive amounts of structured linguistic data. In this climate, we intend to analyze how linked data can help to structure and extract meaning from social media's short, informal and context dependent texts, with an emphasis on real-life applications.
- Diana Trandabăţ (2017) Segmentation-Cohesion-Dependency Parsing Strategy, presentation at the BAEKTEL Workshop, Iasi, February 8-9, 2017.
- Flescan-Lovin-Arseni Iuliana Alexandra, Turcu Ramona Andreea, Sirbu Cristina, Alexa Larisa, Amarandei Sandra Maria, Herciu Nichita, Scutaru Constantin, Diana Trandabăţ , Iftene Adrian (2017) #WarTeam at SemEval-2017 Task 6: Using Neural Networks for Discovering Humorous Tweets, in Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017), Association for Computational Linguistics, pp. 398-401.
This paper presents the participation of #WarTeam in Task 6 of SemEval2017 with a system classifying humor by comparing and ranking tweets. The training data consists of annotate tweets from the @midnight TV show. #WarTeam's system uses a neural network (TensorFlow) having inputs from a Naive Bayes humor classifier and a sentiment analyzer.
- Rotari Razvan-Gabriel, Hulub Ionut, Oprea Stefan, Plamada-Onofrei Mihaela, Lorent Alina Beatrice, Preisler Raluca, Iftene Adrian, Diana Trandabăţ(2017) Wild Devs' at SemEval-2017 Task 2: Using Neural Networks to Discover Word Similarity, in Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017), Association for Computational Linguistics, pp. 258-261.
This paper presents Wild Devs' participation in the SemEval-2017 Task 2 "Multilingual and Cross-lingual Semantic Word Similarity", which tries to automatically measure the semantic similarity between two words. The system was build using neural networks, having as input a collection of word pairs, whereas the output consists of a list of scores, from 0 to 4, corresponding to the degree of similarity between the word pairs.
Back to top
- Calin Ciubotariu, Marius Hrisca, Mihai Gliga, Diana Darabana, Diana Trandabăţ and Adrian Iftene (2016) How to build a sentiment analyzer using off-the-shelf resources, in BringITOn!2016 Catalogue, ISSN 2285-0929, Iasi, 18-19 nov. 2016.
[bib] [abstract]
Minions, a team formed of first year students in the Master of Computational Linguistics, started the participation at Semeval-2016 as a semester project, aiming to build a model for analyzing and classifying "tweets" into positive, neutral and negative, according to the evoked sentiment, while getting familiar with Natural Language Processing tools and methods. Therefore, the backbone of our sentiment analyzer consists in several off-the-shelf, freely available resources, enhanced with a classifier trained on the SemEval-2016 data.
- Octavian Ciobanu Apostol, Cosmin Florean, Oana Bejenaru and Diana Trandabăţ (2016) Sentiment Analysis as a marketing tool, in BringITOn!2016 Catalogue, ISSN 2285-0929, Iasi, 18-19 nov. 2016
[bib] [abstract]
Social media has replaced the traditional sources of information: people's need to be constantly updated changed our behaviour from buying a newspaper or watching TV, to using a Facebook or Twitter account to visualize the day's hottest news, with the bonus of being able to also comment on them. Considering this, our goal is to build a tool that is able to analyse and classify the information provided by social media.
- Diana Trandabăţ, Petronela Savin, Daniela Gifu, Andreea Macovei (2016) eCULTFOOD Project, in Proceedings of ConsILR2016the 12th International Conference "Linguistic Resources and Tools for Processing the Romanian Language", ISSN 1843-911X, Malini, 27-29 oct. 2016
[bib] [abstract]
The eCULTFOOD project has as main objective the creation of an "Ethnolinguistic audio-visual atlas of the cultural food heritage of Bacau County" as a comprehensive database containing the results of field research and scientific documentation on cultural food traditions in the region. The project's area of intervention is the intangible cultural heritage of traditional food, fulfilling the function of protection (research, promotion), dissemination (diffusion, including via new models developed in the on-line environment) and supporting, first and foremost, education as cultural intervention. Its main aim is to preserve, in cartographic and computerized form, a representative corpus of audio-visual documents recording the traditional food cultural heritage based on surveys involving the older generation from the rural county of Bacau.
- Diana Trandabăţ and Oana Gagea (2016) A new perspective on reusing semantic resources, in J. Goluchowski, M. Pankowska, C. Barry, M. Lang, H. Linger, & C. Schneider (Eds.), Information Systems Development: Context, Creativity and Cognition in Information Systems Development (ISD2016 Proceedings). University of Economics in Katowice. ISBN: 978-83-7875-307-0. http://aisel.aisnet.org/isd2014/proceedings2016/CogScience/1.
[bib] [abstract]
Well trained linguists manage to capture semantic behavior of words in various annotated corpora. Using them as training data, semantic relations can be discovered by intelligent systems using supervised machine learning techniques. What if we have short deadlines and limited human and financial possibilities that prevent us from building such a valuable training corpus for our own language? If such a corpus already exists for any other language, we could make use of this treasure and reproduce it for the language we need.
This paper proposes an import method, which transfers semantic annotation (which could be semantic roles, named entity, sentiments, etc.) from an annotated resource to another language, using comparable texts. The case of semantic role annotation transfer from English to Romanian is discussed.
- Diana Trandabăţ (2016) Enhancing social media analysis with semantic roles, in Proceedings of EMCIS, ISBN 9789-6068-970-92, Krakow, 2016, pp. 149-162.
[bib] [abstract]
Semantic role analysis establishes the roles that entities have in different contexts, and what are the temporal, modal or local constraints that determine or restrict an event to take place.
This paper aims to merge two directions, by adding semantic roles to social media analysis. Thus, our system creates a contextual map, by identifying the role an entity plays in different contexts, as well as the roles played by words frequently co-occurring with the input entity.
- Diana Trandabăţ and Adrian Iftene (2016) Complementing Tweets Sentiment Analysis with Semantic Roles, in Proceedings of the Conference on Mathematical Foundations of Informatics MFOI2016, July 25-29, 2016, Chisinau, Republic of Moldova
[bib] [abstract]
Slowly but surely, social media replaced the traditional sources of information: people's need to be constantly updated changed our behavior from buying a newspaper or watching TV, to using a Facebook or Twitter account to visualize, in a customizable manner, the day's hottest news, with the bonus of being able to also comment on them. This paper presents a method to identify a tweet's polarity (negative, positive, neutral) using SentiFrameNet, a naive Bayes classifier and an off-the-self semantic role labeling API.
- Andreea Macovei, Oana-Maria Gagea and Diana Trandabăţ (2016) Towards creating an ontology of social media texts, in Proceedings of RUMOUR2015, Springer CCIS/LNCS
[bib] [abstract] [pdf]
Texts live around us just as we live around them. At any instant, there are texts that people write, share, use to get informed, etc. (starting with an advertisement heard on the radio every morning and finishing with the contract of sale signed before a notary). Combining this with the concept of economy in language (or the principle of least effort) – a tendency shared by all humans – consisting in minimizing the amount of effort necessary to achieve the maximum result, it is no wonder why the social media, with its short, informal and context dependent texts, achieved such a high popularity. Even texts are so constantly present in our lives (or precisely because of that), linguistic classification of texts is still debated, and no clear visualization of texts types is yet available. Going beyond the classification of texts in species and genres, this paper proposes an ontology which discusses the various text types, focusing on social media texts, and offering a set of properties to describe them.
- Cosmin Florean, Oana Bejenaru, Eduard Apostol, Octavian Ciobanu, Adrian Iftene, Diana Trandabăţ (2016) SentimentalITsts at SemEval-2016 Task 4: building a Twitter sentiment analyzer in your backyard, in Proceedings of SemEval-2016 - International Workshop on Semantic Evaluation, held at NAACL HLT 2016, 12-17 June, San Diego, USA
[bib] [abstract]
The paper presents the system developed by the SentimentalITsts team for the participation in Semeval-2016 task 4, in the subtasks A, B and C. The developed system uses off the shelf solutions for the development of a quick sentiment analyzer for tweets. However, the lack of any syntactic or semantic information resulted in performances lower than those of other teams.
- Călin-Cristian Ciubotariu, Marius-Valentin Hrisca, Mihail Gliga, Diana Darabană, Diana Trandabăţ, Adrian Iftene (2016) Minions at SemEval-2016 Task 4: or how to build a sentiment analyzer using off-the-shelf resources?, in Proceedings of SemEval-2016 - International Workshop on Semantic Evaluation, held at NAACL HLT 2016, 12-17 June, San Diego, USA
[bib] [abstract]
Minions, a team formed of first year students in the Master of Computational Linguistics, started the participation at Semeval-2016 as a semester project, aiming to build a model for analyzing and classifying tweets into positive, neutral and negative, according to the evoked sentiment, while getting familiar with Natural Language Processing tools and methods. Therefore, the backbone of our sentiment analyzer consists in several off-the-shelf, freely available resources, enhanced with a classifier trained on the SemEval-2016 data.
Back to top
- Diana Trandabăţ (2015) Identifying semantic events in unstructured text, in Proceedings of MIKE 2015, LNAI 9468, India
[bib] [abstract]
Semantics has always been considered the hidden treasure of texts, accessible only to humans. Artificial intelligence struggles to enrich machines with human features, therefore accessing this treasure and sharing it with computers is one of the main challenges that the natural language domain faces nowadays. This paper represents a further step in this direction, by proposing an automatic approach to extract information about events from unstructured texts by using semantic role labeling.
- Andreea Macovei, Diana Trandabăţ and Dan Cristea (2015) How to make Romanian legal texts easier to understand?, in Multilingualism in Specialized Communication: Challenges and Opportunities in the Digital Age, Book of Abstracts of the 20th European Symposium on Languages for Special Purposes, (Eds.) Vesna Luvsicky and Gerhard Budin, 8-10 July 2015, Vienna, Austria, ISBN: 978-3-200-04186-8
[bib] [abstract]
A democratic society is governed by laws, and the people of a state, as citizens with rights and obligations, must know and respect them. However, the legislative terminology enabling the expression of the rule of law is a highly specialized language which may pose problems in understanding for those who do not have knowledge of judicial field.
Aiming to facilitate access to the legislative texts for all citizens, we conducted a study based on Romanian legal language, which involved a semi-automatic method for extracting several features from legislative texts: (1) interpretable structures (which can have different meanings, depending on the reader’s interpretation), (2) mistakes (spelling mistakes, punctuation mistakes and grammar mistakes), (3) contradictions, (4) long phrases (leading to disruption of the message to be sent) and (5) definitions (both positive and negative definitions) used to identify undefined concepts in laws.
- Andreea Macovei, Oana-Maria Gagea and Diana Trandabăţ (2015) Automatic Identification Of Literary And Non-Literary Texts, Proceedings of Consilr 2015, Iasi, Romania
[bib] [abstract]
Classifying text types is an important challenge of the natural language processing field: as there are many texts published and shared every second (literary works, newspapers, blogs, news, laws, etc.), a possible sorting and classification of those texts becomes necessary. This paper proposes an analysis regarding automatic identification of several text types: we have implemented a tool that can be used to automatically identify literary and non-literary texts. If the determination of text type is made possible, many customized applications can be performed in order to extract information, to analyse the content of the text, to offer suggestions, to summarize a text, to identify a text according to its format, etc.
Back to top
- Gagea Oana-Maria, Gagea Andreea, Trandabăţ D., Cristea D. (2014) How to read and understand a Romanian legislative text?, in BringITOn! 2014 Catalogue, Al. I. Cuza University Press, ISSN 2285-0929, pp 44-45.
[bib] [abstract]
In a democratic society, laws govern the proper functioning of a state. People as citizens of that state, must get to know those laws and rely on them whenever it is necessary. To avoid situations that lead to different interpretations, we present an analysis of Romanian legal language and several methods to semi-automatically identify ambiguous words, errors, long phrases, definitions and semantic relations. This research supports linguistics, legal specialists and citizens in contact with the law.
Back to top
- Trandabăţ A., Pislaru M., Trandabăţ D. (2013) SiadEnv safety and communication features in real life scenarios, in Interdisciplinary Research in Engineering: Steps towards breakthrough innovation for sustainable development; Advanced Engineering Forum, Volume 8-9, Pages 195-204.
[bib] [abstract] [pdf] [isi]
SiadEnv system was designed to keep track of the energy consumes in residential and industrial buildings. This will analyze and compute the energy consume real need in various scenarios. The main objective of SiadEnv is to reduce the energy losses by taking action and modifying the room settings. Thus, SiadEnv computes the difference between outdoor and indoor temperature and adjusts the heating or cooling management in order to maintain the comfort index and to reduce the energy consume. Moreover, it contains indoor safety modules that prevent or reduce the impact of unwanted events such as flood, fire, motion control (thief entry etc). Due to SiadEnv modular design based on wireless sensors networks, the fire monitoring safety module can be easy reconfigured in order to extend its applications. As further work, the SiadEnv safety module will be redesigned into a new application with important social economic and environmental impact, which will use monitor forest fire and predict its dynamic, in order to provide crucial data for forest salvation.
- Trandabăţ D. (2013) Towards extracting relations from unstructured data through natural language semantics, Proceedings of the 2nd Workshop on New Frontiers in Mining Complex Patterns (NFMCP 2013), Prague, 23-27 September 2013, pp. 225-236.
[bib] [abstract] [pdf]
Semantics has always been considered the hidden treasure of texts, accessible only to humans. Artificial intelligence struggles to enrich machines with human features, therefore accessing this treasure and sharing it with computers is one of the main challenges that the natural language domain faces nowadays. This paper represents a further step in this direction, by proposing an automatic approach to extract information from texts on the web by using semantic role labeling.
- Trandabăţ D. (2013) Natural Language Processing using Semantic Roles, presentation at BONSAI: Bridging Organizations and National Societies in Artificial Intelligence, 24-31 July 2013, Bran, Romania.
- Pislaru M., Trandabăţ D., Trandabat A. (2013) Assessment of Corporate Environmental Performance Based on Fuzzy Approach , in APCBEE Procedia 5 ( 2013 ), Elsevier, pages 368-372.
[bib] [abstract] [scopus]
Environmental proactivity is determined by several drivers, each of them being able to model the degree of corporate responses to environmental challenge. For more than twenty years Climate Change issues are still crossing both political and economic agend. The increasing attention regarding environmental problems calls the company to react and adapt its strategy to this new issue. Despite the fact that in the next future the eco revolution would affect every business activity, there exist some industries with high environmental impact which more liable for such change. The aim of the paper is to present a fuzzy rule based model to assess a corporate environmental performance. In order to cope with this challenge the article illustrate an example from food industry.
Back to top
- Trandabăţ D., Irimia E., Barbu Mititelu V., Cristea D., Tufis D. (2012) The Romanian Language in the Digital Age, in White Paper Series, Eds. Georg Rehm and Hans Uszkoreit, Berlin, Springer, ISBN 978-3-642-30702-7, 2012, 87 p.
[bib] [abstract] [springer] [e-book]
During the last 60 years, Europe has become a distinct political and economic structure. Culturally and linguistically it is rich and diverse. However, from Portuguese to Polish and Italian to Icelandic, everyday communication between Europe's citizens, within business and among politicians is inevitably confronted with language barriers. The EU's institutions spend about a billion euros a year on maintaining their policy of multilingualism, i.e., translating texts and interpreting spoken communication. Does this have to be such a burden? Language technology and linguistic research can make a significant contribution to removing the linguistic borders. Combined with intelligent devices and applications, language technology will help Europeans talk and do business together even if they do not speak a common language.
Information technology changes our everyday lives. We typically use computers for writing, editing, calculating, and information searching, and increasingly for reading, listening to music, viewing photos and watching movies. We carry small computers in our pockets and use them to make phone calls, write emails, get information and entertain ourselves, wherever we are. How does this massive digitization of information, knowledge and everyday communication affect our language? Will our language change or even disappear?
More...
- Iftene A., Gînscă A.-L., Moruz M. A., Trandabăţ D. , Husarciuc M., Boroș E. (2012) Enhancing a Question Answering System with Textual Entailment for Machine Reading Evaluation, in Pamela Forner, Jussi Karlgren, Christa Womser-Hacker (Eds.): CLEF 2012 Evaluation Labs and Workshop, Online Working Notes, Rome, Italy, September 17-20, 2012, ISBN 978-88-904810-3-1
[bib] [abstract] [dblp] [pdf]
This paper describes UAIC's Question Answering for Machine Reading Evaluation systems participating in the QA4MRE 2012 evaluation task. We submitted two types of runs, first type of runs based on our system from 2011 edition of QA4MRE, and second type of runs based on Textual Entailment system. For second types of runs, we construct the Text and the Hypothesis, asked by Textual Entailment system from initial test data (the <documents> tag was used to build the Text and the <question> and <answer> tags were used to build the Hypothesis). The results offered by organizer showed that second type of runs were better than first type of runs for English.
Back to top
- Trandabăţ D. (2011) Mining Romanian texts for semantic knowledge, in Proceedings of Intelligent Systems and Design Application Conference, ISDA2011, Cordoba, Spain, ISSN: 2164-7143, ISBN: 978-1-4577-1676-8, DOI: 10.1109/ISDA.2011.6121799, pp. 1062-1066
[bib] [abstract] [ieee]
This papers presents a semantic role labeling system for Romanian texts. The semantic labeling system was developed using PASRL, a platform for supervised learning techniques. The developed platform tests several classifiers on different sub-problems of the SRL task (Predicate Identification, Predicate Sense Identification, Sense Identification, Argument Identification), chooses the ones with the greatest performance and returns a Semantic Role Labeling System (a sequence of trained models to run on new data).
- Trandabăţ D. and Trandabăţ A. (2011) Extracting Semantic Role Information from Unstructured Texts, in Proceedings of 6th International Workshop in Semantic Media Adaptation and Personalization - SMAP2011, ISBN: 978-1-4577-1372-9, DOI: 10.1109/SMAP.2011.20, pp. 62-67, Vigo, Spain.
[bib] [abstract] [ieee]
Shallow semantic parsing of natural language processing is an important component in all kind of NLP applications and Semantic Role Labeling in particular, is an active research topic. This paper describes a rule-based Semantic Role Labeling system aimed at extracting semantic information from texts. The input text is processed by exploiting part of speech information and syntactic dependencies in order to identify semantic roles. The system's architecture is presented and the results and further developments are discussed.
- Trandabăţ D. (2011) Using Semantic Roles to Understand the Web, in Proceedings of the Workshop "Language Resources and Tools with Industrial Applications", ISSN: 1843-911X, pp. 69-82, Cluj-Napoca, Romania.
[bib] [abstract]
This paper presents a semantic role labeling system acting as the backbone of an application that establishes the roles that entities have in different contexts, and what are the temporal, modal or local constraint that determine or restrict an event to take place. A semantic role represents the relationship between a predicate and an argument. Semantic parsing, by identifying and classifying the semantic entities in context and the relations between them, has great potential on its downstream applications, such as text summarization or machine translation.
- Trandabăţ D. (2011) Improving Metadata by Filtering Contextual Semantic Role Information, in Communications in Computer and Information Science, Volume 240, 2011, DOI: 10.1007/978-3-642-24731-6, Metadata and Semantic Research, MTSR2011, ISSN 1865-0929, Springer, pp. 201-208.
[bib] [abstract] [pdf][scopus]
This paper proposes a method to automatically improve a web page's metadata using the semantic content of the page. Thus, using a semantic role labeling system, the web page content is parsed and the entities that frequently play core semantic roles are considered for addition to the web page's list of metadata. Semantic role analysis answers questions such as: "What role has an entity in a specific context?" or "When, why, where or how an event takes place?".
- Trandabăţ D. (2011) Semantic role labeling for structured information extraction, in Proceedings of the fourth workshop on Exploiting semantic annotations in information retrieval, ISBN: 978-1-4503-0958-5, DOI: 10.1145/2064713.2064728, pp. 25-26, Glasgow, UK.
[bib] [abstract] [pdf] [scopus]
This paper presents a semantic role labeling system acting as the backbone of an application that monitors the contexts and relations in which a specified entity appears in written texts. The main goal is to identify the role this entity plays, as well as the roles played by words frequently co-occurring with the input entity.
- Trandabăţ D. (2011) Towards automatic cross-lingual transfer of semantic annotation, in 6e Rencontres Jeunes Chercheurs en Recherche d'Information (RJCRI) 2011, ISBN 978-2-35768-024-1, pp. 403-408, 16-18 March, Avignon, France.
[bib] [abstract] [pdf] [dblp]
In order to develop a semantic labeling system, the most common methods use supervised learning from an annotated corpus. What if we have short deadlines and limited human and financial possibilities that prevent us from building such a training corpus for our language? If such a corpus already exists for any other language, this paper proposes a method to automatically import the existing corpus for the language we need. The transfer method is based on translating the existing corpus (or using annotated versions of existing parallel texts), aligning it at word level, and applying a set of mapping functions to import the annotation from one language to another. An import validation interface is also offered for the manual validation of the resulted resource. As an example, the case of semantic role import from the English FrameNet to Romanian is discussed.
- Trandabăţ D. (2011) Extracting Semantic Information from Texts, in Proceedings of the 13th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing, SYNASC2011, Timisoara, Romania, ISBN 978-0-7695-4630-8, pp. 199-206.
[bib] [abstract] [ieee]
The system presented in this paper creates a map of semantic information about a specific named entity (Person, Organization, etc.). Thus, after the user specifies a named entity, the system searches on the web and returns the first 200 web pages containing the specified entity, applies semantic roles on the returned paragraphs, and extracts a map of related actions involving the searched entity. This map of actions can then be chronologically ordered, thus illustrating the actions a certain entity has performed in a specific time frame (or at least the way it is reflected by the online world).
- Trandabăţ D. (2011) Using semantic roles to improve summaries, in Proceedings of the 13th European Workshop on Natural Language Generation ENLG2011, Nancy, France, pp. 164-169.
[bib] [abstract]
This paper describes preliminary analysis on the influence of the semantic roles in summary generation. The proposed method involves three steps: first, the named entities in the original text are identified using a named entity recognizer; secondly, the sentences are parsed and semantic roles are extracted; thirdly, selection of the sentences containing specific semantic roles for the most relevant entities in text. Although the method is language independent, in order to check its viability, we tested the proposed approach for Romanian summaries.
- Curteanu, N., Trandabăţ D., Bolea, C. (2011) Integrating Contrastive Focus with Givenness and Topic-Comment: A Hierarchical Focus Architecture of the Romanian Discourse-Prosody Interface, in Proceedings of the 6th International Conference Speech Technology and Human-Computer Dialogue "SpeD 2011" Brasov, Romania, May 18-21, 2011, ISBN 978-1-4577-0441-3, IEEE Catalog Number CFP1155H-ART.
[bib] [abstract] [ieee] [scopus]
This paper presents the following results: (a) On the basis of an extensive overview of the currently Information Structure (IS) theories, the first goal of our paper is to update the IS terminology for the three important IS dimensions: Givenness, Background-Focus (also referred as Topic-Focus or Backgroud-Kontrast), and Topic-Comment (also Theme-Rheme). (b) We propose an intonational discourse-level hierarchy among the Contrastive Focus (First Occurrence Focus), Second Occurrence Focus, Informational (Discourse-New) Focus, and Deaccented (Discourse-Given) Focus, while the phonetic properties of the considered intonational inequalities remain to be statistically established and weighted through speech analysis for Romanian. (c) This discourse-level prosodic hierarchy is combined, in a separate and independent way, with the clause- and phrase-level intonational hierarchies driven by Sentence Accent Assignment Rules, Nuclear Stress Rule, and the more recently Sentence Break Assignment Rules. (d) Based on the intonational focus hierarchies at points (b) and (c) above, a new architecture for the Discourse-Prosody interface is outlined, aiming to replace the previous approaches of Topic-Focus and Theme-Rheme algorithms (which can provide only incomplete information) for prosody prediction of Romanian. (e) The notions of explicit and implicit contrastive focus are defined, and the meaningful relevance of the contrastive intonation for the Romanian finite-clauses is pointed out by significant percentages of the phenomena on George Orwell's "1984" corpus. (f) Classes of examples illustrate and evaluate, for Romanian, the intonational-prosodic patterns of the contrastive and non-contrastive focus markers, categories, and domains.
- Curteanu, N., Trandabăţ D., Bolea, C. (2011) Hierarchical Contrastive Focus Applied on the Romanian Discourse-Prosody Interface, in Proceedings of IIS 2011 - International Workshop on Intelligent Information Systems, September 13-14, 2011, Chisinau, Republic of Moldova.
[bib] [abstract]
Classical methods of Information Structure computing, such as Topic-Focus Articulation, Theme-Rheme algorithms, or Segmented Discourse Representation Theory, have been considered for the intonational focus assignment on the Romanian discourse-prosody interface. The present paper proposes a more general, contrastivity-oriented architecture: language-dependent, specifically weighted, pre-established hierarchies of contrastive and non-contrastive intonational foci and breaks are coupled on discourse and clause-level textual structures and entities, aiming to merge and compete the classical approaches for an improved prosody prediction of Romanian.
- Iftene, A., Trandabăţ, D., Toader M., Corîci, M. (2011) Named Entity Recognition for Romanian in Studia Universitatis, Babes Bolyai University Publishing House, Volume LVI, Number 2, pp. 19-24, Proceedings of KEPT2011, Cluj-Napoca, Romania, July, 4-6, 2011.
[bib] [abstract] [pdf]
This paper presents a Named Entity Recognition system for Romanian, created using linguistic grammar-based techniques and a set of resources. Our system's architecture is based on two modules, the named entity identification and the named entity classification module. After the named entity candidates are marked for each input text, each candidate is classified into one of the considered categories, such as Person, Organization, Place, Country, etc. The system's Upper Bound and its performance in real context are evaluated for each of the two modules (identification and classification) and for each named entity type. The evaluation show promising results, our system being comparable with the existing systems for Romanian, and even better for Person recognition.
- Iftene, A., Gînsca, A. L., Moruz A., Trandabăţ, D., Husarciuc M. (2011) Question Answering for Machine Reading Evaluation on Romanian and English Languages in Notebook Paper for the CLEF 2011 LABs Workshop, Amsterdam, Netherlands, 19-22 September 2011, ISBN 978-88-904810-1-7, ISSN 2038-4726.
[bib] [abstract] [pdf] [dblp]
This paper describes UAIC1's Question Answering for Machine Reading Evaluation systems participating in the QA4MRE 2011 evaluation task. The system is designed to extract knowledge from large volumes of text and to use this knowledge to answer questions in Romanian and English monolingual tasks. Our systems were built on the architecture of a Question Answering system, customized for this new task. Thus, the new system used from our previous question answering systems the question processing and information retrieval components, adapted for new requests. Additionally, a new component was added in order to detect the most probable answer of a question, from a list of possible answers.
- Gînsca, A. L., Boroş, E., Iftene, A., Trandabăţ, D., Toader, M., Corîci, M., Perez, C. A., Cristea, D. (2011) Sentimatrix - Multilingual Sentiment Analysis Service, In Proceedings of the 2nd Workshop on Computational Approaches to Subjectivity and Sentiment Analysis (ACL-WASSA2011), Portland, Oregon, USA, June 19-24, 2011, ISBN-13 9781937284060, pp. 189-195.
[bib] [abstract] [pdf]
This paper describes the preliminary results of a system for extracting sentiment and named entities. It also combines rule-based classification, statistics and machine learning in a new method. The accuracy and speed of extraction and classification are crucial. The service oriented architecture permits the end-user to work with a flexible interface in order to produce applications that range from aggregating consumer feedback on commercial products to measuring public opinion on political issues, from blog and forums. The experiment has two versions available for testing, one with concrete extraction results and sentiment calculus and the one with internal metrics validation results.
Back to top
- Trandabăţ D. Natural Language Processing Using Semantic Frames, PhD Thesis, University Al. I. Cuza Iasi, Romania.
- Trandabăţ D., Cristea D. (2010) Developing a semantic role parser for Romanian (in Romanian), Adrian Iftene, Dan Cristea and Dan Tufiș (Eds.), Proceedings of ConsILR 2010, Univ. Al. I. Cuza Press, Bucharest, Romania, May 2010, ISSN 1843-911X, pp. 203-212.
[bib] [abstract]
Semantic parsing, by identifying and classifying semantic entities in context, as well as the relations between them, has a great potential for applications such as text summarization, question-answering or machine translation. Thus, by developing a system that automatically annotates semantic roles for the Romanian language, this paper represents an important intermediate step towards autom atic natural language understanding. For the creation of the semantic role parser for Romanian, it was first necessary to develop a training corpus, annotated with semantic roles. This annotated resource of semantic roles for the Romanian language was created using as starting point a resource developed for English (FrameNet), and applying an automatic transfer method. Subsequently, using a platform for developing supervised semantic role labeling systems (PASRL), several learning algorithms were trained on the developed corpus and the best obtained model was saved. We discuss the results of applying this technique for Romanian, with the conviction that other languages could also benefit form using the same approach.
- Trandabăţ D. (2010) Telling Computers When and How to Adapt Processes, Annals of DAAAM for 2010 & Proceedings of the 21st International DAAAM Symposium, Katalinic, B. (Ed.), Published by DAAAM International Vienna, 20-23rd October 2010, Zadar, Croatia, ISSN 1726-9679, ISBN 978-3-901509-73-5, pp. 1483-1484.
[bib] [abstract]
The next step in human-computer interaction is the use of human languages instead of some pre-defined commands. This paper intends to propose the use of natural language technologies in the management of industrial process flow. In order to make computers understand humans, language models need to be created from knowledge bases. We used such a knowledge base for the creation of a tool designed to analyse natural language semantics and determine who must perform what task where when etc. and we have the conviction that it can be used to make machines understand human when commanding the pipelining of processes.
- Iftene A., Trandabăţ D. (2010) Question -Answering System (in Romanian), presentation at "Bring IT On!", Workshop for prospecting the connections between computer science research and industry, December 10th, 2010.
- Iftene A., Trandabăţ D., Husarciuc M., Moruz A. (2010) Question Answering on Romanian, English and French Languages, in CLEF 2010 LABs and Workshops, Notebook Papers, 22-23 September 2010, Padua, Italy, ISBN 978-88-904810-0-0.
[bib] [abstract] [pdf] [dblp]
This paper describes UAIC's Question Answering systems participating in the ResPubliQA 2010 competition, designed to answer questions on a juridical corpora in Romanian, English and French monolingual tasks. Our systems adhere to the classical architecture of a Question Answering system, with an emphasis on simplicity and real time answers: only shallow parsing was used for question processing, the indexes for the retrieval module were built at coarse-grained paragraph level, and the answer extraction component used simple pattern-based rules and lexical similarity metrics for candidate answer ranking.
- Iftene A., Trandabăţ D., Moruz A., Pistol I., Husarciuc M., Cristea D. (2010) Question Answering on English and Romanian Languages, in Multilingual Information Access Evaluation Vol. I Text Retrieval Experiments, LNCS Series, Eds. Carol Peters, Giorgio Di Nunzio, Mikko Kurimo, Thomas Mandl, Djamel Mostefa, Anselmo Penas, Giovanna Roda, 6241 LNCS , pp. 229-236, Springer, 2010.
[bib] [abstract] [pdf] [dblp] [scopus]
2009 marked UAIC1's fourth consecutive participation at the QA@CLEF competition, with continually improving results. This paper describes UAIC's QA systems participating in the Ro-Ro and En-En tasks. Both systems adhered to the classical QA architecture, with an emphasis on simplicity and real time answers: only shallow parsing was used for question processing, the indexes used by the retrieval module were at coarse-grained paragraph and document levels, and the answer extraction component used simple pattern-based rules and lexical similarity metrics for candidate answer ranking. The results obtained for this year's participation were greatly improved from those of our team's previous participations, with an accuracy of 54% on the EN-EN task and 47% on the RO-RO task.
- Curteanu, N., Moruz, A., Trandabăţ D. An Optimal and Portable Parsing Method for Romanian, French, and German Large Dictionaries Proceedings of II Workshop CogAlex Cognitive Aspects of the Lexicon: Enhancing the Structure, Indexes and Entry Points of Electronic Dictionaries, COLING 2010, Beijing, China, August 2010.
[bib] [abstract] [pdf]
This paper presents a cross-linguistic analysis of the largest dictionaries currently existing for Romanian, French, and German, and a new, robust and portable method for Dictionary Entry Parsing (DEP), based on Segmentation-Cohesion-Dependency (SCD) configurations. The SCD configurations are applied successively on each dictionary entry to identify its lexicographic segments (the first SCD configuration), to extract its sense tree (the second configuration), and to parse its atomic sense definitions (the third one). Using previous results on DLR (The Romanian Thesaurus - new format), the present paper adapts and applies the SCD-based technology to other four large and complex thesauri: DAR (The Romanian Thesaurus - old format), TLF (Le Trésor de la Langue Française), DWB (Deutsches Woerterbuch - GRIMM), and GWB (Goethe-Woerterbuch). This experiment is illustrated on significantly large parsed entries of these thesauri, and proved the following features: (1) the SCD-based method is a completely formal grammar-free approach for dictionary parsing, with efficient (weeks-time adaptable) modeling through sense hierarchies and parsing portability for a new dictionary. (2) SCD-configurations separate and run sequentially and independently the processes of lexicographic segment recognition, sense tree extraction, and atomic definition parsing. (3) The whole DEP process with SCD-configurations is optimal. (4) SCD-configurations, through sense marker classes and their dependency hypergraphs, offer an unique instrument of lexicon construction comparison, sense concept design and DEP standardization.
- Curteanu, N., Trandabăţ D., Moruz, M. A. (2010) Lexical semantics modeling and robust parsing for Romanian, French and German Thesauri (in Romanian), Adrian Iftene, Dan Cristea and Dan Tufiș (Eds.), Proceedings of ConsILR 2010, Univ. Al. I. Cuza Press, Bucharest, Romania, May 2010, ISSN 1843-911X, pp. 113-122.
[bib] [abstract]
This paper presents a lexical-semantics cross-linguistic analysis of the largest thesauri currently existing for Romanian, French, and German, and a new, robust and portable method for their entry parsing, based on the technique of Segmentation-Cohesion-Dependency (SCD) configurations. The general idea behind parsing a thesaurus or dictionary is to transform a raw text dictionary entry into an indexable linguistic database of lexical-semantics (sense) trees. The SCD parsing configurations are applied successively on a thesaurus entry in order to identify its lexicographic segments (the first SCD configuration), to extract the tree of its senses and subsenses (the second one), and to parse its atomic and non-atomic sense definitions (the third one). Using previous results on DLR (The Romanian Thesaurus - new format), the present paper adapts and applies the SCD-based technology to other four large and complex thesauri: DAR (The Romanian Thesaurus - old format), TLF (Le Trésor de la Langue Française), DWB (Deutsches Wörterbuch - GRIMM), and GWB (Göthe-Wörterbuch). This experiment, which was illustrated on significantly large parsed entries of the mentioned thesauri, proved the main achievements of the parsing technique based on SCD-configurations: efficiency, robustness, portability. These qualities, when compared with the classical methods for dictionary entry parsing, derive mainly from at least two SCD-specific facts: the sense tree extraction is performed on the sense marker sequences exclusively, while the processes of sense tree extraction and (atomic) sense definition parsing are shown to be completely separable.
Back to top
- Curteanu, N., Trandabăţ, D., Moruz, A. (2009) Expanding Topic-Focus Articulation with Boundary and Accent Assignment Rules for Romanian Sentence, Vaclav Matousek, Pavel Mautner et al.(Eds.), Text, Speech and Dialogue. Proceedings of the 12th International Conference TSD 2009, Plzen, Czech Republic, September 2009, Lecture Notes in Computer Science, vol. 5729 , pp. 226-233, ISBN 978-3-540-74627-0, ISSN 0302-9743 (Print) 1611-3349 (Online), pp. 226-233.
[bib] [abstract] [dblp] [scopus]
The present paper, maintaining the interest for applying Prague School's Topic-Focus Articulation (TFA) algorithm to Romanian, takes the advantage of an experiment of investigating the intonational focus assignment to the Romanian sentence. Using two lines of research in a previous study, it has been showed that TFA behaves better than an inter-clausal selection procedure for assigning pitch accents to the Background-Kontrast (Topic-Focus) entities, while the Inference Boundary algorithm for computing the Theme-Rheme is more reliable and easier extendable towards boundary and contour tone assignment rules, leading to our novel proposal of Sentence Boundary Assignment Rules (SBAR). The main contributions of this paper are: (a) The TFA algorithm applied for Romanian is extended to inter-clause level, and embedded into a discursive approach for computing the Background-Kontrast entities. (b) Inference Boundary algorithm, applied to Romanian for clause-level Theme-Rheme span computing, is extended with a set of SBAR rules, which are relying on different Communicative Dynamism (CD) degrees of the clause constituents. (c) On each intonational unit, the extended TFA algorithm is further refined, in order to eliminate ambiguities, with a filter derived from Gussenhoven's SAAR (Sentence Accent Assignment Rule). Remarkably, TFA and CD degrees proved again to be resourceful, especially as technical procedures for a new and systematic development of an Information Structure Discourse Theory (ISDT).
- Curteanu, N., Trandabăţ, D., Moruz, A. (2009) Discourse Theories vs. Topic-Focus Articulation Applied to Prosodic Focus Assignment in Romanian, Speech Technology and Human-Computer Dialogue 2009, SpeD '09, Proceedings of the 5th Conference Speech Technology and Human Computer Dialogue "SpeD 2009", Corneliu Burileanu and Horia-Nicolai Teodorescu Eds., Publishing House of the Romanian Academy, Constanta, Romania, June, 18-21, 2009, pp. 135-148, IEEE catalog number is CFP0955H-CDR.
[bib] [pdf] [abstract] [scopus] [ieee]
"...it is not the argument structure that triggers the intonational phrasing, but the [discourse, subclausal n.b.] relation of backgrounding." (K. von Heusinger, 2007) Is it? If yes, how? The present paper, maintaining our attempts for applying Prague School's Topic-Focus Articulation (TFA) algorithm on the syntax-prosody interface of Romanian, proposes two comparative lines of investigation for the intonational focus assignment: (A) The TFA algorithm is improved at clause level with hints from Gussenhoven's SAAR (Sentence Accent Assignment Rule), then extended to inter-clause level, i.e. complex sentences. The new shape of the TFA algorithm is applied to compute the Topic-Focus values in the discursive context, while the information-structural (IS) spans of Theme(s)-Rheme(s) are detached, at clause level, as lowest-highest degrees of Communicative Dynamism (CD) vs. Systemic Order (SO). (B) The second approach we experiment for assigning intonational focus and phrasing is based on the combined and intensive use of discourse theories for computing the IS categories and structures: the Background-Kontrast entities (associated with the Prague School's Topic-Focus) are obtained with Asher's (1993) Segmented Discourse Representation Theory (SDRT) analysis, while Theme-Rheme structures within the finite clause are computed with Leong's (2004) Inference-Boundary (IB) algorithm (of Hallidayan inspiration), applied for the first time to Romanian. Furthermore, this second direction is inspired from and joins the IS-discourse theory proposed by Heusinger (2007), which relies on SDRT inter-clausal evolution of discourse variables for computing the Background-Kontrast. While maintaining the classical SDRT (including rhetorical) discourse relations at the inter-clause level, Heusinger introduces a set of IS-semantics relations, and hands down at sub-clause level the rhetorical and focus particle relations with significant role in intonational-prosodic phrasing. Examples of these two types of research are compared to a gold, intonationally annotated set of Romanian sentences, the proposed theoretical and procedural techniques aiming to balance the pessimistic-realistic view on prosody prediction that it is the speaker-presupposition (and hearer-accommodation) which determines the IS focal scopes, rather than the bare text.
- Iftene, A., Trandabăţ, D., Pistol, I., Moruz, A., Husarciuc, M., Cristea, D. (2009) UAIC Participation at QA@CLEF2008, Evaluating Systems for Multilingual and Multimodal Information Access, Lecture Notes in Computer Science, vol. 5706/2009, pp. 385-392, ISBN 978-3-540-74998-1, ISSN 0302-9743 (Print) 1611-3349 (Online).
[bib] [pdf] [abstract] [dblp] [scopus] [isi]
2008 marked UAIC's third consecutive participation at the QA@CLEF competition, with continually improving results. The most significant change to our system with regards to last year is the partial transition to a real-time QA system, as a consequence of the simplification or elimination of the main time-consuming tasks such as linguistic pre-processing. A brief description of our system and an analysis of the errors introduced by each module are given in this paper.
- Iftene, A., Trandabăţ, D. (2009) Recovering Diacritics using Wikipedia and Google, in Studia Universitatis Babes Bolyai University Publishing House, Volume LIV, Special Issue KEPT-2009: Knowledge Engineering: Principles and Techniques, July, 2-3, 2009, pp 37-40.
[bib] [abstract]
The paper presents a method to restore diacritics using web contexts. The system receives one or more sentences in one language and uses the Google engine to recover diacritics for the sentence words. The system accuracy is similar to the accuracy of existing systems, but the main advantage comes from fact that it uses resource and tools available for free or that are easy to obtain for other languages, leading us to believe that this approach could be valid for more languages.
- Iftene, A., Trandabăţ, D., Pistol, I., Moruz, A.M., Husarciuc, M., Sterpu, M. and Turliuc, C. (2009) Question Answering on English and Romanian Languages, Working Notes for the CLEF 2009 Workshop, Corfu, Greece, 30 Sept. - 2 Oct. 2009.
[bib] [pdf] [abstract]
This year marked UAIC's fourth consecutive participation at the QA@CLEF competition, with continually improving results. This year we participated successfully both in Ro-Ro task and in En-En task pf the ResPubliQA track. A brief description of our system is given in this paper.
- Curteanu, N., Moruz, A., Trandabăţ, D., Bolea, C., Spătaru, M., Husarciuc, M. (2009) Sense tree parsing and definition segmentation in eDTLR Thesaurus, (in Romanian), Trandabăţ, D., Tufis, D., Cristea, D. (Eds.), Proceedings of the Workshop "Linguistic Resources and Instruments for Romanian Language Processing", Iasi, Romania, November, 19-21, 2008, "Al.I.Cuza" University Publishing House, ISSN 1843-911X, pp. 65-74.
Back to top
- Curteanu, N., Moruz, A., Trandabăţ, D. (2008) Extracting Sense Trees from the Romanian Thesaurus by Sense Segmentation & Dependency Parsing, Proceedings of CogAlex Cognitive Aspects of the Lexicon: Enhancing the Structure, Indexes and Entry Points of Electronic Dictionaries, COLING 2008 Workshop endorsed by SIGLEX, pp. 55 - 63, ISBN 978-1-905593-56-9.
[bib] [pdf] [abstract]
This paper aims to introduce a new parsing strategy for large dictionary (thesauri) parsing, called Dictionary Sense Segmentation and Dependency (DSSD), devoted to obtain the sense tree, i.e. the hierarchy of the defined meanings, for a dictionary entry. The real novelty of the proposed approach is that, contrary to dictionary 'standard' parsing, DSSD looks for and succeeds to separate the two essential processes within a dictionary entry parsing: sense tree construction and sense definition parsing. The key tools to accomplish the task of (autonomous) sense tree building consist in defining the dictionary sense marker classes, establishing a tree-like hierarchy of these classes, and using a proper searching procedure of sense markers within the DSSD parsing algorithm. A similar but more general approach, using the same techniques and data structures for (Romanian) free text parsing is SCD (Segmentation-Cohesion-Dependency), which DSSD is inspired from. A DSSD-based parser is implemented in Java, building currently 91% correct sense trees from DTLR (Dicţionarul Tezaur al Limbii Române - Romanian Language Thesaurus) entries, with significant resources to improve and enlarge the DTLR lexical semantics analysis.
- Curteanu, N., Trandabăţ, D., Moruz, A., Pavel, G., Verestiuc, C. (2008) Dictionary Sense Segmentation & Dependency - A Flexible Strategy for Thesauri Parsing, Proceedings of 5th European Conference on Intelligent Systems and Technologies ECIT 2008, CD edition.
[bib] [The Coling Wksh paper is an improved version of this paper. See above.]
- Iftene, A., Pistol, I., Trandabăţ, D. (2008) Grammar-based Automatic Extraction of Definitions, Proceedings of 10th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC 2008), Timisoara, România, September, 26-29, 2008.
[bib] [pdf] [abstract] [scopus] [ieee] [dblp]
The paper describes the creation a grammar developed to automatically extract definitions from web documents. Three evaluation scenarios were run, the results of these experiments being the main focus of the paper. The results are convincing, so that further development as well as further integration of the definition extraction system in various related applications are already under way.
- Iftene, A., Pistol, I., Trandabăţ, D. (2008) UAIC Participation at QA@CLEF2008, Working Notes for the CLEF 2008 Workshop, Aarhus, Denmark, September, 17-19, 2008.
[bib] [pdf] [abstract]
This year marked UAIC's third consecutive participation at the QA@CLEF competition, with continually improving results. The most significant changes to our system with regards to last year is the partial transition to a real-time QA system, consequences being the simplification or elimination of principal time-consuming tasks such as linguistic pre-processing. A brief description of our system and an analysis of the errors introduced by each module are described in this paper.
- Iftene, A., Trandabăţ, D., Pistol, I., Moruz, A., Balahur-Dobrescu, A., Cotelea, D., Dornescu, I., Draghici, I., Cristea, D. (2008) UAIC Romanian Question Answering system for QA@CLEF, Advances in Multilingual and Multimodal Information Retrieval, Lecture Notes in Computer Science, 5152/2008, pp. 336-343, ISBN 978-3-540-74627-0, ISSN 0302-9743 (Print) 1611-3349 (Online).
[bib] [pdf] [abstract] [dblp] [scopus] [isi]
This paper briefly describes UAIC's participation in this year's CLEF question answering competition, focusing on the main challenges and changes compared to our last year participation. An analysis of the errors introduced by each module is also discussed.
- Teodorescu, H., Trandabăţ, D. (2008) The Prosody of the Double-Subject Sentences in Romanian, Revue Roumaine de Linguistique, no. 4/2008, ISSN: 0035-3957.
[bib] [pdf] [abstract] [isi]
The double subject in Romanian sentences is a controversial linguistic phenomenon. While some researchers accept it as a language 'curiosity', others consider it apposition, in order to embody its behaviour in the already existing theories. We present the first study in the international literature on the phonetic analysis of double-subject sentences.
- Trandabăţ, D., Husarciuc, M. (2008) Romanian Semantic Role Resource, Proceedings of the Sixth International Language Resources and Evaluation (LREC'08), Marrakech, Morocco, May, 28 - 30, 2008, ISBN 2-9517408-4-0, pp. 2806 - 2810.
[bib] [pdf] [abstract] [dblp]
Semantic databases are a stable starting point in developing knowledge based systems. Since creating language resources demands many temporal, financial and human resources, a possible solution could be the import of a resource annotation from one language to another. This paper presents the creation of a semantic role database for Romanian, starting from the English FrameNet semantic resource. The intuition behind the import program is that most of the frames defined in the English FN are likely to be valid cross-lingual, since semantic frames express conceptual structures, language independent at the deep structure level. The surface level is realized according to each language's syntactic constraints. In the paper we present the advantages of choosing to import of the English FrameNet annotation, instead of annotating a new corpus. We also take into account the mismatches encountered in the validation process. The rules created to manage particular situations are used to improve the import program. We believe the information and argumentations in this paper could be of interest for those who wish develop FrameNet-like systems for other languages.
Back to top
- Curteanu, N., Trandabăţ, D. Moruz, M. A. (2007) Functional FX bar Projections of the (Romanian) Verbal Group and Sub-Groups on the Syntactic-Semantic Interface, Journal of Computer Science of Moldova, Academy of Science of Moldova, Institute of Mathematics and Computer Science, volume 15, no. 2(44)/2007, pp 123 - 152.
[bib] [pdf] [abstract] [dblp]
The aim of this paper is to investigate the syntactic / semantic substructures (called subgroups) of the Romanian verbal group (VG), or verbal complex, starting with the achievements in the literature, and melted into the device of direct and inverse functional projection within FX-bar theory. The paper examines several problems and their solutions for the syntactic-semantic theories of VG, as discussed in some fundamental papers, and we offer our explanation on the involved syntactic phenomena, the emphasis falling on the VG substructures (verbal subgroups, VSGs), VSG boundaries and composition within VG, direct and inverse FX-bar projections of VG, VG parsing, lexical semantics and intensional / extensional logic representations of the Romanian (verbal or nominal) predicate.
- Curteanu, N., Trandabăţ, D. (2007) Functional (F)X bar Projections for Local and Global Text Structures. The Anatomy of Predication, Revue Roumaine de Linguistique, no. 1-2 / 2007, ISSN: 0035-3957.
[bib] [pdf] [abstract]
This paper proposes and discusses issues that are connected to local and global text structures, all of them being associated in a way or another to the concept of predication. Outcome of various stages of evolution (including our own) of X bar theory, the present work comprises the following topics: (a) A novel functional X bar (FX bar) scheme is proposed, aiming to reveal, model and relate the local, clause-level markers and text structures. (b) At global level, two FX bar schemes are proposed, one pursuing the inter-clause level relations (of logical, syntactic, and second-order theta-semantics nature), and the other being of discursive, rhetorical nature. (c) There are described local and global classes of markers, to be incorporated on the projection levels of FX bar schemes and within SCD (Segmentation-Cohesion-Dependency) linguistic strategy algorithms, together with the graph-based hierarchy among these classes. (d) The concept of functional generativity is discussed, with implications on parsing algorithm classification and FX bar projection mechanism. (e) Local FX bar projection functions have at their core the notion of lexical predication. Direct (towards clause) and inverse (towards lexicon) FX bar projections of the verbal group (verbal complex) are shown to balance between syntactic and semantic diatheses, taking steps for a better understanding of predicate and predication anatomies. (f) Finally, direct and inverse global FX bar projections mediate between larger text spans and inter-clause vs. discourse trees, the intricate relationship between the finite clause and (sub-clause and multi-clause) discourse segment being highlighted.
- Curteanu, N., Trandabăţ, D., Moruz, M. A. (2007) Syntax-Prosody Interface for Romanian within Information Structure Theories, in Burileanu, C., Teodorescu, H-N (Eds.) Advances in Spoken Language Technology, Romanian Academy Publishing House, Iasi, Romania, 2007, pp. 217-228, ISBN 978-973-27-1516-1
[bib] [abstract]
The following main ideas have been pointed out and put to work within our paper: (a) Information Structure (IS) theories (topic-focus, theme-rheme, background-contrast, informational and contrastive focus, focus projection rules, etc.) on text are shown to behave currently as a consistent linguistic tool that can stand behind a correct, language and contextual-depending, mapping of text into speech discourse, on the basis of local-global, textual-intonational structures and markers. (b) Functional and logical-semantics (including pragmatics) approaches for IS modelling of the local and global syntax and discourse, coming from the text side, and reliable intonational units within coherent theories on spoken discourse, coming from the prosody side, are needed. We outline here the SCD (Segmentation-Cohesion-Dependency) and FDG (Functional Dependency Grammar) parsing strategies and linguistic theories. (c) Concerning the IS semantics, this paper takes a first (experimental) step to design an adequate syntax-prosody interface for Romanian, by adapting and applying the Prague School's TFA (Topic-Focus Articulation) algorithm. The contributions we consider worth to be mentioned are: (d1) theoretical and computational local-global parsing frameworks of Romanian, emphasizing the functional and hierarchical organization of linguistic categories and markers: SCD and FDG. (d2) outline of the relationship between textual local-global syntactic-discourse structures and prosodic intonational units; (d3) implementing for the first time (to our knowledge) the TFA algorithm to Romanian.
- Curteanu, N., Trandabăţ, D., Moruz, M. A. (2007) Topic-Focus Articulation Algorithm on the Syntax-Prosody Interface of Romanian. 10th International Conference on Text, Speech and Dialogue, TSD 2007, Pilsen, Cehia, September, 3-7, 2007, Lecture Notes in Computer Science, vol. 4629/2007, pp. 516-523, ISBN 978-3-540-74627-0, ISSN 0302-9743 (Print) 1611-3349 (Online).
[bib] [pdf] [abstract] [dblp] [scopus] [isi]
We propose in this paper an implementation of the Prague School's TFA (Topic-Focus Articulation) algorithm to support the Romanian prosody design, relying on the experience with FDG (Functional Dependency Grammar) and SCD (Segmentation-Cohesion-Dependency) parsing strategies for the classical, i.e. predication-driven, but Information Structure (IS) non-dependent, syntax. As contributions worth to be mentioned are: (a) Outlining the functional and hierarchical organization of linguistic markers and structures within SCD and FDG local-global parsing, on both sides of the syntax-prosody interface of Romanian. (b) Pointing out the relationship between classical (IS-free) syntactic structures, IS (topic-focus, communicative dynamism) depending textual spans, and the corresponding prosodic intonational units. (c) Adapting andimplementing the TFA algorithm for the first time to Romanian prosodic structures, to be continued with TFA sentence-level refinements, its rhetorical-level extension, and embedding into local-global linking algorithms.
- Iftene, A., Trandabăţ, D., Pistol, I. (2007) Grammar-based Automatic Extraction of Definitions and Applications for Romanian, Proceedings of the Intl. Workshop on Natural Language Processing and Knowledge Representation for eLearning Environments held in conjunction with the Intl. Conf. RANLP '2007, September, 26, 2007 Borovets, Bulgaria, pp. 19-26, ISBN 978-954-452-002-1.
[bib] [pdf] [abstract]
This paper presents part of our work in the LT4eL project regarding the grammar developed by the Romanian team in order to extract definitions from texts. Some qualitative results come in order to evaluate our grammar rules. Among the applications of this kind of grammar we will discuss the possible inclusion of the grammar rules into a question answering system in order to extract answers for definition type questions. Another possible usage of those rules envisages the extraction of supplementary knowledge from linguistic resources like Wikipedia. The benefits of such an extra-knowledge resource are evident in textual entailment systems, where some resources like WordNet, Acronyms database or Dirt cannot cover all the requirements of the system.
- Puşcasu, G., Iftene, A., Pistol, I., Trandabăţ, D., Tufiş, D., Ceauşu, A., Ştefanescu, D., Ion, R., Dornescu, I., Moruz, A., Cristea, D. (2007) Cross-Lingual Romanian to English Question Answering at CLEF 2006. Evaluation of Multilingual and Multi-modal Information Retrieval, Lecture Notes in Computer Science vol. 4730/2007, pp. 385-394, ISBN 978-3-540-74998-1, ISSN 0302-9743 (Print) 1611-3349 (Online).
[bib] [pdf] [abstract] [dblp] [scopus] [isi]
This paper describes the development of a Question Answering (QA) system and its evaluation results in the Romanian-English cross-lingual track organized as part of the CLEF 2006 campaign. The development stages of the cross-lingual Question Answering system are described incrementally throughout the paper, at the same time pinpointing the problems that occurred and the way they were addressed. The system adheres to the classical architecture for QA systems, debuting with question processing followed, after term translation, by information retrieval and answer extraction. Besides the common QA difficulties, the track posed some specific problems, such as the lack of a reliable translation engine from Romanian into English, and the need to evaluate each module individually for a better insight into the system's failures.
- Teodorescu, H.-N., Trandabăţ, D. (2007) Appositions Versus Double Subject Sentences - What Information the Speech Analysis Brings to a Grammar Debate. 10th International Conference on Text, Speech and Dialogue, TSD 2007, Pilsen, Czech Republic, September, 3-7, 2007, Lecture Notes in Computer Science, vol. 4629/2007, pp. 286-293, ISBN 978-3-540-74627-0, ISSN 0302-9743 (Print) 1611-3349 (Online).
[bib] [pdf] [abstract] [dblp] [scopus] [isi]
We propose a method based on spoken language analysis to deal with controversial syntactic issues; we apply the method to the problem of the double subject sentences in the Romanian language. The double subject construction is a controversial linguistic phenomenon in Romanian. While some researchers accept it as a language 'curiosity' (specific only to the Asian languages, but not to the European ones), others consider it apposition-type structure, in order to embody its behaviour in the already existing theories. This paper brings a fresh gleam of light over the debate, by presenting what we believe to be the first study on the phonetic analysis of double-subject sentences in order to account for its difference vs. the appositional constructions.
- Teodorescu, H.N., Trandabăţ, D., Feraru, M., Zbancioc, M., Luca, R. (2007) A Corpus of the Sounds in the Romanian Spoken Language for Language-Related Education, International Conference on Human and Material Resources in Foreign Language Learning - HMRFLL 2006, Murcia, Spain, Carlos Perinan Pasqual (Eds.), Revisiting Language Learning Resources, Cambridge Scholars Publishing (CSP), UK, ISBN 1-84718-156-2, Chapter Six, pp. 73-89, 2007.
[bib] [abstract]
We present an initiative to develop an educational and scientific, Internet-based, freely accessible spoken language resource for the sounds and words of the Romanian language. The database contains several versions of all the basic sounds of the Romanian language, moreover sets of words. This language resource is suitable for education, moreover for statistical research on the Romanian spoken language, and for testing and benchmark purposes. The corpus is extensively documented, in order to allow both scientific research and informed education based on it. This corpus, currently located at the address, can be used for purposes such as analysis of sounds, analysis of specificities of the Romanian language pronunciation compared to other languages, and Romanian language learning aided by computer. The resource was created by the cooperation of teachers, researchers and Ph.D. students from three academic institutions.
- Teodorescu, H-N, Feraru, M., Trandabăţ, D. (2007) Studies on the Prosody of the Romanian Language: The Emotional Prosody and the Prosody of Double-Subject Sentences, Burileanu, C., Teodorescu, H-N (Eds.) Advances in Spoken Language Technology, The Publishing House of the Romanian Academy, Bucharest, Romania, 2007, ISBN 978-973-27-1516-1, pp. 171-182.
[bib] [abstract]
We present a study of the prosody - seen in a broader sense - that supports the theory of the interrelationship function of speech. Both "pure emotions" and specific syntactical constructions, like the double-subject propositions, are meant to show a relationship of the speaker with the general context. The analysis goes beyond the basic prosody, as related to pitch values, pitch trajectory, sound duration and pauses; the analysis also aims to determine the change in higher formants. The refinement of the analysis asks for finer tools. Methodological aspects are discussed, including limitations of the currently available tools.
- Trandabăţ, D., Moruz, M., Dornescu, I., Curteanu, N., Bolea, C. (2007) Topic-Focus Subjectivity within Verbal Complex Substructures on the Romanian Syntax-Prosody Interface, Proceedings of the Workshop on Applications of Semantics, Opinion and Sentiments at the 8th EUROLAN Summer School, July 23 - 27, 2007, Iasi, Romania.
[bib] [pdf] [abstract]
The main contributions enclosed within this paper refer to: (i) A general view on the syntax-prosody interface, the non-isomorphism between the two facets of the language being conjectured as a proper subsumption mapping between syntax and prosody; (ii) Implementation of the Topic-Focus Articulation (TFA) algorithm for the Romanian sentence, with novelties on Intonational TFA and interrogative sentences.; (iii) Devising the verbal group (VG, viz. verbal complex) into verbal subgroup syntactic structures; (iv) TFA algorithm refinement on complex Romanian VGs; (v) Subjectivity / Objectivity discovery based on TFA and communicative dynamism disordering, with consequences for text analysis and prosody design in e learning systems.
- Trandabăţ, D. (2007) Semantic Frames in Romanian Natural Language Processing, Proceedings of the North American Chapter of the Association for Computational Linguistics NAACL-HLT 2007, Companion Volume: Doctoral Consortium, ACL, April 2007, Rochester, New York, USA, pp. 29-32, ISBN 1-932432-92-2.
[bib] [pdf] [abstract] [dblp]
Interests to realize semantic frame databases as a stable starting point in developing semantic knowledge based systems exists for languages such as German (the Salsa project), English (the PropBank project, the FrameNet project), Spain, Japanese, etc. I thus propose to create a semantic frame database for Romanian, similar to the FrameNet database. Since creating language resources demands many temporal, financial and human resources, a possible solution could be the import of standardized annotation of a resource developed for a specific language to other languages. This paper presents such a method for the import of the FrameNet annotation from English to Romanian.
Back to top
- Curteanu, N., Moruz, M., Trandabăţ, D., Bolea, C., Dornescu I. (2006) The Structure and Parsing of Romanian Verbal Group and Predicate, 4th European Conference on Intelligent Systems and Technologies, ECIT 2006, Iasi, Romania, 2006, ISBN 973-730-265-6.
[bib] [abstract]
The aim of this paper is two-fold: (a) to discuss the structure (and substructures) of the verbal complex, or verbal group (VG), investigated within the framework of functional FX bar (projection) theory, with the semantic instruments of intensional / extensional logic, and (b) to expose results on the VG parsing task, embodied into a more general FDG clause-level parser. We provide a unitary taxonomy of the verbal and nominal predicate, based on intensional logic. Both constructions rely on verbal subgroups (VSGs) that a VG is decomposed in, with tense auxiliary, copulative, modal, semi-auxiliary, restructuring head verbs. VSG substructures are (recursively) composing one another, using the predication feature (with valence-arity, argument type-sort, and polymorphism, if necessary) of their lexical and grammatical heads, to obtain complex FX bar projections representing the VG. The general description of the VG parsing program is given, its purpose being to extract from Romanian texts, VGs (or verbal predicates), together with their additional properties such as tense and voice (syntactic diathesis). The program is written in Java, and uses regular expressions for determining the basic and compositional properties of VGs.
- Curteanu, N., Trandabăţ, D., Moruz M. (2006) Substructures of the (Romanian) Predicate and Predication using FX-bar Projection Functions on the Syntactic Interface, 4th European Conference on Intelligent Systems and Technologies, ECIT 2006, Iasi, Romania, 2006, ISBN 973-730-265-6.
[bib] [abstract]
The aim of this paper is to investigate the syntactic / semantic substructures of the Romanian verbal group (VG), or verbal complex, starting from the instruments and arguments in the literature, and melted into the device of (direct and inverse) FX bar projections and theory. The paper will examine several problems and their solutions for the syntactic theories of VG, as discussed in some fundamental papers, and we shall offer our explanation on the involved syntactic phenomena, the emphasis falling on the lexical semantics and intensional / extensional logic representations, our interests being mainly oriented towards VG parsing, VG substructures (verbal subgroups, VSGs), VSG composition and their direct and inverse FX bar projections.
- Curteanu, N., Trandabăţ, D., Moruz, M. (2006) The Structure of the Verbal Group, the Lexical Predication and the Logical Representation of the Predicate in Romanian. (in Romanian), Forascu, C., Tufis, D., Cristea, D. (Eds.), Workshop "Linguistic Resources and Instruments for Romanian Language Processing", Iasi, November, 3-4, 2006, "Al.I.Cuza" University Publishing House, ISBN 978-973-703-208-9, pp. 143-148.
- Feraru, M., Trandabăţ, D. (2006) Towards the Emotional Annotation of a Corpus for the Romanian Spoken Language, 4th European Conference on Intelligent Systems and Technologies, ECIT 2006, Iasi, Romania, 2006, ISBN 973-730-265-6.
[bib] [abstract]
We present the steps of the annotation process for some phrases of the Romanian Spoken Language. The recordings have been made from a set of young subjects and the results are presented.
- Iftene, A., Pistol, I., Trandabăţ, D., Puscasu, G., Forăscu, C., Cristea, D. (2006) Question- Answering Systems for Romanian, (in Romanian), Forascu, C., Tufis, D., Cristea, D. (Eds.), Workshop "Linguistic Resources and Instruments for Romanian Language Processing", Iasi, November, 3-4, 2006, "Al.I.Cuza" University Publishing House, ISBN 978-973-703-208-9, pp. 83-88, Iasi, Romania 2006.
- Moruz, M., Curteanu, N., Trandabăţ, D., Dornescu, I., Bolea C. (2006) Parsing the verbal/nominal predicate and the finite/non-finite clause in Romanian. FDG parsing. (in Romanian), Forascu, C., Tufis, D., Cristea, D. (Eds.), Workshop "Linguistic Resources and Instruments for Romanian Language Processing", Iasi, November, 3-4, 2006, "Al.I.Cuza" University Publishing House, ISBN 978-973-703-208-9, pag. 123-128, Iasi, Romania 2006.
- Pistol, I., Iftene, A., Trandabăţ, D., Cristea, D., Forascu, C. (2006) Processing Romanian Resources within the LT4eL Project, (in Romanian), Forascu, C., Tufis, D., Cristea, D. (Eds.), Workshop "Linguistic Resources and Instruments for Romanian Language Processing", Iasi, November, 3-4, 2006, "Al.I.Cuza" University Publishing House, ISBN 978-973-703-208-9, pp. 129-134, Iasi, Romania 2006.
- Puşcasu, G., Iftene, A., Pistol, I., Trandabăţ, D., Tufiş, D., Ceauşu, Al., Stefanescu, D., Ion, R., Orasan, C., Dornescu, I., Moruz, A., Cristea, D. (2006) Developing a Question Answering System for the Romanian-English Track at CLEF 2006, Working Notes for the Cross Language Evaluation Forum (CLEF 2006), Alicante, Spain
[bib] [pdf] [abstract]
This paper describes the development of a question answering system for the Romanian-English cross-lingual track organized within the Cross Lingual Evaluation Forum (CLEF) 2006 campaign. The development stages of our cross-lingual Question Answering (QA) system are described incrementally throughout the paper, at the same time pinpointing the problems that occurred and the way they were addressed. Our system adheres to the classical architecture for QA systems, debuting with question processing followed, after term translation, by information retrieval and answer extraction. Besides the common QA difficulties, the track posed some specific problems, such as the lack of a reliable translation engine from Romanian to English, and the need to evaluate each module individually for a better insight into the system's failures.
- Teodorescu, H.N., Feraru, M., Trandabăţ, D. (2006) Nonlinear Assessment of the Professional Voice 'Pleasantness', Proceedings of the 18th Biennal International EURASIP Conference BIOSIGNAL 2006, Brno, Check Republic, J.Jan, J.Kozumplik, I. Provaynik (Eds.), Analysis of biomedical signals and images, Vutium Press, pag.63-66, ISSN 1211 - 412X, ISBN 80-214-3152-0, 2006.
[bib] [abstract]
The aim of this paper is to deepen the discussion of the role of shimmer and jitter in the natural speech and to assess the importance of shimmer and jitter for the synthesized speech. Toward this goal, nonlinear analysis methods are applied.
- Teodorescu, H.N., Feraru, M., Trandabăţ, D. (2006) "Spoken Romanian Language" Archive, (in Romanian), Forascu, C., Tufis, D., Cristea, D. (Eds.), Workshop "Linguistic Resources and Instruments for Romanian Language Processing", Iasi, November, 3-4, 2006, "Al.I.Cuza" University Publishing House, ISBN 978-973-703-208-9, pp. 3-8, Iasi, Romania 2006.
- Trandabăţ, D., Iftene A.., Pistol, I., Forascu, C., Cristea, D. (2006) Romanian Resources developed in the LT4eL Project, (in Romanian), Forascu, C., Tufis, D., Cristea, D. (Eds.), Workshop "Linguistic Resources and Instruments for Romanian Language Processing", Iasi, November, 3-4, 2006, "Al.I.Cuza" University Publishing House, ISBN 978-973-703-208-9, pp. 51-56, Iasi, Romania 2006.
Back to top
- Husarciuc, M, Trandabăţ, D., Lupu. M. (2005) Inferring Rules In Importing Semantic Frames From English FrameNet Onto Romanian FrameNet. 1st ROMANCE FrameNet Workshop, International Workshop held at EUROLAN 2005 Summer School, July, 26 - 28, 2005, University Babes-Bolyai, Cluj-Napoca, Romania, pp.44-49.
- Lupu, M., Trandabăţ, D., Husarciuc M. (2005) A Romanian SemCor Aligned to the English and Italian MultiSemCor, 1st ROMANCE FrameNet Workshop, International Workshop held at EUROLAN 2005 Summer School, July, 26 - 28, 2005, University Babes-Bolyai, Cluj-Napoca, Romania, pp. 20-27.
- Trandabăţ, D., Husarciuc, M., Lupu, M. (2005) Towards an automatic import of English FrameNet frames into the Romanian language, 1st ROMANCE FrameNet Workshop, International Workshop held at EUROLAN 2005 Summer School, July, 26 - 28, 2005, University Babes-Bolyai, Cluj-Napoca, Romania, pp. 28-36.
Back to top
- Cristea, D., Mihaila, C., Forascu, C., Trandabăţ, D., Husarciuc, M., Haja, G., Postolache, O. (2004) Mapping Princeton WordNet synsets onto Romanian WordNet synsets, Romanian Journal on Information Science and Technology, Special Issue on BalkaNet, Romanian Academy, volume 7, Numbers 1-2, 2004, ISSN: 1453-8245.
[bib] [pdf] [abstract]
The paper reports difficulties encountered during the alignment of synsets between English and Romanian. Reasons for these difficulties are inconsistencies found both in Princeton WordNet (we will refer to it from now on with PWN), on one part, and in our sources, on the other part, the difference in criteria based on which senses were recorded in PWN and in our sources, and the inherent differences in the lexicalization of concepts in the two languages.
Back to top
See more about the projects I've been involved in on my Research page!