13 December – ConsILR Day I (all hours are EET = CET+1)
8:30 – 9:00 Connections
9:00 – 9:20 Welcome message
Participants: the ConsILR-2021 organization team and all attendants
9:20 – 10:20 Keynote speech session 1 – chaired by Dan Cristea
Dima Turchyn: From research to applied use cases: how recent developments in NLP and Speech are changing AI services
Abstract: In this session, you will learn how the new generation of ML models is changing what’s possible with AI, and how it transforms into applied services that can be used by organizations to transform their products and processes. You’ll see a demo of some of the state-of-the-art AI services, as well as learn about practical use cases being solved with Speech and Language AI services, based on examples of challenges solved by organizations in our region.
Bio note: 15+ years of experience in Technology industry and Business Development, from setting up company operations and distribution/partners network in the country to building brand perception in the market to managing cross-group, cross-border sales and marketing strategy and execution. Experienced public speaker and technology expert and enthusiast, especially focusing on Artificial Intelligence.
Session 1: Romanian Linguistic Resources – chaired by Diana Trandabăț
10:20 – 10:40 Elena Isabelle Tamba: Romanian Linguistic Research in the Digital Age and European Language Policies
10:40 – 11:00 Verginica Mititelu, Elena Irimia, Vasile Păiș, Maria Mitrofan, Andrei Avram and Eric Curea: Linked Open Data Resources for Romanian
11:00 – 11:20 Oana Niculescu: Developing Linguistic Resources for Romanian Written and Spoken Language
11:20 – 11:40 Cosmina-Maria Berindei: Digital Resources Developed in the Project BIBLIO-MLRom
11:40 – 12:00 Coffee break
Session 2: Curation of Digital Heritage – chaired by Roxana Vieru
12:00 – 12:20 Gabriela Haja and Elena Isabelle Tamba: The Role of the Philologist in the Project Artificial Intelligence Models (Deep Learning) Applied in the Analysis of Old Romanian Language (DeLORo)
12:20 – 12:40 Dan Cristea, Cristian Pădurariu, Andrei Scutelnicu, Petru Rebeja and Mihaela Onofrei: Data Structure and Acquisition in DeLORo – a Technology to Help Researchers Decipher Old Cyrillic-Romanian Documents
12:40 – 13:00 Silviu Ioan Bejinariu, Florin Iftene, Manuela Nevaci and Carmen Irina Floarea: Preservation of Romanian Linguistic Heritage. Framework for Dialectal Data Management
13:00 – 13:20 Mihai Alex Moruz and Mădălina Ungureanu: Old Romanian Lexicons and their Representation in DLR
13:20 – 15:00 Lunch break
15:00 – 16:00 Keynote speech session 2 – chaired by Dan Tufiș
Abstract: Europe is a multilingual society, in which dozens of languages are spoken. The only option to enable and to benefit from multilingualism is through Language Technologies (LT), i.e., Natural Language Processing and Speech Technologies. The presentation will provide an overview of the European Language Grid (ELG), which is targeted to evolve into the primary platform and marketplace for LT in Europe by providing one umbrella platform for the European LT landscape, including research and industry, enabling all stakeholders to upload, share and distribute their services, products and resources. At the end of our EU project, which will establish a legal entity in 2022, the ELG will provide access to more than 750 services for all European languages as well as thousands of data sets; already now the ELG platform contains more than 5000 resources. Furthermore, the presentation will provide an overview of ELG’s sister project, European Language Equality (ELE) as well as preliminary results. This EU project develops a strategic research, innovation and implementation agenda as well as a roadmap for achieving full digital language equality in Europe by 2030.
Bio note: Prof. Dr. Georg Rehm holds an M.A. in Computational Linguistics and Artificial Intelligence, Linguistics and Computer Science from the University of Osnabrück and a PhD title in Computational Linguistics from the University of Gießen. He is currently a Principal Researcher in the Speech and Language Technology Lab at the German Research Center for Artificial Intelligence (DFKI), in Berlin, the Coordinator of QURATOR and European Language Grid, the Co-coordinator of European Language Equality, and the General Secretary of META-NET, an EU-funded Network of Excellence consisting of 60 research centers from 34 countries, dedicated to building the technological foundations of a multilingual European information society. In April 2021, Georg Rehm was appointed honorary professor for outstanding achievements in research and education at Humboldt-Universität zu Berlin where he is affiliated with the Institut für deutsche Sprache und Linguistik. He has been the Head of the German/Austrian Chapter of the World Wide Web Consortium (W3C). In the 2021/2022 term, Georg Rehm serves as the Secretary of the EACL (European Chapter of the Association for Computational Linguistics).
16:00 – 17:00 Keynote speech session 3 – chaired by Verginica Barbu Mititelu
Abstract: Big Corpus data are notoriously difficult to use linguistically because of their high dimensionality, their opaque structure and because of IPR and license restrictions. In my talk, I will introduce a multi-level approach used by the corpus linguistics group at IDS Mannheim in the context of KorAP, DeReKo and the European Reference Corpus initiative EuReCo, to make corpus as usable as possible, at feasible costs and despite these challenges. I will use examples from the Reference Corpus of the Contemporary Romanian Language (CoRoLa) and report on preliminary results.
Bio note: Marc Kupietz completed his master’s degree in linguistics at the University of Bielefeld. Since then he has worked in various research projects in the fields of psycholinguistics, cognitive science and neural network modelling, as well as in the areas of text technology and information management. After receiving his Ph.D. in 2003, he began working at the Leibniz Institute for the German Language in Mannheim, where he is responsible for the German Reference Corpus DeReKo and has been head of the Corpus Linguistics programme area since 2012. Marc teaches at the Universities of Mannheim and Heidelberg and is co-editor of the series Corpus Linguistics and Interdisciplinary Perspectives on Language (CLIP). His research interests include empirically grounded linguistics, language modelling and research tools.
Session 3: Datasets and Corpora – chaired by Radu Ion
17:00 – 17:20 Constantin Nicolae and Dan Tufiș: RoITD: Romanian IT Question Answering Dataset
17:20 – 17:40 Roxana Szabo and Adrian Groza: A Puzzle-Based Dataset for Natural Language Inference
17:40 – 18:00 Ioana Buhnilă: Building a Corpus of Medical Paraphrases in Romanian
18:00 – 18:20 Coffee break
Session 4: Speech and Text Models for Romanian language – chaired by Horia Nicolai Teodorescu
18:20 – 18:40 Marius Dan Zbancioc and Silvia Monica Feraru: A New Method of Emphasizing the Formants of the Vocal Spectrum
18:40 – 19:00 Vasile Păiș, Elena Irimia, Radu Ion, Dan Tufis, Maria Mitrofan, Verginica Barbu Mititelu, Andrei-Marius Avram and Eric Curea: Romanian Text Anonymization Experiments from the CURLICAT Project
14 December – ConsILR Day II & DeLORo workshop
8:30 – 9:00 Connections
9:00 – 10:00 Keynote speech session 4 – chaired by Eugen Munteanu
Milena Dobreva: Language technologies and large-scale digital heritage resources: friends or foes?
Abstract: The talk will explore the relationship between large scale digital heritage resources and language technologies. It will discuss the different roles and expectations of researchers – end-users of digital resources, and the creators of these resources and will be focused mostly on the applications of data science for innovative research in digital cultural heritage collections and the emerging innovation labs in libraries. The talk will provide examples from the use of digitised collections from Europeana, the Library of Congress, the British Library, the Royal Library of Denmark, Miguel de Cervantes Virtual Library, and the National Library Ivan Vazov in Plovdiv. Finally, the talk will also look into the expectations and challenges around the much-discussed application of artificial technologies in the domain of digital cultural heritage.
Bio note: Milena Dobreva is working in the domains of digital transformation, user experiences and innovation in big digital cultural collections. After 13 years of academic experience in Scotland, Malta and Qatar she is now reintegrating in her native Bulgaria with a research grant DISTILL, which explores disruptive technologies in innovation labs in GLAM institutions. In 2019 she was instrumental in hosting the booksprint which delivered the first book exploring the innovation labs in the cultural and scientific heritage sectors, ‘Open a GLAM Lab’ and I also was honoured to receive the Europeana Unsung Hero award. A member of the ENA Management Board and of the DARIAH Scientific Board.
Session 5: Language Varieties and Social Media – chaired by Corina Forăscu
10:00 – 10:20 Anca-Diana Bibiri, Mihaela Colhon and Mihaela Mocanu: Statistics of Some Regional Varieties of Romanian Greetings
10:20 – 10:40 Mihaela Onofrei, Diana Trandabăț and Bogdan-Andrei Donu: Offensive Language Identification in Social Media for Romanian Language
10:40 – 11:00 Radu A. Ciora, Marius Cioca and Daniela Gifu: A Survey on Fake News Detection Techniques
11:00 – 11:10 ConsILR Closing session
11:10 – 12:00 Coffee break
The conference will continue with the DeLORo workshop. You are equally invited (Mode details here).
The organizers will try to follow closely the program announced below. If small changes will occur, they will be announced during the conference and, as far as possible, on these pages.