DeLORo workshop I
12:00 – 12:15 Opening session Dan Cristea: Presentation of the DeLORo project
12:15 – 13:15 Keynote speech session 1 – chaired by Marius Popescu
Ivan Koychev, Angel Beshirov and Suzan Hadzhieva: DuoSearch – A Novel Search Engine for Bulgarian Historical Documents
Abstract: Search in collections of digitized historical documents is hindered by a two-prong problem, orthographic variety, and optical character recognition (OCR) mistakes. We present a new search engine for historical documents, DuoSearch, which uses ElasticSearch and machine learning methods based on deep neural networks to offer a solution to this problem. It was tested on a collection of historical newspapers in Bulgarian from the mid-19th to the mid-20th century. The system provides an interactive and intuitive interface for the end-users, allowing them to enter search terms in modern Bulgarian and search across historical spellings. This is the first solution facilitating the use of digitized historical documents in Bulgarian.
Bio notes: Ivan Koychev is a full professor at the Faculty of Mathematics and Informatics at the University of Sofia “St Kliment Ohridski”. He graduated from the same university and obtained his PhD from the Bulgarian Academy of Science. In the past he received a three-year post-doc fellowship from the German National Research Centre for Information Technologies, Sankt Augustin, and was a research fellow in the Smart Web Technologies Centre at Robert Gordon University, Aberdeen, UK. His main research interests are in the areas of Machine Learning, Information Retrieval, Data/Text Mining and Natural Language Processing. He worked in numerous academic and industrial projects, funded nationally and by the European Commission.
Angel Beshirov is a second-year Artificial Intelligence graduate student at Sofia University “St. Kliment Ohridski”. He received his bachelor’s degree in Software Engineering from the same university. He is interested in information retrieval, natural language processing and reinforcement learning.
Suzan Hadzhieva has been studying Artificial Intelligence at Sofia University “St. Kliment Ohridski” since the previous year. She received her bachelor’s degree in Software Engineering from the same university. She loves experimenting and she is passionate about natural language processing and information retrieval.
Session 1: Acquisition of Data – chaired by Gabriela Haja
13:15 – 13:30 Roxana Vieru, Isabelle Tamba & Mihaela Onofrei: DeLORo’s primary documents
13:30 – 13:45 Daniela Gîfu & Paula Crucianu: The OOCIAT annotation frontend – functionality
13:45 – 14:00 Gabriela Haja & Isabelle Tamba: Today’s situation of objects’ annotation
14:00 – 15:30 Lunch break
15:30 – 16:30 Keynote speech session 2 – chaired by Isabelle Tamba
Ioannis Pratikakis: Accessing Greek Historical handwritten documents using the μDoc.tS platform
Abstract: The potential to access our written past, which stimulates the interest of not only researchers but also the general public, makes Handwritten Text Recognition (HTR) and KeyWord Spotting (KWS) a highly appealing set of technologies, among the ones appearing in the document image analysis research area. This talk will strive towards a manifestation for the presentation of the HTR and KWS technologies that have been developed in the context of the research project ‘A platform for the transcription of historical handwritten documents – μDoc.tS’. The performance of these technologies will be exemplified for a set of handwritten document collections originating from the Stavronikita Monastery on Mount Athos that recently have been made publicly available for research purposes.
Bio note: Ioannis Pratikakis received his Ph.D. degree in 3D Image analysis from the Electronics engineering and Informatics department at Vrije Universiteit Brussel, Belgium. He is a professor in the Department of Electrical and Computer Engineering at Democritus University of Thrace in Xanthi, Greece, and the head of the Visual Computing Group (https://vc.ee.duth.gr/). Prof. Pratikakis’ research interests lie in machine learning, vision, image processing and graphics, more specifically, in document image analysis and recognition, medical image analysis, and 3D shape analysis, search and retrieval. He served as Guest editor in several journals including the International Journal of Computer Vision, The Visual Computer and – as an Associate Editor – for the MDPI Journal of imaging and the Springer Nature Computer Science journal. He has participated in more than 20 national and international projects including major EU projects related to document analysis and was co-organizer of the 14th International Conference on Frontiers in Handwriting Recognition (ICFHR 2014) and of several competitions related to document image analysis and handwritten keyword spotting. He has been a Senior Member of the IEEE since 2012.
Session 2: Architectures and Processing Models – chaired by Adrian Iftene
16:30 – 16:45 Cristian Pădurariu: The OOCIAT backend and the image2text alignment algorithm
16:45 – 17:00 Petru Rebeja & Andrei Scutelnicu: The internal structure of the database and access facilities
17:00 – 17:15 Mihaela Găman, Marius Popescu and Radu Ionescu: A neural networks approach on recognition of lines and characters
17:15 – 17:30 Mihaela Onofrei & Cecilia Bolea: Yet other similar approaches
17:30 – 17:45 Dan Cristea: What next?
17:45 – Closing the workshop
The organizers will try to follow closely the program announced below. If small changes will occur, they will be announced during the conference and, as far as possible, on these pages.