IYad: A Dataset for Arabic Printed Text Recognition in Natural Scene Videos
Author : Oualid KHIAL, Fatma BOUFERA, Rochdi BACHIR BOUIADJRA Rochdi
Abstract :In this paper, we present a new Arabic text recognition dataset called ”IYaD”. We aim to facilitate and promote advancements in Arabic text extraction and recognition from scene videos. As of the submission of this paper, the IYaD dataset contains 1,400,000 images of a single font and the same number for 16 other fonts. Each image is presented in three distinct versions and is accompanied by its corresponding Arabic labels, Latin transcriptions, and content. It is one of the few datasets dedicated to this subject and the largest in terms of image count and diversity. Although it is artificially created, a complexity study and comparison are provided as proof that it strongly simulates real-world images. It can be used as a data source for almost all known deep learning methods. It is published with a set of tools to ensure its continued use and to guarantee easy processing and label transformation. In order to justify this work, we also aim to highlight the data and the role it plays in artificial neural networks (ANNs).
Keywords :Data set, Arabic, text recognition, AI, Neural networks, deep learning, machine learning.
Conference Name :International Conference on Computer Science, Technology and Artificial Intelligence (ICCSTAI-25)
Conference Place Toronto, Canada
Conference Date 20th Aug 2025