Conference

Academic Research Library

Find some of the best Journals and Proceedings.

Or Browse Popular Categories

Implementation of Transformer Architecture Through Visual Learning

Author : Harshika Dehariya

Abstract : Transformer infrastructures have emerged as a basis for artificial intelligence research in recent years, moving from natural language processing to visual literacy. The Transformer's tone-attention medium makes it possible to model long-range dependencies in visual data, which improves point representation and contextual comprehension. The use of Transformer infrastructures in visual literacy operations is thoroughly examined in this review of the literature. It examines how early Vision Mills (ViT) evolved into sophisticated hierarchical, cold-blooded, multimodal infrastructures like Swin Transformer, DETR, and CLIP. The review also covers widely used datasets, training approaches, assessment standards, and difficulties with computation and data efficacy. Similarly, it looks at real world applications in various computer vision fields, such as independent systems, multimodal understanding, and medical imaging. According to the results, visual Mills have demonstrated a paradigm shift in computer vision research by outperforming conventional convolutional models in terms of rigidity and scalability

Keywords : Computer Vision, Deep Learning, Image Recognition, Multimodal Systems, Self-Attention, Vision Transformer, Visual Learning.

Conference Name : International Conference on Engineering & Technology (ICET - 25)

Conference Place : Bangalore, India

Conference Date : 20th Dec 2025

Preview

View PDF

Welcome to the Academic Research Library — your gateway to a vast collection of scholarly proceedings across diverse fields.

Quick Links

Topic Links

Help

Email ID

academicresearchlibraryinfo@gmail.com