Classification of ISL using Pose and Object Detection based Techniques

Accepted in SmartCom 2023.

Download available soon.

Hearing-impaired individuals are forced to face numerous obstacles in several challenges in different facets of life. The unemployment rate among the deaf is staggering. The COVID-19 pandemic has introduced a new degree of hardship, particularly because the work environment has been completely switched to online platforms because several companies aren’t prepared with the necessary tools to involve the hearing-impaired in this change. Societal inclusion and acceptance of differently-abled people are dependent on the ecosystem that has been created for them, which involves training and skilling them as well as understanding and seamlessly interacting with them, hence increasing employability. With the aid of this project, we want to bridge the communication gap for the hearing-impaired and thereby contribute to the development of suitable habitats for them. In this paper, we present two approaches for the classification of Indian Sign Language: (a) the object detection-based approach utilizes a model built on Scaled-YOLOv4 architecture which performs a frame-by-frame inference and (b) the Pose-based approach utilizes an LSTM model which takes the skeletal pose landmarks from Mediapipe for a sequence of frames as an input to infer and predict the action.

This paper proposed two approaches for Sign Language Translation. The usage of Mediapipe to collect landmarks enhanced the LSTM module’s accuracy and latency. Theoretical findings of around 98% accuracy can be used to support this assertion. Using Streamlit, the model was deployed as a prototype and produced decent real-time results. However, it was found from experimentation that this approach is not very scalable due to the drastic fall in model performance with increase in number of classes. The object detection route allows us to train far more number of classes (about thrice) on the Scaled-YOLOv4 Architecture with only little impact to performance with rise in number of classes. This provides a better and more scalable solution to Sign-to-Text translation with the ability to infer on images, videos and live-cam. The final accuracy of our YOLO model was 95.9% for 25 classes.

V. Mittal, P. Patil, A. Upadhyay, K. Madhwani and N. Giri, “Classification of ISL using Pose and Object Detection based Techniques”