Classification of ISL using Pose and Object Detection based Techniques

According to the 2018-19 Annual report by the Indian Sign Language Research and Training Centre (ISLRTC), there are only 325 certified Sign language interpreters in India. This number is far lesser than required for translating the current 50,71,007 hearing-impaired and 19,98,535 speech-impaired persons in India (Census 2011).

Many times, the organizations required students and employees to work from home due to pandemics. This also helps organizations save money on various expenditures and is more time-efficient. This change has affected the deaf and hard-of-hearing students and job candidates negatively as, when hiring the candidate requires good internet connection and pin-drop silence at home but for hiring hard-of-hearing job candidates additional requirements such as availability of an interpreter becomes an additional hurdle. Also, hard-of-hearing students face similar issues in online lectures.

Thus, we present two approaches for the classification of Indian Sign Language:

Pose-based approach utilizes an LSTM model which takes the skeletal pose landmarks from Mediapipe for a sequence of frames as an input to infer and predict the action.
Object detection-based approach utilizes a model built on Scaled-YOLOv4 architecture which performs a frame-by-frame inference

Pose-based Estimation using LSTM

In this initial approach, the Sign-to-Text translation was performed by acquiring pose skeleton of the person from the input video which is achieved with the help of Holistic model from the Mediapipe Python library. This serves as a preprocessing step for the input is passed to the actual LSTM Model which then, provides the meaning of the Sign in the form of the classification result.

The input takes the form of a video or images. The input video frames would be given to the LSTM model once the landmarks have been extracted using the MP Holistic model. Finally, the LSTM model infers the input to provide the appropriate text for the given action.

Classification Report for LSTM

	Precision	Recall	F1-score	Support
Hello	1	1	1	17
I	1	0.91	0.95	11
Thank You	0.95	1	0.97	18
Deaf	0.94	1	0.97	15
Work	1	0.93	0.97	15
Study	1	0.95	0.97	20
Good	0.96	0.98	0.97	16
Bad	1	1	0.95	16
accuracy	0.98			128
macro avg	0.98	0.98	0.98	128
weighted avg	0.98	97	0.97	128

Confusion Matrix for LSTM

Classes	True Positive	False Positive	True Negative	False Negative
Hello	111	0	17	0
I	117	0	10	1
Thank You	109	1	18	0
Deaf	112	1	15	0
Work	113	0	14	1
Study	108	0	20	0
Good	112	0	16	0
Bad	112	0	15	1

Accuracy Graph for LSTM

Object detection-based Classification using Scaled YOLO-v4

The initial approach, i.e the skeleton approach has a set of drawbacks when it comes to real-time deployment of models in a dynamic environment. The performance of the LSTM model developed for pose-based approach dwindled exponentially with increase in classes. Thus, with an imminent need for a newer approach, the idea of treating the hand-signs as an object was brought up. Finally, YOLO was utilized for the detection of these hand-sign objects.

The input takes the form of a video or images. The input video frames would be given to the YOLO model once the image frame has been grayscale and then, output shall be generated for each in the form of bounding box coordinates and a class result.

Mean Average Precision (0.5:0.95) Graph for Scaled-YOLOv4

Precision for Scaled-YOLOv4

Recall for Scaled-YOLOv4

Twitter Facebook LinkedIn

Vishesh Mittal

Pose-based Estimation using LSTM

Object detection-based Classification using Scaled YOLO-v4