A Full-Stack Multimodal Assistive Communication System

Shivangi Jindal; Kajal Kori

Authors

Shivangi Jindal B.Tech Final Year Student, Sunder Deep Engineering College, Ghaziabad, UP, India Author
Kajal Kori Lecturer, Sunder Deep Engineering College, Ghaziabad, UP, India Author

Keywords:

Assistive Communication Systems, Gesture Recognition using Computer Vision, Speech-to-Text Conversion

Abstract

MANUSCRIPT is a full-stack web-based assistive communication platform designed to bridge the communication gap between sign-language / visually impaired users and the hearing/ seeing population. The system enables real-time conversion of hand gestures and spoken language into readable text using computer vision and speech recognition technologies. It integrates MediaPipe Hands for gesture detection and the Web Speech API for speech-to-text conversion, supported by a Flask backend for secure data storage, authentication, and scalability. The primary problem addressed is the lack of accessible, scalable, and affordable communication tools for individuals with hearing, visual or speech impairments. Existing solutions are either limited to single modalities or rely heavily on server-based systems, leading to latency and privacy concerns. The objective of this project is to develop a hybrid system that combines real-time client-side processing with backend support to enhance usability, scalability, and data persistence. The system achieves an average response time of 0.17s for gesture recognition and 0.8s for speech-to-text conversion, enabling near real-time communication. The expected outcome is a robust, user-friendly platform capable of facilitating seamless communication using gestures and speech, with potential applications in education, healthcare, and everyday interactions.

Downloads

Download data is not yet available.

References

T. Brown et al., “Language Models are Few-Shot Learners,” 2020.

T. Althoff et al., “Counselling Conversations Analysis,” 2017.

B. Liu et al., “Sentiment Analysis,” 2021.

E. Bender et al., “On the Dangers of AI,” 2021.

Google, “MediaPipe Hands,” 2020. [6] OpenAI, “ChatGPT,” 2023.

A. Vaswani et al., “Attention is All You Need,” 2017.

F. Zhang, X. Zhu, and M. Ye, “Hand Gesture Recognition Using Deep Learning: A Review,” IEEE Access, vol. 8, pp. 208980–209012, 2020.

S. Mitra and T. Acharya, “Gesture Recognition: A Survey,” IEEE Transactions on Systems, Man, and Cybernetics, vol. 37, no. 3, pp. 311–324, 2007.

Kataria, B., & Jethva, H. B. (2021). Optical character recognition of Sanskrit manuscripts using convolutional neural networks. Webology, 18(5), 403–424. https://www.webology.org/abstract.php?id=1681

K. Simonyan and A. Zisserman, “Very Deep Convolutional Networks for Large- Scale Image Recognition,” arXiv preprint arXiv:1409.1556, 2014.

A Full-Stack Multimodal Assistive Communication System

Authors

Keywords:

Abstract

Downloads

References

Downloads

Published

Issue

Section

License

How to Cite

RightSideBlock

IssueDate

Latest publications