A Full-Stack Multimodal Assistive Communication System

Authors

  • Shivangi Jindal B.Tech Final Year Student, Sunder Deep Engineering College, Ghaziabad, UP, India Author
  • Kajal Kori Lecturer, Sunder Deep Engineering College, Ghaziabad, UP, India Author

Keywords:

Assistive Communication Systems, Gesture Recognition using Computer Vision, Speech-to-Text Conversion

Abstract

MANUSCRIPT is a full-stack web-based assistive communication platform designed to bridge the communication gap between sign-language / visually impaired users and the hearing/ seeing population. The system enables real-time conversion of hand gestures and spoken language into readable text using computer vision and speech recognition technologies. It integrates MediaPipe Hands for gesture detection and the Web Speech API for speech-to-text conversion, supported by a Flask backend for secure data storage, authentication, and scalability. The primary problem addressed is the lack of accessible, scalable, and affordable communication tools for individuals with hearing, visual or speech impairments. Existing solutions are either limited to single modalities or rely heavily on server-based systems, leading to latency and privacy concerns. The objective of this project is to develop a hybrid system that combines real-time client-side processing with backend support to enhance usability, scalability, and data persistence. The system achieves an average response time of 0.17s for gesture recognition and 0.8s for speech-to-text conversion, enabling near real-time communication. The expected outcome is a robust, user-friendly platform capable of facilitating seamless communication using gestures and speech, with potential applications in education, healthcare, and everyday interactions.

Downloads

Download data is not yet available.

References

T. Brown et al., “Language Models are Few-Shot Learners,” 2020.

T. Althoff et al., “Counselling Conversations Analysis,” 2017.

B. Liu et al., “Sentiment Analysis,” 2021.

E. Bender et al., “On the Dangers of AI,” 2021.

Google, “MediaPipe Hands,” 2020. [6] OpenAI, “ChatGPT,” 2023.

A. Vaswani et al., “Attention is All You Need,” 2017.

F. Zhang, X. Zhu, and M. Ye, “Hand Gesture Recognition Using Deep Learning: A Review,” IEEE Access, vol. 8, pp. 208980–209012, 2020.

S. Mitra and T. Acharya, “Gesture Recognition: A Survey,” IEEE Transactions on Systems, Man, and Cybernetics, vol. 37, no. 3, pp. 311–324, 2007.

Kataria, B., & Jethva, H. B. (2021). Optical character recognition of Sanskrit manuscripts using convolutional neural networks. Webology, 18(5), 403–424. https://www.webology.org/abstract.php?id=1681

K. Simonyan and A. Zisserman, “Very Deep Convolutional Networks for Large- Scale Image Recognition,” arXiv preprint arXiv:1409.1556, 2014.

Downloads

Published

22-04-2026

Issue

Section

Research Articles

How to Cite

[1]
Shivangi Jindal and Kajal Kori, Trans., “A Full-Stack Multimodal Assistive Communication System”, Int J Sci Res Sci & Technol, vol. 13, no. 2, pp. 871–876, Apr. 2026, Accessed: Apr. 29, 2026. [Online]. Available: https://mail.ijsrst.com/index.php/home/article/view/IJSRST2613366