AI Object Detection and Smart Response Humanoid Robot

Authors

  • Dr. T. Swarna Latha Department of ECE, Sri Venkateswara College of Engineering, Tirupati, Andhra Pradesh, India Author
  • Ediga Indu Department of ECE, Sri Venkateswara College of Engineering, Tirupati, Andhra Pradesh, India Author
  • Ganaparthi Tribhuvana Department of ECE, Sri Venkateswara College of Engineering, Tirupati, Andhra Pradesh, India Author
  • Chintha Vijaya Varshitha Department of ECE, Sri Venkateswara College of Engineering, Tirupati, Andhra Pradesh, India Author
  • Edambaku Sai Varshini Department of ECE, Sri Venkateswara College of Engineering, Tirupati, Andhra Pradesh, India Author
  • Mohammad Rahath Department of ECE, Sri Venkateswara College of Engineering, Tirupati, Andhra Pradesh, India Author

Keywords:

Humanoid Robot, Object Detection, Computer Vision, Embedded Artificial Intelligence, Vision-Based Perception, Real-Time Object Tracking, Smart Response System, Human–Robot Interaction, Servo Motion Control, Text-to-Speech Interaction

Abstract

Vision-based object perception has become a fundamental capability for intelligent humanoid robots, enabling autonomous awareness, adaptive behavior, and natural interaction with dynamic environments. Conventional humanoid as demonstration of robots often rely on the predefined scripts, limited sensor-based on detection, or external computation, which restrict real-time responsiveness and reduce interaction realism. This paper presents the design, implementation, and evaluation of an AI-based object detection and the smart response humanoid robot for autonomous and the interactive operation using the embedded vision intelligence. The proposed system utilizes an onboard camera to acquire real-time visual data, which is processed using deep learning–based computer vision algorithms to detect, classify, and localize objects in the robot’s surroundings. Vision-based preprocessing and AI inference modules operate entirely on an embedded processing platform, enabling low-latency object recognition without external sensors or cloud-based computation. Detected object information, including spatial position and movement, is mapped to intelligent decision logic that generates coordinated humanoid responses. Humanoid motion is achieved through a servo-driven control architecture that enables smooth head and upper-body tracking of detected objects, enhancing visual focus and human-like behavior. In addition to physical response, the system incorporates an integrated text-to-speech mechanism that provides real-time audio has the feedback by announcing recognized objects, creating a multimodal interaction experience. The complete system is implemented on a humanoid robotic platform and evaluated under various indoor conditions to measure detection accuracy, response latency, and motion stability. Experimental results demonstrate as reliable real-time performance, to accurate object detection, smooth tracking behavior, and the effective synchronization between vision, motion, and the voice feedback. The proposed architecture is modular and scalable, supporting future extensions such as multi-object interaction, face and gesture recognition, emotion-aware responses, and intelligent adaptive learning. The system is well suited for applications in robotics education, interactive demonstrations, public exhibitions, and vision-based human–robot interaction research.

Downloads

Download data is not yet available.

References

R. Szeliski, Computer Vision: Algorithms and Applications, Springer, New York, USA, 2011.

G. Bradski and A. Kaehler, Learning OpenCV: Computer Vision with the OpenCV Library, O’Reilly Media, Sebastopol, CA, USA, 2008.

J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: Unified, real-time object detection,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Las Vegas, NV, USA, pp. 779–788, June 2016.

J. Redmon and A. Farhadi, “YOLO9000: Better, faster, stronger,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Honolulu, HI, USA, pp. 7263–7271, July 2017.

Z. Cao, T. Simon, S. E. Wei, and Y. Sheikh, “Realtime multi-person 2D pose estimation using part affinity fields,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 43, no. 1, pp. 172–186, Jan. 2021.

S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards real-time object detection with region proposal networks,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 39, no. 6, pp. 1137–1149, June 2017.

B. Siciliano and O. Khatib, Springer Handbook of Robotics, Springer, Cham, Switzerland, 2016.

A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classification with deep convolutional neural networks,” in Proc. Adv. Neural Inf. Process. Syst. (NIPS), Lake Tahoe, NV, USA, pp. 1097–1105, Dec. 2012.

P. Corke, Robotics, Vision and Control: Fundamental Algorithms in MATLAB, Springer, Berlin, Germany, 2017.

M. Goodrich and A. Schultz, “Human–robot interaction: A survey,” Foundations and Trends in Human–Computer Interaction, vol. 1, no. 3, pp. 203–275, 2007.

Downloads

Published

25-03-2026

Issue

Section

Research Articles

How to Cite

[1]
Dr. T. Swarna Latha, Ediga Indu, Ganaparthi Tribhuvana, Chintha Vijaya Varshitha, Edambaku Sai Varshini, and Mohammad Rahath, Trans., “AI Object Detection and Smart Response Humanoid Robot”, Int J Sci Res Sci & Technol, vol. 13, no. 2, pp. 430–443, Mar. 2026, Accessed: Apr. 29, 2026. [Online]. Available: https://mail.ijsrst.com/index.php/home/article/view/IJSRST2613314