AI Integration Engine
Computer vision, gesture interpretation, and voice intent pipelines integrated with real-time hardware control.
OpenCV + MediaPipe Gesture Stack
Camera frames are processed with OpenCV, then fed to MediaPipe Hands for landmarks. Gesture logic maps landmark geometry into motion intents.
Palm up = Forward | Fist = Stop | Left tilt = Left | Right tilt = Right
Speech Recognition Command Layer
Microphone audio is converted to text and normalized into action commands before transmitting over Bluetooth to the firmware execution layer.
Capture -> Recognize -> Map Command -> Send -> Execute
Software-Hardware Communication Bridge
Python GUI serializes events, Bluetooth transports packets, and Arduino applies deterministic motor/sensor logic through L298N to all four DC motors.
High-level AI decisions, low-level embedded execution.