FUSION X

AI Integration Engine

Computer vision, gesture interpretation, and voice intent pipelines integrated with real-time hardware control.

OpenCV + MediaPipe Gesture Stack

Camera frames are processed with OpenCV, then fed to MediaPipe Hands for landmarks. Gesture logic maps landmark geometry into motion intents.

Palm up = Forward | Fist = Stop | Left tilt = Left | Right tilt = Right

Speech Recognition Command Layer

Microphone audio is converted to text and normalized into action commands before transmitting over Bluetooth to the firmware execution layer.

Capture -> Recognize -> Map Command -> Send -> Execute

Software-Hardware Communication Bridge

Python GUI serializes events, Bluetooth transports packets, and Arduino applies deterministic motor/sensor logic through L298N to all four DC motors.

High-level AI decisions, low-level embedded execution.