Abstract: A portable remote control device enables user interaction with an appliance by detecting user gestures made in a hover zone, and converting the gestures to commands that are wirelessly transmitted to the appliance. The remote control device includes at least two cameras whose intersecting FOVs define a three-dimensional hover zone within which user interactions are imaged. Separately and collectively image data is analyzed to identify a relatively few user landmarks. Substantially unambiguous correspondence is established between the same landmark on each acquired image, and a three-dimensional reconstruction is made in a common coordinate system. Preferably cameras are modeled to have characteristics of pinhole cameras, enabling rectified epipolar geometric analysis to facilitate more rapid disambiguation among potential landmark points. As a result processing overhead and latency times are substantially reduced.