Quasi-three-dimensional method and apparatus to detect and localize interaction of user-object and virtual transfer device
A system used with a virtual device inputs or transfers information to a companion device, and includes two optical systems OS1, OS2. In a structured-light embodiment, OS1 emits a fan beam plane of optical energy parallel to and above the virtual device. When a user-object penetrates the beam plane of interest, OS2 registers the event. Triangulation methods can locate the virtual contact, and transfer user-intended information to the companion system. In a non-structured active light embodiment, OS1 is preferably a digital camera whose field of view defines the plane of interest, which is illuminated by an active source of optical energy. Preferably the active source, OS1, and OS2 operate synchronously to reduce effects of ambient light. A non-structured passive light embodiment is similar except the source of optical energy is ambient light. A subtraction technique preferably enhances the signal/noise ratio. The companion device may in fact house the present invention.
Priority is claimed from applicants' co-pending U.S. provisional patent application Ser. No. 60/287,115 filed on 27 Apr. 2001 entitled “Input Methods Using Planar Range Sensors”, from co-pending U.S. Provisional patent application Ser. No. 60/272,120 filed on 27 Feb. 2001 entitled “Vertical Triangulation System for a Virtual Touch-Sensitive Surface”, and from co-pending U.S. provisional patent application Ser. No. 60/231,184 filed on 7 Sep. 2000 entitled “Application of Image Processing Techniques for a Virtual Keyboard System”. Further, this application is a continuation-in-part from co-pending U.S. patent application Ser. No. 09/502,499 filed on 11 Feb. 2000 entitled “Method And Apparatus for Entering Data Using A Virtual Input Device”. Each of said applications is incorporated herein by reference.
FIELD OF THE INVENTIONThe invention relates generally to sensing proximity of a stylus or user finger relative to a device to input or transfer commands and/or data to a system, and more particularly to such sensing relative to a virtual device used to input or transfer commands and/or data and/or other information to a system.
BACKGROUND OF THE INVENTIONIt is often desirable to use virtual input devices to input commands and/or data and/or transfer other information to electronic systems, for example a computer system, a musical instrument, even telephones. For example, although computers can now be implemented in almost pocket-size, inputting data or commands on a mini-keyboard can be time consuming and error prone. While many cellular telephones can today handle e-mail communication, actually inputting messages using the small telephone touch pad can be difficult. For example, a PDA has much of the functionality of a computer but suffers from a tiny or non-existent keyboard. If a system could be used to determine when a user's fingers or stylus contacted a virtual keyboard, and what fingers contacted what virtual keys thereon, the output of the system could perhaps be input to the PDA in lieu of keyboard information. (The terms “finger” or “fingers”, and “stylus” are used interchangeably herein.) In this example a virtual keyboard might be a piece of paper, perhaps that unfolds to the size of a keyboard, with keys printed thereon, to guide the user's hands. It is understood that the virtual keyboard or other input device is simply a work surface and has no sensors or mechanical or electronic components. The paper and keys would not actually input information, but the interaction or interface between the user's fingers and portions of the paper, or if not paper, portions of a work surface, whereon keys would exist, could be used to input information to the PDA. A similar virtual device and system might be useful to input e-mail to a cellular telephone. A virtual piano-type keyboard might be used to play a real musical instrument. The challenge is how to detect or sense where the user's fingers or a stylus are relative to the virtual device.
U.S. Pat. No. 5,767,848 to Korth (1998) entitled “Method and Device For Optical Input of Commands or Data” attempts to implement virtual devices using a two-dimensional TV video camera. Such optical systems rely upon luminance data and require a stable source of ambient light, but unfortunately luminance data can confuse an imaging system. For example, a user's finger in the image foreground may be indistinguishable from regions of the background. Further, shadows and other image-blocking phenomena resulting from a user's hands obstructing the virtual device would seem to make implementing a Korth system somewhat imprecise in operation. Korth would also require examination of the contour of a user's fingers, finger position relative to the virtual device, and a determination of finger movement.
U.S. Pat. No.______ to Bamji et al. (2001) entitled “CMOS-Compatible Three-Dimensional Image Sensor IC”, application Ser. No. 09/406,059, filed 22 Sep. 1999, discloses a sophisticated three-dimensional imaging system usable with virtual devices to input commands and data to electronic systems. In that patent, various range finding systems were disclosed, which systems could be used to determine the interface between a user's fingertip and a virtual input device, e.g., a keyboard. Imaging was determined in three-dimensions using time-of-flight measurements. A light source emitted optical energy towards a target object, e.g., a virtual device, and energy reflected by portions of the object within the imaging path was detected by an array of photodiodes. Using various sophisticated techniques, the actual time-of-flight between emission of the optical energy and its detection by the photodiode array was determined. This measurement permitted calculating the vector distance to the point on the target object in three-dimensions, e.g., (x,y,z). The described system examined reflected emitted energy, and could function without ambient light. If for example the target object were a layout of a computer keyboard, perhaps a piece of paper with printed keys thereon, the system could determine which user finger touched what portion of the target, e.g., which virtual key, in what order. Of course the piece of paper would be optional and would be used to guide the user's fingers.
Three-dimensional data obtained with the Bamji invention could be softwareprocessed to localize user fingers as they come in contact with a touch surface, e.g., a virtual input device. The software could identify finger contact with a location on the surface as a request to input a keyboard event to an application executed by an associated electronic device or system (e.g., a computer, PDA, cell phone, Kiosk device, point of sale device, etc.). While the Bamji system worked and could be used to input commands and/or data to a computer system using three-dimensional imaging to analyze the interface of a user's fingers and a virtual input device, a less complex and perhaps less sophisticated system is desirable. Like the Bamji system, such new system should be relatively inexpensive to mass produce and should consume relatively little operating power such that battery operation is feasible.
The present invention provides such a system.
SUMMARY OF THE PRESENT INVENTIONThe present invention localizes interaction between a user finger or stylus and a passive touch surface (e.g., virtual input device), defined above a work surface, using planar quasi-three-dimensional sensing. Quasi-three-dimensional sensing implies that determination of an interaction point can be made essentially in three dimensions, using as a reference a two-dimensional surface that is arbitrarily oriented in three-dimensional space. Once a touch has been detected, the invention localizes the touch region to determine where on a virtual input device the touching occurred, and what data or command keystroke, corresponding to the localized region that was touched, is to be generated in response to the touch. Alternatively, the virtual input device might include a virtual mouse or trackball. In such an embodiment, the present invention would detect and report coordinates of the point of contact with the virtual input device, which coordinates would be coupled to an application, perhaps to move a cursor on a display (in a virtual mouse or trackball implementation) and/or to lay so-called digital ink for a drawing or writing application (virtual pen or stylus implementation). In the various embodiments, triangulation analysis methods preferably are used to determine where user-object “contact” with the virtual input device occurs.
In a so-called structured-light embodiment, the invention includes a first optical system (OS1) that generates a plane of optical energy defining a fanbeam of beam angle φ parallel to and a small stand-off distance ΔY above the work surface whereon the virtual input device may be defined. In this embodiment, the plane of interest is the plane of light produced by OS1, typically a laser or LED light generator. The two parallel planes may typically be horizontal, but they may be disposed vertically or at any other angle that may be convenient. The invention further includes a second optical system (OS2) that is responsive to optical energy of the same wavelength as emitted by OS1. Preferably OS2 is disposed above OS1 and angled with offset θ, relative to the fan-beam plane, toward the region where the virtual input device is defined. OS2 is responsive to energy emitted by OS1, but the wavelength of the optical energy need not be visible to humans. The invention may also be implemented using non-structured-light configurations that may be active or passive. In a passive triangulation embodiment, OS1 is a camera rather than an active source of optical energy, and OS2 is a camera responsive to the same optical energy as OS1, and preferably disposed as described above. In such embodiment, the plane of interest is the projection plane of a scan line of the OS1 camera. In a non-structured-light embodiment such as an active triangulation embodiment, OS1 and OS2 are cameras and the invention further includes an active light source that emits optical energy having wavelengths to which OS1 and OS2 respond. Optionally in such embodiment, OS1 and OS2 can each include a shutter mechanism synchronized to output from the active light source, such that shutters in OS1 and OS2 are open when optical energy is emitted, and are otherwise closed. An advantage of a non-structured light configuration using two cameras is that bumps or irregularities in the work surface are better tolerated. The plane defined by OS1 may be selected by choosing an appropriate row of OS1 sensing pixel elements to conform to the highest y-dimension point (e.g., bump) of the work surface.
In the structured-light embodiment, OS2 will not detect optical energy until an object, e.g., a user finger or stylus, begins to touch the work surface region whereon the virtual input device is defined. However, as soon as the object penetrates the plane of optical energy emitted by OS1, the portion of the finger or stylus intersecting the plane will be illuminated (visibly or invisibly to a user). OS2 senses the intersection with the plane of interest by detecting optical energy reflected towards OS2 by the illuminated object region. Essentially only one plane is of interest to the present invention, as determined by configuration of OS1, and all other planes definable in three-dimensional space parallel to the virtual input device can be ignored as irrelevant. Thus, a planar three-dimensional sensor system senses user interactions with a virtual input device occurring on the emitted fan-beam plane, and ignores any interactions on other planes.
In this fashion, the present invention detects that an object has touched the virtual input device. Having sensed that a relevant touch-intersection is occurring, the invention then localizes in two-dimensions the location of the touch upon the plane of the virtual device. In the preferred implementation, localized events can include identifying which virtual keys on a virtual computer keyboard or musical keyboard are touched by the user. The user may touch more than one virtual key at a time, for example the “shift” key and another key. Note too that the time order of the touchings is determined by the present invention. Thus, if the user touches virtual keys for “shift” and “t”, and then for the letters “h” and then “e”, the present invention will recognize what is being input as “T” then “h” and then “e”, or “The”. It will be appreciated that the present invention does not rely upon ambient light, and thus can be fully operative even absent ambient light, assuming that the user knows the location of the virtual input device.
Structured-light and/or non-structured light passive triangulation methods may be used to determine a point of contact (x,z) between a user's hand and the sense plane. Since the baseline distance B between OS1 and OS2 is known, a triangle is formed between OS1, OS2 and point (x,z), whose sides are B, and projection rays R1 and R2 from OS1, OS2 to (x,z). OS1 and OS2 allow determination of triangle angular distance from a reference plane, as well as angles α1 and α2 formed by the projection rays, and trigonometry yields distance z to the surface point (x,z), as well as projection ray lengths.
A processor unit associated with the present invention executes software to identify each intersection of a user-controlled object with the virtual input device and determines therefrom the appropriate user-intended input data and/or command, preferably using triangulation analysis. The data and/or commands can then be output by the present invention as input to a device or system for which the virtual input device is used. If desired the present invention may be implemented within the companion device or system, especially for PDAs, cellular telephones, and other small form factor device or systems that often lack a large user input device such as a keyboard.
Other features and advantages of the invention will appear from the following description in which the preferred embodiments have been set forth in detail, in conjunction with their accompanying drawings.
BRIEF DESCRIPTION OF THE DRAWINGS
By “virtual input device” it is meant that an image of an input device may be present on work surface 40, perhaps by placing a paper bearing a printed image, or perhaps system 10 projects a visible image of the input device onto the work surface, or there literally may be no image whatsoever visible upon work surface 40. As such, virtual input device 50, 50′, 50″ requires no mechanical parts such as working keys, and need not be sensitive to a touch by a finger or stylus; in short, the virtual input device preferably is passive.
In the example of
System 10 further includes a second optical system (OS2) 60, typically a camera with a planar sensor, that is preferably spaced apart from and above 0S1 20, and inclined toward work surface 40 and plane 30 at an angle θ, about 10° to about 90°, and preferably about 250°. System 10 further includes an electronic processing system 70 that, among other tasks, supervises OS1 and OS2. System 70 preferably includes at least a central processor unit (CPU) and associated memory that can include read-only-memory (ROM) and random access memory (RAM).
In
In a structured-light embodiment, OS1 20 emits optical energy in fan-beam 30, parallel to the x-z plane 30. OS1 may include a laser line generator or an LED line generator, although other optical energy sources could be used to emit plane 30. A line generator OS1 is so called because it emits a plane of light that when intersected by a second plane illuminates what OS2 would view as a line on the second plane. For example if a cylindrical object intersected plane 30, OS2 would see the event as an illuminated portion of an elliptical arc whose aspect ratio would depend upon distance of OS2 above plane 30 and surface 40. Thus, excluding ambient light, detection by OS2 of an elliptical arc on plane 30 denotes a touching event, e.g., that an object such as 120R has contacted or penetrated plane 30. Although a variety of optical emitters may be used, a laser diode outputting perhaps 3 mW average power at a wavelength of between 300 nm to perhaps 1,000 nm could be used. While ambient light wavelengths (e.g., perhaps 350 nm to 700 nm) could be used, the effects of ambient light may be minimized without filtering or shutters if such wavelengths are avoided. Thus, wavelengths of about 600 nm (visible red) up to perhaps 1,000 nm (deep infrared) could be used. A laser diode outputting 850 nm wavelength optical energy would represent an economical emitter, although OS2 would preferably include a filter to reduce the effects of ambient light.
While OS1 preferably is stationary in a structured light embodiment, it is understood that a fan-beam 30 could be generated by mechanically sweeping a single emitted line of optical energy to define the fan-beam plane 30. As shown in
If desired, OS2 could be made responsive substantially solely to optical energy emitted from OS1 by synchronously switching OS1 and OS2 on and off at the same time, e.g., under control of unit 70. OS1 and OS2 preferably would include shutter mechanisms, depicted as elements 22, that would functionally open and close in synchronized fashion. For example, electronic processing system 70 could synchronously switch-on OS1, OS2, or shutter mechanisms 22 for a time period t1 with a desired duty cycle, where t1 is perhaps in the range of about 0.1 ms to about 35 ms, and then switch-off OS1 and OS2. If desired, OS1 could be operated at all times, where plane 30 is permitted to radiate only when shutter 22 in front of OS1 20 is open. In the various shutter configuration, repetition rate of the synchronous switching is preferably in the range of 20 Hz to perhaps 300 Hz to promote an adequate rate of frame data acquisition. To conserve operating power and reduce computational overhead, a repetition rate of perhaps 30 Hz to 100 Hz represents an acceptable rate. Of course other devices and methods for ensuring that OS2 responds substantially only to optical energy emitted by OS1 may also be used. For ease of illustration shutters 22 are depicted as mechanical elements, but in practice the concept of shutters 22 is understood to include turning on and off light sources and cameras in any of a variety of ways.
If desired, source(s) of optical energy used with the present invention could be made to carry a so-called signature to better enable such energy to be discerned from ambient light energy. For example and without limitation, such sources might be modulated at a fixed frequency such that cameras or other sensor units used with the present invention can more readily recognize such energy while ambient light energy would, by virtue of lacking such signature, be substantially rejected. In short, signature techniques such as selecting wavelengths for optical energy that differ from ambient light, techniques that involve synchronized operation of light sources and camera sensors, and modulating or otherwise tagging light source energy can all improve the signal/noise ratio of information acquired by the present invention.
Note that there is no requirement that work surface 40 be reflective or nonreflective with respect to the wavelength emitted by OS1 since the fan-beam or other emission of optical energy does not reach the surface per se. Note too that preferably the virtual input device is entirely passive. Since device 50 is passive, it can be scaled to be smaller than a full-sized device, if necessary. Further, the cost of a passive virtual input device can be nil, especially if the “device” is simply a piece of paper bearing a printed graphic image of an actual input device.
In
In
Thus, until an object such as a portion of a user's hand or perhaps of a stylus intersects the optical energy plane 30 emitted by OS1 20, there will be no reflected optical energy 130 for OS2 60 to detect. Under such conditions, system 10 knows that no user input is being made. However as soon as the optical energy plane is penetrated, the intersection of the penetrating object (e.g., fingertip, stylus tip, etc.) is detected by OS2 60, and the location (x,z) of the penetration can be determined by processor unit 70 associated with system 10. In
In the embodiment shown, system 10 can generate and input to system 80 or 90 keystrokes representing data and/or commands that a user would have entered on an actual keyboard. Such input to system 80 or 90 can be used to show information 140 on display 150, as the information is entered by the user on virtual input device 50. If desired, an enlarged cursor region 160 could be implemented to provide additional visual input to aid the user who is inputting information. If desired, processor unit 70 could cause system 80 and/or 90 to emit audible feedback to help the user, e.g., electronic keyclick sounds 170 corresponding with the “pressing” of a virtual key on virtual input device 50. It is understood that if system 80 or 90 were a musical instrument rather than a computer or PDA or cellular telephone, musical sounds 170 would be emitted, and virtual input-device 50 could instead have the configuration similar to a piano keyboard or keyboards associated with synthetic music generators.
In
Note that the configuration of
As used herein, triangulation helps determine the shape of surfaces in a field of view of interest by geometric analysis of triangles formed by the projection rays, e.g., R1, R2 of two optical systems, e.g., OS1 20, OS2 60. A baseline B represents the known length of the line that connects the centers of projection of the two optical systems, OS1, OS2. For a point (x,z) on a visible surface in the field of view of interest, a triangle may be defined by the location of the point and by locations of OS1, and OS2. The three sides of the triangle are B, R1, and R2. OS1 and OS2 can determine the angular distance of the triangle from a reference plane, as well as the angles α1 and α2 formed by the projection rays that connect the surface point with the centers of projection of the two optical systems. Angles α1 and α2 and baseline B completely determine the shape of the triangle. Simple trigonometry can be used to yield the distance to the surface point (x,z), as well as length of projection ray R1 and/or R2.
It is not required that OS1 20 be implemented as a single unit. For example
Triangulation according to the present invention preferably uses a standard camera with a planar sensor as OS2 60. The nature of OS1 20 distinguishes between two rather broad classes of triangulation. In a structured-light triangulation, OS1 20 is typically a laser or the like whose beam may be shaped as a single line that is moved to project a moving point onto a surface. Alternatively the laser beam may be planar and moved to project a planar curve. As noted, another class of triangulation system may be termed passive triangulation in which a camera is used as OS1 20. Structured-light systems tend to be more complex to build and consume more operating power, due to the need to project a plane of light. Passive systems are less expensive, and consume less power. However passive system must solve the so-called correspondence problem, e.g., to determine which pairs of points in the two images are projections of the same point in the real world. As will be described, passive non-structured-light triangulation embodiments may be used, according to the present invention.
Whether system 10 is implemented as a structured-light system in which OS1 actively emits light and OS2 is a camera, or as a passive system in which OS1 and OS2 are both cameras, information from OS2 and OS1 will be coupled to a processing unit, e.g., 70, that can determine what events are occurring. In either embodiment, when an object such as 120R intersects the projection plane 30 associated with OS1 20, the intersection is detectable. In a structured-light embodiment in which OS1 emits optical energy, the intersection is noted by optical energy reflected from the intersected object 120R and detected by OS2, typically a camera. In a passive light embodiment, the intersection is seen by OS1, a camera, and also by OS2, a camera. In each embodiment, the intersection with plane 30 is detected as though the region of surface 40 underlying the (x,z) plane intersection were touched by object 120R. System 10 preferably includes a computing system 70 that receives data from OS1, OS2 and uses geometry to determine the plane intersection position (x,z) from reflected image coordinates in a structured-light embodiment, or from camera image coordinates in a passive system. As such, the dual tasks of detecting initial and continuing contact and penetration of plane 30 (e.g., touch events), and determining intersection coordinate positions on the plane may be thus accomplished.
To summarize thus far, touch events are detected and declared when OS1 recognizes the intersection of plane 30 with an intruding object such as 120R. In a two-camera system, a correspondence is established between points in the perceived image from OS1 and from those in OS2. Thereafter, OS2 camera coordinates are transformed into touch-area (x-axis, z-axis) coordinates to locate the (x,z) coordinate position of the event within the area of interest in plane 30. Preferably such transformations are carried out by processor unit 70, which executes algorithms to compute intersect positions in plane 30 from image coordinates of points visible to OS2. Further, a passive light system must distinguish intruding objects from background in images from OS1 and OS2. Where system 10 is a passive light system, correspondence-needs to be established between the images from camera OS1 and from camera OS2. Where system 10 is a structured-light system, it is desired to minimize interference from ambient light.
Consider now computation of the (X,Z) intersection or tip position on plane 30. In perspective projection, a plane in the world and its image are related by a transformation called a homography. Let a point (X,Z) on such plane be represented by the column vector P=(X, Z, 1)T, where the superscript T denotes transposition. Similarly, let the corresponding image point be represented by p=(x, z, 1)T.
A homography then is a linear transformation P=Hp, where H is a 3×3 matrix.
This homography matrix may be found using a calibration procedure. Since the sensor rests on the surface, sensor position relative to the surface is constant, and the calibration procedure need be executed only once. For calibration, a grid of known pitch is placed on the flat surface on which the sensor is resting. The coordinates pi of the image points corresponding to the grid vertices Pi are measured in the image. A direct linear transform (DLT) algorithm can be used to determine the homography matrix H. Such DLT transform is known in the art; see for example Richard Hartley and Andrew Zisserman. Multiple View Geometry in Computer Vision, Cambridge University Press, Cambridge, UK, 2000.
Once H is known, the surface point P corresponding to a point p in the image is immediately computed by the matrix-vector multiplication above. Preferably such computations are executed by system 70.
Image correspondence for passive light embodiments will now be described. Cameras OS1 20 and OS2 60 see the same plane in space. As a consequence, mapping between the line-scan camera image from OS1 and the camera image from OS2 will itself be a homography. This is similar to mapping between the OS2 camera image and the plane 30 touch surface described above with respect to computation of the tip intercept position. Thus a similar procedure can be used to compute this mapping.
Note that since line scan camera OS1 20 essentially sees or grazes the touch surface collapsed to a single line, homography between the two images is degenerate. For each OS2 camera point there is one OS1 line-scan image point, but for each OS1 line-scan image point there is an entire line of OS2 camera points. Because of this degeneracy, the above-described DLT algorithm will be (trivially) modified to yield a point-to-line correspondence. By definition, a passive light embodiment of the present invention has no control over ambient lighting, and it can be challenging to distinguish intruding intersecting objects or tips from the general background. In short, how to tell whether a particular image pixel in an OS1 image or OS2 image represents the image of a point on an object such as 120R, or is a point in the general background. An algorithm executable by system 70 will now be described.
Initially, assume one or more background images 11, . . . , In with only the touch surface portion of plane 30 in view. Assume that cameras OS1 and OS2 can respond to color, and let Rbi(x, z), Gbi(x, z), Bbi(x, z) be the red, green, and blue components of the background image intensity li at pixel position (x, z). Let sb(x, z) be a summary of Rbi(x, z), Gbi(x, z), Bbi(x, z) over all images. For instance, Sb(x, z) can be a three-vector with the averages, medians, or other statistics of Rbi(x, z), Gbi(x, z), Bbi(x, z) at pixel position (x, z) over all background images I1, . . . , In, possibly normalized to de-emphasize variations in image brightness.
Next, collect a similar summary st for tip pixels over a new sequence of images J1, . . . , Jm. This second summary is a single vector, rather than an image of vectors as for Sb(x, z). In other words, st does not depend on the pixel position (x, z). This new summary can be computed, for instance, by asking a user to place finger tips or stylus in the sensitive area of the surface, and recording values only at pixel positions (x, z) whose color is very different from the background summary sb(x, z) at (x, z), and computing statistics over all values of j, x, z.
Then, given a new image with color components c(x, z)=(R(x, z), G(x, z), B(x, z)), a particular pixel at (x, z) is attributed to either tip or background by a suitable discrimination rule. For instance, a distance d(c1, c2) can be defined between three-vectors (Euclidean distance is one example), and pixels are assigned based on the following exemplary rule:
-
- Background if d(c(x,z), sb(x, z))<<d(c(x,z), st).
- Tip if d(c(x,z), sb(x, z))>>d(c(x,z), st).
- Unknown otherwise.
Techniques for reducing ambient light interference, especially for a structured-light triangulation embodiment will now be described. In such embodiment, OS2 needs to distinguish between ambient light and light produced by the line generator and reflected back by an intruding object.
Using a first method, OS1 emits energy in a region of the light spectrum where ambient light has little power, for instance, in the near infrared. An infrared filter on camera OS2 can ensure that the light detected by the OS2 sensor is primarily reflected from the object (e.g., 120R) into the lens of camera OS2.
In a second method, OS1 operates in the visible part of the spectrum, but is substantially brighter than ambient light. Although this can be achieved in principle with any color of the light source, for indoor applications it may be useful to use a blue-green light source for OS1 (500 nm to 550 nm) because standard fluorescent lights have relatively low emission in this band. Preferably OS2 will including a matched filter to ensure that response to other wavelengths are substantially attenuated.
A third method to reduce effects of ambient light uses a standard visible laser source for OS1, and a color camera sensor for OS2. This method uses the same background subtraction algorithm described above. Let the following combination be defined, using the same terminology as above:
C(x, z)=min{d(c(x,z), sb(x, z)), d(c(x,z), st)}.
This combination will be exactly zero when c(x,z) is equal to the representative object tip summary st (since d(st, st)=0) and for the background image sb(x, z) (since d(sb(x, z), sb(x, z))=0), and close to zero for other object tip image patches and for visible parts of the background. In other words, object tips and background will be hardly visible in the image C(x,z). By comparison, at positions where the projection plane 30 from laser emitter OS1 intersects object tips 120R, the term d(c(x,z), st) will be significantly non-zero, which in turn yields a substantially non-zero value for C(x,z). This methodology achieves the desired goal of identifying essentially only the object tip pixels illuminated by laser (or other emitter) OS1. This method can be varied to use light emitters of different colors, to use other distance definitions for the distance d, and to use different summaries Sb(x, z) and st.
In
As noted in
Consider now the configuration of
Accordingly the configuration of
Such intermediate orientations do not satisfy the Scheimpflug condition, but by a lesser degree and therefore still exhibit good focusing than a configuration whose lens axis points directly towards the center of the sensitive area of plane 3.
Touch position module 240 uses tip pixel coordinates from tip detection module 230 at the time a touch is reported from touch detection module 220 to find the (x-z) coordinates of the touch on the touch surface. As noted, a touch is tantamount to penetration of plane 30 associated with an optical emitter OS1 in a structured-light embodiment, or in a passive light embodiment, associated with a plane of view of a camera OS1. Mathematical methods to convert the pixel coordinates to the X-Z touch position are described elsewhere herein.
Key identification module 260 uses the X-Z position of a touch and maps the position to a key identification using a keyboard layout table 250 preferably stored in memory associated with computation unit 70. Keyboard layout table 250 typically defines the top/bottom/left and right coordinates of each key relative to a zero origin. As such, a function of key identification module 260 is to perform a search of table 250 and determine which key contains the (x,z) coordinates of the touch point. When the touched (virtual) key is identified, translation module 270 maps the key to a predetermined KEYCODE value. The KEYCODE value is output or passed to an application that is being executed on the companion device or system 80 (executing on a companion device) that is waiting to receive-a notification of a keystroke event. The application under execution interprets the keystroke event and assigns a meaning to it. For instance, a text input application uses the value to determine what symbol was typed. An electronic piano application determines what musical note was pressed and plays that note, etc.
Alternatively, as shown in
It is understood in
When the user's finger (or stylus) touches a region of virtual input device 50, touch position module 240 (see
In
In
In the embodiment of
Turning now to
If system 10 is a passive light system, a touch event is registered when the outline of a fingertip appears in a selected frame row of OS1, a digital camera. The (x,z) plane 30 location of the touch is determined by the pixel position of the corresponding object tip (e.g., 120R) in OS2, when a touch is detected in OS1. As shown in
As noted, in a structured-light embodiment, OS1 will typically be a laser line generator, and OS2 will be a camera primarily sensitive to wavelength of the light energy emitted by OS1. As noted, this can be achieved by installing a narrowband light filter on OS2 such that only wavelength corresponding to that emitted by OS1 will pass. Alternatively, OS2 can be understood to include a shutter that opens and closes in synchronism to pulse output of OS1, e.g., OS2 can see optical energy only at time that OS1 emits optical energy. In either embodiment of a structured-light system, OS2 preferably will only detect objects that intercept plane 30 and thus reflect energy emitted by OS1.
In the above case, touch sense detection and range calculation are carried out by system 10. Thus, a touch event is registered when the outline of an object, e.g., fingertip 120R, appears within the viewing range of OS2. As in the above example, range distance may be calculated as an affine function of the number of pixels from the “near” end of pixel frame.
A further example of analytical steps carried out in
The Z coordinate of the upper left corner of virtual keyboard 50 is set by convention to be x=0 and z=0, e.g., (0,0). The homography H that maps points in the image to points on the virtual device depends on the tilt of camera OS2 60. An exemplary homography matrix for the configuration above is as follows:
The above matrix preferably need be determined only once during a calibration procedure, described elsewhere herein.
Referring now to
Referring now to
As seen in
Had the user instead struck the spacebar or some other key closer to the bottom of virtual keyboard 50, that is, further away from the sensor OS1 20, the situation depicted by fingertip position 110 in
In the above example in which virtual key ′ T′ is pressed, tip detection module 230 in
The homogeneous image coordinate vector p is then multiplied by the homography matrix H to yield the coordinates P of the user fingertip in the frame of reference of the virtual keyboard:
The user-object or finger 120L is thus determined to have touched virtual keyboard 50 at a location point having coordinates x=11.53 and z=2.49 cm. Key identification module 260 in
These conditions are satisfied for the virtual “T” key because 10.5<11.53<12.4, and 1.9<2.49<3.8. Referring to
The occurrence need not necessarily be a keystroke. For example, the userobject or finger may have earlier contacted the “T” key and may have remained in touch contact with the key thereafter. In such case, no keystroke event should be communicated to application 280 running on the companion device 80 or 90.
Key translation module 270 preferably stores the up-state or down-state of each key internally. This module determines at every frame whether any key has changed state. In the above example, if the key “T” is found to be in the down-state in the current frame but was in the up-state in the previous frame, translation module 270 sends a KEYCODE message to application 280. The KEYCODE code will include a ′ KEY DOWN′ event identifier, along with a ′ KEY ID′ tag that identifies the “T” key, and thereby informs application 280 that the “T” key has just be “pressed” by the user-object. If the “T” key were found to have been also in the down-state during previous frames, the KEYCODE would include a ′ KEY HELD′ event identifier, together with the ′ KEY ID′ associated with the “T” key. Sending the ′ KEY HELD′ event at each frame (excepting the first frame) in which the key is in the down-state frees application 280 from having to maintain any state about the keys. Once the “T” key is found to be in the up-state in the current frame but was in the downstate in previous frames, translation module 270 sends a KEYCODE with a ′ KEY UP′ event identifier, again with a ′ KEY ID′ tag identifying the “T” key, informing application 280 that the “T” key was just “released” by the user-object.
From the foregoing, it will be appreciated that it suffices that frame images comprise only the tips of the user-object, e.g., fingertips. The various embodiments of the present invention use less than full three-dimensional image information acquired from within a relatively shallow volume defined slightly above a virtual input or virtual transfer device. A system implementing these embodiments can be relatively inexpensively fabricated and operated from a self-contained battery source. Indeed, the system could be constructed within common devices such as PDAs, cellular telephones, etc. to hasten the input or transfer of information from a user. As described, undesired effects from ambient light may be reduced by selection of wavelengths in active light embodiments, by synchronization of camera(s) and light sources, by signal processing techniques that acquire and subtract-out images representing background noise.
Modifications and variations may be made to the disclosed embodiments without departing from the subject and spirit of the invention as defined by the following claims.
Claims
1-13. (Cancelled)
14. A system enabling a user-manipulated user-object used with a virtual transfer device to transfer information to a companion device, the system comprising:
- a central processor unit including memory storing at least one software routine;
- a first optical system defining a plane substantially parallel-to and spaced-above a presumed location of said virtual transfer device;
- a second optical system having a relevant field of view encompassing at least portions of said plane and responsive to user-object penetration of said plane to interact with said virtual transfer device;
- means for determining relative position of a portion of said user-object on said plane;
- wherein said system transfers information to said companion device enabling user-object with said virtual transfer device to affect operation of said companion device
15. The system of claim 14, wherein said means for determining includes determining said relative position using triangulation analysis.
16. The system of claim 14, wherein said means for determining includes said processor unit executing said routine to determine said relative position.
17. The system of claim 14, wherein:
- said first optical system includes means for generating a plane of optical energy; and
- said second optical system includes a camera sensor that detects a reflected portion of said optical energy when said user-object penetrates said plane.
18. The system of claim 14, wherein:
- said first optical system includes at least one of (i) a laser to generate said plane, and (ii) an LED to generate said plane; and
- said second optical system includes a camera sensor that detects a reflected portion of said optical energy when said user-object penetrates said plane.
19. The system of claim 14, further including means for enhancing responsiveness of said second optical system to said user-object penetration while decreasing said responsiveness to ambient light.
20. The system of claim 19, wherein said means for enhancing includes at least one of (a) providing a signature associated with generation of said plane, (b) selecting a common wavelength for energy within said plane defined by said first optical system and for responsiveness of said second optical system, and (c) synchronizing operation of said first optical system and operation of said second optical system.
21. The system of claim 14, wherein said first optical system includes a first camera sensor that defines said plane.
22. The system of claim 14, wherein:
- said first optical system includes a first camera sensor that defines said plane;
- said second optical system includes a second camera to sense said penetration;
- and further including:
- a source of optical energy directed generally toward said virtual transfer device; and
- means for synchronizing operation of at least two of same first optical system, said second optical system, and said source of optical energy;
- wherein effects of ambient light upon accuracy of information obtained with said system are reduced.
23. The system of claim 14, wherein:
- said first optical system includes a generator of optical energy of a desired wavelength; and
- said second optical system is sensitive substantially only to optical energy of said desired wavelength.
24. The system of claim 14, wherein said companion device includes at least one of (i) a PDA, (ii) a portable communication device, (iii) an electronic device, (iv) an electronic game device, and (v) a musical instrument, and said virtual transfer device is at least one of (I) a virtual keyboard, (II) a virtual mouse, (III) a virtual trackball, (IV) a virtual pen, (V) a virtual trackpad, and (VI) a user-interface selector.
25. The system of claim 14, wherein said virtual transfer device is mapped to a work surface selected from at least one of (i) a table top, (ii) a desk top, (iii) a wall, (iv) a point-of-sale appliance, (v) a point-of-service appliance, (vi) a kiosk, (vii) a surface in a vehicle, (viii) a projected display, (ix) a physical display, (x) a CRT, and (xi) an LCD.
26. The system of claim 14, wherein at least one of said first operating system and said second operating system is a camera sensor having a lens and an image plane;
- wherein at least one of said lens ad said image plane is tilted to enhance at least one of resolution and depth of field.
27. The system of claim 14, further including means for enhancing distinguishment of said user-object from a background object.
Type: Application
Filed: Dec 30, 2003
Publication Date: Feb 3, 2005
Inventors: Carlo Tomasi (Palo Alto, CA), Abbas Rafii (Palo Alto, CA)
Application Number: 10/750,452