Compound Gesture Recognition
One embodiment of the invention includes a method for executing and interpreting gesture inputs in a gesture recognition interface system. The method includes detecting and translating a first sub-gesture into a first device input that defines a given reference associated with a portion of displayed visual content. The method also includes detecting and translating a second sub-gesture into a second device input that defines an execution command for the portion of the displayed visual content to which the given reference refers.
The present invention relates generally to interface systems, and specifically to compound gesture recognition.
BACKGROUNDAs the range of activities accomplished with a computer increases, new and innovative ways to provide an interface with a computer are often developed to complement the changes in computer functionality and packaging. For example, touch sensitive screens can allow a user to provide inputs to a computer without a mouse and/or a keyboard, such that desk area is not needed to operate the computer. Examples of touch sensitive screens include pressure sensitive membranes, beam break techniques with circumferential light sources and sensors, and acoustic ranging techniques. However, these types of computer interfaces can only provide information to the computer regarding the touch event, itself, and thus can be limited in application. In addition, such types of interfaces can be limited in the number of touch events that can be handled over a given amount of time, and can be prone to interpret unintended contacts, such as from a shirt cuff or palm, as touch events. Furthermore, touch sensitive screens can be prohibitively expensive and impractical for very large display sizes, such as those used for presentations.
SUMMARYOne embodiment of the invention includes a method for executing and interpreting gesture inputs in a gesture recognition interface system. The method includes detecting and translating a first sub-gesture into a first device input that defines a given reference associated with a portion of displayed visual content. The method also includes detecting and translating a second sub-gesture into a second device input that defines an execution command for the portion of the displayed visual content to which the given reference refers.
Another embodiment of the invention includes a method for executing and interpreting gesture inputs in a gesture recognition interface system. The method includes obtaining a plurality of sequential images of a gesture input environment and detecting a first sub-gesture based on a three-dimensional location of at least one feature of a first input object relative to displayed visual content in each of the plurality of sequential images of the gesture input environment. The method also includes translating the first sub-gesture into a first device input that defines a given reference associated with a portion of the displayed visual content. The method also includes detecting a second sub-gesture based on changes in the three-dimensional location of at least one feature of at least one of the first input object and a second input object in each of the plurality of sequential images of the gesture input environment. The method further includes translating the second sub-gesture into a second device input that defines an execution command for the portion of the displayed visual content to which the given reference refers.
Another embodiment of the invention includes a gesture recognition system. The system comprises means for displaying visual content and means for obtaining a plurality of sequential images of a gesture input environment that is associated with the visual content. The system also comprises means for determining compound gesture inputs associated with at least one input object based on three-dimensional locations of at least one feature of the at least one input object in each of the plurality of sequential images of the gesture input environment. The system further comprises means for translating the compound gesture inputs into a first device input and a second device input. The first device input can be configured to reference a portion of the visual content and the second device input can be configured to execute a command associated with the portion of the visual content to which the first device input refers in at least one of the buffered plurality of sequential images.
The present invention relates generally to interface systems, and specifically to compound gesture recognition. A user employs an input object to provide simulated inputs to a computer or other electronic device. It is to be understood that the simulated inputs can be provided by compound gestures using the input object. For example, the user could provide gestures that include pre-defined motion using the input object in a gesture recognition environment, such as defined by a foreground of a display screen that displays visual content. The input object could be, for example, one or both of the user's hands; a wand, stylus, pointing stick; or a variety of other devices with which the user can gesture. The simulated inputs could be, for example, simulated mouse inputs, such as to establish a reference to the displayed visual content and to execute a command on portions of the visual content with which the reference refers. Thus, a compound gesture can be a gesture with which multiple sub-gestures can be employed to provide multiple related device inputs. For example, a first sub-gesture can be a reference gesture to refer to a portion of the visual content and a second sub-gesture can be an execution gesture that can be performed concurrently with or immediately sequential to the first sub-gesture, such as to execute a command on the portion of the visual content to which the first sub-gesture refers.
Any of a variety of gesture recognition interface systems can be implemented to recognize the compound gestures. As an example, one or more infrared (IR) light sources can illuminate a gesture recognition environment that is defined by the area of physical space in a foreground of a vertical or horizontal display surface. A set of stereo cameras can each generate a plurality of images of the input object. The plurality of images can be, for example, based on a reflected light contrast of the IR light reflected back from the input object relative to substantially non-reflected light or more highly reflected light from a retroreflective background surface. The plurality of images of the input object from each camera could be, for example, a plurality of matched sets of images of the input object, such that each image in the matched set of images corresponds to the input object from a different perspective at substantially the same time. A given matched set of images can be employed to determine a location of the input object and the plurality of matched sets of images can be employed to determine physical motion of the input object.
A controller can be configured to receive the plurality of images to determine three-dimensional location information associated with the input object. For example, the controller could apply an algorithm to determine features of the input object, such as endpoints, length, and pitch of elongated portions of the input object in three-dimensional space. The controller could then translate the simulated inputs into device inputs based on the three-dimensional location information. For example, the controller could interpret gesture inputs based on motion associated with the input object and translate the gesture inputs into inputs to a computer or other device. The controller could also compare the motion associated with the one or more endpoints of the input object with a plurality of pre-defined gestures stored in a memory, such that a match with a given pre-defined gesture could correspond with a particular device input.
An input object 24 can provide simulated inputs over the vertical display surface 20. In the example of
In the example of
In the example of
The first camera 12 and the second camera 14 can each provide their respective separate images of the input object 24 to a controller 26. The controller 26 could reside, for example, within a computer (not shown) for which the gesture recognition interface system 10 is designed to provide a gesture recognition interface. It is to be understood, however, that the hosting of a controller is not limited to a standalone computer, but could be included in embedded processors. The controller 26 can process the respective images associated with the input object 24 to generate three-dimensional location data associated with the input object 24.
For example, each of the first camera 12 and the second camera 14 could each be mounted at pre-determined angles relative to the floor 28 beneath the vertical display surface 20. For a given matched pair of images of the input object 24, if the pre-determined angles of each of the cameras 12 and 14 are equal, then each point of the input object 24 in two-dimensional space in a given image from the camera 12 is equidistant from a corresponding point of the input object 24 in the respective matched image from the camera 14. As such, the controller 26 could determine the three-dimensional physical location of the input object 24 based on a relative parallax separation of the matched set of images of the input object 24 at a given time. In addition, using a computer algorithm, the controller 26 could also determine the three-dimensional physical location of features associated with portions of the input object 24, such as fingers and fingertips. As an example, the controller 26 can be configured to determine and interpret the gestures that are provided in the gesture recognition environment in any of a variety of ways, such as those described in either of U.S. patent applications entitled “Gesture Recognition Interface System”, Ser. No. 11/485,788, filed Jul. 13, 2006, and “Gesture Recognition Interface System with Vertical Display”, Ser. No. 12/133,836, filed Jun. 5, 2008, each assigned to the same assignee as the Present Application and incorporated herein by reference in its entirety.
The gesture recognition interface system 10 can also include a projector 30. The projector 30 can provide visual content with which the user can interact and provide inputs. In the example of
As an example, the controller 26 can determine compound gestures that are performed by a user using the input object 24 and can translate the compound gestures into simulated mouse inputs. For example, the controller 26 could interpret pointing at the vertical display surface 20 by the input object 24, such as with an extended index finger, to establish a reference 32 on the visual content that is displayed on the vertical display surface 20. In the example of
The establishment of the reference 32 can be a first of multiple sub-gestures of a compound gesture. Specifically, an additional sub-gesture can be implemented using the input object 24, or an additional input object such as the user's other hand, to perform an execution gesture that can be translated as an execution command to interact with a portion of the visual content with which the reference 32 refers, such as based on a visual overlapping. The portion of the visual content with which the reference 32 overlaps could be an active portion, such as could provide interaction in response to execution commands. Therefore, the controller 26 can interpret the additional sub-gesture of the compound gesture as a left mouse-click, a right mouse-click, a double mouse-click, or a click-and-hold. Accordingly, a user of the gesture recognition interface system 10 could navigate through a number of computer menus, graphical user interface (GUI) icons, and/or execute programs associated with a computer merely by moving his or her fingertip through the air in the gesture recognition environment 22 and initiating one or more complementary gestures without touching a mouse or the vertical display surface 20.
The first portion 52 of the diagram 50 demonstrates a user's hand 58 performing a first sub-gesture, such that the user's hand 58 is implemented as an input object in the associated gesture recognition interface system. The first sub-gesture is demonstrated in the example of
The second portion 54 of the diagram 50 demonstrates that, upon the reference 64 referring to OBJECT 3, the user performs a second sub-gesture of the compound gesture with the hand 58 by extending the thumb of the hand 58. The second sub-gesture that is performed by extending the thumb of the hand 58 can thus be an execution gesture. Therefore, in the second portion 54 of the diagram 50, the extension of the thumb could be translated by the associated controller as a “click-and-hold” command, such as to simulate a click-and-hold of a left mouse button. Accordingly, in the second portion 54 of the diagram 50, OBJECT 3 is selected for interaction by the user merely by the extension of the thumb.
The third portion 56 of the diagram 50 demonstrates the interaction of OBJECT 3 based on the user implementing the first gesture of the compound gesture. Specifically, as demonstrated in the example of
The example of
The compound gesture that is demonstrated in the example of
In the example of
In addition to translating the gestures into device inputs based on the sequential images stored in the image buffer 36, the controller 26 can also access the sequential images that are stored in the image buffer 36 to identify a portion of the visual content to which a reference gesture was referring prior to the performance of a subsequently performed execution gesture. As an example, the controller 26 can monitor an amount of time that a reference gesture refers to a given portion of the visual content and/or an amount of time between the termination of a reference gesture and the performance of an execution gesture. Accordingly, the controller 26 can associate the execution gesture with the reference gesture based on one or timing thresholds, such that the controller 26 can access previous images in the sequential images stored in the image buffer 36 to perform the corresponding execution command on the appropriate portion of the visual content.
The first portion 102 of the diagram 100 demonstrates a user's hand 108 performing a first sub-gesture, such that the user's hand 108 is implemented as an input object in the associated gesture recognition interface system. The first sub-gesture is demonstrated in the example of
The second portion 104 of the diagram 100 demonstrates that, upon the reference 114 referring to OBJECT 3, the user performs a second sub-gesture of the compound gesture with the hand 108 by snapping the fingers of the hand 108. The second sub-gesture that is performed by snapping the fingers of the hand 108 can thus be an execution gesture. Therefore, in the second portion 104 of the diagram 100, the snapping of the fingers could be translated by the associated controller as an execution command, such as to simulate a double click of a left mouse button.
As demonstrated in the example of
The third portion 106 of the diagram 100 demonstrates the effect of the execution command that is performed on OBJECT 3. Specifically, as described above, OBJECT 3 is configured as a desktop folder. Therefore, the effect of a simulated double left mouse-click is to open the desktop folder, demonstrated in the example of
The example of
Referring back to the example of
The diagram 150 includes a set of compound gestures that each involve the use of a user's hand 152 to perform the compound gestures. Each of the compound gestures demonstrated in the example of
A first compound gesture 158 is demonstrated in the diagram 150 as similar to the compound gesture demonstrated in the example of
A second compound gesture 160 is demonstrated in the diagram 150 as beginning with the reference gesture 154. However, the execution gesture 156 is demonstrated as the user maintaining the reference gesture 154 with the hand 152, except that the hand 152 is thrust forward and backward rapidly. Thus, the controller 26 can interpret the execution gesture 156 based on the rapid change forward and backward of the hand 152. In addition, a user can maintain the reference gesture 154 while performing the execution gesture 156, similar to the compound gesture described above in the example of
A third compound gesture 162 is demonstrated in the diagram 150 as beginning with the reference gesture 154. However, the execution gesture 156 is demonstrated as the user maintaining the extension of the index finger while rotating the index finger in a circle. As an example, the third compound gesture 162 can be configured to scroll through a document or list that is displayed on the vertical display surface 20, depending on the direction of rotation of the index finger. For example, the controller 26 could be configured to access the image buffer 36 to determine the document or list to which the reference gesture 154 referred prior to the execution gesture 156. As another example, the third compound gesture 162 could be combined with another gesture, such that the list or document could be selected with a different compound gesture prior to the execution gesture 156 of the third compound gesture 162.
A fourth compound gesture 164 is demonstrated in the diagram 150 as beginning with the reference gesture 154. However, the execution gesture 156 is demonstrated as the user forming a claw-grip with the thumb and all fingers. As an example, the fourth compound gesture 164 could be implemented to select a portion of the visual content for movement or for manipulation. It is to be understood that the fourth compound gesture 164 could include a subset of all of the fingers formed as a claw-grip, or each different amount or set of fingers could correspond to a different execution command. In addition, the claw-grip need not be implemented with the fingers and/or thumb touching, but could just include the fingers and/or thumb being slightly extended and bent.
A fifth compound gesture 166 is demonstrated in the diagram 150 as beginning with the reference gesture 154. However, the execution gesture 156 is demonstrated as the user forming an open palm. A sixth compound gesture 168 is demonstrated in the diagram 150 as beginning with the reference gesture 154, with the execution gesture 156 being demonstrated as the user forming a closed fist. As an example, the fifth compound gesture 166 and/or the sixth compound gesture 168 could be implemented to select a portion of the visual content for movement or for manipulation. In addition, for example, either of the fifth compound gesture 166 and the sixth compound gesture 168 could include motion of the thumb to incorporate a different execution gesture.
The diagram 150 in the example of
The diagram 200 includes a first compound gesture 202, a second compound gesture 203, a third compound gesture 204, and a fourth compound gesture 205 that all involve the use of a user's hand 206 to perform the compound gestures. Each of the compound gestures demonstrated in the example of
The first compound gesture 202 is demonstrated in the diagram 150 as similar to the fourth compound gesture 164 demonstrated in the example of
The second compound gesture 203 is demonstrated in the diagram 200 as similar to the fifth compound gesture 166 demonstrated in the example of
The third compound gesture 204 is demonstrated in the diagram 150 as similar to the first compound gesture 168 demonstrated in the example of
The fourth compound gesture 205 is demonstrated in the diagram 200 as similar to the first compound gesture 168 demonstrated in the example of
The diagram 200 in the example of
The two-handed compound gesture 250 demonstrated in the example of
The example of
The diagram 300 includes a set of compound gestures that each involve the use of a user's left hand 302 and right hand 304 to perform the compound gestures. Each of the compound gestures demonstrated in the example of
A first compound gesture 310 is demonstrated in the diagram 300 as similar to the compound gesture 168 demonstrated in the example of
A second compound gesture 314 is demonstrated in the diagram 300 as similar to the compound gestures 164 and 202 demonstrated in the examples of
A third compound gesture 318 is demonstrated in the diagram 300 as similar to the compound gestures 166 and 203 demonstrated in the examples of
It is to be understood that the diagram 300 is not intended to be limiting as to the two-handed compound gestures that are capable of being performed in the gesture recognition interface system 10. As an example, the two-handed compound gestures are not limited to implementation of the extended fingers and thumb of the ready position 308 of the right hand, but that a different arrangement of fingers and the thumb could instead by implemented. As another example, it is to be understood that the two-handed compound gestures in the diagram 300 can be combined with any of a variety of other gestures, such as the single-handed compound gestures in the examples of
The gesture recognition interface system 400 includes a first camera 402 and a second camera 404. Coupled to each of the first camera 402 and the second camera 404, respectively, is a first IR light source 406 and a second IR light source 408. The first camera 402 and the second camera 404 may each include an IR filter, such that the respective camera may pass IR light and substantially filter other light spectrums. The first IR light source 406 and the second IR light source 408 each illuminate a background surface 410 which can be retroreflective. As such, IR light from the first IR light source 406 can be reflected substantially directly back to the first camera 402 and IR light from the second IR light source 408 can be reflected substantially directly back to the second camera 404. Accordingly, an object that is placed above the background surface 410 may reflect a significantly lesser amount of IR light back to each of the first camera 402 and the second camera 404, respectively. Therefore, such an object can appear to each of the first camera 402 and the second camera 404 as a silhouette image, such that it can appear as a substantially darker object in the foreground of a highly illuminated background surface 410. It is to be understood that the background surface 410 may not be completely retroreflective, but may include a Lambertian factor to facilitate viewing by users at various angles relative to the background surface 410.
An input object 412 can provide simulated inputs over the background surface 410. In the example of
In the example of
The first camera 402 and the second camera 404 can each provide their respective separate silhouette images of the input object 412 to a controller 414. The controller 414 could reside, for example, within a computer (not shown) for which the gesture recognition interface system 400 is designed to provide a gesture recognition interface. It is to be understood, however, that the hosting of a controller is not limited to a standalone computer, but could be included in embedded processors. The controller 414 can process the respective silhouette images associated with the input object 412 to generate three-dimensional location data associated with the input object 412.
For example, each of the first camera 402 and the second camera 404 could be mounted at a pre-determined angle relative to the background surface 410. For a given matched pair of images of the input object 412, if the predetermined angle of each of the cameras 402 and 404 is equal, then each point of the input object 412 in two-dimensional space in a given image from the camera 402 is equidistant from a corresponding point of the input object 412 in the respective matched image from the camera 404. As such, the controller 414 could determine the three-dimensional physical location of the input object 412 based on a relative parallax separation of the matched pair of images of the input object 412 at a given time. In addition, using a computer algorithm, the controller 414 could also determine the three-dimensional physical location of at least one end-point, such as a fingertip, associated with the input object 412.
The gesture recognition interface system 400 can also include a projector 416 configured to project image data. The projector 416 can provide an output interface, such as, for example, computer monitor data, for which the user can interact and provide inputs using the input object 412. In the example of
It is to be understood that the gesture recognition interface system 400 is not intended to be limited to the example of
The gesture recognition interface system 450 includes a three-dimensional display system 458, demonstrated in the example of
An input object 464, demonstrated as a user's hand in the example of
As an example, a user of the gesture recognition interface system 450 could perform a reference gesture with the input object 464 to refer to one of the functional components 462, demonstrated in the example of
The gesture recognition interface system 450 is demonstrated as yet another example of the use of compound gestures in providing device inputs to a computer. It is to be understood that the gesture recognition interface system 450 is not intended to be limited to the example of
In view of the foregoing structural and functional features described above, a methodology in accordance with various aspects of the present invention will be better appreciated with reference to
At 506, a first gesture input is determined based on a three-dimensional location of at least one feature of a first input object relative to displayed visual content in each of the plurality of sequential images of the gesture input environment. The first gesture input can be a portion of a compound gesture, such that it is a reference gesture. The gesture can be determined based on an IR brightness contrast as perceived by a controller in each of the sequential images. The three-dimensional location can be based on parallax separation of the features in each of the concurrent images in the sequence. At 508, the first gesture is translated into a first device input to the computer, the first device input being configured to refer to a portion of the visual content. The reference to the portion of the visual content can be based on establishing a reference, such as a mouse pointer, on the visual content in response to the first gesture. Thus the first gesture input could be a pointed index finger to simulate a mouse cursor.
At 510, a second gesture input is determined based on changes in the three-dimensional location of at least one feature of at least one of the first input object and a second input object in each of the plurality of sequential images of the gesture input environment, the second gesture being different than the first gesture. The second gesture input can be a portion of a compound gesture, such that it is an execution gesture. The second gesture input could be performed with the same hand as the first gesture input, the other hand, or with both hands. At 512, the second gesture is translated into a second device input to the computer, the second device input being configured to execute a command associated with the portion of the visual content to which the first device input refers in at least one of the buffered plurality of sequential images. The executed command can be any of a variety of commands that manipulate the portion of the visual content to which the first gesture input refers, such as left, right, or scrolling mouse commands, and/or such as single-click, double-click, or click-and-hold commands.
What have been described above are examples of the present invention. It is, of course, not possible to describe every conceivable combination of components or methodologies for purposes of describing the present invention, but one of ordinary skill in the art will recognize that many further combinations and permutations of the present invention are possible. Accordingly, the present invention is intended to embrace all such alterations, modifications and variations that fall within the spirit and scope of the appended claims.
Claims
1. A method for executing and interpreting gesture inputs in a gesture recognition interface system, the method comprising:
- detecting and translating a first sub-gesture into a first device input that defines a given reference associated with a portion of displayed visual content;
- detecting and translating a second sub-gesture into a second device input that defines an execution command for the portion of the displayed visual content to which the given reference refers.
2. The method of claim 1, wherein detecting and translating the first sub-gesture comprises detecting and translating the first sub-gesture that is provided by a first hand of a user.
3. The method of claim 2, wherein the first sub-gesture comprises pointing to the portion of the displayed visual content with at least one extended finger, and wherein the second sub-gesture comprises one of moving a thumb associated with the first hand of the user, rapidly extending and retracting the first hand of the user, rotating the at least one extended finger, forming a grip with the thumb and at least one finger of the first hand of the user, forming a fist, and snapping fingers of the first hand of the user.
4. The method of claim 2, wherein detecting and translating the second sub-gesture comprises detecting and translating the second sub-gesture that is provided by the second hand of the user, the method further comprising maintaining the first sub-gesture with the first hand of the user concurrently with employing the second hand of the user to provide the second sub-gesture.
5. The method of claim 1, further comprising maintaining the second sub-gesture to interact with the selected portion of the displayed visual content.
6. The method of claim 1, wherein the first sub-gesture corresponds to pointing at the portion of the displayed visual content, and wherein the reference corresponds to a mouse cursor.
7. The method of claim 1, wherein the second device input corresponds to one of a single mouse-click, a double mouse-click, and a mouse-click-and-hold.
8. The method of claim 1, wherein translating the first and second sub-gestures comprises:
- obtaining a plurality of sequential images of an input object;
- determining a three-dimensional location of the input object in each of the plurality of sequential images of the input object;
- determining motion of the input object based on changes in the three-dimensional location of the input object in each of the plurality of sequential images of the input object; and
- correlating the motion of the input object into one of a plurality of predefined gestures that each have an associated device input.
9. The method of claim 8, wherein the first sub-gesture and the second sub-gesture are each implemented by the first input object, wherein translating the first sub-gesture further comprises:
- buffering the plurality of sequential images in a memory;
- accessing the buffered plurality of sequential images from the memory subsequent to translating the second gesture; and
- determining to what the given reference refers on the displayed visual content based on the accessed plurality of sequential images.
10. The method of claim 8, wherein obtaining the plurality of sequential images of the input object comprises obtaining a plurality of sequential images of the input object concurrently from a plurality of stereo cameras, and wherein determining the three-dimensional location of the input object comprises:
- illuminating the input object with infrared (IR) light;
- determining a location of the input object in each the plurality of sequential images of each of the plurality of stereo cameras based on an IR brightness contrast between the input object and a background; and
- determining the three-dimensional location of the input object based on a parallax separation of the input object in each of the concurrently obtained sequential images of each of the respective plurality of stereo cameras.
11. The method of claim 1, further comprising:
- detecting and translating a third sub-gesture into a third device input that is configured to execute a command associated with manipulation of the portion of the displayed visual content.
12. A gesture recognition interface system configured to implement the method of claim 1.
13. A method for executing and interpreting gesture inputs in a gesture recognition interface system, the method comprising:
- obtaining a plurality of sequential images of a gesture input environment;
- detecting a first sub-gesture based on a three-dimensional location of at least one feature of a first input object relative to displayed visual content in each of the plurality of sequential images of the gesture input environment;
- translating the first sub-gesture into a first device input that defines a given reference associated with a portion of the displayed visual content;
- detecting a second sub-gesture based on changes in the three-dimensional location of at least one feature of at least one of the first input object and a second input object in each of the plurality of sequential images of the gesture input environment; and
- translating the second sub-gesture into a second device input that defines an execution command for the portion of the displayed visual content to which the given reference refers.
14. The method of claim 13, wherein detecting the first sub-gesture comprises determining a three-dimensional location of at least one finger of a first hand of a user relative to the displayed visual content, and wherein detecting the second sub-gesture comprises determining changes in the three-dimensional location of at least one finger of at least one of the first hand of the user and a second hand of the user in each of the plurality of sequential images of the gesture input environment.
15. The method of claim 14, wherein determining the three-dimensional location of the at least one finger of a first hand comprises determining the portion of the visual content based on a location on the visual content where the at least one finger is pointing, and wherein the second sub-gesture comprises one of moving a thumb associated with the first hand of the user, rapidly extending and retracting the first hand of the user, rotating the at least one finger, forming a grip with a thumb and at least one finger of one of the first and second hand, and snapping fingers of one of the first and second hand.
16. The method of claim 14, wherein determining the second sub-gesture comprises maintaining the first sub-gesture with the first hand of the user concurrently with determining the changes in the three-dimensional location of the at least one finger of the second hand of the user.
17. The method of claim 13, further comprising:
- detecting a third sub-gesture based on changes in a three-dimensional location of at least one feature of the at least one of the first input object and a second input object in each of the plurality of sequential images of the gesture input environment; and
- translating the third sub-gesture into a third device input to the computer, the third device input being configured to execute a command associated with manipulation of the portion of the visual content.
18. The method of claim 13, wherein detecting the third sub-gesture comprises determining changes in the three-dimensional location of the at least one feature of the first input object in each of the plurality of sequential images of the gesture input environment, and wherein translating the second sub-gesture comprises accessing the plurality of sequential images prior to the changes in the three-dimensional location of the at least one feature of the first input object to determine the portion of the visual content on which to execute the command.
19. A gesture recognition interface system comprising:
- means for displaying visual content;
- means for obtaining a plurality of sequential images of a gesture input environment that is associated with the visual content;
- means for determining compound gesture inputs associated with at least one input object based on three-dimensional locations of at least one feature of the at least one input object in each of the plurality of sequential images of the gesture input environment; and
- means for translating the compound gesture inputs into a first device input and a second device input, the first device input being configured to reference a portion of the visual content and the second device input being configured to execute a command associated with the portion of the visual content to which the first device input refers in at least one of the plurality of sequential images.
20. The system of claim 19, wherein the means for means for obtaining the plurality of images comprises plural means for concurrently obtaining the plurality of sequential images from different perspectives, and wherein the means for determining the compound gesture inputs comprises:
- means for illuminating the at least one input object with infrared (IR) light; and
- means for determining the three-dimensional locations of the at least one feature of the at least one input object based on an IR brightness contrast between the at least one input object and a background, and based on a parallax separation of the at least one feature of the at least one input object in the different perspectives of the concurrently obtained plurality of sequential images.
21. The system of claim 19, wherein the means determining the compound gestures determines changes in the three-dimensional location of the at least one feature of the first input object in each of the plurality of sequential images of the gesture input environment, and wherein the means for translating the compound gesture inputs accesses the buffered plurality of sequential images prior to the changes in the three-dimensional location of the at least one feature of the first input object to determine the portion of the visual content on which to execute the command.
Type: Application
Filed: Aug 22, 2008
Publication Date: Feb 25, 2010
Patent Grant number: 8972902
Inventors: H. Keith Nishihara (Los Altos, CA), Shi-Ping Hsu (Pasadena, CA), Adrian Kaehler (Boulder Creek, CA), Bran Ferren (Glendale, CA), Lars Jangaard (Glendale, CA)
Application Number: 12/196,767