PORTABLE DEVICE INCLUDING MOUTH DETECTION TO INITIATE SPEECH RECOGNITION AND/OR VOICE COMMANDS

Info

Publication number: 20130190043
Type: Application
Filed: Nov 30, 2012
Publication Date: Jul 25, 2013
Inventor: Charles J. Kulas (San Francisco, CA)
Application Number: 13/691,351

Abstract

Embodiments generally relate to portable electronic devices such as a phone with a camera and touchscreen. In one embodiment, a method includes detecting a voice and checking if an image of a mouth is detected by using the camera. An embodiment also includes activating a voice recognition application on a phone if both the voice and the mouth are detected.

Description

Description

CROSS REFERENCES TO RELATED APPLICATIONS

This application claims priority from U.S. Provisional Patent Application Ser. No. 61/590,284; entitled “USER INTERFACE USING DEVICE AWARENESS”, filed on Jan. 24, 2012, which is hereby incorporated by reference as if set forth in full in this document for all purposes.

BACKGROUND

Many conventional computing devices such as computers, tablets, game consoles, televisions, monitors, phones, etc., include a touchscreen. A touchscreen enables a user to interact directly with displayed objects on the touchscreen by touching the objects with a hand, finger, stylus, or other item. Such displayed objects may include controls that control functions on a phone. Using the touchscreen, the user can activate controls by touching corresponding objects on the touchscreen. For example, the user can touch an object such as a button on the touchscreen to activate a voice recognition application on the phone. The user can touch the touchscreen and swipe up and down to scroll a page up and down on the touchscreen.

SUMMARY

Embodiments generally relate to a phone. In one embodiment, a method includes detecting a voice and checking if a mouth is detected. The method also includes activating a voice recognition application or voice command input on a phone if both the voice and the mouth are detected.

One embodiment provides a method comprising: detecting a voice; checking if a mouth is detected; and activating a voice recognition application on a phone if both the voice and the mouth are detected.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 illustrates a diagram of a phone that is held up to the mouth of a user, where the user is talking into the phone, according to one embodiment.

FIG. 2 illustrates a block diagram of a phone, which may be used to implement the embodiments described herein.

FIG. 3 illustrates an example simplified flow diagram for enhancing phone functionality based on detection of a mouth of a user, according to one embodiment.

DETAILED DESCRIPTION

Embodiments described herein enhance phone functionality based on detection of a mouth of a user. In one embodiment, if a phone detects both a voice and mouth, the phone automatically activates a voice recognition application on a phone. In other words, if a user holds the phone up to the user's mouth and talks, the phone automatically interprets what the user is saying without the user needing to manually activate the voice recognition application.

FIG. 1 illustrates a diagram of a phone 100 that is held up to the mouth 102 of a user, where the user is talking into phone 100, according to one embodiment. In one embodiment, phone 100 includes a display screen 104 and a camera lens 106 of a camera. Camera lens 106 is configured to detect objects (e.g., mouth 102) that are within a predetermined distance from display screen 104. In one embodiment, camera lens 106 may be configured with a field of view 108 that can detect mouth 102 when it is within a close proximity (e.g., 3 to 6 inches, or more) to display screen 104.

In one embodiment, camera lens 106 may be a wide angle lens that can capture an object that is anywhere in front of display screen 104. In one embodiment, camera lens 106 may be a transparent cover over an existing camera lens, where camera lens 106 alters the optics to achieve a wider field of view and closer focus. As an overlay, camera lens 106 may be a film or button placed over an existing lens to alter the optics. In one embodiment, if camera lens 106 overlays an existing camera lens, phone 100 corrects any distortions to an image that may occur. Camera lens 106 may be permanently fixed to phone 100 or temporarily fixed to phone 100. In one embodiment, camera lens 106 may be a permanent auxiliary lens on phone 100, which may be used by an existing camera or a separate dedicated camera with the purpose of detecting a user finger.

While camera lens 106 is shown in the upper center portion of phone 100, camera lens 100 may be located anywhere on the face of phone 100 One or more lenses or cameras may be used, placed and oriented on the device as desired. FIG. 2 illustrates a block diagram of a phone 100, which may be used to implement the embodiments described herein. In one embodiment, phone 100 may include a processor 202 and a memory 204. A phone aware application 206 may be stored on memory 204 or on any other suitable storage location or computer-readable medium. In one embodiment, memory 204 may be a non-volatile memory (e.g., random-access memory (RAM), flash memory, etc.). Phone aware application 206 provides instructions that enable processor 202 to perform the functions described herein. In one embodiment, processor 202 may include logic circuitry (not shown).

In one embodiment, phone 100 also includes a detection unit 210. In one embodiment, detection unit 210 may be a camera that includes an image sensor 212 and an aperture 214. Image sensor 212 captures images when image sensor 212 is exposed to light passing through camera lens 106 (FIG. 1). Aperture 214 regulates light passing through camera lens 106. In one embodiment, after detection unit 210 captures images, detection unit 210 may store the images in an image library 216 in memory 204.

In other embodiments, phone 100 may not have all of the components listed and/or may have other components instead of, or in addition to, those listed above.

The components of phone 100 shown in FIG. 2 may be implemented by one or more processors or any combination of hardware devices, as well as any combination of hardware, software, firmware, etc.

While phone 100 is described as performing the steps as described in the embodiments herein, any suitable component or combination of components of phone 100 may perform the steps described.

FIG. 3 illustrates an example simplified flow diagram for enhancing phone functionality based on detection of a mouth of a user, according to one embodiment. Referring to both FIGS. 1 and 3, a method is initiated in block 302, where phone 100 detects a voice. In one embodiment, the voice includes speech. In block 304, phone 100 checks if a mouth 102 is detected. In block 306, phone 100 activates a voice recognition application on a phone if both the voice and mouth 102 are detected. In one embodiment, a face is sufficient determine that the user intends to speak into phone 100. In other words, phone 100 activates a voice recognition application on a phone if both the voice and a face are detected.

In one embodiment, phone 100 activates a voice recognition application on a phone if both the voice and mouth 102 with moving lips are detected. In one embodiment, if phone 100 detects moving lips, phone 100 activates a lip reading application. In one embodiment, phone 100 may interpret commands from the user solely by voice recognition, solely by lip reading, or a combination of both voice recognition and lip reading.

In one embodiment, to detect a mouth, phone 100 takes a picture if the voice is detected. Phone 100 then determines if a mouth is in the picture. If the mouth is in the picture, phone 100 determines a distance between the mouth and the phone. In one embodiment, a mouth is determined to be detected if the mouth is within a predetermined distance from the phone. One or more pictures can be taken. Video can also be used.

In one embodiment, the predetermined distance (e.g., 0 to 12 inches, or more, etc.) is set to a default distance that is set at the factory. In one embodiment, the user may modify the predetermined distance. The user may also modify the field of view 108 angle. A face or mouth 102 that is close to display screen 102 is indicative of the user intending to speak into phone 100. For example, if the users mouth/face is within 12 inches from display screen 104, the user probably intends to speak into phone 100 to activate a control.

In one embodiment, any detection device or sensor may be used to check for a mouth. For example, such a sensor can be an image sensor, a proximity sensor, a distance sensor, an accelerometer, an infrared sensor, and an acoustic sensor, etc.

Although the description has been described with respect to particular embodiments thereof, these particular embodiments are merely illustrative, and not restrictive.

Any suitable programming language may be used to implement the routines of particular embodiments including C, C++, Java, assembly language, etc. Different programming techniques may be employed such as procedural or object-oriented. The routines may execute on a single processing device or on multiple processors. Although the steps, operations, or computations may be presented in a specific order, the order may be changed in particular embodiments. In some particular embodiments, multiple steps shown as sequential in this specification may be performed at the same time.

Particular embodiments may be implemented in a computer-readable storage medium (also referred to as a machine-readable storage medium) for use by or in connection with an instruction execution system, apparatus, system, or device. Particular embodiments may be implemented in the form of control logic in software or hardware or a combination of both. The control logic, when executed by one or more processors, may be operable to perform that which is described in particular embodiments.

A “processor” includes any suitable hardware and/or software system, mechanism or component that processes data, signals or other information. A processor may include a system with a general-purpose central processing unit, multiple processing units, dedicated circuitry for achieving functionality, or other systems. Processing need not be limited to a geographic location, or have temporal limitations. For example, a processor may perform its functions in “real time,” “offline,” in a “batch mode,” etc. Portions of processing may be performed at different times and at different locations, by different (or the same) processing systems. A computer may be any processor in communication with a memory. The memory may be any suitable processor-readable storage medium, such as random-access memory (RAM), read-only memory (ROM), magnetic or optical disk, or other tangible media suitable for storing instructions for execution by the processor.

Particular embodiments may be implemented by using a programmed general purpose digital computer, by using application specific integrated circuits, programmable logic devices, field programmable gate arrays, optical, chemical, biological, quantum or nanoengineered systems, components and mechanisms. In general, the functions of particular embodiments may be achieved by any means known in the art. Distributed, networked systems, components, and/or circuits may be used. Communication or transfer of data may be wired, wireless, or by any other means.

It will also be appreciated that one or more of the elements depicted in the drawings/figures may also be implemented in a more separated or integrated manner, or even removed or rendered as inoperable in certain cases, as is useful in accordance with a particular application. It is also within the spirit and scope to implement a program or code that is stored in a machine-readable medium to permit a computer to perform any of the methods described above.

As used in the description herein and throughout the claims that follow, “a”, “an”, and “the” include plural references unless the context clearly dictates otherwise. Also, as used in the description herein and throughout the claims that follow, the meaning of “in” includes “in” and “on” unless the context clearly dictates otherwise.

While one or more implementations have been described by way of example and in terms of the specific embodiments, it is to be understood that the implementations are not limited to the disclosed embodiments. To the contrary, they are intended to cover various modifications and similar arrangements as would be apparent to those skilled in the art. Therefore, the scope of the appended claims should be accorded the broadest interpretation so as to encompass all such modifications and similar arrangements.

Thus, while particular embodiments have been described herein, latitudes of modification, various changes, and substitutions are intended in the foregoing disclosures, and it will be appreciated that in some instances some features of particular embodiments will be employed without a corresponding use of other features without departing from the scope and spirit as set forth. Therefore, many modifications may be made to adapt a particular situation or material to the essential scope and spirit.

Claims

1. A method comprising:

detecting a voice;

checking if a mouth is detected; and

activating a voice recognition application on a phone if both the voice and the mouth are detected.

2. The method of claim 1, wherein the detecting of the mouth comprises detecting moving lips.

3. The method of claim 1, further comprising:

detecting moving lips; and

activating a lip reading application.

4. The method of claim 1, further comprising:

detecting moving lips;

activating a lip reading application; and

interpreting commands from a user by one or more of voice recognition and lip reading.

5. The method of claim 1, wherein the detecting of the mouth comprises:

taking a picture if the voice is detected; and

determining if a mouth is in the picture.

6. The method of claim 1, wherein the detecting of the mouth comprises:

taking a picture if the voice is detected;

determining if a mouth is in the picture; and

if the mouth is in the picture, determining a distance between the mouth and the phone, wherein the mouth is determined to be detected if the mouth is within a predetermined distance from the phone.

7. A computer-readable storage medium carrying one or more sequences of instructions thereon, the instructions when executed by a processor cause the processor to:

detect a voice;

check if a mouth is detected; and

activate a voice recognition application on a phone if both the voice and the mouth are detected.

8. The computer-readable storage medium of claim 7, wherein the instructions further cause the processor to detect moving lips.

9. The computer-readable storage medium of claim 7, wherein the instructions further cause the processor to:

detect moving lips; and

activate a lip reading application.

10. The computer-readable storage medium of claim 7, wherein the instructions further cause the processor to:

detect moving lips;

activate a lip reading application; and

interpret commands from a user by one or more of voice recognition and lip reading.

11. The computer-readable storage medium of claim 7, wherein the instructions further cause the processor to:

take a picture if the voice is detected; and

determine if a mouth is in the picture.

12. The computer-readable storage medium of claim 7, wherein the instructions further cause the processor to:

take a picture if the voice is detected;

determine if a mouth is in the picture; and

if the mouth is in the picture, determine a distance between the mouth and the phone, wherein the mouth is determined to be detected if the mouth is within a predetermined distance from the phone.

13. An apparatus comprising:

one or more processors; and

logic encoded in one or more tangible media for execution by the one or more processors, and when executed operable to:

detect a voice;

check if a mouth is detected; and

activate a voice recognition application on a phone if both the voice and the mouth are detected.

14. The apparatus of claim 13, further comprising a sensor that checks for the mouth.

15. The apparatus of claim 13, further comprising a sensor that checks for the mouth, wherein the sensor is one or more of an image sensor, a proximity sensor, a distance sensor, an accelerometer, and an acoustic sensor.

16. The apparatus of claim 13, further comprising a camera that has a lens configured to detect objects within the predetermined distance from the phone.

17. The apparatus of claim 13, wherein the logic when executed is further operable to detect moving lips.

18. The apparatus of claim 13, wherein the logic when executed is further operable to: