Methods, systems, and products for language preferences
Methods, systems, and computer program products provide personalized feedback in a cloud-based environment. A client device routes image data to a server for analysis. The server analyzes the image data to recognize people of interest. Because the server performs image recognition, the client device is relieved of these intensive operations.
Latest AT&T Patents:
- METHOD AND SYSTEM FOR DYNAMIC LINK AGGREGATION
- DUAL SUBSCRIBER IDENTITY MODULE RADIO DEVICE AND SERVICE RECOVERY METHOD
- CARRIER AGGREGATION - HANDOVER SYNERGISM
- APPARATUSES AND METHODS FOR FACILITATING AN INDEPENDENT SCELL TOPOLOGY IN RESPECT OF COMMUNICATIONS AND SIGNALING
- Protection Against Relay Attack for Keyless Entry Systems in Vehicles and Systems
This application is a continuation of U.S. application Ser. No. 14/827,278 filed Aug. 15, 2015 and since issued as U.S. Pat. No. 9,507,770, which is a continuation of U.S. application Ser. No. 13/669,500 filed Nov. 6, 2012 and since issued as U.S. Pat. No. 9,137,314, with all applications incorporated herein by reference in their entireties.
BACKGROUNDVideo and audio processing require intensive operations. Processors and memory may be taxed and even overwhelmed when executing image and audio instructions. Indeed, in today's mobile environment, video and audio processing can waste limited processing and battery resources.
The features, aspects, and advantages of the exemplary embodiments are better understood when the following Detailed Description is read with reference to the accompanying drawings, wherein:
The exemplary embodiments will now be described more fully hereinafter with reference to the accompanying drawings. The exemplary embodiments may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. These embodiments are provided so that this disclosure will be thorough and complete and will fully convey the exemplary embodiments to those of ordinary skill in the art. Moreover, all statements herein reciting embodiments, as well as specific examples thereof, are intended to encompass both structural and functional equivalents thereof. Additionally, it is intended that such equivalents include both currently known equivalents as well as equivalents developed in the future (i.e., any elements developed that perform the same function, regardless of structure).
Thus, for example, it will be appreciated by those of ordinary skill in the art that the diagrams, schematics, illustrations, and the like represent conceptual views or processes illustrating the exemplary embodiments. The functions of the various elements shown in the figures may be provided through the use of dedicated hardware as well as hardware capable of executing associated software. Those of ordinary skill in the art further understand that the exemplary hardware, software, processes, methods, and/or operating systems described herein are for illustrative purposes and, thus, are not intended to be limited to any particular named manufacturer.
As used herein, the singular forms “a,” “an,” and “the” are intended to include the plural forms as well, unless expressly stated otherwise. It will be further understood that the terms “includes,” “comprises,” “including,” and/or “comprising,” when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element is referred to as being “connected” or “coupled” to another element, it can be directly connected or coupled to the other element or intervening elements may be present. Furthermore, “connected” or “coupled” as used herein may include wirelessly connected or coupled. As used herein, the term “and/or” includes any and all combinations of one or more of the associated listed items.
It will also be understood that, although the terms first, second, etc. may be used herein to describe various elements, these elements should not be limited by these terms. These terms are only used to distinguish one element from another. For example, a first device could be termed a second device, and, similarly, a second device could be termed a first device without departing from the teachings of the disclosure.
Here, though, analysis is cloud-based. The client device 20 uploads the stream 36 of image data to the server 22 for analysis. The server 22 then analyzes the stream 36 of image data to recognize people and objects of interest. As the reader may know, image recognition may require significant processing and memory resources. Exemplary embodiments thus relieve the client device 20 of these intensive operations. The client device 20 routes, sends, or forwards the stream 36 of image data to the server 22 for analysis. The server 22 analyzes the stream 36 of image data and recognizes people and/or objects of interest.
The server 22 generates instructions for the client device 20. Once the server 22 recognizes the people or objects of interest, the server 22 may then instruct the client device 20 where to aim the audio input system 30. As
As
Exemplary embodiments thus establish personalized, bi-directional, web-based communication. Conventional natural speech systems require intensive processing capabilities that can bog down modern mobile devices. Exemplary embodiments, though, offload intensive video and audio processing to the web-based server 22. Mobile client devices, such as smart phones and tablet computers, may thus provide natural, synthesized speech without powerful hardware componentry and intensive battery consumption. The client device 20 merely aims its audio input system 30 and/or its audio output system 32 according to the position 38 calculated by the server 22.
The server 22 analyzes the stream 36 of image data. When the server 22 receives the stream 36 of image data, the server-side algorithm 58 instructs the server 22 to perform an image analysis 70. The image analysis 70 is executed to recognize one or more persons and/or objects in the stream 36 of image data. Any image analysis 70 may be performed, such as facial recognition 72 of a face in the stream 36 of image data. Regardless, once a person (or object) of interest is recognized, the server 22 may then determine the position 38 of the recognized person relative to the physical space 34 shown in the stream 36 of image data. The server 22 sends the position 38 to the network address associated with the client device 20, and the client device 20 further refines the vision system 28 to the position 38 of the recognized person.
As
The server 22 analyzes the stream 40 of audio data. When the server 22 receives the stream 40 of audio data, the server-side algorithm 58 instructs the server 22 to perform a speech analysis 80. The speech analysis 80 is executed to recognize the semantic content contained within the stream 40 of audio data. While any speech analysis 80 may be performed, speech-to-text translation may be preferred due to its availability and inexpensive cost. Regardless, once the semantic content is determined, the feedback 42 is generated. The server 22 routes the feedback 42 into and along the communications network 24 to the network address associated with the client device 20.
The client device 20 then aims its audio output system 32. When the feedback 42 is received, the client device 20 also aligns its audio output system 32 to the position 38 determined by the server 22. The client device 20 thus points a beam of sound to the position 38 of the recognized person, thus providing the personalized audio feedback 42.
Exemplary embodiments isolate the feedback 42. Because the audio feedback 42 is beamed to the position 38 of the recognized person, the audio feedback 42 is personal and unheard by others in the same physical space 34. That is, even if a crowd of people mingle in a room, exemplary embodiments may narrowly beam the audio feedback 42 to only the location of the recognized person. The recognized person, for example, may ask questions to the client device 20, and the client device 20 aims an audible answer back to the recognized person, without sending audio cues to the crowd. As the recognized person moves about the room, exemplary embodiments may track the movements, listen in, and provide the personalized feedback 42 based on audible interactions with the client device 20. Exemplary embodiments thus establish a bidirectional communication channel that follows the movements of the recognized person.
As
Suppose, for example, multiple people watch a movie in the same media room. As
As
The client device 20 thus aims the user's preferred language. As this disclosure explains, the server 22 has determined the respective positions 38 of the recognized users. The client device 20 may thus instruct the audio output system 32 to dedicate and aim its output devices to the respective positions 38 of each recognized user. That is, as the display device 90 displays the movie, the audio output system 32 beams each user's preferred language track 104 to their respective position 38. The recognized user thus enjoys the movie according to her language preference 98.
Exemplary embodiments are especially helpful in multi-language environments. As multiple people view the movie, exemplary embodiments may beam different language tracks to different people. The image analysis 70 may be used to recognize several different people within a presentation space of the display device 90. Each different, recognized person may have a different language preference 98. One person may prefer an English language track 104, another person may prefer a German language track 104, and yet another person may prefer a Spanish language track 104. As the server 22 consults each recognized person's profile 94, the server 22 may set-up each person's different language track 104. The multiple language tracks 104 may be streamed to the client device 20, and the audio output system 32 beams each person's preferred language track 104 to their respective positions 38. The multiple people thus enjoy the same, common visual content, but each person may enjoy a different, but personal, language track 104. Because each person's language track 104 is isolated to their respective position 38, the other language tracks 104 are unheard by the other viewers in the same physical space 34.
Exemplary embodiments may be applied to other scenarios. As the users view the content on the display device 90, suppose one of the users wishes to call a friend. Even though separate channels have been established with the server 22, one of the users may audibly utter a command to “call” to a “name.” The client device 20 and the server 22 may cooperate to initiate the call, while the client device 20 continues receiving and displaying content on the display device 90. So, in parallel, one of the users may speak commands to change the displayed content (such as “change channel”), while the other viewing user converses over the established call. The client device 20 and the server 22 may thus also cooperate to suppress cross-talk, thus reducing or eliminating the channel change commands from compromising the other user's call (and vice versa). Further, the client device 20 and the server 22 also cooperate to project or beam different audio to each user, thus isolating the call from the other's commands. Exemplary embodiments, in other words, directionally deliver each person's personal audio without mixing.
Input audio may thus be received. As the users enjoy the movie, exemplary embodiments may still interpret their speech. As this disclosure explains, the client device 20 may also aim the audio input system 30. As the recognized users enjoy the content and their respective language track 104, the client device 20 may also aim the audio input system (illustrated as reference numeral 30) to the different positions 38 of the viewers. As this disclosure explained, the audio input system 30 may have individual microphones that are individually aimed to the position 38 of each recognized viewer. The client device 20 may thus receive and forward the separate streams 40 of audio data to the server 22 for analysis, as
The client device 20 responds to the position 38. Once the position 38 of the recognized face is determined, the server 22 may send the position 38 to the client device 20. The client device 20 may use the position 38 to orient the vision system 28 and the audio input system 30. That is, the client device 20 aims or aligns its cameras and microphones according to the position 38 determined by the server 22. As the recognized face moves, the position 38 may be repeatedly determined as feedback to the client device 20. The client device 20 is thus able to train its cameras and microphones to the roving, recognized face.
The server 22 also generates the feedback 42. When the server 22 receives the stream 40 of audio data, the server 22 calls or invokes the speech analysis 80. The speech analysis 80 provides a real-time interaction with any recognized user. The server 22 may process the stream 40 of audio data to suppress all but the recognized user's audio input. If multiple people are recognized, exemplary embodiments may simultaneously track and listen to all the recognized users present in the room, through multiple instantiations applied as one instance per individual user. The speech analysis 80 may perform a speech-to-text translation to convert the stream 40 of audio data into text. The server 22 may then send or feed the text to a dialogue manager 112. The dialogue manager 112 analyzes the text for recognized commands, phrases, and other semantic content. The dialogue manager 112 generates the acoustic feedback 42. The dialogue manager 112 may perform a text-to-speech translation that converts the acoustic feedback 42 into speech. However the feedback 42 is obtained, the feedback 42 is routed back to the client device 20 for directional delivery. The client device 20 pinpoints its audio output system 32 to the position 38 of the recognized user, thus delivering the personalized feedback 42 to the user. Because the feedback 42 directed to the position 38, though, the feedback 42 remains mostly inaudible to other users in the same room. The feedback 42 is based on audio signals that are largely inaudible to the rest of the users because of narrow (highly-directive) beaming of audio. The server 22 may thus deliver audio content that may be different for the individual users, as dictated by their personal profile 94.
The server 22 may thus send an input alignment command 130 to the client device 20. The input alignment command 130 routes along the communications network (illustrated as reference numeral 24 in
As
The server 22 may thus send an output alignment command 142. The output alignment command 142 instructs the client device 20 to aim the corresponding speaker in a direction of the output vector 140. The output alignment command 142 routes along the communications network 24 to the network address associated with the client device 20. A motor control mechanism thus aims the speaker to the direction of the output vector 140, thus directing the feedback 42 to the determined position 38 of the recognized person.
The server 22 may thus determine a trajectory for the recognized person. As this disclosure explains, exemplary embodiments may track the movements of one or more recognized persons. As each person's movement is tracked, the server 22 may determine a trajectory vector 150 associated with recognized person's movements. As the recognized people mingle, some trajectory vectors 150 will intersect.
The server 22 may thus facilitate social interactions. When the two trajectory vectors 152 and 154 are projected to intersect, the server 22 may retrieve social information 156 associated with each respective person. The server 22 may again query the database 96 of profiles for the profile 94 associated with each recognized person. The server 22 then queries each respective profile 94 for each person's social information 156. Each person's social information 156, for example, may include their name, their spouse's name, and their children's names. The server 22 then sends the social information 156 to the client device 20 as the feedback 42. The client device 20 aims the social information 156 to the current position 38 of the respective person, as this disclosure explains.
The social information 156 helps jog memories. As two people converge, the server 22 can provide the personal, isolated feedback 42 of the approaching social interaction. The feedback 42 is preferably beamed for hearing just prior to the actual intersection 126, thus audibly providing names and other important social information 156 prior to interaction. Each person is thus audibly, but privately, informed of the other person's name and other social information 156. Social interaction may thus commence with less awkward moments of memory loss.
The flowchart continues with
As
Exemplary embodiments may be applied to any signaling standard. As those of ordinary skill in the art recognize,
Exemplary embodiments may be physically embodied on or in a computer-readable storage medium. This computer-readable medium, for example, may include CD-ROM, DVD, tape, cassette, floppy disk, optical disk, memory card, memory drive, and large-capacity disks. This computer-readable medium, or media, could be distributed to end-subscribers, licensees, and assignees. A computer program product comprises processor-executable instructions for personalized audible feedback, as the above paragraphs explained.
While the exemplary embodiments have been described with respect to various features, aspects, and embodiments, those skilled and unskilled in the art will recognize the exemplary embodiments are not so limited. Other variations, modifications, and alternative embodiments may be made without departing from the spirit and scope of the exemplary embodiments.
Claims
1. A method, comprising:
- sending, by a device, image data to a server for an analysis;
- receiving, by the device, a position determined in response to the analysis, the position associated with a face recognized within the image data;
- receiving, by the device, an alignment command associated with the position determined in response to the analysis; and
- aligning, by the device, an audio output in response to the alignment command;
- wherein the audio output is aimed to the position associated with the face recognized within the image data.
2. The method of claim 1, further comprising determining an identification associated with the face recognized within the image data.
3. The method of claim 1, further comprising determining a language preference associated with the face recognized within the image data.
4. The method of claim 1, further comprising determining an audio track associated with the face recognized within the image data.
5. The method of claim 1, further comprising receiving an audio track associated with the face recognized within the image data.
6. The method of claim 1, further comprising determining a name associated with the face recognized within the image data.
7. The method of claim 1, further comprising receiving an instruction for the aligning of the audio output to the position associated with the face recognized within the image data.
8. A system, comprising:
- a hardware processor; and
- a memory device, the memory device storing instructions, the instructions when executed causing the hardware processor to perform operations, the operations comprising:
- sending image data to a server for an analysis;
- receiving a position determined in response to the analysis, the position associated with a face recognized within the image data;
- receiving an alignment command associated with the position determined in response to the analysis; and
- aligning an audio output in response to the alignment command;
- wherein the audio output is aimed to the position associated with the face recognized within the image data.
9. The system of claim 8, wherein the operations further comprise determining an identification associated with the face recognized within the image data.
10. The system of claim 8, wherein the operations further comprise determining a language preference associated with the face recognized within the image data.
11. The system of claim 8, wherein the operations further comprise determining an audio track associated with the face recognized within the image data.
12. The system of claim 8, wherein the operations further comprise receiving an audio track associated with the face recognized within the image data.
13. The system of claim 8, wherein the operations further comprise determining a name associated with the face recognized within the image data.
14. The system of claim 8, wherein the operations further comprise receiving an instruction for the aligning of the audio output to the position associated with the face recognized within the image data.
15. A memory device storing instructions that when executed cause a hardware processor to perform operations, the operations comprising:
- sending image data to a web-based server for an analysis;
- receiving a position determined in response to the analysis, the position associated with a face recognized within the image data;
- receiving an alignment command associated with the position determined in response to the analysis; and
- aligning an audio output in response to the alignment command;
- wherein the audio output is aimed to the position associated with the face recognized within the image data.
16. The memory device of claim 15, wherein the operations further comprise determining an identification associated with the face recognized within the image data.
17. The memory device of claim 15, wherein the operations further comprise determining a language preference associated with the face recognized within the image data.
18. The memory device of claim 15, wherein the operations further comprise determining an audio track associated with the face recognized within the image data.
19. The memory device of claim 15, wherein the operations further comprise receiving an audio track associated with the face recognized within the image data.
20. The memory device of claim 15, wherein the operations further comprise receiving an instruction to align the audio output to the position associated with the face recognized within the image data.
4863384 | September 5, 1989 | Slade |
5707128 | January 13, 1998 | Dugdale |
6380990 | April 30, 2002 | Bessel |
6714660 | March 30, 2004 | Ohba |
6937718 | August 30, 2005 | Scholte |
7369100 | May 6, 2008 | Zacks et al. |
8005680 | August 23, 2011 | Kommer |
8019818 | September 13, 2011 | Lorch |
8190645 | May 29, 2012 | Bashaw |
8208970 | June 26, 2012 | Cheung et al. |
8230367 | July 24, 2012 | Bell et al. |
8468581 | June 18, 2013 | Cuende Alonso |
8509730 | August 13, 2013 | Kim |
8558893 | October 15, 2013 | Persson et al. |
8605956 | December 10, 2013 | Ross |
8793580 | July 29, 2014 | Robinson |
8832564 | September 9, 2014 | McCoy |
8917913 | December 23, 2014 | Kritt |
8942109 | January 27, 2015 | Dorenbosch et al. |
9087357 | July 21, 2015 | Gershon |
9137314 | September 15, 2015 | Dimitriadis |
9507770 | November 29, 2016 | Dimitriadis |
20030001880 | January 2, 2003 | Holtz |
20040113939 | June 17, 2004 | Zacks et al. |
20040148197 | July 29, 2004 | Kerr et al. |
20040208324 | October 21, 2004 | Cheung et al. |
20050057491 | March 17, 2005 | Zacks et al. |
20050071166 | March 31, 2005 | Comerford |
20050180582 | August 18, 2005 | Guedalia |
20050272416 | December 8, 2005 | Ooi et al. |
20060078859 | April 13, 2006 | Mullin |
20070147610 | June 28, 2007 | Kethi Reddy |
20070165866 | July 19, 2007 | Super |
20070258108 | November 8, 2007 | Matsumoto |
20080109159 | May 8, 2008 | Shi |
20080140652 | June 12, 2008 | Millman et al. |
20080252596 | October 16, 2008 | Bell et al. |
20100031298 | February 4, 2010 | Iwanami |
20100041330 | February 18, 2010 | Elg |
20100302401 | December 2, 2010 | Oku et al. |
20100306249 | December 2, 2010 | Hill |
20110065453 | March 17, 2011 | Baldemair |
20110085061 | April 14, 2011 | Kim |
20110096963 | April 28, 2011 | Shekhara |
20110248935 | October 13, 2011 | Mellow et al. |
20110316996 | December 29, 2011 | Abe et al. |
20120035907 | February 9, 2012 | Lebeau |
20120066607 | March 15, 2012 | Song |
20120072898 | March 22, 2012 | Pappas |
20120078720 | March 29, 2012 | Pappas |
20120124603 | May 17, 2012 | Amada |
20120163625 | June 28, 2012 | Siotis et al. |
20120230512 | September 13, 2012 | Ojanpera |
20120254382 | October 4, 2012 | Watson |
20130093897 | April 18, 2013 | Fan |
20130254647 | September 26, 2013 | Amacker et al. |
20130286860 | October 31, 2013 | Dorenbosch et al. |
20140063057 | March 6, 2014 | Eronen et al. |
20140098240 | April 10, 2014 | Dimitriadis et al. |
20140108962 | April 17, 2014 | Olomskiy |
20140126741 | May 8, 2014 | Dimitriadis |
20150078681 | March 19, 2015 | Damola |
20160004689 | January 7, 2016 | Dimitriadis |
20170046335 | February 16, 2017 | Dimitriadis |
Type: Grant
Filed: Oct 31, 2016
Date of Patent: Dec 12, 2017
Patent Publication Number: 20170046335
Assignee: AT&T INTELLECTUAL PROPERTY I, L.P. (Atlanta, GA)
Inventors: Dimitrios B. Dimitriadis (Rutherford, NJ), Horst J. Schroeter (New Providence, NJ)
Primary Examiner: Gerald Gauthier
Application Number: 15/338,488
International Classification: G06F 17/28 (20060101); H04W 52/02 (20090101); H04L 29/08 (20060101); G06F 3/16 (20060101); G06K 9/00 (20060101); H04L 29/06 (20060101);