System And Method For Facilitating Cognitive Processing Of Simultaneous Remote Voice Conversations
A system and method for facilitating cognitive processing of simultaneous remote voice conversations is provided. A plurality of remote voice conversations participated in by distributed participants are provided over a shared communication channel. A main conversation between at least two of the distributed participants and one or more subconversations between at least two other of the distributed participants are identified from within the remote voice conversations. Segments of interest to one of the distributed participants are defined including a conversation excerpt having a lower attention activation threshold for the one distributed participant. Each of the subconversations is parsed into conversation excerpts. The conversation excerpts are compared to the segments of interest. One or more gaps between conversation flow in the main conversation are predicted. Segments of interest are selectively injected into the gaps of the main conversation as provided to the one distributed participant over the shared communications channel.
Latest PALO ALTO RESEARCH CENTER INCORPORATED Patents:
- COMPUTER-IMPLEMENTED SYSTEM AND METHOD FOR PROVIDING CONTEXTUALLY RELEVANT TASK RECOMMENDATIONS TO QUALIFIED USERS
- Methods and systems for fault diagnosis
- Arylphosphine nanomaterial constructs for moisture-insensitive formaldehyde gas sensing
- SYSTEM AND METHOD FOR ROBUST ESTIMATION OF STATE PARAMETERS FROM INFERRED READINGS IN A SEQUENCE OF IMAGES
- METHOD AND SYSTEM FOR FACILITATING GENERATION OF BACKGROUND REPLACEMENT MASKS FOR IMPROVED LABELED IMAGE DATASET COLLECTION
This invention relates in general to computer-mediated group communication. In particular, this invention relates to a system and method for facilitating cognitive processing of simultaneous remote voice conversations.
BACKGROUNDConversation analysis characterizes the order and structure of human spoken communication. Conversation can be formal, such as used in a courtroom, or more casual, as in a chat between old friends. One fundamental component of all interpersonal conversation, though, is turn-taking, whereby participants talk one at-a-time. Brief and short gaps in conversation often occur. Longer gaps, however, may indicate a pause in the conversation, a hesitation among the speakers, or a change in topic. As a result, conversation analysis involves consideration of both audible and temporal aspects.
Conversation is also dynamic. When groups of people gather, a main conversation might branch into subconversations between a subset of the participants. For example, coworkers discussing the weather may branch into a talk about one co-worker's weekend, while another part of the group debates the latest blockbuster movie. An individual involved in one discussion would find simultaneously following the other conversation difficult. Cognitive limits on human attention force him to focus his attention on only one conversation.
Passive listening is complicated by the dynamics of active conversation, such as where an individual is responsible for simultaneously monitoring multiple conversations. For example, a teacher may be listening to multiple groups of students discuss their class projects. Although the teacher must track each group's progress, simultaneously listening to and comprehending more than one conversation in detail is difficult, again due to cognitive limits on attention.
Notwithstanding, the human selective attention process enables a person to overhear or focus on certain words, even when many other conversations are occurring simultaneously. For example, an individual tends to overhear her name mentioned in another conversation, even if she is attentive to some other activity. Thus, the teacher would recognize her name being spoken by one student group even if she was listening to another group. These “high meaning” words have a lower attention activation threshold since they have more “meaning” to the listener. Each person's high meaning words are finite and context-dependent, and a large amount of subconversation may still be ignored or overlooked due to the limits, and inherent unreliability, of the selective attention process.
As well, cognition problems that occur when attempting to follow multiple simultaneous conversations are compounded when the participants are physically removed from one another. For instance, teleconferencing and shared-channel communications systems allow groups of participants to communicate remotely. Conversations between participants are mixed together on the same media channel and generally received by each group over a single set of speakers, which hampers following more than one conversation at a time. Moreover, visual cues may not be available and speaker identification becomes difficult.
Current techniques for managing simultaneous conversations place audio streams into separate media channels, mute or lower the volume of conversations in which a participant is not actively engaged, and use spatialization techniques to change the apparent positions of conversants. These techniques, however, primarily emphasize a main conversation to the exclusion of other conversations and noises.
Therefore, an approach is needed to facilitate monitoring multiple simultaneous remote conversations. Preferably, such an approach would mimic and enhance the human selective attention process and allow participants to notice those remote communications of likely importance to them, which occur in subconversations ongoing at the same time as a main conversation.
SUMMARYA system and method provide insertion of segments of interest selectively extracted from voice conversations between remotely located participants into a main conversation of one of the participants. The voice conversations are first analyzed and conversation floors between the participants are identified. A main conversation for a particular participant, as well as remaining subconversations, is identified. A main conversation can be a conversation in which the particular participant is actively involved or one to which the particular participant is passively listening. The subconversations are preferably muted and analyzed for segments of likely interest to the particular participant. The segments of interest are “high meaning” excerpts of the subconversations that are of likely interest to the participant. Gaps or pauses in the natural conversation flow of the main conversation are predicted and the segments of interest are inserted into those predicted gaps of sufficient duration. Optionally, the participant can explore a specific segment of interest further by joining the subconversation from which the segment was taken or by listening to the subconversation at a later time.
One embodiment provides a system and method for facilitating cognitive processing of simultaneous remote voice conversations. A plurality of remote voice conversations participated in by distributed participants are provided over a shared communications channel. Each of a main conversation between at least two of the distributed participants and one or more subconversations between at least two other of the distributed participants are identified from within the remote voice conversations. Segments of interest to one of the distributed participants are defined including a conversation excerpt having a lower attention activation threshold for the one distributed participant. Each of the subconversations is parsed into live conversation excerpts. The live conversation excerpts are compared to the segments of interest. The main conversation is continually monitored and one or more gaps between conversation flow in the main conversation are predicted. The live conversation excerpts are selectively injected into the gaps of the main conversation as provided to the one distributed participant over the shared communications channel.
A further embodiment provides a system and method for providing conversation excerpts to a participant from simultaneous remote voice conversations. A plurality of remote voice conversations actively participated in by distributed participants are provided over a shared communications channel. Each of a main conversation in which one of the distributed participant is actively involved and one or more subcombinations between at least two other of the distributed participants are identified from within the remote voice conversations. Segments of interest to one of the distributed participants are defined including a conversation excerpt having a lower attention activation threshold for the one distributed participant. The subconversations as provided to the one distributed participant over the shared communications channel are muted. Each of the subconversations is parsed into live conversation excerpts. The live conversation excerpts are compared to the segments of interest. The main conversation is continually monitored and one or more gaps between conversation flow in the main conversation are predicted. The live conversation excerpts are selectively injected into the gaps of the main conversation as provided to the one distributed participant over the shared communications channel.
Still other embodiments will become readily apparent to those skilled in the art from the following detailed description, wherein are described embodiments of the invention by way of illustrating the best mode contemplated for carrying out the invention. As will be realized, the invention is capable of other and different embodiments and its several details are capable of modifications in various obvious respects, all without departing from the spirit and the scope of the present invention. Accordingly, the drawings and detailed description are to be regarded as illustrative in nature and not as restrictive.
In-person conversations involve participants who are physically located near one another, while computer-mediated conversations involve distributed participants who converse virtually from remote and physically-scattered locations.
Preferably, each computer 12a-b is a general-purpose computing workstation, such as a personal desktop or notebook computer, for executing software programs. The computer 12a-b includes components conventionally found in computing devices, such as a central processing unit, memory, input/output ports, network interface, and storage. Other systems and components capable of providing audio communication, for example, through a microphone and speaker are possible, for example, cell phones 15, wireless devices. Web-enabled television set-top boxes 16, and telephone or network conference call systems 17. User input devices, for example, a keyboard and mouse, may also be interfaced to each computer. Other input devices are possible.
The computers 12a-b connect to the server 13, which enables the participants 11a-b to remotely participate in a collective conversation over a shared communication channel. The server 13 is a server-grade computing platform configured as a uni-, multi- or distributed processing system, which includes those components conventionally found in computing devices, as discussed above.
Conversation ModesA participant 11a-e can be actively involved in a conversation or passively listening, that is, monitoring,
In participating mode, a participant 11a is an active part of the main conversation.
In monitoring mode, a participant 11a is a listener or third party to the main conversation and subconversations.
Distributed participants remotely converse with one another via communication devices that relay the conversation stream.
A server 54 is communicatively interposed between the devices 53a-d and the conversation stream being delivered through the conversation channels 52a-d flows through the server 54. In operation, the server 53 receives conversation streams via the communications devices 53a-d and, upon receiving the conversation streams, the server 53 assesses the conversation floors and identifies a main conversation in which the participants 51a-d are involved. In a further embodiment, the main conversation is a conversation substream originating with a specific participant, who, is actively involved in the main conversation, as described above with reference to
Each participant 11a-e, through the server 54, can monitor multiple simultaneous streams within a remote conversation. Substreams are processed to mimic the human selective attention capability.
Certain steps are performed prior to the running of the application. Conversation segments are identified (step 61). The segments can include “high meaning” words and phrases that have a lower activation threshold, as discussed, further with reference to
As a participant 11a-e receives a conversation containing multiple audio subconversations, conversation floors are identified (step 63). After identifying the available conversation floors, the conversation floor of a particular participant 11a-e, or main conversation, is identified (step 63), The conversation floors and main conversation can be identified directly by the action of the participant 11a-e or indirectly, such as described in commonly-assigned U.S. Patent Application Publication No. US2004/0172255, filed Apr. 16, 2003, pending, by Aoki et al., the disclosure of which is incorporated by reference, as further discussed below with reference to
All parallel conversations are muted. Parallel conversations are the conversations that remain after the main conversation is identified, that is, the subconversations. Although muted, the server 54 analyzes the parallel conversations (step 64) for segments of interest by parsing the parallel conversations into conversational excerpts and comparing the conversational excerpts to segments, previously identified in step 61, that may be of interest to the participant 11a-e. as further described below with reference to
The parallel conversations are analyzed as they occur. Once a suitable gap, as predicted in step 62, in the conversation flow of the main conversation occurs, the segment, if of possible participant interest, can be injected into the gap provided the predicted gap is of sufficient duration (step 65). In a still further embodiment, the segments are stored on a suitable form of recording medium and injected into a gap at a later time point. In a further embodiment, a participant can choose to perform an action on the injected segment (step 66), as further discussed below with reference to
For example, a group of co-workers are talking in a shared audio space, such as an online conference, about an upcoming project. Two of the participants, Alan and Bob, begin talking about marketing considerations, while Chris and David discuss a problem related to a different project. The conversation floors of Alan and Bob's subconversation, Chris and David's subconversation, as well as the continuing main conversation of the other participants are each identified. Alan and Bob's conversation, from their perspective, is identified as the main conversation for Alan and Bob and all remaining conversations are muted as parallel conversations, which are analyzed for segments of possible interest. Chris says to David that the marketing budget for the other project should be slashed in half. Since Alan is the marketing manager of the other project, the segments “marketing budget” and “slashed” are injected into a predicted gap in Alan and Bob's subconversation. Alan can then choose to join Chris and David's subconversation, as further described below with reference to
Monitoring mode allows a participant 11a-e to focus on a main conversation, although not actively engaged in the conversation. For example, a 911 emergency services dispatch supervisor is monitoring a number of dispatchers coordinating police activities, including a car chase, an attempted burglary, and a complaint about a noisy neighbor. The conversation floors of the car chase subconversation, the attempted burglary subconversation, and the noise complaint subconversation are each identified. The supervisor, judging that the car chase requires the most immediate attention, places the car chase subconversation as the main conversation. All remaining conversations are muted and analyzed for possible segments of interest. During a gap in the main conversation, the segments “gun” and “shots fired” are injected from the noise complaint subconversation. The supervisor can shift his attention to the noise complaint conversation as the main conversation, as further described below with reference to
High meaning segments of interest are identified and injected into a main conversation of a participant 11a-e.
The conversation features can be combined, into an algorithm, such as the Bayesian network mentioned above, to produce a probability estimate that a gap of a certain length in conversation will occur. When the probability is high enough, the system will inject content into the predicted gap. In a further embodiment, the threshold for the probability can be user-defined.
Conversation Floor IdentificationConversation floors are identified using conversation characteristics shared between participants engaged in conversation.
Referring now to
The segment of interest is “injected” by including select portions, or excerpts, of the parsed parallel conversations into gaps in conversation flow within a main conversation.
For example, with reference to the main conversation between Alan and Bob discussed above, parts of words 102, words 103, sentence fragments 104, or entire sentences 105 can be injected, from Chris and David's parallel conversation. Chris's statement to David that the marketing budget for the other project should be slashed in half provides an example. Part of words 102 “marketing” and “slashed” are injected into a predicted gap as “market” and “slash.” Alternatively, whole words 103 “marketing” and “slashed” could be injected. Additionally, the sentence fragments 104 “marketing budget for the other project,” and “slashed in half” can be injected into the gap in the main conversation. Similarly, Chris's entire sentence 105 “the marketing budget for the other project should be slashed in half” could be injected.
Further, sounds 106 can be injected into gaps of the main conversation. With reference to the 911 dispatch supervisor example discussed above, the sound of a gun discharging from the noise complaint subconversation can be injected into a gap. The supervisor can then choose to shift his attention to that, subconversation, as further described below with reference to
After segments of interest have been injected, a participant can choose to ignore the information or investigate the information further.
If the participant 11a-e joins 113 a parallel conversation, the main conversation is muted 115 and placed with other parallel conversations, while the selected 113 parallel conversation becomes 116 the main conversation. Segments of interest from the parallel conversations can then be injected into gaps of the new main conversation.
Segment PlaybackPlayback of injected segments of interest can be modified to differentiate the segments from the main conversation.
Multiple simultaneous conversations within a remote conversation are monitored and processed by a system to mimic the human selective attention capability.
In one embodiment, the server 131 includes pre-application module 132 and application module 133. The pre-application module 132 includes submodules to identify 134 and gap train 135. The application module 133 contains submodules to find floors 136, analyze 137, inject 138, and take action 139, as appropriate. The server 131 is coupled to a database (not shown) or other form of structured data store, within which segments of interest (not shown) are maintained. Other modules and submodules are possible.
The identify submodule 134 identifies conversation segments that are of likely interest to a participant 11a-e. The segments can include “high meaning” words and phrases, as further discussed above with reference to
The floor find submodule 136 identifies conversation floors from audio streams. The particular conversation floor, or main conversation, is identified as well. The conversation floors and main conversation can be identified directly by the action of the participant 11a-e or indirectly, such as described in commonly-assigned U.S. Patent Application Publication No. US2004/0172255, filed Apr. 16, 2003, pending, by Aoki et al. the disclosure of which is incorporated, by reference, as further discussed above with reference to
The analyze submodule 137 parses the parallel conversations into conversation excerpts and analyzes the parallel conversations for excerpts that match the segments previously identified by the identify module 134. The analysis can be carried out by common information retrieval techniques, for example term frequency-inverse document frequency (TF-IDF). Other analysis functions are possible.
The inject submodule 138 injects segments of possible participant interest into a predicted gap of sufficient expected length in the main conversation. Other injection functions are possible. The action submodule 139 chooses an action to be taken on the injected segment. For example, the participant 11a-e can choose join the parallel conversation from which the injected segment was extracted, as further discussed above with reference to
While the invention has been particularly shown and described as referenced to the embodiments thereof, those skilled in the art will understand that the foregoing and other changes in form and detail may be made therein without departing from the spirit and scope of the invention.
Claims
1. A system for facilitating cognitive processing of simultaneous remote voice conversations, comprising:
- a communication module configured to receive a plurality of remote voice conversations between distributed participants provided over a shared communications channel;
- a floor module to identify from within the remote voice conversations each of a main conversation between at least two of the distributed participants and one or more subconversations between at least two other of the distributed participants;
- an identification module to define segments of interest to one of the distributed participants comprising a conversation excerpt having a lower attention activation threshold for the one distributed participant;
- an analysis module to parse each of the subconversations into live conversation excerpts and to compare the live conversation excerpts to the segments of interest;
- a gap prediction module to continually monitor the main conversation and to predict one or more gaps between conversation flow in the main conversation; and
- an injection module to selectively inject the live conversation excerpts into the gaps of the main conversation as provided to the one distributed participant over the shared communications channel.
2. A system according to claim 1, wherein the one distributed participant is at least one of actively involved, in the main conversation and passively listening to the main conversation.
3. A system according to claim 1, further comprising:
- a sound module to mute the subconversations as provided to the one distributed participant over the shared communications channel.
4. A system according to claim 1, wherein the segments of interest are selected from the group comprising parts of words, words, sentence fragments, sentences, and sounds.
5. A system according to claim 1, further comprising:
- a playback modifier module to apply a playback modifier to the live conversation excerpts; and
- a playback presentation module to modify presentation of the live conversation excerpts based on the playback modifier.
6. A system according to claim 5, wherein the playback modifier is selected from the group comprising volume, speed, pitch, and pause length.
7. A system according to claim 1, further comprising one or more of: a selection module to select at least one of the segments of interest, and an
- action module to place the subconversation from which the one segment of interest was comprised as the main conversation; and
- a storage module to store the subconversations corresponding to the live conversation excerpts, and a replay module to replay the subconversations upon termination of the main conversation.
8. A method for facilitating cognitive processing of simultaneous remote voice conversations, comprising:
- participating in a plurality of remote voice conversations between distributed participants provided over a shared communications channel;
- identifying from within the remote voice conversations each of a main conversation between at least two of the distributed participants and one or more subconversations between at least two other of the distributed participants;
- defining segments of interest to one of the distributed participants comprising a conversation excerpt having a lower attention activation threshold for the one distributed participant;
- parsing each of the subconversations into live conversation excerpts and comparing the live conversation excerpts to the segments of interest;
- continually monitoring the main conversation and predicting one or more gaps between conversation flow in the main conversation; and
- selectively injecting the live conversation excerpts into the gaps of the main conversation as provided to the one distributed participant over the shared communications channel.
9. A method according to claim 8, wherein the one distributed participant is at least one of actively involved in the main conversation and passively listening to the main conversation.
10. A method according to claim 8, further comprising:
- muting the subconversations as provided to the one distributed participant over the shared communications channel.
11. A method according to claim 8, wherein the segments of interest are selected from the group comprising parts of words, words, sentence fragments, sentences, and sounds.
12. A method according to claim 8, further comprising:
- applying a playback modifier to the live conversation excerpts; and
- modifying presentation of the live conversation excerpts based on the playback modifier.
13. A method according to claim 12, wherein the playback modifier is selected from the group comprising volume, speed, pitch, and pause length.
14. A method according to claim 8, further comprising one or more of:
- selecting at least one of the segments of interest, and placing the subconversation from which the one segment of interest was comprised as the main conversation; and
- storing the subconversations corresponding to the live conversation excerpts, and replaying the subconversations upon termination of the main conversation.
15. A system for providing conversation excerpts to a participant from simultaneous remote voice conversations, comprising:
- a communication module configured to receive a plurality of remote voice conversations between distributed participants provided over a shared communications channel;
- a floor module to identify from within the remote voice conversations each of a main conversation in which one of the distributed participants is actively involved and one or more subconversations between at least two other of the distributed participants;
- an identification module to define segments of interest to the one of the distributed participants comprising a conversation excerpt having a lower attention activation threshold for the one distributed participant;
- a sound module to mute the subconversations as provided to the one distributed participant over the shared communications channel;
- an analysis module to parse each of the subconversations into live conversation excerpts and to compare the live conversation excerpts to the segments of interest;
- a gap prediction module to continually monitor the main conversation and to predict, one or more gaps between conversation flow in the main conversation; and
- an injection module to selectively inject the live conversation excerpts into the gaps of the main conversation as provided to the one distributed participant over the shared communications channel.
16. A system according to claim 15, wherein the identification module further comprises:
- a criteria module to define a selection criteria for the one distributed participant; and
- a segment selection module to identify segments of interest based on the selection criteria.
17. A system according to claim 16, wherein the selection criteria is selected from the group comprising personal details, interests, projects, terms, and term frequency.
18. A system according to claim 15, further comprising one or more of:
- a selection module to select at least one of the segments of interest, and an action module to place the subconversation from which the one segment of interest was comprised as the main conversation; and
- a storage module to store the subconversations corresponding to the live conversation excerpts, and a replay module to replay the subconversations upon termination of the main conversation.
19. A system according to claim 15, wherein the segments of interest are selected from the group comprising parts of words, words, sentence fragments, sentences, and sounds.
20. A method for providing conversation excerpts to a participant from simultaneous remote voice conversations, comprising:
- actively participating in a plurality of remote voice conversations between distributed participants provided over a shared communications channel;
- identifying from within the remote voice conversations each of a main conversation in which one of the distributed participants is actively involved and one or more subconversations between at least two other of the distributed participants;
- defining segments of interest to the one of the distributed participants comprising a conversation excerpt having a lower attention activation threshold for the one distributed participant;
- muting the subconversations as provided to the one distributed participant over the shared communications channel;
- parsing each of the subconversations into live conversation excerpts and comparing the live conversation excerpts to the segments of interest;
- continually monitoring the main conversation and predicting one or more gaps between conversation flow in the main conversation; and
- selectively injecting the live conversation excerpts into the gaps of the main conversation as provided to the one distributed participant over the shared communications channel.
21. A method according to claim 20, further comprising:
- defining a selection criteria for the one distributed participant; and
- identifying segments of interest based on the selection criteria.
22. A method according to claim 21, wherein the selection criteria is selected from the group comprising personal details, interests, projects, terms, and term frequency.
23. A method according to claim 20, further comprising one or more of:
- selecting at least one of the segments of interest, and placing the subconversation from which the one segment of interest was comprised as the main conversation; and
- storing the subconversations corresponding to the live conversation excerpts, and replaying the subconversations upon termination of the main conversation.
24. A method according to claim 20, wherein the segments of interest are selected from the group comprising parts of words, words, sentence fragments, sentences, and sounds.
Type: Application
Filed: Apr 11, 2008
Publication Date: Oct 15, 2009
Patent Grant number: 8265252
Applicant: PALO ALTO RESEARCH CENTER INCORPORATED (Palo Alto, CA)
Inventors: Nicolas B. Ducheneaut (Sunnyvale, CA), Trevor F. Smith (Seattle, WA)
Application Number: 12/101,764
International Classification: G10L 15/00 (20060101);