APPARATUS AND METHOD FOR PACKET-BASED MEDIA COMMUNICATIONS
The performance of a voice conference using a packet-based conference bridge can be improved with a number of modifications. In one modification, the conference bridge receives speech indication signals from the individual packet-based terminals within the voice conference, these speech indication signals then being used by the conference bridge to select the talkers within the voice conference. This removes the need for speech detection techniques within the conference bridge, hence decreasing the required processing power and the latency within the conference bridge. In another modification, the conference bridge sends addressing control signals to the individual packet-based terminals selected as talkers, these addressing control signals directing the terminals selected as talkers to directly transmit their voice data packets to the other terminals within the voice conference. This direct transmission of voice data packets can reduce transcoding and latency within the network. These two modifications could further be combined, resulting in a conference bridge that receives speech indication signals, selects the talkers for the voice conference and outputs addressing control signal to the talkers. In this case, the advantages of the two modifications are gained as well as additional capacity advantages resulting from no voice signals actually traversing the conference bridge.
Latest Nortel Networks Limited Patents:
The present application is a Continuation Application of U.S. patent application Ser. No. 11/113,050, filed on Apr. 25, 2005, which is a Continuation Application of U.S. patent application Ser. No. 09/750,015, filed on Dec. 29, 2000, which issued as U.S. Pat. No. 6,956,828, the disclosures of which are hereby incorporated by reference in their entireties.
FIELD OF THE INVENTIONThis invention relates generally to packet-based media communications and more specifically to media conferencing within a packet-based communication network.
BACKGROUND OF THE INVENTIONPrior to the use of packet-based voice communications, telephone conferences were a service option available within standard non-packet-based telephone networks such as Pulse Code Modulation (PCM) telephone networks. As depicted in
One such algorithm used to control a conference session, referred to as a “party line” approach, comprises the steps of mixing the voice communications received from each telephone terminal 16 within the conference session and further distributing the result to each of the telephone terminals 16 for broadcasting. A problem with this algorithm is the amount of noise that is combined during the mixing step, this noise comprising a background noise source corresponding to each of the telephone terminals 16 within the conference session.
An improved algorithm for controlling a conference session is disclosed within U.S. patent application Ser. No. 08/987,216 entitled “Method of Providing Conferencing in Telephony” by Dal Farra et al, filed on Dec. 9, 1997, assigned to the assignee of the present invention, and herein incorporated by reference. This algorithm comprises the steps of selecting primary and secondary talkers, mixing the voice communications from these two talkers and forwarding the result of the mixing to all the participants within the conference session except for the primary and secondary talkers. The primary and secondary talkers receive the voice communications corresponding to the secondary and primary talkers respectively. The selection and mixing of only two talkers at any one time can reduce the background noise level within the conference session when compared to the “party line” approach described above.
In a standard PCM telephone network as is depicted in
Currently, packet-based voice communications are being utilized more frequently as Voice-over-Internet Protocol (VoIP) becomes increasingly popular. In these standard VoIP communications, voice data in PCM form is being encapsulated with a header and footer to form voice data packets; the header in these packets has, among other things, a Real Time Protocol (RTP) header that contains a time stamp corresponding to when the packet was generated. One area that requires considerable improvement is the use of packet-based voice communications to perform telephone conferencing capabilities.
As depicted within
The inputting apparatus 30 performs a number of functions on the packets that are received at the conference bridge 28 from the terminals within a voice conference. These functions include protocol stack, jitter buffer and decompression operations. During the protocol stack operation, the inputting apparatus 30 receives packets comprising compressed voice signals, hereinafter referred to as voice data packets, and strips off the packet overhead required for transmitting the voice data packets through the packet-based network 20. During the jitter buffer operation, the inputting apparatus 30 receives the compressed voice signals, ensures that the compressed voice signals are within the proper sequence (i.e. time ordering signals), buffers the compressed voice signals to ensure smooth playback and ideally implements packet loss concealment. During the decompression operation, the inputting apparatus 30 receives the buffered compressed voice signals, converts them into standard PCM format and outputs the resulting voice signals (that are in Pulse Code Modulation) to the energy detection, talker selection and mixing block 32.
The energy detection, talker selection and mixing block 32 performs almost identical functionality to the conference bridge 17 within
The outputting apparatus 34 performs a number of functions on the outputs from the block 32, these functions including compression and transmission operations. During the compression operation, the outputting apparatus 34 receives and compresses respective ones of the three outputs from the energy detection, talker selection and mixing block 32. During the transmission operation, the outputting apparatus 34 performs a protocol stack operation on the compressed voice signals, encapsulates the compressed voice signals within the packet-based format required for transmission on the packet-based network 20 and transmits voice data packets comprising the compressed voice signals to the appropriate terminals 22,24,26 within the conference session. It is noted that, in the case of the talker selection algorithm described above, the mixed voice signal is forwarded to all the terminals with the exception of the primary and secondary talkers while the primary and secondary talkers are sent the appropriate unmixed voice signals.
One problem with the setup depicted within
Hence, a new design within a packet-based voice communication network is required to implement voice conferencing functionality. In this new design, a reduction in transcoding, latency and/or required signal processing power within the conferencing network is needed.
SUMMARY OF THE INVENTIONThe present invention is directed to methods and apparatus that can be utilized within a packet-based media communication system for media conferences. In one embodiment of the present invention, a packet-based conference bridge receives speech indication signals from the individual packet-based terminals within a voice conference, these speech indication signals being used to select the talkers within the voice conference. The speech indication signals could be a talking/listening indication, an energy level indication or another parameter that a talker selection algorithm could use to select packet-based terminals as talkers. In another embodiment of the present invention, the packet-based conference bridge sends addressing control signals to the individual packet-based terminals selected as talkers. These addressing control signals indicate the packet-based network addresses for all the packet-based terminals that the talker should directly transmit its voice data packets to. A yet other embodiment of the present invention combines the use of both of the above embodiments such that the packet-based conference bridge essentially comprises a talker selection block that receives speech indication signals from packet-based terminals within a voice conference and transmits addressing control signals to the terminals that are selected as talkers in order to direct the voice data packets from the talker(s) to the appropriate other packet-based terminals within the voice conference.
There are numerous advantages of the embodiments of the present invention compared to well-known voice conferencing techniques. For one, all of the embodiments of the present invention reduce the amount of processing power required within the conference bridges. This is done by removing the need for an energy detection block and/or an outputting apparatus within the conference bridge. This, in turn, can reduce the latency for the voice data packets. Another advantage of some embodiments of the present invention is a reduced transcoding that must be done. This reduction could be caused by the reduced need to decompress the compressed voice signals within the conference bridge due to the independently received speech detection signals. Further, by transmitting voice data packets in some embodiments directly between the source of the voice data packets to the destination of the voice data packets, a significant reduction in transcoding can be achieved. Yet another advantage of embodiments of the present invention is the reduced concentration of traffic that results from the implementation of the combined embodiments. In this case, the conference bridge does not receive or transmit high bandwidth voice data packets, but rather receives and transmits control signals to manage the voice conference. This also reduces any strain that might occur on the limited input/output capacity for the conference bridge.
The present invention, according to a first broad aspect, is a conference bridge including an input unit, a talker selection unit and an output unit. The input unit operates to receive at least one media data packet from at least two sources forming a media conference, each media data packet defining a media signal. The talker selection unit operates to receive speech indication signals from at least one of the sources within the media conference and to process the speech indication signals including selecting a set of the sources within the media conference as talkers. The output unit operates to output the media signals that correspond to the set of sources within the media conference selected as talkers.
The present invention, according to a second broad aspect, is a conference bridge including an input unit, an energy detection and talker selection unit and an output unit. The input unit operates to receive at least one media data packet from at least two sources forming a media conference, each media data packet defining a media signal. The energy detection and talker selection unit operates to determine at least one speech parameter corresponding to each of the media signals and select a set of the sources within the media conference as talkers based on the determined speech parameters. The output unit operates to output addressing control signals to the sources within the media conference selected as talkers. The addressing control signals comprise instructions for the sources within the media conference selected as talkers to output their media signals directly to other sources within the media conference.
The present invention, according to a third broad aspect, is a conference bridge arranged to be coupled to a packet-based network that includes at least two sources of media signals forming a media conference. In this aspect, the conference bridge includes a talker selection unit similar to that of the first broad aspect and an output unit similar to the second broad aspect.
According to a fourth broad aspect, the present invention is a packet-based apparatus arranged to be coupled to a conference bridge via a packet-based network. The packet-based apparatus including an output unit and a speech detection unit. The output unit operates to receive at least one media signal from at least one participant within a media conference and output the received media signal to the conference bridge via the packet-based network. The speech detection unit operates to process the received media signal, generate a speech indication signal based upon the received media signal and output the speech indication signal to the conference bridge.
According to a fifth broad aspect, the present invention is a packet-based apparatus arranged to be coupled to a conference bridge via a packet-based network, the apparatus including an addressing control unit and an output unit. The addressing control unit operates to receive at least one addressing control signal from the conference bridge. The output unit operates to receive at least one media signal from at least one participant within a media conference and output the received media signal, via the packet-based network, to at least one other participant within the media conference based upon the addressing control signal. In another embodiment of the fifth broad aspect, the apparatus further includes a speech detection unit similar to that of the fourth broad aspect.
In yet further aspects, the present invention is a method for controlling a media conference, a method for a packet-based apparatus to operate within a media conference controlled by a conference bridge and a network incorporating a conference bridge according to one of the first three broad aspects.
Other aspects and features of the present invention will become apparent to those ordinarily skilled in the art upon review of the following description of specific embodiments of the invention in conjunction with the accompanying figures.
Embodiments of the present invention are described with reference to the following figures, in which:
The present invention is directed to a number of different methods and apparatus that can be utilized within a packet-based voice communication system. Primarily, the embodiments of the present invention are directed to methods and apparatus used for voice conferences within packet-based communication networks, but this is not meant to limit the scope of the present invention.
One skilled in the art would understand that there are two essential sectors for the operations of a telephone session. These sectors include a control plane that performs administrative functions such as access approval and build-up/tear-down of telephone sessions and/or conference sessions and a media plane which performs the signal processing required on media (voice or video) streams such as format conversions and mixing operations. As described below, the present invention is applicable to modifications within the media plane which could be implemented with a variety of different control planes while remaining within the scope of the present invention.
Embodiments of the present invention described herein below are directed to packet-based conference bridges and packet-based apparatus coupled within a packet-based network that enable media conferences between numerous sources of media signals. These sources of media signals can be any device in which a person can output media data for transmission within the packet-based network. In, some embodiments, the packet-based apparatus are packet-based terminals coupled together with the packet-based conference bridge within a packet-based network, each of the packet-based terminals being a source for media signals for the other packet-based apparatus.
In other embodiments, one or more of the packet-based apparatus are packet-based network interfaces which couple standard non-packet-based terminals, such as PCM or analog telephone terminals, to a packet-based network, each of the non-packet-based terminals being a source for media signals for the media conference. This situation is illustrated within
In the following description, it should be understood that despite referring to the sources of media signals as packet-based terminals within the packet-based network throughout this document, such references could alternatively be directed to another form of media signal source. Further, although the packet-based apparatus described below are the packet-based terminals that also serve as the source for media signals, it should be understood that, alternatively, the packet-based apparatus could be packet-based network interfaces. Yet further, although the following description of the present invention is specific to voice data packets that contain compressed voice signals and generally to voice conferencing, this should not limit the scope of the present invention as is described in further detail herein below.
A first embodiment of the present invention, in which reduced processing is required within the packet-based conference bridge compared to well-known conference bridge designs, is now described with reference to
In operation, the talker selection block 44 receives the speech indication signals from the packet-based terminals within the voice conference, via the packet-based network 20, and performs a predefined talker selection algorithm. This talker selection algorithm could be similar to that disclosed within U.S. patent application Ser. No. 08/987,216, as incorporated by reference herein above, in which primary and secondary talkers are selected, though the present invention should not be limited to this implementation. During the selection of talkers by the talker selection block 44, the technique used depends upon the particular design. For instance, in one implementation, talkers are selected based upon the order in which participants in the voice conference begin to speak. In this case, the talkers are selected as the first terminals which send speech indication signals to the talker selection block 44 indicating that a participant local to the particular packet-based terminal has begun to speak. In other designs, the energy level of the voice signals, as indicated within the speech indication signals received from the packet-based terminals, is used by the talker selection block 44 to select the talkers. In yet other designs, some of the talkers could be pre-selected while the talker selection block 44 uses the speech indication signals simply to select the other talker(s) within the voice conference. This could be applicable in cases that a monitor or prearranged speaker for the voice conference is always selected as a talker.
Within the implementation of
It should be noted that a procedure for de-selecting talkers is another operation within the talker selection block 44. In one embodiment, the de-selection of a packet-based terminal as a talker occurs if a speech indication signal received from the particular terminal indicates that a participant local to the terminal has stopped speaking. In another embodiment, the de-selection of a packet-based terminal as a talker occurs if speech indication signals received from the particular terminal indicate the speech from a participant local to the terminal has decreased in energy. In yet another embodiment, the de-selection of a terminal as a talker is performed if a predetermined time interval is passed since the receipt of a speech indication signal that indicates that the particular terminal has a participant local to the terminal speaking.
There are numerous alternative implementations for the packet-based conference bridge according to the first embodiment of the present invention. For one, modifications within the conference bridge could be made similar to those described within U.S. patent application Ser. No. 09/475,047 entitled “APPARATUS AND METHOD FOR PACKET-BASED MEDIA COMMUNICATIONS” by Simard et al, filed on Dec. 29, 1999 and incorporated herein by reference. As indicated within U.S. patent application Ser. No. 09/475,047, there are numerous implementations for the inputting apparatus 30, talker selection and mixing block 42 and the outputting apparatus 34 possible. For instance, the jitter buffer operation could be removed from the inputting apparatus 30 in some implementations. Further, in some implementations, the inputting apparatus 30 does not need to perform a decompression operation and the outputting apparatus 34 does not need to perform a compression operation on any voice signals corresponding to talkers which do not require a mixing operation. This reduced transcoding can result in higher quality voice signals being broadcast to the participants of the voice conference as well as reduce the latency of the voice data packets through the conference bridge 28.
In yet further alternatives, the talker selection block 44 is coupled to the inputting apparatus 30 so as to prevent the unnecessary processing of voice data packets that are received from packet-based terminals that are not selected as talkers. This can be accomplished with the present invention since the selection of the talkers within the voice conference is independent of the processing of the received voice data packets.
It should be noted that although the blocks 30,34,44,46 within
In operation, the inputting apparatus 50 receives the voice data packets output from the packet-based conference bridge 28 and, along with the decompression unit 52, performs similar operations as described above for the inputting apparatus 30 within
The microphone 58 operates to receive sound waves local to the microphone 58 and generate analog voice signals corresponding to the sound waves, these analog voice signals being input to the A/D converter 60. The A/D converter 60 converts the analog voice signals to a digital format and forwards these voice signals to the compression unit 62. The compression unit 62 combined with the outputting apparatus 64 perform similar operations to those described above for the outputting apparatus 34 within
Both of the above described operations within the packet-based terminal of
There are numerous alternative implementations for the speech detector 66. For instance, in one implementation, the speech detector 66 sends the talking signal to the packet-based conference bridge 28 when it first detects the energy level of the received voice signals have exceeded the predetermined energy threshold for a first predetermined time interval and sends the listening signal to the packet-based conference bridge 28 when it detects the energy level of the received voice signals are below the predetermined energy threshold for a second predetermined time interval.
In other embodiments, the speech indication signals are not talking and listening signals respectively. Instead, the speech indication signals correspond to specific parameters extracted from the received voice signals. For instance, the speech indication signals in one implementation correspond to energy levels for the voice signals. In one example, these speech indication signals could be nil energy (0), a low energy level (E1) or a high energy level (E2). For this example, multiple energy thresholds could be used for comparison in order to classify the energy level of talking at the specific packet-based terminal. In another implementation, the extracted parameters from the voice signals could be the pitch of the voice signals. In this case, the pitch could either be directly forwarded to the talker selection block 44 or, alternatively, a determination could take place within the speech detector 66 on whether the pitch indicates that there is speech or not. In the alternative case, a talking or listening signal as described above could be sent after processing the pitch values.
It should be noted that, although not illustrated within
Although the speech detector 66 is illustrated in
In other implementations, the speech detector 66 receives the compressed voice signals from the compression unit 62 and/or the voice data packets from the outputting apparatus 64. In these cases, speech detection operations as disclosed within U.S. patent application Ser. No. 09/475,047, previously incorporated by reference, could be utilized. In one implementation, as disclosed within U.S. patent application Ser. No. 09/475,047, a Voice Activity Detection (VAD) operation is enabled at the packet-based terminal. In this embodiment, packets (and therefore compressed voice signals) that contain speech can be distinguished from packets that do not by the number of bytes contained within the packet. In other words, the size of the compressed voice signal can determine whether it contains speech. For example, in the case that the G.723.1 VoIP standard is utilized, voice data packets containing voice would contain a compressed voice signal of 24 bytes while voice data packets containing essentially silence would contain a compressed voice signal of 4 bytes. In another implementation as disclosed within U.S. patent application Ser. No. 09/475,047, the speech detector 66 could determine if there is speech within a compressed voice signal by monitoring a pitch-related sector within the corresponding voice data packet. For example, within the G.723.1 VoIP standard, the pitch sector is an 18-bit field that contains pitch lag information for all subframes. In this particular implementation, the speech detector 66 could use the pitch sector to generate a pitch value for each subframe. If the pitch value is within a particular predetermined range, the corresponding compressed voice signal is said to contain speech. If not, the compressed voice signal is said to not contain speech. This predetermined range can be determined by experimentation or alternatively calculated mathematically. It is noted that many current VoIP standard codecs include pitch information as part of the transmitted packet and a similar comparison of pitch values with a predetermined range can be used with these standards.
Although the blocks within
There are a number of advantages of the packet-based network according to the first embodiment of the present invention. For one, there is a decrease in required processing power within the conference bridge 28 compared to well-known designs due to the removal of the energy detection operation from the conference bridge. This removal of the energy detection operation further, as described above, could lead to reduced need for decoding, decompression and transcoding operations and thus to increased quality voice signals with significantly reduced latency.
As depicted within
Next within the signalling diagram of
Subsequently, terminal A 22 sends a talking signal 78 to the conference bridge 28, this talking signal 78 indicating that a participant within the voice conference local to terminal A 78 has begun to speak. In this case, since primary and secondary talkers are already selected and in this particular example only two talkers are to be selected at a time, no change occurs within the conference bridge 28 due to the receipt of talking signal 78. Essentially, the participant at the terminal A 22 is being muted within the voice conference.
Next as depicted in
It should be noted that the above descriptions of sample signalling diagrams within a network according to the first embodiment of the present invention, should not be used to limit the scope of the present invention. This signalling diagrams are included to illustrate two possible implementations of the present invention.
A second embodiment of the present invention, in which the transmission of voice data packets is routed directly between packet-based terminals according to instructions from a packet-based conference bridge, is now described with reference to
In operation, the energy detection and talker selection block 100 receives the voice signals corresponding to participants within a voice conference from the inputting apparatus 30, performs an energy detection operation on the received voice signals to determine which packet-based terminals within the voice conference have participants local to the terminals speaking, and selects the talker(s) within the voice conference based upon the results of the energy detection operation. Further, the block 100 within
The energy detection operation performed within the energy detection and talker selection block 100 could be implemented in a number of different manners. For instance, it could include one of the speech detection algorithms described above for speech detector 66. As described previously, the operation of energy detection/speech detection algorithms are disclosed within U.S. patent application Ser. No. 09/475,047 as incorporated by reference previously. The talker selection operation performed within the block 100 could also be implemented in numerous different manners. Essentially, all of the possible implementations previously described for the talker selection block 44 of
As described above, the selection of the talkers within block 100 determines which packet-based terminals within the voice conference receive the addressing control signals, the addressing control signals giving the talkers permission to transmit their voice data packets to the other terminals within the voice conference. As well, the addressing control signals preferably forward the packet-based network addresses corresponding to the other packet-based terminals that is needed to transmit the voice data packets directly. In alternative implementations, the talker(s) do not require the packet-based network addresses since they have them stored internally. In this case, the addressing control signals are simply permission signals to allow the talkers to transmit to the other packet-based terminals within the voice conference.
As an option to the conference bridge according to the second embodiment of the present invention depicted in
There are numerous alternative implementations for the packet-based conference bridge according to the second embodiment of the present invention. For one, similar to the first embodiment of the present invention, modifications within the conference bridge could be made similar to those described within U.S. patent application Ser. No. 09/475,047, previously incorporated by reference. As indicated within U.S. patent application Ser. No. 09/475,047, there are numerous implementations for the inputting apparatus 30 and energy detection and talker selection block 100 possible.
It should be noted that although the blocks 30,100,46,34 within
In the operation of the packet-based terminal of
It should be recognized that modifications are required within the inputting apparatus 50 within the packet-based terminal for the second embodiment of the present invention if more than one talker is allowed to be selected at a time. This is because, according to the second embodiment of the present invention, this would result in more than one set of voice data packets arriving at the inputting apparatus 50. In the case of primary and secondary talkers being selected by the block 100, it is possible that a particular terminal will receive voice data packets from two different talkers. In this situation, the packet-based terminal mix the primary and secondary voice signals to generate mixed voice signals.
Although depicted as separate components within
Although the blocks within
There are a number of advantages of the packet-based network according to the second embodiment of the present invention. With the direct transmission of voice data packets from one packet-based terminal to other packet-based terminals, there is a significantly lighter load on the conference bridge which translates into higher capacity. Further, the conferencing configuration of the second embodiment reduces the concentration effect in which conference bridges are traditionally significant sources and sinks of traffic within the network and redistributes the traffic more evenly within the packet-based network. Yet further, the direct transmission of the voice data packets can reduce the need for transcoding and also decrease the overall latency.
As depicted within
Next, within
As depicted in
A third embodiment of the present invention, in which the first and second embodiments of the present invention are combined, is now described with reference to
In this third embodiment of the present invention, the packet-based conference bridge 28 is reduced to simply a talker selection block 150 as illustrated in
As depicted within
Next within
As depicted in
The packet-based terminals for embodiments as described herein above is not specific to any one packet-based voice communications standard (such as VoIP G.711, G.729, G.723, etc), as it can be modified such that it can be used for numerous different standards. In one alternative embodiment, the packet-based terminal is a multi-mode terminal that allows for voice conferences of a number of different standards to utilize the single packet-based terminal.
It should be noted that, although the network described above for embodiments of the present invention was specific to networks used for voice conferencing, this should not limit the scope of the present invention. For instance, the network of packet-based terminals could be used for point-to-point communications as well as voice conferencing. In the case of a point-to-point voice communication, both terminals would select the other participant as a lone talker. This allows a point-to-point conversation to be expanded to a larger voice conference with no major configuration modifications.
In general, although the operation of the present invention was described herein above with use of the terms voice data packets and voice signals, these packets and signals can be referred to broadly as media data packets and media signals respectively. In this case, media data packets are any data packets that are transmitted via the media plane, these media data packets preferably being either audio or audio/video data packets. It is noted that use of the term voice data packets above is specific to the described embodiments in which the audio signals are voice. Further, it should be understood that video data packets may incorporate audio data packets.
Although the present invention herein above described has a single voice conference being established with the use of a network of packet-based apparatus and a conference bridge, it should be understood that in some embodiments the conference bridge it could be possible and/or one or more of the packet-based apparatus could be capable of handling a plurality of voice conferences simultaneously.
Persons skilled in the art will appreciate that there are yet more alternative implementations and modifications possible for implementing the present invention, and that the above implementation is only an illustration of this embodiment of the invention. The scope of the invention, therefore, is only to be limited by the claims appended hereto.
Claims
1. A method of selecting at least one media packet stream for use in a conference, the method comprising:
- receiving packets comprising speech indication data characterizing a plurality of media packet streams at a media packet stream selection entity; and
- selecting at least one media packet stream for use in the conference based on the speech indication data.
2. The method of claim 1, wherein selecting at least one media packet stream comprises selecting at least one media packet stream based on speech indication data comprises selecting at least one media packet stream based on speech indication data characterizing the selected media packet stream
3. The method of claim 2, wherein selecting at least one media packet stream comprises selecting at least one media packet stream based on speech indication data indicating that the selected media packet stream comprises speech data.
4. The method of claim 1, wherein selecting at least one media packet stream comprises selecting a plurality of media packet streams based on the speech indication data.
5. The method of claim 1, wherein:
- receiving packets comprising speech indication data comprises receiving packet data streams comprising media data and speech indication data characterizing the media data; and
- selecting at least one media packet stream comprises forwarding at least one selected media packet stream based on speech indication data characterizing the selected media packet stream at the media packet stream selection entity.
6. The method of claim 5, wherein selecting at least one media packet stream comprises selecting a plurality of media packet streams based on speech indication data characterizing the selected media packet streams by forwarding the plurality of selected media packet streams at the media packet stream selection entity.
7. The method of claim 6, wherein forwarding the plurality of selected media packet streams comprises:
- mixing the plurality of selected media packet streams to provide a mixed media packet data stream; and
- forwarding the mixed packet data stream.
8. The method of claim 1, wherein selecting at least one media packet stream comprises:
- selecting at least one packet-based apparatus based on the speech indication data characterizing a media packet stream available at the packet-based apparatus; and
- sending control information to the at least one selected packet-based apparatus
9. The method of claim 1, further comprising:
- receiving the control information at the at least one selected packet-based apparatus; sand
- forwarding at least one media packet stream available at the selected packet-based apparatus in response to the received control information.
10. The method of claim 4, wherein selecting a plurality of media packet streams comprises:
- selecting a plurality of packet-based apparatuses based on speech indication data characterizing the media packet streams available at the packet-based apparatuses; and
- sending respective control information to each selected packet-based apparatus.
11. The method of claim 10, further comprising:
- receiving the respective control information at each selected packet-based apparatus; and
- forwarding at least one respective media packet stream available at each selected packet-based apparatus in response to the received control information.
12. The method of claim 1, wherein selecting at least one media packet stream comprises:
- selecting at least one packet-based apparatus based on speech indication data characterizing a media packet stream available at the packet-based apparatus; and
- sending address information to the selected packet-based apparatus.
13. The method of claim 12, further comprising:
- receiving the address information at the at least one selected packet-based apparatus; and
- forwarding at least one media packet stream available at the selected packet-based apparatus to at least one packet-based apparatus in response to the received address information, the at least one packet-based apparatus being determined by the address information.
14. The method of claim 13, wherein forwarding the media packet stream comprises forwarding the media packet stream to plural selected packet-based apparatuses in response to the received address information, the plural packet-based apparatuses being determined by the address information.
15. The method of claim 4, wherein selecting a plurality of media packet streams comprises:
- selecting a plurality of packet-based apparatuses based on speech indication data characterizing the media packet streams available at the packet-based apparatuses and
- sending respective address information to each selected packet-based apparatus.
16. The method of claim 15, further comprising
- receiving the respective address information at each selected packet-based apparatus; and
- forwarding the respective media packet stream available at each selected packet-based apparatus to at least one packet-based apparatus in response to the received address information, the at least one packet-based apparatus being determined by the address information.
17. The method of claim 16, wherein forwarding the media packet stream comprises forwarding the media packet stream to plural selected packet-based apparatuses in response to the received address information, the plural packet-based apparatuses being determined by the address information.
18. The method of claim 1, further comprising sending the packets comprising speech indication data characterizing a plurality of media packet streams from at least one packet-based apparatus to the media packet stream selection entity.
19. The method of claim 1, further comprising sending the packets comprising speech indication data characterizing a plurality of media packet streams from a plurality of packet-based apparatuses to the media packet stream selection entity.
20. The method of claim 1, wherein sending the packets comprising speech indication data comprises sending packet data streams comprising media data and speech indication data from the at least one packet-based apparatus to the media packet stream selection entity.
21. The method of claim 1, wherein sending the packets comprising speech indication data comprises sending packet data streams comprising media data and speech indication data from the plurality of packet-based apparatuses to the media packet stream selection entity.
22. The method of claim 21, wherein selecting at least one media packet stream comprises forwarding at the media packet stream selection entity at least one selected media packet stream based on speech indication data, the forwarding being to packet-based apparatuses other than a packet-based apparatus from which the at least one selected media packet stream was sent to the media packet stream selection entity.
23. The method of claim 1, wherein selecting at least one media packet stream comprises selecting a media packet stream comprising video data packets.
24. The method of claim 23, wherein the video data packets comprise audio information.
25. A system for selecting at least one media packet stream for use in a conference, the system comprising:
- an speech indication data input operable to receive packets comprising speech indication data characterizing a plurality of media packet streams; and
- a media packet stream selector operable to select at least one media packet stream for use in the conference based on the speech indication data.
26. The system of claim 25, wherein the media packet stream selector is operable to select at least one media packet stream by selecting at least one media packet stream based on speech indication data characterizing the selected media packet stream.
27. The system of claim 26, wherein the media stream selector is operable to select at least one packet stream by selecting at least one media packet stream based on speech indication data indicating that the selected media packet stream comprises speech data.
28. The system of claim 24, wherein the media packet stream selector is operable to select at least one media packet stream by selecting a plurality of media packet streams based on speech indication data characterizing the selected media packet streams.
29. The system of claim 25, wherein:
- the speech indication data input is operable to receive packets comprising speech indication data by receiving packet data streams comprising media data and speech indication data; and
- the media packet stream selector is operable to select at least one media packet stream by forwarding at least one selected media packet stream based on speech indication data.
30. The system of claim 29, wherein the media packet stream selector is operable to select at least one media packet stream by selecting a plurality of media packet streams based on speech indication data by forwarding the plurality of selected media packet streams at the media packet stream selection entity.
31. The system of claim 30, wherein the media packet stream selector is operable to forward the plurality of selected media packet streams by:
- mixing the plurality of selected media packet streams to provide a mixed media packet data stream; and
- forwarding the mixed packet data stream.
32. The system of claim 25, wherein the media packet stream selector is operable to select at least one media packet stream by:
- selecting at least one packet-based apparatus based on speech indication data characterizing a media packet stream available at the packet-based apparatus; and
- sending control information to the at least one selected packet-based apparatus.
33. The system of claim 32, further comprising at least one packet-based apparatus, the packet-based apparatus comprising:
- an input operable to receive the control information; and
- an output operable to forward at least one media packet stream available at the selected packet-based apparatus in response to the received control information.
34. The system of claim 28, wherein the media packet stream selector is operable to select a plurality of media packet streams by:
- selecting a plurality of packet-based apparatuses based on speech indication data characterizing the media packet streams available at the packet-based apparatuses; and
- sending respective control information to each selected packet-based apparatus.
35. The system of claim 34, further comprising a plurality of packet-based apparatuses, each packet based apparatus comprising:
- an input operable to receive the control information; and
- an output operable to forward at least one media packet stream available at the packet-based apparatus in response to the received permission indication.
36. The system of claim 26, wherein the media packet stream selector is operable to select at least one media packet stream by:
- selecting at least one packet-based apparatus based on speech indication data characterizing a media packet stream available at the packet-based apparatus; and
- sending address information to the selected packet-based apparatus.
37. The system of claim 36, further comprising at least one packet-based apparatus, the packet-based apparatus comprising:
- an input operable to receive the address information at the packet-based apparatus; and
- an output operable to forward the media packet stream available at the packet-based apparatus in response to the received address information.
38. The system of claim 37, wherein the output is operable to forward the media packet stream by forwarding at least one media packet stream to plural selected packet-based apparatuses in response to the received address information, the plural packet-based apparatuses being determined by the address information.
39. The system of claim 38, wherein the media packet stream selector is operable to select a plurality of media packet streams by:
- selecting a plurality of packet-based apparatuses based on speech indication data characterizing media packet streams available at the packet-based apparatuses and
- sending respective address information to each selected packet-based apparatus.
40. The system of claim 39, further comprising a plurality of packet-based apparatuses, each packet based apparatus comprising:
- an input operable to receive the address information; and
- an output operable to forward at least one media packet stream available at the packet-based apparatus in response to the received address information.
41. The system of claim 40, wherein each output is operable to forward the at least one media packet stream by forwarding the at least one media packet stream to plural selected packet-based apparatuses in response to the received address information, the plural packet-based apparatuses being determined by the address information.
42. The system of claim 25, further comprising at least one packet-based apparatus operable to send the packets comprising speech indication data characterizing at least one media packet stream to the media packet stream selection entity.
43. The system of claim 25, further comprising a plurality of packet-based apparatuses, each packet-based apparatus operable to send packets comprising speech indication data characterizing at least one media packet stream to the media packet stream selection entity.
44. The system of claim 42, wherein the at least one packet-based apparatus is operable to send the packets comprising speech indication data by sending at least one packet data streams comprising media data and speech indication data to the media packet stream selection entity.
45. The system of claim 43, wherein the plurality of packet-based apparatuses is operable to send the packets comprising speech indication data by sending packet data streams comprising media data and speech indication data to the media packet stream selection entity.
46. The system of claim 45, wherein the media packet stream selector is operable to select at least one media packet stream by forwarding at least one selected media packet stream based on speech indication data characterizing the selected media packet stream, the forwarding being to packet-based apparatuses other than a packet-based apparatus from which the at least one selected media packet stream was sent to the media packet stream selector.
47. The system of claim 25, wherein the media packet stream selector is operable to select at least one media packet stream by selecting a media packet stream comprising video data packets.
48. The system of claim 47, wherein the video data packets comprise audio information.
49. A packet-based apparatus arranged to be coupled to a conference bridge via a packet-based network, the packet-based apparatus comprising:
- a media signal input operable to receive at least one media signal from at least one participant in a media conference;
- a media packet stream output operable to output at least one media packet stream based on the received media signal to the packet-based network;
- at least one media signal processor operable to generate at least one speech indication based on the received media signal; and
- an output operable to output the speech indication to the packet-based network.
50. A packet-based apparatus according to claim 49, wherein the media signal processor is operable to generate the speech indication by determining whether the received media signal contains speech.
51. A packet-based apparatus according to claim 49, wherein the media signal processor is operable to determine whether the media signal contains speech by measuring an energy level of the media signal.
52. A packet-based apparatus according to claim 51, wherein the media signal processor is operable to determine whether the media signal contains speech by comparing the measured energy to a predetermined energy threshold.
53. A packet-based apparatus according to claim 52, wherein the media signal processor is operable to determine whether the media signal contains speech by measuring the energy level over a predetermined time interval.
54. A packet-based apparatus according to claim 52, wherein the media signal processor is operable to generate a speech indication comprising a talking indication when the received media signal is determined to contain speech.
55. A packet-based apparatus according to claim 50, wherein the media signal processor is operable to generate a speech indication that characterizes a parameter of the media signal.
56. A packet-based apparatus according to claim 51, wherein the media signal processor is operable to determine whether the received media signal contains speech by comparing the measured energy level to a set of predetermined energy thresholds.
57. A packet-based apparatus according to claim 56, wherein the media signal processor is operable to generate a speech indication which comprises a value representative of a threshold below which the measured energy level lies.
58. A packet-based apparatus according to claim 50, wherein the media signal processor is operable to determine whether the received media signal contains speech by measuring a pitch of the media signal.
59. A packet-based apparatus according to claim 58, wherein the media signal processor is operable to generate a speech indication comprising a value characterizing the measured pitch.
60. A packet-based apparatus according to claim 59, wherein:
- the media signal processor is operable to determine whether the received media signal contains speech by comparing the measured pitch to a predetermined pitch threshold; and
- wherein the speech indication comprises a talking indication if the received media signal is determined to contain speech.
61. A packet-based apparatus according to claim 50, wherein the media signal processor is further operable:
- to compress the received media signal into a media packet stream; and
- to couple the media packet stream to the output for output on the packet-based network.
62. A packet-based apparatus according to claim 61, wherein the media signal processor is operable to determine if the received media signal contains speech by determining if the number of bytes of the media packet stream indicates that the received media signal contains speech.
63. A packet-based apparatus according to claim 61, wherein the media signal processor is operable to determine if the received media signal contains speech, by evaluating a set of bits of the media packet stream.
64. A packet-based apparatus according to claim 61, wherein the media signal processor is operable to determine if the received media signal contains speech, by evaluating at least one bit of the media packet stream.
65. A packet-based apparatus according to claim 64, wherein the media signal processor is operable to evaluate at least one bit corresponding to pitch information of the media packet stream.
66. A packet-based apparatus according to claim 49, further comprising a microphone operable to receive audio signals from the at least one participant within the media conference to generate the media signal, the microphone being coupled to the media signal input.
67. A packet-based network interface arranged to be coupled between a packet-based network and a non-packet-based network, the network interface comprising a packet-based apparatus according to claim 49, wherein the media signal input is operable to receive the media signal from the at least one participant in the media conference from a non-packet-based terminal via the non-packet-based network.
68. A packet-based apparatus according to claim 49, wherein the media signal processor is operable to couple the media packet stream to a conference bridge via the output.
69. A packet-based apparatus according to claim 49, further comprising an address information input operable to receive address information from the conference bridge, the output being coupled to the address information input and being operable to output a media packet stream, via the packet-based network, to at least one other participant in the media conference based upon the address information.
70. A packet-based apparatus according to claim 69, wherein the output is operable to output a media packet stream based on at least one packet-based network address in the address information.
71. A method of operating a packet-based apparatus coupled to a conference bridge via a packet-based network, the method comprising:
- receiving at least one media signal from at least one participant in a media conference;
- outputting at least one media packet stream based on the received media signal to the packet-based network;
- generating at least one speech indication based on the received media signal; and
- outputting the speech indication to the packet-based network.
72. A method according to claim 71, wherein generating at least one speech indication comprises generating the speech indication by determining whether the received media signal contains speech.
73. A method according to claim 72, wherein generating at least one speech indication comprises measuring an energy level of the media signal.
74. A method according to claim 73, wherein generating at least one speech indication comprises comparing the measured energy to a predetermined energy threshold.
75. A method according to claim 74, wherein generating at least one speech indication comprises measuring the energy level over a predetermined time interval.
76. A method according to claim 75, wherein generating at least one speech indication comprises generating a speech indication comprising a talking indication when the received media signal is determined to contain speech.
77. A method according to claim 72, wherein generating at least one speech indication comprises generating a speech indication that characterizes a parameter of the media signal.
78. A method according to claim 73, wherein determining whether the received media signal contains speech comprises comparing the measured energy level to a set of predetermined energy thresholds.
79. A method according to claim 78, wherein generating at least one speech indication comprises generating a speech indication which comprises a value representative of a threshold below which the measured energy level lies.
80. A method according to claim 72, wherein generating at least one speech indication comprises determining whether the received media signal contains speech by measuring a pitch of the media signal.
81. A method according to claim 78, wherein generating at least one speech indication comprises generating a speech indication comprising a value characterizing the pitch the measured pitch.
82. A method according to claim 80, wherein:
- determining whether the received media signal contains speech comprises comparing the measured pitch to a predetermined pitch threshold; and
- wherein the speech indication comprises a talking indication if the received media signal is determined to contain speech.
83. A method according to claim 72, further comprising:
- compressing the received media signal into a media packet stream; and
- coupling the media packet stream to the output for output on the packet-based network.
84. A method according to claim 83, wherein determining if the received media signal contains speech comprises determining if the number of bytes of the media packet stream indicates that the received media signal contains speech.
85. A method according to claim 83, wherein determining if the received media signal contains speech comprises evaluating a set of bits of the media packet stream.
86. A method according to claim 83, wherein determining if the received media signal contains speech comprise evaluating at least one bit of the media packet stream.
87. A method according to claim 86, comprising evaluating at least one bit corresponding to pitch information of the media packet stream.
88. A method according to claim 71, further comprising receiving audio signals from the at least one participant within the media conference via a microphone, the microphone being coupled to the media signal input.
89. A method according to claim 71, wherein:
- the packet-based apparatus comprises a packet-based network interface coupled between a packet-based network and a non-packet-based network; and
- receiving at least one media signal by receiving the media signal from the at least one participant in the media conference from a non-packet-based terminal via the non-packet-based network.
90. A method according to claim 71, further comprising coupling the packet media stream to a conference bridge.
91. A method according to claim 71, further comprising:
- receiving address information from the conference bridge; and
- outputting a media packet stream, via the packet-based network, to at least one other participant in the media conference based upon the address information.
92. A method according to claim 89, comprising outputting a media packet stream based on at least one packet-based network address in the address information.
93. In a packet data network having a plurality of media signals entering the packet data network at respective ingress points, a method of selecting at least one media packet stream for use in a conference, the method comprising:
- monitoring a characteristic of the media signals at the ingress points before packetizing the media signals to provide the media packet streams; and
- selecting the at least one media packet stream based on the monitoring.
94. The method of claim 93, wherein monitoring a characteristic comprises monitoring speech activity of the media signals at the ingress points.
95. The method of claim 93, further comprising:
- communicating results of the monitoring to a media packet stream selector connected to the packet data network; and
- selecting the at least one media packet stream at the media packet stream selector.
96. The method of claim 93, further comprising:
- packetizing the media signals at the ingress points to provide the media packet streams; and
- transmitting the media packet streams to the media packet stream selector.
97. The method of claim 96, further comprising:
- forwarding selected media packet streams from the media packet stream selector to egress points of the packet data network.
98. The method of claim 93, further comprising:
- communicating results of the media packet stream selection to the ingress points; and
- forwarding selected media packet streams at the ingress points to egress points of the packet data network in accordance with the results of the media packet stream selection.
99. A packet data network comprising:
- a plurality of ingress points, a respective media signal entering the packet data network at each ingress point and each ingress point being operable to monitor a characteristic of its respective media signal before packetizing the media signal to provide a respective media packet stream;
- a plurality of egress points corresponding to the ingress points;
- a media packet stream selector operable to select, based on the monitoring, at least one media packet stream for forwarding to the egress points.
100. The network of claim 99, wherein each ingress point is operable to monitor speech activity in its respective media signal before packetizing the media signal.
101. The network of claim 99, wherein:
- the ingress points are operable to communicate the media packet streams and results of the monitoring to the media packet stream selector; and
- media packet stream selector is operable to forward the at least one selected media packet stream to the egress points.
102. The network of claim 99, wherein:
- the media packet stream selector is operable to communicate results of the media packet stream selection to the ingress points; and
- the ingress points are operable to forward the at least one selected media packet stream to egress points of the packet data network in accordance with the results of the media packet stream selection.
103. The network of claim 99, wherein:
- the media packet stream selector is operable to communicate results of the media packet stream selection by sending address information to the ingress points; and
- the ingress points are operable to address respective media packet streams based on the address information to selected egress points of the packet data network.
104. A conference bridge comprising:
- means for receiving media data packets from at least two sources forming a media conference, each media data packet comprising a media signal and packet overhead, wherein the means for receiving media data packets is adapted to remove the packet overhead;
- means for receiving speech indication signals from at least one of the sources within the media conference;
- means for processing the speech indication signals including selecting a set of the sources within the media conference as talkers; and
- means for outputting the media signals that correspond to the set of sources within the media conference selected as talkers.
Type: Application
Filed: Jul 15, 2011
Publication Date: Nov 3, 2011
Applicant: Nortel Networks Limited (Mississauga)
Inventors: Frederic F. Simard (Nepean), David R. Cuddy (Ottawa), Philip K. Edholm (Pleasanton, CA)
Application Number: 13/183,732