METHOD FOR CONDUCTING AN AUDIO AND/OR VIDEO CONFERENCE
A method for conducting audio and/or video conference, in which one of the terminals that is coupled to a central conference unit takes on the role of a media server, and this occurs under the control of said central conference control unit.
Latest RingCentral, Inc. Patents:
The present application is the U.S. national stage application of International Patent Application No. PCT/EP2018/059464, filed on Apr. 12, 2018, claiming priority to German Patent Application No. 10 2017 108 017.1, filed on Apr. 13, 2017.
FIELDThe invention relates to a method for conducting an audio and/or video conference, in which multiple terminals are coupled to a central conference control unit via a data network. At least one predefined terminal of the terminals can comprise a data processing unit, which enables the terminal to participate in the conference by operating an application, particularly a browser application. Typically, such a predefined terminal is a personal computer with an installed camera and an installed microphone (or connected camera and microphone) and a screen as well as a speaker ([installed] or connected). The data processing unit can equally be an appropriately equipped tablet computer device, a smartphone or similar portable terminal. Using the browser, the personal computer (or other device) is conference-enabled: by using the camera, images are captured, by using the microphone, speech is recorded. The user's own image is displayed and/or the images of other participants are displayed on the screen, and speech output is delivered through the microphone.
BACKGROUNDThe prior standard practice for audio and/or video conferencing was that multiple participants (clients) logged into the central conference unit using their browser, which is typically located in a cloud. This is explained by an example based on
The disadvantage to a procedure such as
US 2014/0122600 A1 discloses a conference server that takes on the task of mixing the audio or video data. The individual browsers, running JavaScript, provide a corresponding capability to participate in the conference, but not via one's own mixer.
EP 1 014 666 A1 discloses a method to implement multipoint connections (i.e., conferences) between multiple terminals of an H.323 data communication network, in which the conferencing unit executes the opening of user data channels (i.e., of audio and video channels) between the terminals via the conferencing unit.
EP 1 001 596 A2 describes a multimedia terminal for telephony that itself enables multipoint connections.
US 2004/0210637 A1 describes the envisioning of external conference bridges in addition to other paths.
At the computer link https://Webrtchacks.com/web-audio-conference/ on Apr. 3, 2017, an article was available from which the principle emerged of a browser taking over the role of a central conferencing control unit to conduct local audio conferences. The article calls this a “poor man's conference solution” because no server is used. It is not known from the article how such audio conferencing can be accomplished.
SUMMARYIt is the object of embodiments of the present invention to provide a method for conducting an audio and/or video conference which preferably occurs using clients, provided via an application, particularly a browser application, and wherein the method achieves the most efficient possible use of the data networks and/or the most optimal security during the exchange of audio and/or video data (user data).
Embodiments of the invented method for conducting an audio and/or video conference, in which multiple terminals are coupled with a central conference control unit via a data network, and wherein at least one first terminal comprises a data processing unit, which enables the terminal to participate in the conference by running an application, particularly a browser application, can have the invented characteristic that a predefined terminal of the at least one first terminal receives audio and/or video data streams (i.e., user data streams) from other terminals upon meeting at least one predefined criterion based on control commands (signaling) by the central conference control unit, and:
a) mixes them by means of the application, particularly a browser application (particularly also with audio and/or video streams self-generated by the first terminal) and sends them back in mixed form to the other terminals and/or
b) relays to the other terminals a selection of the received audio and/or video data and the audio and/or video data generated by the at least one first terminal.
Embodiments of the invention can be based on the principle of using a (central or distributed) media server outside of the clients, which typically is the case in central conferences; but in this case one of the clients (that is, the predefined terminal of the at least one first terminal) functions itself as the media server or media node, which normally is the case only in conferencing methods that do not involve a conference server. This allows the data exchange between the individual browsers and/or the associated terminals to occur, but without having to forgo the advantages of a central conference control unit (with corresponding conference room user experience, user authentication, etc.). If all terminals are located in the same company network, for example, security-sensitive audio and/or video data do not have to leave the company network. Otherwise, if the audio and video data between clients and a central WebRTC conference server are transmitted encrypted in WebRTC standard, then the audio packets must be decrypted, for example by the conference server (in the Cloud) to allow them to be mixed, then again re-encrypted and sent back to the clients. That means complete “end-to-end confidentiality” is often not ensured. It is apparent from this that, for reasons of confidentiality, it can be advantageous to keep the media data local in the local area network (LAN). Further advantages are the significant reduction of the required wide area network (WAN) bandwidth, that is, the bandwidth between the location/company network/clients and the data center of the Cloud service provider of the WebRTC conference solution.
The application is preferably a WebRTC (Web Real-Time Control) browser application, that is, a web real-time control browser application. It can thereby be set up based on the presence of WebRTC technology. The one predefined terminal of the at least one terminal must only be equipped with a suitable application plug-in. An application plug-in is not to be confused here with a browser plug-in, because WebRTC browsers (e.g., Google Chrome® browser, Mozilla Firefox® browser, etc.) do not require browser plug-ins by definition in W3C WebRTC standardization (World Wide Web Consortium), at least not for basic WebRTC functionality.
In one variation, the one at least one first terminal receives audio and/or video data streams from all other terminals and mixes them or subjects them to a selection. The entire conference is thereby in principle supported by the predefined one first terminal in regard to user data streams. This further preferably provides that all audio and video data streams are running exclusively via the predefined first terminal, so that there are no longer parallel user data streams. In this way, security can be ensured with appropriate placement of the clients and/or their browsers in a closed network (the company network, for example).
In another variation, the one at least one first terminal receives audio and/or video data streams from only a subgroup of the other terminals and mixes them or subjects them to a selection. In this case, a subconference can be conducted: Those participants that are sitting at terminals in a company network can communicate separately, for example to agree in regard to their conduct towards a participant outside of the company network with whom they are negotiating. Here also, the security-related data remains in the closed network, i.e., the company network.
It is therefore preferable that the predefined criterion — which when met initially causes the central conference control unit to make the predefined terminal of the at least one first terminal execute the receiving and mixing/selecting process — includes those other terminals from which audio and/or video data streams are received being coupled with each other in a common Local Area Network, LAN.
In one preferred embodiment of the invention, each terminal must log in to the central conference control unit to participate in the conference. The predefined terminal transmits (upon such a login) or transmitted (with an earlier first login) the information to the central conference control unit that it has been enabled to receive audio and/or video data streams and a) to mix and b) to process a selection and that the predefined criterion includes such information having been transmitted. In this way, the central conference control unit can generally be equipped so that it mixes the user data streams or subjects them to a selection process. Only when a terminal logs in with the appropriately configured browser, which it provides via a suitable browser extension to be able to conduct the conference itself, will this task be transferred to the corresponding browser.
In another aspect of the invention, a computer program product is provided to provide or extend (the latter in the form of a plug-in) an application, particularly a browser application on a first data processing unit. This computer program product can include code stored on a non-transitory computer readable medium (e.g flash memory, a hard drive, etc.) that is configured to confer the capability to the first data processing unit to receive audio and/or video data from at least one second data processing unit and to mix and/or process a selection of audio and/or video data received, on one hand, from the second data processing unit and provided, on the other hand, by the first data processing unit and/or another second data processing unit, and to relay the mixed and/or selected audio and/or video data to the at least one second data processing unit. This computer program product therefore ensures that a browser is running which has the above-described properties, i.e., the central conference control unit accepting the task of mixing and/or conducting a selection from the audio and/or video data. In particular, the browser application can be enabled by the computer product to state during the initial login to the central conference control unit that the browser is capable of conferencing. The functionality of the computer program product can be defined by code that defines a method that is performed when a processor of the first data processing unit executes the code.
In a further aspect of the invention, a computer program product is provided to provide or extend (in the form of a plug-in) an application, particularly a browser application, to a second data processing unit that is configured to give the second data processing unit the capability to exchange control signals with a central data processing unit and to transmit audio and/video signals to a first data processing unit outside of the central data processing unit. This computer program product allows the browser, which itself is not conferencing-enabled, to use the conferencing-enabled browser application under the control of the central data processing unit. Functionality of the computer program product can be defined by code that defines a method that is performed when a processor of the second data processing unit executes the code.
The invention provides, in a still further aspect, a computer program product which serves to provide or extend (in the form of a plug-in) an application, particularly a conferencing application, to a third data processing unit and is configured to confer the third data processing unit with the capability to obtain information from a first data processing unit, which states that this first data processing unit is capable of receiving audio and/or video data from second data processing units and to mix and/or run a selection process and to transmit the mixed and/or selected audio and/or video data to additional data processing units, whereby the third data processing unit, upon receipt of such information, transfers the task of such mixing and/or running a selection process to the first data processing unit, which had transmitted the information.
Other details, objects, and advantages of the invention will become apparent as the following description of certain present preferred embodiments thereof and certain present preferred methods of practicing the same proceeds.
Preferred embodiments of the invention are described in detail with reference to the drawings, in which:
In the embodiment of the invented method shown based on
As an alternative or in addition to mixing, it is possible for a selection to be made by the conference-enabled browser EP 1K, so that, for example, whenever a participant is currently speaking, their image is displayed on the browsers, and otherwise not. The advantage of the procedure according to
The configuration of conferencing-enabled browsers EP 1K is described in greater detail below using
Initially it is provided via a client application, for example a JavaScript application 22, that gives it the capability to communicate with the central conference control application K-A. It can therefore take on the role of a client. As a plug-in or installed in the client application, a conferencing client application 24 is provided, possibly also in JavaScript, that gives the client the capability of signaling that it is conferencing-enabled. While the client application 22 generally responds to a WebRTC client signaling with the conference application K-A (see reference designation “SC” “Signaling Client”), the additional application 24 responds that an additional signaling of the conference client is occurring (“Signaling Conference Client”, SKC), and for example also accepts the teleconferencing commands of the central conference control unit (media server role in EP1K).
The browser 25 comprises a web interface, particularly a web real-time control interface 26: WebRTC API, Web Real-Time Control Application Programming Interface. It comprises a unit for managing the session and the conference (“Session Management and Conference Management”) 28. Furthermore, it requires a corresponding voice unit 30 (“Voice Engine” with corresponding codecs) for coding and decoding and the same for video data in the unit 32 (“Video Engine” with corresponding codecs). Examples of codecs are G.711, Opus, H.264, VP8, etc. In addition, there is a unit 34 for mixing and/or connecting (selecting, routing). The transmission interface (carrier interface) 36 provides for the routing of the data and assigns a client/browser OS interface to the additional browsers EP 2 and EP 3.
The browsers EP 2 and EP 3, (where EP stands for “end point,” as discussed above, e.g., telecommunication participant), also exchange corresponding signals SC with the conference application K-A. By signaling SKC, the conference application decides that the browser EP 1K should conduct the media for the conference. In response, the browsers EP 2 and EP 3 transmit their user data N2 and N3 to the unit 34, where this user data N2 and N3 is mixed or undergoes selection with corresponding user data N1 generated by units 30 or 32, wherein the mixed data is sent back as data M2 and M3 or the selected (“switched”) data is sent back as SW2 and SW3. The signal path 20 displayed in
The data exchange can be explained in detail as shown below:
Initially, in a Step S10, the query is sent by the browser EP 1K to the conference control application K-A to enter the conference room or to initiate it (see “EP 1K JoinRequest”) in
In the next Step S12, the conference application K-A asks the browser EP 1K (in its role as “Media Node”) to provide a corresponding media resource in EP 1K (see “Conf Create Request” in
In Step S20, the browser EP 2 asks whether it can enter the conference (“EP 2 Join Request”). The browser EP K1 (in its role as “Media Server”) receives the request then in Step S22 from the conference control unit to connect EP2 into the conference (Conf Add Request) and confirms this on its side in Step S24 (“Conf Add Confirm”). At that point, the conference application K-A sends confirmation to the browser EP 2 in Step S26 that the conference with EP 1K was entered. Like the Steps S10, S12, S14, S16, in regard to the exchange of the IP and port receiving address between client EP 1K and the conference resource (in client EP 1K), the receiver IP addresses and ports are exchanged in the Steps S20, S22, S24 and S26 also between EP2 and the conference resource (in EP 1K)
Corresponding Steps S28, S30, S32, and S34 also occur with the third browser EP 3. After completing the Steps S26 and/or S34, the browsers EP 2 and EP 3 then send their audio and video data in the Steps S36 and S38 to the conference resource of the browser EP 1K. The arrow S40 shows that the browser EP 1K internally (in its role as client) contributes its own audio and video data for mixing, which was recorded with the assigned microphone or the assigned camera. The Client EP 1K does not have to send its locally expressed media data via the LAN interface (IP address) to the media resource of the same EP 1K, but can perform this internally in the browser. In the mixing Step S42, the user data received is mixed in such a way that corresponding data can be issued in Step S44 to the browser EP 1K itself or can be transmitted in Steps S46 and S48 to the browsers EP 2 and EP 3 and can be issued from there.
The browsers shown in the figures to this point are all conference participants, each running on a respective communication device (e.g. a communication terminal such as a laptop computer, personal computer, etc.). The invention is then also applicable if a conference is already ongoing and only one subgroup in a company network would like to conduct a subconference. In this case, the requesting user needs the appropriate authorization and graphical user interface (GUI) controls on his browser client (e.g. “Split Local Conference”), connected with the corresponding payload reconfiguration orders, sent via the central conference control application.
While certain present preferred embodiments of the communication apparatus, communication system, communication device, non-transitory computer readable medium, and embodiments of methods for making and using the same have been shown and described above, it is to be distinctly understood that the invention is not limited thereto but may be otherwise variously embodied and practiced within the scope of the following claims.
Claims
1-10. (canceled)
11. A method for conducting a conference, the method comprising:
- receiving information indicating that a terminal is running an application that enables the terminal to support conferences;
- determining that the application is capable of assuming a role of a media server based on a conference type, a conference mode, a supported codec, and a conference credential for authentication;
- sending control commands to the terminal.
12. The method of claim 11, wherein the control commands comprise commands for:
- receiving a first audio or video data stream from another terminal;
- mixing the first audio or video data stream from the other terminal and a second audio and/or video data stream from the terminal to generate a mixed audio or video data stream; and
- sending the mixed audio or video data streams to the other terminal.
13. The method of claim 11, wherein the control commands comprise commands for:
- receiving a first audio or video data stream from another terminal;
- selecting an audio or video data stream from either the first audio or video data stream from the other terminal or a second audio or video data stream from the terminal to identify a selected audio or video data stream; and
- sending the selected audio or video data stream back to the other terminal.
14. The method of claim 11, wherein the method further comprises:
- sending, to the terminal, a confirmation that the application of the terminal can assume the role of the media server.
15. The method of claim 11, wherein the application is a Web Real-Time Communication (WebRTC) browser application.
16. The method of claim 12, wherein the terminal and the other terminal are communicatively connected to each other through a Local Area Network (LAN).
17. The method of claim 12, wherein the method further comprises:
- receiving log in information from the terminal and the other terminal.
18. The method of claim 11, wherein the conference type comprises an audio conference, a video conference, or a screen share conference.
19. The method of claim 11, wherein the conference mode comprises a conference mixing or a selective forwarding unit.
20. The method of claim 11, wherein the supported codec comprises G.711, OPUS, H.264, VP8, or VP9.
21. A non-transitory computer-readable medium storing instructions that, when executed by a processor, cause the processor to:
- receiving information indicating that a terminal is running an application that enables the terminal to support conferences;
- determining that the application is capable of assuming a role of a media server based on a conference type, a conference mode, a supported codecs, and a conference credential for authentication;
- sending control commands to the terminal.
22. The non-transitory computer-readable medium of claim 21, wherein the control commands comprise commands for:
- receiving a first audio or video data stream from another terminal;
- mixing the first audio or video data stream from the other terminal and a second audio and/or video data stream from the terminal to generate a mixed audio or video data stream; and
- sending the mixed audio or video data streams to the other terminal.
23. The non-transitory computer-readable medium of claim 21, wherein the control commands comprise commands for:
- receiving a first audio or video data stream from another terminal;
- selecting an audio or video data stream from either the first audio or video data stream from the other terminal or a second audio or video data stream from the terminal to identify a selected audio or video data stream; and
- sending the selected audio or video data stream back to the other terminal.
24. The non-transitory computer-readable medium of claim 21, wherein the instructions further comprise:
- sending, to the terminal, a confirmation that the application of the terminal can assume the role of the media server.
25. The non-transitory computer-readable medium of claim 21, wherein the application is a Web Real-Time Communication (WebRTC) browser application.
26. The non-transitory computer-readable medium of claim 22, wherein the terminal and the other terminal are communicatively connected to each other through a common Local Area Network (LAN).
27. The non-transitory computer-readable medium of claim 22, wherein the instructions further comprise:
- receiving log in information from the terminal and the other terminal.
28. The non-transitory computer-readable medium of claim 21, wherein the conference type comprises an audio conference, a video conference, or a screen share conference.
29. The non-transitory computer-readable medium of claim 21, wherein the conference mode comprises a conference mixing or a selective forwarding unit.
30. The non-transitory computer-readable medium of claim 21, wherein the supported codec comprises G.711, OPUS, H.264, VP8, or VP9.
Type: Application
Filed: Aug 17, 2022
Publication Date: Dec 8, 2022
Applicant: RingCentral, Inc. (Belmont, CA)
Inventor: Karl Klaghofer (Munchen)
Application Number: 17/889,698