Voice-based entertainment activity in a networked enviorment

Info

Publication number: 20110224000
Type: Application
Filed: Jan 14, 2011
Publication Date: Sep 15, 2011
Inventors: James Toga (Wayland, MA), Siddhartha Gupta (Boston, MA), Kenneth Cox (Marlborough, MA), Rafal K. Boni (Needham, MA)
Application Number: 12/930,713

Abstract

Methods and apparatus, including computer program products, for voice-based entertainment activity in a networked environment. A method includes, in a network of interconnected computers, establishing a connection between the client and a server, the client comprising a web browser and an audio input/output device, receiving a list of entertainment games over the connection from the server, selecting an entertainment game from the list, receiving a globally unique identifier from the server corresponding to the selected entertainment game, the globally unique identifier having an associated voice channel and text channel, in response to a client request to initiate game play, sending audio to the server through the audio input/output device and receiving audio from the server through the audio/input device, in response to a client request to terminate game play, terminating the associated voice channel, and receiving text messages from the server over the associated text channel.

Description

Description

CROSS-REFERENCE TO RELATED APPLICATIONS

This application claims the benefit of U.S. Provisional Application No. 61/295,739, filed Jan. 17, 2010, and titled TECHNIQUES FOR VOICE-BASED GAMES IN A NETWORKED ENVIRONMENT, which is incorporated by reference in its entirety.

BACKGROUND OF THE INVENTION

The present invention relates to data processing by digital computer, and more particularly to voice-based entertainment activity in a networked environment.

The Internet continues to make available ever-increasing amounts of information that can be stored in databases and accessed therefrom. Additionally, with a proliferation of portable terminals (e.g., notebook computers, cellular telephones, personal data assistants (PDAs), smartphones and so forth), users are becoming more mobile, and hence, more reliant upon information accessible via the Internet. Accordingly, the connectivity available via the Internet is frequently used to chat, socialize and communicate with friends and family.

The advent of the Internet and networked applications has also been marked by an increase in the variety of entertainment activities, such as multi-user applications where users may interact with each other and not just with computers. In more recent times, voice communications between users have been added as an additional feature to a number of these applications. More specifically, voice communications generally have been added as a conversational medium, and not an element of a user interface (UI), for interacting with the application or entertainment activity—the modes and forms of user input are still generally limited to modes such as text and pointing.

SUMMARY OF THE INVENTION

The present invention provides methods and apparatus, including computer program products, for voice-based entertainment in a networked environment.

In general, in one aspect, the invention features a method including, in a network of interconnected computers, establishing a connection between the client and a server, the client comprising a web browser and an audio input/output device, receiving a list of entertainment games over the connection from the server, selecting an entertainment game from the list, receiving a globally unique identifier from the server corresponding to the selected entertainment game, the globally unique identifier having an associated voice channel and text channel, in response to a client request to initiate game play, sending audio to the server through the audio input/output device and receiving audio from the server through the audio/input device, in response to a client request to terminate game play, terminating the associated voice channel, and receiving text messages from the server over the associated text channel.

In another aspect, the invention features a method including, in a network of interconnected computers, establishing a connection between a client and a server, the client comprising a web browser and an audio input/output device, sending a list of entertainment games over the connection to the client, in response to a selected entertainment game received from the client, sending a globally unique identifier from the server corresponding to the selected entertainment game, the globally unique identifier having an associated voice channel and text channel, receiving audio from the client over the voice channel and sending audio to the client over the voice channel, in response to client request to terminate game play, terminating the voice channel, and sending messages to the client over the text channel.

Other features and advantages of the invention are apparent from the following description, and from the claims.

BRIEF DESCRIPTION OF THE DRAWINGS

The invention will be more fully understood by reference to the detailed description, in conjunction with the following figures, wherein:

FIG. 1 is a block diagram.

FIG. 2 is a flow diagram.

FIG. 3 is a flow diagram.

FIG. 4 is an exemplary user interface (UI).

FIG. 5 is an exemplary UI.

FIG. 6 is an exemplary UI.

FIG. 7 is an exemplary UI.

FIG. 8 is an exemplary UI.

FIG. 9 is an exemplary UI.

FIG. 10 is an exemplary database structure.

FIG. 11 is a flow diagram.

FIG. 12 is a flow diagram.

FIG. 13 is a flow diagram.

FIG. 14 is a flow diagram.

FIG. 15 is a flow diagram.

Like reference numbers and designations in the various drawings indicate like elements.

DETAILED DESCRIPTION

Voice-based entertainment activities, such as voice-based games for example, refer to those entertainment activities in which a primary form of input for play is through a user's voice.

As shown in FIG. 1, an exemplary system 10 includes a client system 12 linked to a network of interconnected computer systems (e.g., Internet) 14. The system 10 includes a server system 16 linked to the network 14. The client system 12 can include a processor 20, a memory 22 and one of more input/output devices 24, such as audio microphone, keyboard and mouse. Memory 22 includes an operating system 26, such as Linux, MAC Snow Leopard® or Windows® 7, and a web browser 28, such as Mozilla Firefox®, Opera® or Internet Explorer®.

Server system 16 can include a processor 30 and memory 32. Memory 32 includes an operating system 34, such a Linux or Windows® and a voice-based game process 100. In a preferred embodiment, the server 16 is a Vivox Network voice game IVR system from Vivox, Inc., of Natick, Mass. The Vivox Network is the backbone of the number one integrated voice platform for the Social Web, which supports over 35 million users and over 3 billion voice chat minutes per month. It's built based on a cloud computing paradigm.

As shown in FIG. 2, the process 100 includes establishing (102) a connection between the client and a server, the client comprising a web browser and an audio input/output device.

Process 100 receives (104) a list of entertainment games over the connection from the server and selects (106) an entertainment game from the list.

Process 100 receives (108) a globally unique identifier from the server corresponding to the selected entertainment game, the globally unique identifier having an associated voice channel and text channel. The globally unique identifier can have an associated database table to maintain game information. In implementations, the globally unique identifier can be received by a number of users and the server can maintain a history of game interaction of the users.

In response to a client request to initiate game play, process 100 sends (110) audio to the server through the audio input/output device and receives (112) audio from the server through the audio/input device. The sent audio to the server through the audio input/output device can enable game play. The audio can recorded by the server. The recorded audio can be sent by the server upon a request from the client. In implementations, the client used to initiate game play need not be the same client used to complete subsequent portions of the game play. For example, a game can be started on a personal computer and finished on an iPhone device from any physical location.

In response to a client request to terminate game play, process 100 terminates (114) the associated voice channel.

Process 100 receives (116) text messages from the server over the associated text channel. The text message can be an interactive chat between two or more users.

As shown in FIG. 2, a process 200 includes establishing (202) a connection between a client and a server, the client comprising a web browser and an audio input/output device.

Process 200 sends (204) a list of entertainment games over the connection to the client.

In response to a selected entertainment game received from the client, process 200 sends (206) a globally unique identifier from the server corresponding to the selected entertainment game, the globally unique identifier having an associated voice channel and text channel. The globally unique identifier can have an associated database table to maintain game information. The globally unique identifier can be received by a number of users and the server can maintain a history of game interaction of the users.

Process 200 receives (208) audio from the client over the voice channel and sends (210) audio to the client over the voice channel. The received audio from the client can enable game play. The client audio can be recorded and the recorded audio can be sent to the client upon a client request. The server can maintain a history of game play.

In response to a client request to terminate game play, process 200 terminates (212) the voice channel and sends (214) messages to the client over the text channel. The messages can include live feeds of information pertaining to the game. The live feeds of information can include one or more of completed game turns, game scores and leader board changes.

The processes described above enable new ways for interaction with voiced-based games. Voice-based games typically involve quick playing turns, and may be played with short overall play time—this makes them particularly attractive for casual or spontaneous participation. Voice-based games are inherently social—social interaction is an aspect that attracts users to play the game. By their nature, such games can also have a low cost for providing content, since a large part of the content (e.g., the segments) may be provided by the participants as part of the play—this makes them particularly advantageous in terms of cost of content. Voice-based games are readily adaptable to different player demographics or interests. For example, a game of Antakshari can be adapted simply by stating a particular genre of music for play, or a language for lyrics, and so forth. These aspects among others make such applications particularly attractive in many Internet applications.

Antakshari is a voice-based game that has been implemented incompletely and with limited success for the Internet, in part due to limitations that the present techniques address. For example, one limitation of this game is that user play input is limited to typing the text of a segment of lyrics to a song. Another limitation is that scoring is limited to determining when a player has failed to make a correct response within a timeout period. This limited form of Antakshari for the Internet attracts a number of players, in spite of many burdens and deficiencies that are readily apparent, including the following. There is no musical content to the player input, making play less engaging. Text input by typing is inherently less spontaneous and convenient than voice input. Typing is also a computer skill that with which many potential players may not have much facility, particularly for rapid play. Text input is particularly problematic for play from hand-held or hands-free devices such as web-enabled cell phones. Many social attributes of the game, such as good-natured vocal heckling and the social effects of spontaneous laughter, are lost. Scoring is limited to aspects that can be determined from the text of the lyrics. The game cannot be played readily while engaging in other activities, because each player must both type and read a display to play.

Another example application is web offerings of Karaoke Machines that are typical of the current state-of-the-art for playing the voice-based game Karaoke. A Karaoke Machine is a separate device that plays accompaniment and displays the melody and/or lyrics for a player to sing. All participants playing with a Karaoke Machine must be present at the same time and same place together, with the Karaoke Machine, to participate or to play.

Techniques of the present invention overcome these limitations by using voice input for play. Using a microphone and voice connection such as that provided by Vivox, a player plays a round by singing a segment of a song. Players may play at their own computer or other device, and need not be located together to play together.

The player's input is audible to other participants; not all participants need be players. Player's input is scored as correct or not correct; if not correct, the player has lost the round. Play may also be scored in other ways. Scoring of a round may in part or in whole automatic, for example by the application incorporating techniques for speech or music recognition.

Process 100 and process 200 may also determine a skill rating or ranking for a player, which may be analogous to a chess player rating, based on the player's past play or other indicators.

Further aspects of the present techniques include real-time voice communications for social content during play, such as good-natured heckling, whispered hints, and cheering. Games that are very attractive to players for their social content, such as Karaoke, can now be played successfully via the Internet.

Further, software control of the voice communications permits the play and interactions during play to be enhanced in many ways, such as by keeping heckling audible without drowning out player inputs, enabling players on a team to confer by voice on a “private channel” separately from other participants, and rendering participants' voices at different spatial positions. The application software may modify the quality or tone of a participant's voice for social or other affect: this may also be done under the control of a participant. The techniques of the present invention permit players to participate from many kinds of devices, such as a computer, a hand-held communications or Internet device, or a traditional telephone. The techniques further permit players to participate in a hands-free fashion, or while engaging in another activity requiring the use of the hands or not permitting visual engagement with a display. As is readily appreciated, the techniques of the present invention can make the application inherently accessible, for example, permitting ready participation by players who are visually or physically impaired.

The techniques of the present invention permit the same play to be scored in more than one way. Scoring can be based not only on aspects of a player's play segment (such as the response time), but also on a second player's response to a first player's segment, or how well or poorly an opposing player was able to respond. Scoring may be based on the correctness or any aspect of the melody or the lyrics, and may be based on an algorithm for voting by other participants. Scoring may be based on combined or multiple criteria, for example accumulated or weighted response times, or maximum or minimum times.

Additional aspects of the techniques of the present invention also permit players to participate who are separated in space, as well as separated in time, including players who play irregularly or who play shifted in time. For example, a round of play or input may be played in non-real-time, in which a player listens to a recording of one or more previous turns, and makes their input for play within a time limit at the end of the recording being played back. A player may start play from a game or a portion of a game that has been played previously. A player may play a game or a practice game in which the player responds to one or more segments that are determined by software, or in an environment including simulated participants, or responding to play segments of their own prior play.

The preferred embodiment employs the Vivox Network for voice and text input and output during play. As is readily apparent, the techniques of the present invention may be employed with other networks, forms of communication, and devices, such as for analog communications, visual communications, and other forms of communication or user input and output. The Vivox Network enables user game clients to join channels in either text or voice mode (or both).

When a game client first “connects” to the Vivox Network, it is presented with a list of ongoing games, as well as the option to start a new game. Once the player using the client makes a selection, the client determines what the “Game ID” (a globally unique identifier) for the game is (if a new game is being created, the Vivox Network creates a new game ID, and adds a new database table). Each ongoing game ID has a unique voice and text channel associated with it. The client joins this “game channel” (in text mode). The “game channel” is used to communicate a “live feed” of information, using the Vivox Network's scalable text chat capability (primarily a list of just completed turns, along with the users' score), as well as a way to signal to the client when the “leader board” has been updated.

When a player is ready to take a next turn, the player presses a “Take turn” button in a user interface (UI). This makes the client join then into a private voice chat channel, from which the client and thereby the user interact with the Vivox Network's voice game IVR system. The IVR system is responsible for, among other things, playing back the previous turn to the player, prompting the player to start singing the player's song segment, creating a high quality recording of the player's turn in the network, and then providing the player the ability to review (and possibly rerecord) the turn. If the player is satisfied with her or his turn, the player may hit a “submit turn” button.

This causes the client to drop the player from the private voice channel (but stay connected to the game text channel—the Vivox Network supports multichannel mode for communications). The Vivox Network then updates the “turns” database table (by adding a new row to it), and also sends a control text message to the “game text channel”, instructing all connected clients to update their “leader board” and “live feed” windows (by reading the appropriate database tables). Finally, if there are no other users currently taking a turn, the network makes the just submitted turn the “current” one (and moves the last players' turn into the “concluded turns” database table).

In this way, the Vivox Network enables two (or more) players to actively participate in the game in an asynchronous but interactive manner (primarily through the use of the scalable and scriptable Vivox Network game IVR system).

Employing techniques of the present invention, scoring and play can also be extended by aspects of the techniques beyond a single game. For example, accurate tracking of the scoring makes it possible to show players' histories in “leader boards” of the best players, or to show a ranking of players in absolute or relative terms.

As is readily apparent, the techniques of the present invention are not limited to game play applications.

FIG. 4 shows an exemplary user interface (UI) 400 enabling a participant to start play for the game Antakshari. The participant can join a group of players at a particular “game table”, or start a new game table. The element “Discography” is a drop-down list for the participant to select a genre of music for songs during play: in this example, Bollywood, Indipop, or All Genres. The element “Game Type” is a selection element for the participant to select a type of game: in this example, Regular (a continuing sequence of turns) or Challenge. The element “Speed” is an element for the participant to select a mode of timeliness for the game: Fast (play is all “live” in real time), Medium—play may be interrupted and resumed after a delay, as described above, or Slow—play may be interrupted and scored without a game being completed.

FIG. 5 shows an exemplary UI 500 displaying current games matching selections made by the participant in the UI 400 of FIG. 4, and enabling the user to select the particular game to join. The participant may click on the “Join this Game” for the game the participant wishes to join: several games are shown in FIG. 5.

The UI selection for “Sidd's Bollywood Challenge” is exemplary: as seen in FIG. 5, this element shows the Speed and Genre characteristics of the game. Also shown is an indication for a general skill level or rating of the participants in the game. The player can select to see additional statistics or information about the game by clicking on the UI element at for further information. A short description related to the game or the players is also shown.

Similar elements for games are shown in FIG. 5 for “Meera's Indipop Game”, “Deepak's Bollywood Oldies”, and “Geeta's Free-for-all”. Games may be varied in details such as skill levels, difficulty, or scoring: for example, as indicated in the UI 500, “Geeta's Free-for-all” shows a game that is scored by lowest cumulative playing time, rather than first player or team that is unable to play a valid segment within a timeout period.

FIG. 6 shows an exemplary leadership board UI 600 for the game showing the names of a number of players, the number of turns of play each named player has made in a set of games, and the current score for that player: in this example, the score is based on the lowest average response time, as shown.

FIGS. 7, 8 and 9 show three steps in a player taking a turn. FIG. 7 shows an exemplary UI 700 for the player to start recording a response: the player clicks on “Record” to start recording a segment. FIG. 8 shows a response by the UI 800 to let the user start recording the segment after an audible indication consisting of a “Beep”. After the beep, the UI allows the player to complete the turn: the player can click on “stop” to stop recording of the player's response, “review” to review the segment, or “submit” to submit the segment as the player's turn. The player can also click on “cancel”, and not play that turn.

FIG. 9 shows the UI 900 after a user has clicked on “review”—the player's recorded segment is played back to the user. The other UI elements are active a shown: for example, the user can click on “record” to record a different turn segment, “top” to stop the review, or “submit” to submit the segment for that turn.

FIGS. 10 through 15 illustrate details of a presently-preferred exemplary implementation incorporating a number of the present techniques. The exemplary implementation employs a computer system programmed to implement the game, storage, a communications service accessed via a client program such as a browser, a server, and data storage accessible to the server. The exemplary implementation employs the Facebook application programming interface (API) and the Facebook Platform Hypertext Preprocessor PHP5 client software. The UI is displayed to the user using the browser and a network connection. The implementation further supports asynchronous play, and participants may join or rejoin a game at different turns.

FIG. 10 is an exemplary listing of database structures used to hold information of game play segments, including recordings.

FIG. 11 is an exemplary flow diagram illustrating form processing for a participant joining play.

FIG. 12 is an exemplary flow diagram illustrating details of presenting the principles of the game to a play joining a game for the first time.

FIG. 13 is an exemplary flow diagram illustrating processing for a participant playing a turn by recording a segment.

FIG. 14 is an exemplary flow diagram illustrating processing to validate and score a participant's segment for a turn of play.

FIG. 15 is an exemplary flow diagram illustrating processing displaying a participant's score for a turn and updating database structures, and the processing for posting an MP3 file contain a recording of a player's play.

The description and the figures are of course exemplary, and the techniques may be implemented in many other fashions or employing any suitable component, and further may be applied to other applications, including other games. Other forms of implementations and other applications of the techniques are readily apparent and understood from the descriptions and figures.

Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, specially designed ASICs (application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof. These various implementations can include implementation in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, coupled to receive data and instructions from, and to transmit data and instructions to, a storage system, at least one input device, and at least one output device.

These computer programs (also known as programs, software, software applications or code) include machine instructions for a programmable processor, and can be implemented in a high-level procedural and/or object-oriented programming language, and/or in assembly/machine language. As used herein, the terms “machine-readable medium” “computer-readable medium” refers to any computer program product, apparatus and/or device (e.g., magnetic discs, optical disks, memory, Programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term “machine-readable signal” refers to any signal used to provide machine instructions and/or data to a programmable processor.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to the user and a keyboard and a pointing device (e.g., a mouse or a trackball) by which the user can provide input to the computer. Other kinds of devices can be used to provide for interaction with a user as well; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback), and input from the user can be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a client computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include a local area network (“LAN”), a wide area network (“WAN”), and the Internet.

The computing system can include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.

The foregoing description does not represent an exhaustive list of all possible implementations consistent with this disclosure or of all possible variations of the implementations described. A number of implementations have been described. Nevertheless, it will be understood that various modifications may be made without departing from the spirit and scope of the systems, devices, methods and techniques described here. For example, various forms of the flows shown above may be used, with steps re-ordered, added, or removed. Accordingly, other implementations are within the scope of the following claims.

Claims

1. A method comprising:

in a network of interconnected computers, establishing a connection between the client and a server, the client comprising a web browser and an audio input/output device;

receiving a list of entertainment games over the connection from the server;

selecting an entertainment game from the list;

receiving a globally unique identifier from the server corresponding to the selected entertainment game, the globally unique identifier having an associated voice channel and text channel;

in response to a client request to initiate game play, sending audio to the server through the audio input/output device and receiving audio from the server through the audio/input device;

in response to a client request to terminate game play, terminating the associated voice channel; and

receiving text messages from the server over the associated text channel.

2. The method of claim 1 wherein the sent audio to the server through the audio input/output device enables game play.

3. The method of claim 1 wherein the audio is recorded by the server.

4. The method of claim 3 wherein the recorded audio is sent by the server upon a request from the client.

5. The method of claim 1 wherein the globally unique identifier has an associated database table to maintain game information.

6. The method of claim 2 wherein the server maintains a history of game play.

7. The method of claim 1 wherein audio from the client is generated from a first user.

8. The method of claim 1 wherein the globally unique identifier is received by a plurality of users and the server maintains a history of game interaction of the users.

9. A method comprising:

in a network of interconnected computers, establishing a connection between a client and a server, the client comprising a web browser and an audio input/output device;

sending a list of entertainment games over the connection to the client;

in response to a selected entertainment game received from the client, sending a globally unique identifier from the server corresponding to the selected entertainment game, the globally unique identifier having an associated voice channel and text channel;

receiving audio from the client over the voice channel and sending audio to the client over the voice channel;

in response to client request to terminate game play, terminating the voice channel; and

sending messages to the client over the text channel.

10. The method of claim 9 where the received audio from the client enables game play.

11. The method of claim 9 wherein the client audio is recorded.

12. The method of claim 11 wherein the recorded audio is sent to the client upon a client request.

13. The method of claim 9 wherein the globally unique identifier has an associated database table to maintain game information.

14. The method of claim 9 wherein the messages comprise live feeds of information pertaining to the game.

15. The method of claim 14 wherein the live feeds of information include one or more of completed game turns, game scores, leader board changes.

16. The method of claim 9 wherein the server maintains a history of game play.

17. The method of claim 9 wherein audio from the client is generated from a first user.

18. The method of claim 9 wherein the globally unique identifier is received by a plurality of users and the server maintains a history of game interaction of the users.

19. The method of claim 1 wherein the client request to initiate game play is not the same client used to complete subsequent portions of game play.