Speech based personal information manager

Info

Publication number: 20040117188
Type: Application
Filed: Jul 2, 2003
Publication Date: Jun 17, 2004
Inventors: Daniel Kiecza (Cambridge, MA), Francis Kubala (Boston, MA)
Application Number: 10610699

Abstract

A personal information manager (PIM) [100] stores user personal data. The PIM may be a personal computer like box located in the home of the user. The PIM includes an audio interface [108] and a visual interface [110]. Users may equivalently interact and manage their data through the audio interface or the visual interface. Accordingly, the user has full access to the PIM whenever the user can establish a voice or data connection.

Description

Description

RELATED APPLICATIONS

[0001] This application claims priority under 35 U.S.C. § 119 based on U.S. Provisional Application Nos. 60/394,064 and 60/394,082 filed Jul. 3, 2002 and Provisional Application No. 60/419,214 filed Oct. 17, 2002, the disclosures of which are incorporated herein by reference.

BACKGROUND OF THE INVENTION

[0002] 1. Field of the Invention

[0003] The present invention relates generally to personal information management systems and, more particularly, to a personal information manager that includes speech recognition capabilities.

[0004] 2. Description of Related Art

[0005] Personal information management devices assist users in organizing personal information. Typically, conventional personal information management devices are portable and relatively small computing systems that provide a number of information management functions. For example, the personal information management devices may store user telephone contact numbers, to-do lists, and calendar/scheduling information. In some cases, these personal information management devices may also function as communication devices by, for example, providing the ability to transmit and receive email.

[0006] Conventionally, users enter information into personal information management devices through a text-based input device, such as a keyboard or a stylus, and view the contents of the personal information management devices through a visual display associated with the device, such as an LCD.

[0007] One reason for the popularity of conventional personal information management devices is that the small size of these devices allows users to easily carry the devices with them during the course of their day. Because the devices are always accessible, users can interact with the devices whenever the need arises.

[0008] Despite the portability and general accessibility of conventional personal information management devices, these devices still have a number of limitations. These limitations include a limited interface for entering and retrieving data. Further, although conventional personal information management devices are portable, the user still has to remember to carry the device with him.

[0009] Accordingly, there is a need in the art for an improved personal information management device. In particular, it would be desirable for a personal information management device to have an improved user interface and be always accessible.

SUMMARY OF THE INVENTION

[0010] Systems and methods consistent with the present invention provide improved personal information management services.

[0011] One aspect of the invention is directed to an information management device that includes a voice interface, a database, and a dialog manager. The database stores data for a user that includes at least voicemail data and email data. The dialog manager provides access to the voicemail data and the email data when the user connects to the information management device through the voice interface.

[0012] A second aspect of the invention is directed to a method of managing personal information. The method includes storing the personal information in a database associated with a personal information management device; establishing a connection to the personal information management device via a voice interface; and receiving spoken commands over the voice interface, the spoken commands initiating retrieval of voicemail, email, and personal organization information.

[0013] Another aspect of the invention is directed to a system. The system includes a database configured to store personal data of a user and a voice interface configured to provide access to the personal data via a voice connection. Further, the system includes a network interface configured to provide access to the personal data via a data connection that provides visual information to the user. The system further includes a control processing component configured to receive spoken commands from the voice interface and respond to the spoken commands. The system responds to the spoken commands by providing the personal data to the user as audio data. The system receives logical commands from the network interface and responds to the logical commands by providing the personal data to the user as visual data.

BRIEF DESCRIPTION OF THE DRAWINGS

[0014] The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate the invention and, together with the description, explain the invention. In the drawings,

[0015] FIG. 1 is a diagram illustrating a personal information manager (PIM) implemented in a manner consistent with the present invention;

[0016] FIG. 2 is a diagram illustrating a speech recognition component in the PIM of FIG. 1;

[0017] FIG. 3 is an exemplary diagram illustrating portions of the speech recognition component in additional detail;

[0018] FIG. 4 is a diagram illustrating contents of a database used by the PIM; and

[0019] FIG. 5 is a flow chart illustrating exemplary operation of the PIM in accessing personal data stored in a database through a voice interface.

DETAILED DESCRIPTION

[0020] The following detailed description of the invention refers to the accompanying drawings. The same reference numbers in different drawings may identify the same or similar elements. Also, the following detailed description does not limit the invention. Instead, the scope of the invention is defined by the appended claims and equivalents.

[0021] A personal information manager (PIM), as described below, stores a variety of information for a user. The PIM is remotely accessible through both a voice port and a network data port. The PIM may permanently reside at a single location, such as the user's home, but is remotely accessible by the user whenever the user has access to a voice or data line. Information in the PIM can be accessed and/or modified through either the voice or data ports.

[0022] FIG. 1 is a diagram illustrating a PIM 100 implemented in a manner consistent with the present invention. PIM 100 includes database 102, control processing component 104, dialog manager 106, voice interface 108, network interface 110, and local interface 112.

[0023] In one implementation, PIM 100 is designed to be a personal computer-like box that resides in the home of the user. Unlike some conventional personal information management devices, which users carry with them, PIM 100 does not necessarily need to be designed to be portable. Instead, PIM 100 includes a number of interfaces 108-112 that allow users to connect and interact with PIM 100 from anywhere the user has a telephone or a network connection.

[0024] Voice interface 108 connects PIM 100 to the user via a voice line. Voice interface 108 may be, for example, a standard telephone connection that connects the user to a standard public switched telephone network (PSTN) 120. By dialing the number assigned to voice interface 108, users can verbally give instructions to and receive information from PIM 100.

[0025] Network interface 110 connects PIM 100 to network 121. Network 121 may be a wide area network, such as the Internet. Users may connect to PIM 100 via an HTTP (hyper-text transfer protocol) web-based connection. In this situation, PIM 100 may act as a web server in receiving and responding to user commands.

[0026] Through voice interface 108 and network interface 110, users can connect to PIM 100 whenever they have access to either a voice line, such as via a cell or wireless phone, or a network connection, such as via the Internet.

[0027] In one implementation, PIM 100 may also include local interface 112. Local interface 112 may include connections for wired or wireless devices, such as a keyboard and/or a display device (not shown). Through local interface 112, users may interact directly with PIM 100.

[0028] As previously mentioned, PIM 100 may interact with a user through a voice interface, such as through a telephone line. Dialog manager 106 handles the speech recognition and text-to-speech synthesis functions necessary for this interaction. Dialog manager 106 includes speech recognition component 115, command interface 116, and text-to-speech synthesis component 117. Speech recognition component 115 processes incoming audio signals and converts the audio signals into their textual transcriptions. Command interface 116 analyzes the transcriptions produced by speech recognition component 115 to, for example, spot user commands in the transcription. An example of a user command may include the command “check email,” which may cause PIM 100 to audibly alert the user if he has new email. Text-to-speech synthesis component 117 outputs information stored in a textual format as one or more audio signals appropriate for transmission through voice interface 108. For example, textual email may be converted to speech and sent to a user over voice interface 108.

[0029] FIG. 2 is a diagram illustrating speech recognition component 115 of dialog manager 106 in additional detail. As shown in FIG. 2, speech recognition component 115 may include training system 210, statistical model 220, and recognition system 230. Training system 210 may include logic that estimates parameters of statistical model 220 from a corpus of training data. The training data may initially include human-produced data. For example, the training data might include one hundred hours of audio data that has been meticulously and accurately transcribed by a human. Training system 210 may use the training data to generate parameters for statistical model 220 that recognition system 230 may later use to recognize future data that it receives (i.e., new audio that it has not heard before).

[0030] Statistical model 220 may include acoustic models and language models. The acoustic models may describe the time-varying evolution of feature vectors for each sound or phoneme. The acoustic models may employ continuous Hidden Markov Models (HMMs) to model each of the phonemes in the various phonetic contexts.

[0031] The language models may include n-gram language models, where the probability of each word is a function of the previous word (for a bi-gram language model) and the previous two words (for a tri-gram language model). Typically, the higher the order of the language model, the higher the recognition accuracy at the cost of slower recognition speeds.

[0032] Recognition system 230 may use statistical model 220 to process input audio data. FIG. 3 is an exemplary diagram of recognition system 230 according to an implementation consistent with the principles of the invention. Recognition system 230 may include audio classification logic 310, speech recognition logic 320, speaker identification logic 340, name spotting logic 350, and topic classification logic 360. Audio classification logic 310 may distinguish speech from silence, noise, and other audio signals in input audio data. For example, audio classification logic 310 may analyze five second windows of the input data to determine whether it contains speech.

[0033] Speech recognition logic 320 may perform continuous speech recognition to recognize the words spoken in the segments that it receives from audio classification logic 310. Speech recognition logic 320 may generate a transcription of the speech using statistical model 220. Speaker identification logic 340 may identify the speaker by comparing acoustic features of the speaker's voice with a set of pre-stored acoustic features. Speaker identification may be useful for security verification.

[0034] Name spotting logic 350 may locate the names of people, places, and organizations in the transcription. Name spotting logic 350 may extract the names and store them in a database. Topic classification logic 360 may assign topics to the transcription. Each of the words in the transcription may contribute differently to each of the topics assigned to the transcription. Topic classification logic 360 may generate a rank-ordered list of all possible topics and corresponding scores for the transcription. Topics identified for a speaker segment may be stored with the segment and later used when performing searches over an archive of spoken segments.

[0035] Referring back to FIG. 1, PIM 100 includes control processing component 104 and database 102. Database 102 stores user information. FIG. 4 is a diagram illustrating potential contents of database 102. Database 102 may include user created to-do lists 401, user memos 402, contact lists 403, user calendar information 404, voicemail 405-406, email 407, and other documents or files 408. To-do lists 401 may include lists of “to-do” action items created by the user. Memos 402 may include brief written documents created by the user. Contact lists 403 may include contact information, such as names, telephone numbers, and email addresses, for a number of people.

[0036] Voicemails received through voice interface 108 may be stored as digitized audio 405. Similarly, voicemail transcriptions 406 may include a rich transcription of the received voicemails, as transcribed by speech recognition component 115. Email received via network interface 110 may also be stored in database 102.

[0037] Items 401-408, stored in database 102, are exemplary. In general, database 102 functions as a central storage location for a user's personal information. Accordingly, database 102 could be used to store other appropriate user information, such as any miscellaneous files 408 (e.g., documents, audio files, video files, etc.) that the user wishes to store.

[0038] Referring again to FIG. 1, control processing component 104 may manage the information in database 102. Control processing component 104 may, for example, act as a web server and process requests, such as HTTP requests, received via network interface 110. The requests may relate to the display or manipulation of the user data in database 102. Control processing component 104 may also interact with dialog manager 106 to transfer data to or receive data from dialog manager 106. Thus, for example, voice commands received from a user may be identified by speech recognition component 115 and then transmitted from command interface 116 to control processing component 104.

[0039] FIG. 5 is a flow chart illustrating exemplary operation of PIM 100 in accessing personal data stored in database 102 through voice interface 108.

[0040] A user begins by placing a call to voice interface 108 of PIM 100 (Act 501). The call may be placed from any conventional phone. If voice interface 108 is connected to a telephone line that shares other household functions, such as an answering machine and/or a general phone line, PIM 100 may monitor the connection to determine if the caller intends to interact with PIM 100. The caller may, for example, press a predetermined number sequence on the phone or speak a predetermined command to indicate that the caller wishes to interact with PIM 100.

[0041] PIM 100 may check to determine whether the caller is authorized to use PIM 100 (Act 502). This authorization may take the form of a predetermined password entered on the phone keypad, by a predetermined spoken password, or through speaker identification performed by speaker identification component 340. When using speaker identification component 340 to authorize a user, speaker identification component 340 may compare pre-stored acoustic features of the user's voice to acoustic features derived from the active voice connection. If the acoustic features match, the user is authorized.

[0042] Once authorized, the user may interact with PIM 100 by giving spoken audio commands to PIM 100 (Act 503). The spoken commands may relate to any of the data in database 102. The spoken commands are processed by recognition system 230 of speech recognition component 115 to generate a logical representation of the command, which command interface 116 may transmit to control processing component 104.

[0043] The commands include a command from a set of commands that relate to voicemail data 405 and 406, based on which PIM 100 assists the user in retrieving voice mail (Act 504). The user may, for example, control the playback of voice messages that other parties left for the user. Thus, via voice commands, the user may listen to, delete, and/or file voice messages for future reference. In some embodiments, the user may search or otherwise interact with voicemail transcriptions 406. For example, after listening to a voice message from “Bob,” the user may instruct PIM 100 to search archived voice messages for other messages from Bob, and then playback a particular one of those messages.

[0044] Another set of commands may relate to email. Through these commands, PIM 100 allows the user to retrieve, compose, and/or manage email account(s) (Act 505). The user may, for example, command PIM 100 to playback, through text-to-speech synthesis component 117, the “subject” and “from” fields of all newly received emails. Emails that the user is particularly interested in may be selected for playback of the “body” of the email. In addition to merely retrieving emails, in some implementations, the user may compose emails. In this situation, the user may speak the name of the intended recipient, the subject line of the email, and the body of the email. Speech recognition component 115 transcribes the user's spoken email. Alternatively, the body of the email could be transmitted as an acoustic file (i.e., send what the person said as an acoustic file attachment). Control processing component 104 may then look up the address corresponding to the intended recipient in database 102 and prepare the email with the transcribed speech.

[0045] In additional to email and voicemail control, PIM 100 may respond to commands to manage personal information of the user, such as to-do lists 401, memos 402, contact lists 403, and/or calendar information 404 (Act 506). A user may, for example, add action items to to-do list 401 or review action items already in the to-do list. The user may similarly edit, review, and manage memos 402, contact lists 403, calendar information 404. As with voicemail and email management, speech recognition component 115 and text-to-speech synthesis component 117 enable recognition of user voice commands, provide dictation of user speech, and convert data to audio for playback to the user.

[0046] In addition to accessing personal data in database 102 through voice interface 108, the user may access his personal data via network interface 110. In one implementation, PIM 100 presents a web browser based interface to the user. Accordingly, in the situation in which network 121 is the Internet, the user may connect to PIM 100 through any computing device that contains a suitable browser program and is connected to the Internet. Control processing component 104 may act as a web server that provides access to database 102.

[0047] In general, when interacting with a user through network interface 110, control processing component 104 allows the user to perform all of the functions that are available through voice interface 108. Thus, the user may access and manage voicemail and email accounts, as well as access and manage to-do lists 401, memos 402, contact lists 403, and calendar information 404. Web servers that provide access to email, to-do lists, memos, contact lists, and calendars are known in the art and will not be discussed further herein.

[0048] When providing access to voicemail functions through network interface 110, control processing component 104 may transmit transcriptions of the received voicemails to the user. Alternatively, control processing component 104 may stream audio corresponding to the voicemail over network 121.

[0049] Local interface 112 provides direct access to PIM 100. Through local interface 112 a user may connect, for example, a keyboard, a mouse, and a monitor. Control processing component 104 may provide functionality through local interface 102 that is similar to that provided through network interface 110. In some implementations, a user connect a microphone and speakers through local interface 112. In this situation, the microphone and speaker lines may be coupled to dialog manager 106 may provide functionality through local interface 102 that is similar to that provided through voice interface 108.

[0050] In summary, whether accessing PIM 100 through voice interface 108, network interface 110, or local interface 112, PIM 100 gives the user full access to the personal data stored in database 102. Users may experience equivalent personal data access features when accessing their data through audio or video interfaces.

[0051] Because PIM 100 stores the user's personal data at a single location that can be accessed on-demand from virtually anywhere, PIM 100 can provide a number of additional data management services. PIM 100, may, for example, act as a centralized storage for user passwords and login names. The user can simply call PIM 100 and request that PIM 100 log onto a web site and retrieve user specified information, which PIM 100 may then return to the user. Thus, in addition to managing the user's personal information, PIM 100 can retrieve, store, and provide on-demand access to other information in which the user is interested.

Conclusion

[0052] Systems and methods consistent with principles of the invention provide a convenient and easy obtainable access to personal data. Users can manage their personal data through traditional visual interfaces, or equivalently, through an audio interface. In this way, the user's personal data remains totally under the control of the user at all times.

[0053] The foregoing description of preferred embodiments of the invention provides illustration and description, but is not intended to be exhaustive or to limit the invention to the precise form disclosed. Modifications and variations are possible in light of the above teachings or may be acquired from practice of the invention. For example, while a series of acts has been presented with respect to FIG. 5, the order of the acts may be different in other implementations consistent with the present invention.

[0054] Certain portions of the invention have been described as software that performs one or more functions. The software may more generally be implemented as any type of logic. This logic may include hardware, such as an application specific integrated circuit or a field programmable gate array, software, or a combination of hardware and software.

[0055] No element, act, or instruction used in the description of the present application should be construed as critical or essential to the invention unless explicitly described as such. Also, as used herein, the article “a” is intended to include one or more items. Where only one item is intended, the term “one” or similar language is used.

[0056] The scope of the invention is defined by the claims and their equivalents.

Claims

1. An information management device comprising:

a voice interface;

a database configured to store data for a user of the information management device, the data including at least voicemail data and email data; and

a dialog manager component configured to provide access to the voicemail data and the email data when the user connects to the information management device through the voice interface.

2. The information management device of claim 1, wherein the dialog manager component further comprises:

a speech recognition component configured to analyze verbal commands given by the user;

wherein the information management device accesses the database based on the analyzed verbal commands.

3. The information management device of claim 1, wherein the dialog manager component further comprises:

a text-to-speech component that transmits the email data to the user as audio information synthesized from text of the email data.

4. The information management device of claim 1, further comprising:

a network interface configured to provide access to the database through a visual user interface.

5. The information management device of claim 4, wherein the network interface and the voice interface provide equivalent access to the database.

6. The information management device of claim 1, wherein the data in the database includes additional data related to at least one of personal to-do lists, contact information, and calendar information.

7. The information management device of claim 6, further comprising:

a network interface configured to provide access to the database through a visual user interface;

wherein the additional data is accessible through the voice interface and the network interface.

8. The information management device of claim 1, wherein the information management device is a personal-computer-like device located at a residence of the user.

9. The information management device of claim 1, wherein the access provided by the dialog manager component includes reviewing and deleting the voicemail data and the email data.

10. A device comprising:

means for receiving and transmitting audio data;

means for storing personal data of a user, the personal data including at least voicemail data and email data;

means for providing access to the voicemail data and the email data to the user via an audio interface implemented via the means for receiving and transmitting; and

means for creating email data based on speech received through the means for receiving and transmitting.

11. The device of claim 10, further comprising:

means for recognizing commands spoken by the user; and

means for accessing the voicemail data and the email data based on the commands spoken by the user.

12. A method of managing personal information, comprising:

storing the personal information in a database associated with a personal information management device;

establishing a connection to the personal information management device via a voice interface; and

receiving spoken commands over the voice interface, the spoken commands initiating retrieval of voicemail, email, and personal organization information.

13. The method of claim 12, wherein the personal organization information includes at least one of personal to-do lists, contact information, and calendar information.

14. The method of claim 12, wherein the spoken commands additionally include commands to create an email message.

15. The method of claim 12, wherein establishing the connection includes authorizing a user based on an identification of acoustic properties of voice information of the user.

16. The method of claim 12, wherein the personal information includes passwords of a user.

17. A device for managing personal information, comprising:

means for storing the personal information;

means for accepting a connection initiated over a voice interface; and

means for receiving spoken commands over the voice interface, the spoken commands initiating retrieval of voicemail, email, and personal organization information.

18. The device of claim 17, wherein the personal organization information includes at least one of personal to-do lists, contact information, and calendar information.

19. The device of claim 17, further comprising:

means for authorizing the connection based on an identification of acoustic properties of voice information of a user.

20. A system comprising:

a database configured to store personal data of a user;

a voice interface configured to provide access to the personal data via a voice connection;

a network interface configured to provide access to the personal data via a data connection that provides visual information to the user; and

a control processing component configured to receive spoken commands from the voice interface and respond to the spoken commands by providing the personal data to the user as audio data, the control processing component being further configured to receive logical commands from the network interface and to respond to the logical commands by providing the personal data to the user as visual data.

21. The system of claim 20, further comprising:

a dialog manager component configured to analyze the spoken commands and generate logical commands based on the spoken commands.

22. The system of claim 21, wherein the dialog manager component further comprises:

a speech recognition component configured to perform speech recognition functions on the spoken commands; and

a text-to-speech component configured to convert textual information from the database to the audio data.

23. The system of claim 22, wherein the speech recognition component further includes:

speaker identification logic configured to identify the user as a speaker of the spoken commands by comparing acoustic features of the voice of the user to a set of pre-stored acoustic features.

24. The system of claim 23, wherein the speech recognition component further comprises:

name spotting logic configured to locate the names of at least one of people, places, and organizations in speech spoken by the user; and

topic classification logic configured to assign topics to the speech spoken by the user.

25. The system of claim 20, wherein the personal data in the database includes data related to at least one of personal to-do lists, contact information, and calendar information.

26. The system of claim 20, wherein the personal data in the database includes data related to voicemail and email.