Voice print identification portal
Systems and methods providing for secure voice print authentication over a network are disclosed herein. During an enrollment stage, a client's voice is recorded and characteristics of the recording are used to create and store a voice print. When an enrolled client seeks access to secure information over a network, a sample voice recording is created. The sample voice recording is compared to at least one voice print. If a match is found, the client is authenticated and granted access to secure information. Systems and methods providing for a dual use voice analysis system are disclosed herein. Speech recognition is achieved by comparing characteristics of words spoken by a speaker to one or more templates of human language words. Speaker identification is achieved by comparing characteristics of a speaker's speech to one or more templates, or voice prints. The system is adapted to increase or decrease matching constraints depending on whether speaker identification or speaker recognition is desired.
The present invention claims priority to U.S. Provisional Patent Application No. 60/894,627, entitled “VOICE PRINT IDENTIFICATION PORTAL,” filed Mar. 13, 2007 which is hereby incorporated by reference.
FIELD OF THE INVENTIONThe present invention relates generally to system access control based on user identification by biometric acquisition and speech signal processing for word recognition. More particularly, the present invention relates to combining voice based biometric identification for securing various computer related devices and speech recognition for device control and automated entry of information.
BACKGROUNDThe field of processing voice signals for use within a computerized device has traditionally been split into two distinct fields, speaker identification and speech recognition. These two fields have historically required separate and uniquely designed and configured systems. These systems are often provided by different vendors
Speech recognition involves recognizing a human language word spoken by a speaker. In one example, speech recognition is utilized for computerized dictation, where a user speaks into a microphone and her words are recognized and entered into a document. Another example of speech recognition is controlling personal electronics, such as a cellular telephone or car stereo, through the use of verbal commands. Other applications for speech recognition include: command recognition, dictation, interactive voice response systems, automotive speech recognition, medical transcription, pronunciation teaching, automatic translation, and hands-free computing. Speech recognition is typically achieved through comparison characteristic qualities of spoken words, phrases, or sentences to one or more templates. A variety of algorithms are known in the art that allow qualification and/or comparison of speech to templates. These algorithms include: hidden Markov models, neural network-based systems, dynamic time warping based systems, frequency estimation, pattern matching algorithms, matrix representation, decision trees, and knowledge based systems. Some systems will employ a combination of these techniques to achieve higher accuracy rates.
Speaker identification involves the process of identifying or verifying the identity of a specific person based on unique qualities of human speech. Human speech is often referred to as a biometric identification mechanism similar to finger prints or retinal scans. Like fingerprints and retinal scans, every individual has a unique voice print that can be analyzed and matched against known voice prints. Like other biometric identification mechanisms, voice prints can be utilized for verification or identification.
Verification using a voice print is commonly referred to as voice authentication. Voice authentication is achieved in a similar manner to speech recognition: characteristic qualities of spoken words or phrases are compared to one or more templates. However, voice authentication is much more difficult to successfully achieve than speech recognition. First, speech recognition requires a less stringent match between the spoken word and a speech template. All that must be determined is what word was said, not who said that word based on a specific accent, pitch and tone. Second, speaker identification requires matching the speaker to a much larger number of possibilities, because one person must be identified out of many, not just what word they spoke. Whereas it may be acceptable to take up to several seconds to perform voice authentication, speech recognition must be done at a relatively fast pace in order for an interface to be reasonably useable.
Traditionally, the use of speech for identification purposes versus speech for recognition purposes has been very segmented. While speech authentication requires complex and demanding comparisons, speech recognition requires real-time performance in order to meet user needs. Due to these differing requirements, existing systems (including computer hardware, software, or both) have been limited to performing one of these two functions.
The use of speech to authenticate a user has a variety of advantages over other identification methods. First, like fingerprints or iris scans, every human being has an entirely unique speech pattern that can be quantifiably recognized using existing technology. Second, unlike fingerprints or iris scans, the input to a speaker identification system (the spoken word) may be different every time, even where the speaker is saying the same word. Therefore, unlike other methods of human authentication, speech authentication provides the additional advantage of an ability to prevent multiple uses of the same voice print.
The rise of the computer age has drastically changed the manner in which people interact with each other in both business and personal settings. Along with the rise of the use of technology to conduct everyday life, security concerns with the use of computers have risen dramatically due to identity theft. Identity theft typically occurs where personal information such as bank accounts, social security numbers, passwords, identification numbers . . . etc., or corporate information is accessible when transferred over networks such as the internet, or when personal information or corporate information is entered into a user interface. For typical internet transactions such as consumer purchases, bank account transfers . . . etc, the transaction involves both a business side (back-end) and a customer side (front-end). The customer typically uses a computer, or handheld device such as a Smartphone or Personal Digital Assistant (PDA) to communicate during the transaction. Typically, communications during internet transactions are made very secure by using high security protocols such as Transport Layer Security (TSL) or Secure Socket Layer (SSL). However, when a customer enters in information (before it is transferred) at the front-end side of the transaction, the information is highly vulnerable to theft. In fact, in most cases of identity theft, personal information is stolen from the front-end side of the transaction. Therefore, a need exists to provide an efficient, more secure means of protecting the identity of one who wishes to interact in a secure environment over networks such as the internet. More specifically, a need exists to provide a secure transaction environment in which personal or corporate information is not communicated to the customer front-end in an accessible or repeatable format.
SUMMARY OF THE INVENTIONThe invention described herein seeks to remedy the issues discussed above by providing a system and method of voice authentication. In one embodiment, a method of securely authenticating a client seeking access to secure information or services available through a network is disclosed herein. In an embodiment, the method includes an enrollment process. The enrollment process may include receiving, at a server, an enrollment request and a voice recording. The process further includes processing, at the server, the voice recording to determine identifying characteristics of the client's voice, and creating a voice print identification of the client and storing the voice print identification.
In an embodiment, the method also includes an authentication process. The authentication process includes receiving, at the server, a request for authentication of a client with an existing voice print. In one embodiment, the existing voice print was created according to the enrollment process discussed above. In one embodiment, the authentication process includes receiving a sample recording of the client's voice. In one embodiment, the process includes processing the sample recording. In one embodiment, the process includes comparing characteristics of the sample recording to at least one voice print identification. In one embodiment, the process includes determining, based at least in part on the comparing, that the client is authenticated. In one embodiment, the process includes communicating, over the network, an indication that the client is authenticated. In one embodiment, receiving, at the server, a sample recording of the client's voice is the only information received from the client that is used to determine that the client is authenticated.
In another embodiment, a method of securely authenticating a client seeking access to secure information available through a network is described here. In an embodiment, the method includes an enrollment process. In an embodiment, the enrollment process includes sending, to a server, an enrollment request. In an embodiment, the enrollment process includes the voice recording of a client. In an embodiment, the enrollment process includes sending, to a server, the voice recording. In an embodiment, the enrollment process includes receiving, from the server, an indication that a voice print for the client has been created and stored based on the voice recording.
In an embodiment, the method also includes an authentication process. In an embodiment, the authentication process includes sending, to the server, a request to authenticate the client. In an embodiment, the authentication process includes sending, to the server, a sample voice recording of the client. In an embodiment, the authentication process includes receiving, from the server, an indication that the client is authenticated. In an embodiment, the authentication process includes permitting the client access to secure information over the network based on the indication that the client is authenticated. In one embodiment sending, to the server, a sample voice recording of the client is the only information originating from the client that is used to authenticate the client.
In an embodiment, a system for securely authenticating a client seeking access to secure information available through a network is described herein. In an embodiment, the system includes a back-end computer system adapted to manage and control access to secure information. In an embodiment, the system includes a front-end interface, adapted to provide the client with access to the back-end computer system. In an embodiment, the system includes a voice analysis computer system, adapted to verify a client's identity based on a voice sample. In an embodiment, the front-end interface is adapted to provide the client with the ability to record a client voice sample and communicate the client's voice sample to the voice analysis computer system. In an embodiment, the voice analysis computer system is adapted to compare the received client's voice sample to at least one voice print and authenticate the client based at least in part on the comparison. In an embodiment, the voice analysis computer system is adapted to communicate an indication of authentication. In an embodiment, the sample voice recording of the client is the only information originating from the client that is used to authenticate the client.
In an embodiment, a method of operating a voice analysis system is described herein. In an embodiment, the method includes receiving, by a voice analysis system, at least one parameter indicating whether the system is to operate in a first mode or a second mode. In an embodiment, the method includes receiving, by the voice analysis system, a voice recording. In an embodiment, the method includes setting voice analysis constraints to a first level if the parameter indicates the first mode, or setting the voice analysis constraints to a second level if the parameter indicates the second mode. In an embodiment, the method includes comparing the voice recording to at least one template. In an embodiment, the comparison is based at least in part on the constraints. In an embodiment, the first mode indicates that the voice analysis system is to perform speaker identification. In an embodiment, the second mode indicates that the voice analysis system is to perform word recognition. In an embodiment, if the parameter indicates the first mode, an indication of authentication is provided. In an embodiment, if the parameter indicates the second mode, an indication of the textual value of the voice recording is provided.
The invention may be more completely understood in consideration of the following detailed description of various embodiments of the invention in connection with the accompanying drawings, in which:
While the invention is amenable to various modifications and alternative forms, specifics thereof have been shown by way of example in the drawings and will be described in detail. It should be understood, however, that the intention is not to limit the invention to the particular embodiments described. On the contrary, the intention is to cover all modifications, equivalents, and alternatives falling within the spirit and scope of the invention as defined by the appended claims.
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTSAccording to the example illustrated in
According to the embodiment illustrated in
When client 301 visits webpage 302, the client is offered the ability to, or required to, use voice authentication to access secure information. In various embodiments, client 301 is provided with means to create a sample voice recording. In various embodiments, the client is provided an interface through the webpage to record his/her voice. The recording (and possibly a user id associated with the service provider as discussed in reference to
In one embodiment, voice analysis computer system 303 communicates, using a secure connection, with back-end computer system 304 to determine whether the particular client 301 has permission to access particular content. In one embodiment, voice analysis computer system 303 has access to a client security key (and possibly security keys allowing access to back-end computer system 304 itself) that allows access to back-end computer system 304. According to this embodiment, voice analysis computer system 303 transmits the client security key to back-end computer system 304. In response, back-end computer system 304 may determine whether client 301 should be granted access, and communicates (using a secure connection) authorization of access to voice analysis computer system 303. Voice analysis computer system 303 may then allow access to secure content through webpage 302.
In another embodiment, voice analysis computer system 303 does not have access to a client security key to determine permission. Instead, voice analysis computer system 303 attempts to verify the identity of client 301, and, if successful, communicates success to back-end computer system 304. According to this embodiment, back-end computer system 304 determines whether client 301 is to be granted permission to access webpage 302, and back-end computer system 304 itself communicates and allows access to webpage 302.
In another embodiment, voice analysis computer system 303 verifies permission by reviewing client and business specific information stored on voice analysis computer system 303. According to this embodiment, voice analysis computer system 303 does not communicate security keys to back-end computer system 304 and receive authorization from back-end computer system 304. Instead, the entire authentication process is achieved in voice analysis computer system 303. When a client's identity and permission are verified, authorized access is communicated to webpage 302.
The various embodiments of client authentication illustrated in
In various other embodiments, client authentication illustrated in
System 801 may further include Java JSP application 803. Java JSP application 803 is adapted to run on voice analysis computer system 303. JSP application is further adapted to communicate with applet 802 to receive and transfer commands and information from applet 802. In one embodiment, JSP application 803 is adapted to receive a voice recording from applet 802, and process that voice recording. System 801 may further include one or more databases such as MySQL Database(s) 804. JSP application 803, among other applications, may be adapted to store and manage data in Databases 804.
In some embodiments, system 801 also includes Secure Web Based Administration Pages 805. In various embodiments, administration pages 805 provide an interface to create, modify, and configure client users.
In some embodiments, system 801 further includes Web Administration and Company administration JSP applications 806. In various embodiments, Web Administration and Company Administration JSP applications 806 provide a web-based interface to configure companies, including companies access to system 801.
In one embodiment, applet 802 is adapted to run on front-end interface 101, while JSP application 803 is adapted to run on voice analysis computer system 303. In an alternative embodiment, both applet 802 and JSP application 803 are adapted to run on front-end interface 101. In yet another alternative embodiment, JSP application 803 is adapted to run on back-end computer system 304.
At 1003, applet 802 is adapted to capture a client's voice. Voice capture may include: 1) providing a user interface to allow the client to record voice, 2) providing instructions to the client, 3) controlling front-end interface 101 in order to record voice (including measuring background noise and setting detection thresholds), 4) verifying that the resultant recording meets requirements for further processing, and 5) preparing the recording for communication.
In one embodiment, the voice recording is communicated using a TCP protocol. At 1004, after the user's voice is recorded, applet 802 sends the voice recording to JSP application 803 for processing, and verifies that the communication was successful. In one embodiment, applet 802 sends the voice recording over a secure connection such as an SSL connection. In one embodiment, JSP application 803 runs on voice analysis computer system 303.
At 1005, when JSP application 803 has completed processing the voice recording, applet 802 processes return values from JSP application 803. Applet 802 processes the return values based on what function was desired at 1002. Also at 1005, applet 802 provides the user with a results display. In one embodiment, if authentication or enrollment were requested, applet 802 provides the user with an indication that authentication was successful or unsuccessful. In another embodiment, where speech recognition was requested, applet 802 provides the user with a textual indication of the words that were spoken. In a similar embodiment, applet 802 provides the client with a verbal indication of words spoken by the client, or applet 802 may also act in response to words spoken by the client. Once the results have been provided to the user, applet 802 returns to 1002 and allows the client to re-enter parameters.
At 1102, JSP application 803 awaits a request from applet 802. When a request is received, JSP application processes the request. At 1103, JSP application 803, based on the request from applet 802, determines what function is desired of JSP application 803. JSP application 803 determines whether applet 802 requested: enrollment of a new user, re-enrollment of an existing user, authentication of an enrolled user, or speech recognition.
At 1104, and 1105, where enrollment of a new user or re-enrollment of an existing user is requested by applet 802, JSP application validates the user ID of the user, processes the voice recording, and updates an enrollment template and stores the template in databases 804. At 1110, data is transferred back to applet 803.
At 1106 and 1107, where authentication of an existing user is requested by applet 802, the user's user id is validated, the user's voice recording is processed, and the voice recording is compared to existing voice templates to determine whether the client is authenticated. If the client is authenticated, security tokens are prepared for transmission to applet 802. At 1110, security tokens and other data are communicated to applet 802.
At 1109, where speech recognition is requested, JSP application 803 is adapted to modify (lessen) voice recognition constraints such that JSP application 803 is only adapted to verify a particular word, not a particular client's voice. At 1108, the voice recording is processed and compared to stored voice commands. If a match is found, an identification of a voice command is prepared for communication to applet 802. At 1110, the identification of a voice command and other data are communicated to applet 802.
In various embodiments, alternatives are provided for a client who does not have access to a front-end interface 101 that is capable of recording voice. In one embodiment, a client is provided the ability to select a “Call In” button. When the “Call In” button has been selected, the client is provided an ordinary telephone number. The user may call the number in order to record his/her voice.
In another embodiment, the client does not have any access to a front-end interface 101 or the internet. According to this embodiment, a client is provided with the ability to operate the entire system through ordinary telephone service. The client may communicate with and request system 801 functions through voice commands or though dialing numbers on a telephone keypad. In one embodiment, this telephone only system is implemented using telephony systems such as IPPC or IPPC express offered by Cisco Systems, Inc.
Finally, while the present invention has been described with reference to certain embodiments, those skilled in the art should appreciate that they can readily use the disclosed conception and specific embodiments as a basis for designing or modifying other structures for carrying out the same purposes of the present invention without departing from the spirit and scope of the invention as defined by the appended claims.
Claims
1. A method of securely authenticating a client seeking access to secure information available through a network, comprising:
- receiving, at a server, an enrollment request;
- receiving, at the server, a voice recording;
- processing, at the server, the voice recording to determine identifying characteristics of the speaker's voice;
- creating, based on the identifying characteristics, a voice print identification of the speaker;
- storing, at the server, the voice print identification;
- receiving, at the server, a request for authentication of a client with an existing voice print identification;
- receiving, at the server, a sample recording of the client's voice;
- processing, at the server, the sample recording of the client's voice;
- comparing characteristics of the sample recording to at least one voice print identification;
- determining, based at least in part on the comparing, that the client is authenticated; and
- communicating, over the network, an indication that the client is authenticated.
2. The method of claim 1, wherein the communicating an indication that the client is authenticated comprises:
- communicating only non-critical information.
3. The method of claim 1, wherein the communicating an indication that the client is authenticated comprises:
- communicating only a positive or negative indication that the client is authenticated.
4. The method of claim 1, wherein receiving, at the server, a sample recording of the client's voice further comprises:
- receiving an indication of the client's identity.
5. The method of claim 1, wherein determining, based at least in part on the comparing, that the client is authenticated further comprises:
- comparing the sample recording of the client's voice to at least one stored indication of characteristics of sample recordings that were previously used to authenticate the client; and
- if the sample recording has been previously used to authenticate, not providing an indication that the client is authenticated.
6. The method of claim 1, wherein receiving, at the server, a sample recording of the client's voice is the only information received from the client used to authenticate the client.
7. A method of securely authenticating a client who seeks access to secure information available through a network, comprising:
- sending, to a server, an enrollment request;
- recording the voice of a client;
- sending, to a server, the voice recording;
- receiving, from the server, an indication that a voice print for the client has been created and stored based on the voice recording;
- sending, to the server, a request to authenticate the client;
- sending, to the server, a sample voice recording of the client;
- receiving, from the server, an indication that the client is authenticated; and
- permitting the client access to secure information over the network based on the indication that the client is authenticated.
8. The method of claim 7, wherein the receiving an indication that the client is authenticated comprises:
- receiving only non-critical information.
9. The method of claim 7, wherein the receiving an indication that the client is authenticated comprises:
- receiving only a positive or negative indication that the client is authenticated.
10. The method of claim 7, wherein sending, to the server, a sample voice recording of the client further comprises:
- sending an indication of the client's identity.
11. The method of claim 7, wherein sending, to the server, a sample voice recording of the client is the only information originating from the client that is used to authenticate the client.
12. A system for securely authenticating a client seeking access to secure information available through a network, comprising:
- a back-end computer system adapted to manage and control access to secure information;
- a front-end interface, adapted to provide the client with access to the back-end computer system;
- a voice analysis computer system, adapted to verify a client's identity based on a voice sample;
- wherein the front-end interface is adapted to provide the client with the ability to record a client voice sample and communicate the client's voice sample to the voice analysis computer system;
- wherein the voice analysis computer system is adapted to compare the received client's voice sample to at least one voice print and authenticate the client based at least in part on the comparison; and
- wherein the voice analysis computer system is adapted to communicates an indication of authentication.
13. The system of claim 12, wherein the voice analysis computer system is adapted to communicate the indication of authentication to the front-end interface, and wherein the front-end interface is adapted to allow the client access to secure information.
14. The system of claim 12, wherein the voice analysis computer system is adapted to communicate the indication of authentication to the back-end computer system, and wherein the back-end computer system is adapted to allow the client access to secure information.
15. The system of claim 12, wherein the voice analysis computer system is adapted to allow the client access to secure information.
16. The system of claim 12, wherein the front-end interface is adapted to provide the client with the ability to record a client voice sample and communicate the clients voice sample to the voice analysis computer system along with an indication of the client's identity.
17. The system of claim 12, wherein the sample voice recording of the client is the only information originating from the client that is used to authenticate the client.
18. A method of operating a voice analysis system, comprising:
- receiving, by a voice analysis system, at least one parameter indicating whether the system is to operate in a first mode or a second mode;
- receiving, by the voice analysis system, a voice recording;
- setting voice analysis constraints to a first level if the parameter indicates the first mode, or setting the voice analysis constraints to a second level if the parameter indicates the second mode; and
- comparing the voice recording to at least one template, wherein the comparison is based at least in part on the constraints, wherein the first mode indicates that the voice analysis system is to perform speaker identification, wherein the second mode indicates that the voice analysis system is to perform word recognition; and
- wherein if the parameter indicates the first mode, providing an indication of authentication, and wherein the if the parameter indicates the second mode, providing an indication of the textual value of the voice recording.
Type: Application
Filed: Mar 13, 2008
Publication Date: Oct 16, 2008
Inventor: Noel J. Grover (Minneapolis, MN)
Application Number: 12/075,799
International Classification: G06F 21/00 (20060101); G10L 11/00 (20060101); G10L 15/26 (20060101); G10L 17/00 (20060101);