Speaker Identification Or Verification (epo) Patents (Class 704/E17.001)
- Preprocessing operations, e.g., segment selection, etc., pattern representation or modeling, e.g., based on linear discriminant analysis (LDA), principal components, etc.; feature selection or extraction (EPO) (Class 704/E17.005)
- Training, model building, enrollment (EPO) (Class 704/E17.006)
- Decision making techniques, pattern matching strategies (EPO) (Class 704/E17.007)
- Hidden Markov Models (HMMs) (EPO) (Class 704/E17.012)
- Artificial neural networks, connectionist approaches (EPO) (Class 704/E17.013)
- Pattern transformations and operations aimed at increasing system robustness, e.g., against channel noise, different working conditions, etc. (EPO) (Class 704/E17.014)
- Interactive procedures, man-machine interface (EPO) (Class 704/E17.015)
-
Publication number: 20120278077Abstract: Systems and methods for detecting people or speakers in an automated fashion are disclosed. A pool of features including more than one type of input (like audio input and video input) may be identified and used with a learning algorithm to generate a classifier that identifies people or speakers. The resulting classifier may be evaluated to detect people or speakers.Type: ApplicationFiled: July 11, 2012Publication date: November 1, 2012Applicant: MICROSOFT CORPORATIONInventors: Cha Zhang, Paul A. Viola, Pei Yin, Ross G. Cutler, Xinding Sun, Yong Rui
-
Publication number: 20120259635Abstract: A system for the storing of client information in an independent repository is disclosed. Client data may be uploaded by client or those authorized by client or collected and stored by the repository. Data about the client file such as, for example, the time of upload and modifications are stored in a metadata file associated with the client file.Type: ApplicationFiled: April 5, 2012Publication date: October 11, 2012Inventors: Gregory J. Ekchian, Jack A. Ekchian
-
Publication number: 20120253811Abstract: A method for identifying a plurality of speakers in audio data and for decoding the speech spoken by said speakers; the method comprising: receiving speech; dividing the speech into segments as it is received; processing the received speech segment by segment in the order received to identify the speaker and to decode the speech, processing comprising: performing primary decoding of the segment using an acoustic model and a language model; obtaining segment parameters indicating the differences between the speaker of the segment and a base speaker during the primary decoding; comparing the segment parameters with a plurality of stored speaker profiles to determine the identity of the speaker, and selecting a speaker profile for said speaker; updating the selected speaker profile; performing a further decoding of the segment using a speaker independent acoustic model, adapted using the updated speaker profile; outputting the decoded speech for the identified speaker, wherein the speaker profiles are updType: ApplicationFiled: August 23, 2011Publication date: October 4, 2012Applicant: Kabushiki Kaisha ToshibaInventors: Catherine BRESLIN, Mark John Francis Gales, Kean Kheong Chin, Katherine Mary Knill
-
Publication number: 20120253808Abstract: According to an embodiment, a voice recognition device includes a voice inputting unit, a voice recognition processing unit, a vibration movement pattern model holding unit, and a vibration movement unit. The voice recognition processing unit performs voice recognition processing using a digital signal output from the voice inputting unit to output a voice recognition result and outputs voice reliability of the received voice signal. The vibration movement pattern model holding unit stores models prepared according to a number of patterns of the voice reliability output from the voice recognition processing unit and holds vibration movements corresponding to the models. The vibration movement unit detects whether or not the voice reliability output from the voice recognition processing unit matches any one of the models in the vibration movement pattern model holding unit and performs vibration movement predetermined for a matched model.Type: ApplicationFiled: October 17, 2011Publication date: October 4, 2012Inventors: Motonobu Sugiura, Hiroshi Fujimura
-
Publication number: 20120239399Abstract: Disclosed is a voice recognition device which creates a recognition dictionary (statically-created dictionary) in advance for a vocabulary having words to be recognized whose number is equal to or larger than a threshold, and creates a recognition dictionary (dynamically-created dictionary) for a vocabulary having words to be recognized whose number is smaller than the threshold in an interactive situation.Type: ApplicationFiled: March 30, 2010Publication date: September 20, 2012Inventors: Michihiro Yamazaki, Yuzo Maruta
-
Publication number: 20120221335Abstract: According to one embodiment, the method may include constructing a first voice tag for registration speech based on Hidden Markov acoustic model (HMM), constructing a second voice tag for the registration speech based on template matching, and combining the first voice tag and the second voice tag to construct voice tag of the registration speech.Type: ApplicationFiled: February 24, 2012Publication date: August 30, 2012Inventors: Rui Zhao, Lei He
-
Publication number: 20120213428Abstract: A training device comprises a first regenerating unit regenerates at least one of an image and a voice for training during the training courses which lead the user to train the operation of an input device, an operation accepting unit accepts the user operation for at least one of the image and the voice for training from a simulated user interface which simulates a user interface of the input device during training, a second regenerating unit regenerates at least one of the image and the voice for training when the training is ended, and a normal operation instructing unit instructs a normal operation to the user by outputting at least one of the image and the voice of the normal operation of the user, which show at least one of the image and the voice for training, which is synchronous with the regeneration of the second regenerating unit.Type: ApplicationFiled: February 7, 2012Publication date: August 23, 2012Applicant: TOSHIBA TEC KABUSHIKI KAISHAInventors: Daigo Kudou, Masanori Sambe, Takesi Kawaguti
-
Publication number: 20120191454Abstract: A system is described to monitor various parameters of a conversation, for example distinguishing voices in a conversation and reporting who in the group is violating the proper etiquette rules of conversation. These results would indicate any disruptive individuals in a conversation. So they are identified, monitored, trained to prevent further disturbances, and their etiquette is improved to prevent further disturbances. Some of the functions the system can perform include: report the identity of the voices, report how long one has spoken, report how often one interrupts, report how often one raises their voice, count the occurrences of obscenities and determine length of silences.Type: ApplicationFiled: January 26, 2011Publication date: July 26, 2012Inventors: Quinton Andrew Gabara, Constance Marie Gabara, Helen Mary Gabara, Simone Marie Gabara, Cassandra Marlene Gabara, Asher Thomas Gabara, Thaddeus John Gabara
-
Publication number: 20120173239Abstract: The invention refers to a method of verifying the identity of a speaker based on the speakers voice comprising the steps of: receiving (1, 5) a first and a second voice utterance; using biometric voice data to verify (2, 6) that the speakers voice corresponds to the speaker the identity of which is to be verified based on the received first and/or second voice utterance and determine (8) the similarity of the two received voice utterances characterized in that the similarity is determined using biometric voice characteristics of the two voice utterances or data derived from such biometric voice characteristics.Type: ApplicationFiled: June 26, 2009Publication date: July 5, 2012Inventors: Marta Sánchez Asenjo, Marta Garcia Gomar
-
Publication number: 20120162470Abstract: A moving image photographing apparatus that recognizes the shape of a speaker's mouth, and/or recognizes the speaker's voice to detect a speaker area, and selectively performs image signal processing with respect to the detected speaker area, and a moving image photographing method using the moving image photographing apparatus. The moving image photographing apparatus may selectively reproduce a moving image by generating a still image including the speaker area and using the still image as a bookmark.Type: ApplicationFiled: September 22, 2011Publication date: June 28, 2012Applicant: Samsung Electronics., Ltd.Inventors: Eun-young Kim, Seung-a Yi
-
Publication number: 20120143608Abstract: An audio signal source verification system is presented that, in certain embodiments, receives a first template for an audio signal and compares it to templates from different sound sources to determine a correlation between them. A question and response format may be used to eliminate false verifications and to increase the probability that an audio signal is from the purported source of the signal. Moreover mobile devices may be operated to provide audio signals generated by users of those phones and the audio signals and templates derived form those signals may be compared to known templates to determine a confidence level or other indication may be used to indicate the mobile device user is who they purport to be. Moreover comparisons can be made using templates of different richness to achieve confidence levels and confidence levels may be represented based on the results of the comparisons.Type: ApplicationFiled: June 10, 2011Publication date: June 7, 2012Inventor: John D. KAUFMAN
-
Publication number: 20120123786Abstract: A method for identifying and authenticating a user and protecting information. The identification process is enabled by using a mobile device such as a smartphone, laptop, or thin client device. A user speaks a phrase to create an audio voiceprint while a camera streams video images and creates a video print. The video data is converted to a color band calculated pattern to numbers. The audio voiceprint, video print, and color band are registered in a database as a digital fingerprint. Processing of all audio and video input occurs on a human key system server so there is not usage by the thin client systems used by the user to access the human key server for authentication and verification. When a user registers an audio and video fingerprint is created and stored in the database as reference to identify that individual for the purpose of verification.Type: ApplicationFiled: December 20, 2011Publication date: May 17, 2012Inventors: David Valin, Alex Socolof
-
Publication number: 20120095763Abstract: Digital method for authentication of a person by comparing a current voice profile with a previously stored initial voice profile, wherein to determine the relevant voice profile the person speaks at least one speech sample into the system, this speech sample is conveyed to a voice-profile calculation unit and thereby, on the basis of a prespecified voice-profile algorithm, the voice profile is calculated, such that the overall size of the speech sample and/or parameters of its evaluation to determine the relevant voice profile are established dynamically and automatically as the sample is spoken, in response to the result of an evaluation of a first partial speech sample.Type: ApplicationFiled: February 19, 2008Publication date: April 19, 2012Applicant: VOICE.TRUST AGInventors: Raja Kuppuswamy, Christian Pilz
-
Publication number: 20120084078Abstract: A scalable voice signature authentication capability is provided herein. The scalable voice signature authentication capability enables authentication of varied services such as speaker identification (e.g. private banking and access to healthcare account records), voice signature as a password (e.g. secure access for remote services and document retrieval) and the Internet and its various services (e.g.Type: ApplicationFiled: September 30, 2010Publication date: April 5, 2012Applicant: Alcatel-Lucent USA Inc.Inventors: Madhav Moganti, Anish Sankalia
-
Publication number: 20120078638Abstract: A communications system includes a receiver and at least one transmitter. The receiver receives, from different intermediate systems, biometric samples from parties attempting to obtain services from the intermediate systems and information characterizing the expected identifies of the parties. The at least one transmitter transmits, to the intermediate systems, verification that the biometric samples match pre-registered biometric information obtained from a storage device such that the expected identities of the parties is verified as the identities of the parties.Type: ApplicationFiled: November 10, 2011Publication date: March 29, 2012Applicant: AT&T INTELLECTUAL PROPERTY I, L.P.Inventors: Brian M. NOVACK, Daniel Larry MADSEN, Timothy R. THOMPSON
-
Publication number: 20120078624Abstract: The present invention relates to a method for detecting a voice section in time-space by using audio and video information. According to an embodiment of the present invention, a method for detecting a voice section from time-space by using audio and video information comprises the steps of: detecting a voice section in an audio signal which is inputted into a microphone array; verifying a speaker from the detected voice section; sensing the face of the speaker by using a video signal which is inputted into a camera if the speaker is successfully verified, and then estimating the direction of the face of the speaker; and determining the detected voice section as the voice section of the speaker if the estimated face direction corresponds to a reference direction which is previously stored.Type: ApplicationFiled: February 10, 2010Publication date: March 29, 2012Applicant: Korea University-Industrial & Academic Collaboration FoundationInventors: Dongsuk Yook, Hyeowoo Lee
-
Publication number: 20120065973Abstract: A method and apparatus for performing microphone beamforming. The method includes recognizing a speech of a speaker, searching for a previously stored image associated with the speaker, searching for the speaker through a camera based on the image, recognizing a position of the speaker, and performing microphone beamforming according to the position of the speaker.Type: ApplicationFiled: September 13, 2011Publication date: March 15, 2012Applicant: Samsung Electronics Co., Ltd.Inventors: Sung-Jae Cho, Hyun-Soo Kim
-
Publication number: 20120053941Abstract: A wireless voice activation surgical laser system utilizing wireless transmitter receivers. The present invention integrates wireless communication between a surgical laser, a voice recognition device, and a microphone to allow surgeons to verbally activate or deactivate a surgical laser. The voice recognition device is able to recognize a surgeons commands and relay the commands directly to the surgical laser.Type: ApplicationFiled: August 27, 2011Publication date: March 1, 2012Inventor: Michael D. SWICK
-
Publication number: 20120035929Abstract: The messaging system (100) comprises a message data engine (101), a message generation engine (102), and an inference analysis engine (107). The message data engine (101) is operable to identify the desired recipient of the message, analyse and deconstruct the desired message content into syllables, words or phrases as desired or as appropriate. The message or part message may then be passed to the inference engine (107) for review of the message content and context. The message may then be referred back to the requestor (111) through the interface (103) with details of the problem for remediation, or passed to the message generation engine (102) for transcription. The message generation engine (102) may apply a range of speech samples and/or speech parameters as appropriate to the input message in order to compile a representation of this message with the speaker characteristics that were requested.Type: ApplicationFiled: February 9, 2010Publication date: February 9, 2012Inventors: Allan Gauld, Sarah Elizabeth Roberts
-
Publication number: 20120016673Abstract: A speaker recognition system generates a codebook store with codebooks representing voice samples of speaker, referred to as trainers. The speaker recognition system may use multiple classifiers and generate a codebook store for each classifier. Each classifier uses a different set of features of a voice sample as its features. A classifier inputs a voice sample of an person and tries to authenticate or identify the person. A classifier generates a sequence of feature vectors for the input voice sample and then a code vector for that sequence. The classifier uses its codebook store to recognize the person. The speaker recognition system then combines the scores of the classifiers to generate an overall score. If the score satisfies a recognition criterion, then the speaker recognition system indicates that the voice sample is from that speaker.Type: ApplicationFiled: September 27, 2011Publication date: January 19, 2012Applicant: Microsoft CorporationInventor: Amitava Das
-
Publication number: 20120010886Abstract: A language identification system suitable for use with voice data transmitted through either a telephonic or computer network systems is presented. Embodiments that automatically select the language to be used based upon the content of the audio data stream are presented. In one embodiment the content of the data stream is supplemented with the context of the audio stream. In another embodiment the language determination is supplemented with preferences set in the communication devices and in yet another embodiment, global position data for each user of the system is used to supplement the automated language determination.Type: ApplicationFiled: July 6, 2011Publication date: January 12, 2012Inventor: Javad Razavilar
-
Publication number: 20120004914Abstract: A system generates an audio challenge that includes a first voice and one or more second voices, the first voice being audibly distinguishable, by a human, from the one or more second voices. The first voice conveys first information and the second voice conveys second information. The system provides the audio challenge to a user and verifies that the user is human based on whether the user can identify the first information in the audio challenge.Type: ApplicationFiled: September 12, 2011Publication date: January 5, 2012Applicant: Tell Me Networks c/o Microsoft CorporationInventors: Nikko Strom, Dylan F. Salisbury
-
Publication number: 20110320200Abstract: One-to-many comparisons of callers' voice prints with known voice prints to identify any matches between them. When a customer communicates with a particular entity, such as a customer service center, the system makes a recording of the real-time call including both the customer's and agent's voices. The system segments the recording to extract at least a portion of the customer's voice to create a customer voice print, and it formats the segmented voice print for network transmission to a server. The server compares the customer's voice print with multiple known voice prints to determine any matches, meaning that the customer's voice print and one of the known voice prints are likely from the same person. The identification of any matches can be used for a variety of purposes, such as determining whether to authorize a transaction requested by the customer.Type: ApplicationFiled: September 7, 2011Publication date: December 29, 2011Applicant: American Express Travel Related Services Company, Inc.Inventors: Vicki Broman, Vernon Marshall, Seshasayee Bellamkonda, Marcel Leyva, Cynthia Hanson
-
Publication number: 20110313776Abstract: A system, method and computer-readable medium for controlling devices connected to a network. The method includes receiving an utterance from a user for remotely controlling a device in a network; converting the received utterance to text using an automatic speech recognition module; accessing a user profile in the network that governs access to a plurality of devices on the network and identifiers which control a conversion of the text to a device specific control language; identifying based on the text a device to be controlled; converting at least a portion of the text to the device control language; and transmitting the device control language to the identified device, wherein the identified device implements a function based on the transmitted device control language.Type: ApplicationFiled: August 27, 2011Publication date: December 22, 2011Applicant: AT&T Intellectual Property ll, L.P.Inventors: Joseph A. ALFRED, Joseph M. Sommer
-
Publication number: 20110313766Abstract: Systems and methods for detecting people or speakers in an automated fashion are disclosed. A pool of features including more than one type of input (like audio input and video input) may be identified and used with a learning algorithm to generate a classifier that identifies people or speakers. The resulting classifier may be evaluated to detect people or speakers.Type: ApplicationFiled: August 30, 2011Publication date: December 22, 2011Applicant: MICROSOFT CORPORATIONInventors: Cha Zhang, Paul A. Viola, Pei Yin, Ross G. Cutler, Xinding Sun, Yong Rui
-
Publication number: 20110313765Abstract: A method for assessing quality of conversational speech between nodes of a communication network (1), comprising establishing a voice communication session via the communication network (1) between a user at a user terminal (2) and a virtual subject system (4), the virtual subject system (4) and user terminal (2) being connected to the communication network (1), the user terminal enabling the user to communicate by voice with the virtual subject system (4), during the session, acting as a conversation partner in a voice conversation with the virtual subject system (4), the virtual subject system being equipped with a speech generation module (42) to enable speaking during the session and a voice recognition module (41) to enable interpreting speech of the user during the session, and assessing the quality of speech over the communication network based on the voice conversation during the session, the assessing being performed by the user.Type: ApplicationFiled: November 24, 2009Publication date: December 22, 2011Applicant: ALCATEL LUCENTInventor: Nicolas Tranquart
-
Publication number: 20110295603Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for speech recognition. In one aspect, a computer-based method includes receiving a speech corpus at a speech management server system that includes multiple speech recognition engines tuned to different speaker types; using the speech recognition engines to associate the received speech corpus with a selected one of multiple different speaker types; and sending a speaker category identification code that corresponds to the associated speaker type from the speech management server system over a network. The speaker category identification code can be used by any one of speech-interactive applications coupled to the network to select one of an appropriate one of multiple application-accessible speech recognition engines tuned to the different speaker types in response to an indication that a user accessing the application is associated with a particular one of the speaker category identification codes.Type: ApplicationFiled: April 28, 2011Publication date: December 1, 2011Inventor: William S. Meisel
-
Publication number: 20110295604Abstract: Disclosed herein are systems, methods, and computer-readable storage media for processing a message received from a user to determine whether an estimate of intelligibility is below an intelligibility threshold. The method includes recognizing a portion of a user's message that contains the one or more expected utterances from a critical information list, calculating an estimate of intelligibility for the recognized portion of the user's message that contains the one or more expected utterances, and prompting the user to repeat at least the recognized portion of the user's message if the calculated estimate of intelligibility for the recognized portion of the user's message is below an intelligibility threshold. In one aspect, the method further includes prompting the user to repeat at least a portion of the message if any of a measured speech level and a measured signal-to-noise ratio of the user's message are determined to be below their respective thresholds.Type: ApplicationFiled: August 8, 2011Publication date: December 1, 2011Applicant: AT&T Intellectual Property II, L.P.Inventors: Harvey S. Cohen, Randy G. Goldberg, Kenneth H. Rosen
-
Publication number: 20110285807Abstract: A videoconferencing apparatus automatically tracks speakers in a room and dynamically switches between a controlled, people-view camera and a fixed, room-view camera. When no one is speaking, the apparatus shows the room view to the far-end. When there is a dominant speaker in the room, the apparatus directs the people-view camera at the dominant speaker and switches from the room-view camera to the people-view camera. When there is a new speaker in the room, the apparatus switches to the room-view camera first, directs the people-view camera at the new speaker, and then switches to the people-view camera directed at the new speaker. When there are two near-end speakers engaged in a conversation, the apparatus tracks and zooms-in the people-view camera so that both speakers are in view.Type: ApplicationFiled: May 18, 2010Publication date: November 24, 2011Applicant: POLYCOM, INC.Inventor: Jinwei FENG
-
Publication number: 20110282665Abstract: Provided is a method for measuring environmental parameters for multi-modal fusion. The method for measuring environmental parameters for multi-modal fusion, includes: preparing at least one enrolled modality; receiving at least one input modality; calculating image related environmental parameters of input images in at least one input modality based on illumination of enrolled image in at least one enrolled modality; and comparing the image related environmental parameters with a predetermined reference value and discarding the input image or outputting it as a recognition data according to the comparison result.Type: ApplicationFiled: January 31, 2011Publication date: November 17, 2011Applicant: Electronics and Telecommunications Research InstituteInventors: Hye Jin KIM, Do Hyung KIM, Su Young CHI, Jae Yeon LEE
-
Publication number: 20110276331Abstract: A voice recognition system includes: a voice input unit 11 for inputting a voice uttered a plurality of times; a registering voice data storage unit 12 for storing voice data uttered the plurality of times and input into the voice input unit 11; an utterance stability verification unit 13 for determining a similarity between the voice data uttered the plurality of times that are read from the registering voice data storage unit 12, and determining that registration of the voice data is acceptable when the similarity is greater than a threshold Tl; and a standard pattern creation unit 14 for creating a standard pattern by using the voice data where the utterance stability verification unit 13 determines that registration is acceptable.Type: ApplicationFiled: October 8, 2009Publication date: November 10, 2011Inventors: Michihiro Yamazaki, Jun Ishii, Hiroki Sakashita, Kazuyuki Nogi
-
Publication number: 20110270611Abstract: A system for inspecting an oil level in each part of a railroad car truck includes: an imaging unit that obtains an image of an oil level gauge; an oil level inspection unit that inspects whether or not the oil level in the each part of the railroad car truck is within a predetermined range based on the image of the oil level gauge obtained by the imaging unit; a voice input unit adapted for an inspector to input, via voice, an inspection result; a voice processing unit that determines whether or not the inspection result inputted via the voice input unit is good based on the inputted inspection result, and converts a determination result into displayable data; a display unit that displays an oil level inspection result and the determination result; and a storage unit that stores, as data, the oil level inspection result and the determination result.Type: ApplicationFiled: December 3, 2009Publication date: November 3, 2011Applicant: CENTRAL JAPAN RAILWAY COMPANYInventors: Kyouichi Nishimura, Nozomu Nakamura, Kazuhiro Okada, Yoshitaka Tanaka
-
Publication number: 20110267531Abstract: An image capturing apparatus and method for selective real-time focus/parameter adjustment. The image capturing apparatus includes a display unit, an adjustment unit, and a generation unit. The display unit is configured to display an image. The interface unit is configured to enable a user to select a plurality of regions of the image displayed on the display unit. The adjustment unit is configured to enable the user to adjust at least one focus/parameter of at least one selected region of the image displayed on the display unit. The generation unit is configured to convert the image including at least one adjusted selected region into image data, where at least one focus/parameter of the at least one adjusted selected region has been adjusted by the adjustment unit prior to conversion.Type: ApplicationFiled: May 3, 2010Publication date: November 3, 2011Applicant: CANON KABUSHIKI KAISHAInventor: Francisco Imai
-
Publication number: 20110246198Abstract: The present invention refers to a method for verifying the identity of a speaker based on the speakers voice comprising the steps of: a) receiving a voice utterance; b) using biometric voice data to verify (10) that the speakers voice corresponds to the speaker the identity of which is to be verified based on the received voice utterance; and c) verifying (12, 13) that the received voice utterance is not falsified, preferably after having verified the speakers voice; d) accepting (16) the speakers identity to be verified in case that both verification steps give a positive result and not accepting (15) the speakers identity to be verified if any of the verification steps give a negative result. The invention further refers to a corresponding computer readable medium and a computer.Type: ApplicationFiled: December 10, 2008Publication date: October 6, 2011Inventors: Marta Sánchez Asenjo, Alfredo Gutiérrez Navarro, Alberto Martin De Los Santos De Las Heras, Marta Garcia Gomar
-
Publication number: 20110231182Abstract: A mobile system is provided that includes speech-based and non-speech-based interfaces for telematics applications. The mobile system identifies and uses context, prior information, domain knowledge, and user specific profile data to achieve a natural environment for users that submit requests and/or commands in multiple domains. The invention creates, stores and uses extensive personal profile information for each user, thereby improving the reliability of determining the context and presenting the expected results for a particular question or command. The invention may organize domain specific behavior and information into agents, that are distributable or updateable over a wide area network.Type: ApplicationFiled: April 11, 2011Publication date: September 22, 2011Applicant: VoiceBox Technologies, Inc.Inventors: Chris Weider, Richard Kennewick, Mike Kennewick, Philippe Di Cristo, Robert A. Kennewick, Samuel Menaker, Lynn Elise Armstrong
-
Publication number: 20110224986Abstract: A method for configuring a voice authentication system employing at least one authentication engine comprises utilising the at least one authentication engine to systematically compare a plurality of impostor voice sample against a voice sample of a legitimate person to derive respective authentication scores. The resultant authentication scores are analysed to determine a measure of confidence for the voice authentication system.Type: ApplicationFiled: July 21, 2009Publication date: September 15, 2011Inventor: Clive Summerfield
-
Publication number: 20110218798Abstract: Techniques implemented as systems, methods, and apparatuses, including computer program products, for obfuscating sensitive content in an audio source representative of an interaction between a contact center caller and a contact center agent. The techniques include performing, by an analysis engine of a contact center system, a context-sensitive content analysis of the audio source to identify each audio source segment that includes content determined by the analysis engine to be sensitive content based on its context; and processing, by an obfuscation engine of the contact center system, one or more identified audio source segments to generate corresponding altered audio source segments each including obfuscated sensitive content.Type: ApplicationFiled: March 5, 2010Publication date: September 8, 2011Applicant: Nexdia Inc.Inventor: Marsal Gavalda
-
Publication number: 20110196677Abstract: According to one illustrative embodiment, a method is provided for analyzing an audio interaction. At least one change in an emotion of a speaker in an audio interaction and at least one aspect of the audio interaction are identified. The at least one change in an emotion is analyzed in conjunction with the at least one aspect to determine a relationship between the at least one change in an emotion and the at least one aspect, and a result of the analysis is provided.Type: ApplicationFiled: February 11, 2010Publication date: August 11, 2011Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Om D. Deshmukh, Chitra Dorai, Shailesh Joshi, Maureen E. Rzasa, Ashish Verma, Karthik Visweswariah, Gary J. Wright, Sai Zeng
-
Publication number: 20110166859Abstract: A voice recognition unit is constructed in such a way as to create a voice label string for an inputted voice uttered by a user inputted for each language on the basis of a feature vector time series of the inputted voice uttered by the user and data about a sound standard model, and register the voice label string into a voice label memory 2 while automatically switching among languages for a sound standard model memory 1 used to create the voice label string, and automatically switching among the languages for the voice label memory 2 for holding the created voice label string by using a first language switching unit SW1 and a second language switching unit SW2.Type: ApplicationFiled: October 20, 2009Publication date: July 7, 2011Inventors: Tadashi Suzuki, Yasushi Ishikawa, Yuzo Maruta
-
Publication number: 20110137635Abstract: The present disclosure describes a system and method of transliterating Semitic languages with support for diacritics. An input module receives and pre-processes Romanized character and forwards the pre-processed Romanized characters to a transliteration engine. The transliteration engine selects candidate transliteration rules, applies the rules, and scores and ranks the results for output. To optimize search for candidate transliteration rules, the transliteration engine may apply word-stemming strategies to process inflections indicated by affixes. The present disclosure further describes optimizations as pre-processing emphasis text, caching, dynamic transliteration rule pruning, and buffering/throttling input. The system and methods are suitable for multiple applications including but not limited to web applications, windows applications, client-server applications and input method editors such as those via Microsoft Text Services Framework TSF™.Type: ApplicationFiled: December 8, 2009Publication date: June 9, 2011Applicant: MICROSOFT CORPORATIONInventors: Achraf Chalabi, Hany Grees, Mostafa Ashour, Roaa Mohammed
-
PHONE CONVERSATION RECORDING SYSTEM USING CALL CONTROL AND FUNCTIONS OF PHONE CONVERSATION RECORDING
Publication number: 20110135069Abstract: New functions are added to the existing telephone network to provide services of a telecommunications carrier which are intended to deter frauds and crimes committed using telephony. Also, the telephonic circumstances during the commitment of a fraud or crime are preserved to assist prevention of recommitment of a fraud or crime. A voice announcement indicating that a telephone conversation now started will be recorded is issued to a sender in advance. This offers a function that deters frauds and crimes by creating psychological resistance. A warning is issued to the recipient after performing a voiceprint check. The contents of telephone conversations during the commitment of a fraud or crime are played back to provide information necessary to take countermeasures against frauds and crimes.Type: ApplicationFiled: July 27, 2010Publication date: June 9, 2011Inventor: KAZUKI YOSHIDA -
Publication number: 20110131044Abstract: An apparatus, program product and method is provided for separating a target voice from a plurality of other voices having different directions of arrival. The method comprises the steps of disposing a first and a second voice input device at a predetermined distance from one another and upon receipt of voice signals at said devices calculating discrete Fourier transforms for the signals and calculating a CSP (cross-power spectrum phase) coefficient by superpositioning multiple frequency-bin components based on correlation of the two spectra signals received and then calculating a weighted CSP coefficient from said two discrete Fourier-transformed speech signals. A target voice is separated when received by said devices from other voice signals in a spectrum by using the calculated weighted CSP coefficient.Type: ApplicationFiled: November 29, 2010Publication date: June 2, 2011Applicant: International Business Machines CorporationInventors: Takashi Fukuda, Osamu Ichikawa, Masafumi Nishimura
-
Publication number: 20110125498Abstract: One embodiment of the invention provides a computer-implemented method of handling a telephone call. The method comprises monitoring a conversation between an agent and a customer on a telephone line as part of the telephone call to extract the audio signal therefrom. Real-time voice analytics are performed on the extracted audio signal while the telephone call is in progress. The results from the voice analytics are then passed to a computer-telephony integration system responsible for the call for use by the computer-telephony integration system for determining future handling of the call.Type: ApplicationFiled: June 19, 2009Publication date: May 26, 2011Applicant: NEWVOICEMEDIA LTDInventors: Richard Pickering, Joseph Moussalli, Ashley Unitt
-
Publication number: 20110119060Abstract: A method and system for speaker diarization are provided. Pre-trained acoustic models of individual speaker and/or groups of speakers are obtained. Speech data with multiple speakers is received and divided into frames. For a frame, an acoustic feature vector is determined extended to include log-likelihood ratios of the pre-trained models in relation to a background population model. The extended acoustic feature vector is used in segmentation and clustering algorithms.Type: ApplicationFiled: November 15, 2009Publication date: May 19, 2011Applicant: International Business Machines CorporationInventor: Hagai Aronowitz
-
Publication number: 20110102142Abstract: A system and associated method verify the attendance and/or identity of viewers of audio/video/data streams transmitted over the internet. The system and method captures various types of interaction with the viewers and either takes appropriate action, as configured by a webcast program administrator, or simply logs this interaction to a database where audience attention and identity can be validated at a later date.Type: ApplicationFiled: November 4, 2009Publication date: May 5, 2011Inventors: IAN J. WIDGER, Steven J. Silves, Jeremy M. Knight
-
Publication number: 20110099011Abstract: A method and system for determining and communicating biometrics of a recorded speaker in a voice transcription process. An interactive voice response system receives a request from a user for a transcription of a voice file. A profile associated with the requesting user is obtained, wherein the profile comprises biometric parameters and preferences defined by the user. The requested voice file is analyzed for biometric elements according to the parameters specified in the user's profile. Responsive to detecting biometric elements in the voice file that conform to the parameters specified in the user's profile, a transcription output of the voice file is modified according to the preferences specified in the user's profile for the detected biometric elements to form a modified transcription output file. The modified transcription output file may then be provided to the requesting user.Type: ApplicationFiled: October 26, 2009Publication date: April 28, 2011Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventor: Peeyush Jaiswal
-
Publication number: 20110093267Abstract: A device may be configured to provide a query to a user. Voice data may be received from the user responsive to the query. Voice recognition may be performed on the voice data to identify a query answer. A confidence score associated with the query answer may be calculated, wherein the confidence score represents the likelihood that the query answer has been accurately identified. A likely age range associated with the user may be determined based on the confidence score. The device to calculate the confidence score may be tuned to increase a likelihood of recognition of voice data for a particular age range of callers.Type: ApplicationFiled: December 22, 2010Publication date: April 21, 2011Applicant: VERIZON PATENT AND LICENSING INC.Inventor: Kevin R. Witzman
-
Publication number: 20110093261Abstract: Systems and methods are operable to associate each of a plurality of stored audio patterns with at least one of a plurality of digital tokens, identify a user based on user identification input, access a plurality of stored audio patterns associated with a user based on the user identification input, receive from a user at least one audio input from a custom language made up of custom language elements wherein the elements include at least one monosyllabic representation of a number, letter or word, select one of the plurality of stored audio patterns associated with the identified user, in the case that the audio input received from the identified user corresponds with one of the plurality of stored audio patterns, determine the digital token associated with the selected one of the plurality of stored audio patterns, and generate the output signal for use in a device based on the determined digital token.Type: ApplicationFiled: October 15, 2010Publication date: April 21, 2011Inventor: Paul Angott
-
Publication number: 20110080289Abstract: A device may include a sensor configured to detect when a user is wearing or holding the device. The device may also include a display and a communication interface. The communication interface may be configured to forward an indication to a media playing device when the user is wearing or holding the device and receive content from the media playing device, where the content is received in response to the indication that the user is wearing or holding the device. The communication interface may also output the content to the display.Type: ApplicationFiled: October 22, 2009Publication date: April 7, 2011Applicant: SONY ERICSSON MOBILE COMMUNICATIONS ABInventor: Wayne Christopher Minton
-
Publication number: 20110071831Abstract: The present invention refers to a method for localizing a person comprising the steps carried out in a computing system (1): determining (20) the localization of a telecommunication means (3, 6, 8) or determining a telecommunication means (3, 6, 8) at a specific location; this can be implemented using ANI or calling number received and a database to look up address of a fixed telephone, for a cellular device, cell-ID or triangulation can be used; receiving (21) a voice utterance of a person by the telecommunications means; and verifying (22) the identity of that person based on the received voice utterance using biometric voice data (speech, speaker recognition). Further the invention relates to a corresponding system and computer readable medium.Type: ApplicationFiled: May 9, 2008Publication date: March 24, 2011Applicant: AGNITIO, S.L.Inventors: Marta Garcia Gomar, Marta Sanchez Asenjo