Speaker Identification Or Verification (epo) Patents (Class 704/E17.001)
  • Publication number: 20120278077
    Abstract: Systems and methods for detecting people or speakers in an automated fashion are disclosed. A pool of features including more than one type of input (like audio input and video input) may be identified and used with a learning algorithm to generate a classifier that identifies people or speakers. The resulting classifier may be evaluated to detect people or speakers.
    Type: Application
    Filed: July 11, 2012
    Publication date: November 1, 2012
    Applicant: MICROSOFT CORPORATION
    Inventors: Cha Zhang, Paul A. Viola, Pei Yin, Ross G. Cutler, Xinding Sun, Yong Rui
  • Publication number: 20120259635
    Abstract: A system for the storing of client information in an independent repository is disclosed. Client data may be uploaded by client or those authorized by client or collected and stored by the repository. Data about the client file such as, for example, the time of upload and modifications are stored in a metadata file associated with the client file.
    Type: Application
    Filed: April 5, 2012
    Publication date: October 11, 2012
    Inventors: Gregory J. Ekchian, Jack A. Ekchian
  • Publication number: 20120253811
    Abstract: A method for identifying a plurality of speakers in audio data and for decoding the speech spoken by said speakers; the method comprising: receiving speech; dividing the speech into segments as it is received; processing the received speech segment by segment in the order received to identify the speaker and to decode the speech, processing comprising: performing primary decoding of the segment using an acoustic model and a language model; obtaining segment parameters indicating the differences between the speaker of the segment and a base speaker during the primary decoding; comparing the segment parameters with a plurality of stored speaker profiles to determine the identity of the speaker, and selecting a speaker profile for said speaker; updating the selected speaker profile; performing a further decoding of the segment using a speaker independent acoustic model, adapted using the updated speaker profile; outputting the decoded speech for the identified speaker, wherein the speaker profiles are upd
    Type: Application
    Filed: August 23, 2011
    Publication date: October 4, 2012
    Applicant: Kabushiki Kaisha Toshiba
    Inventors: Catherine BRESLIN, Mark John Francis Gales, Kean Kheong Chin, Katherine Mary Knill
  • Publication number: 20120253808
    Abstract: According to an embodiment, a voice recognition device includes a voice inputting unit, a voice recognition processing unit, a vibration movement pattern model holding unit, and a vibration movement unit. The voice recognition processing unit performs voice recognition processing using a digital signal output from the voice inputting unit to output a voice recognition result and outputs voice reliability of the received voice signal. The vibration movement pattern model holding unit stores models prepared according to a number of patterns of the voice reliability output from the voice recognition processing unit and holds vibration movements corresponding to the models. The vibration movement unit detects whether or not the voice reliability output from the voice recognition processing unit matches any one of the models in the vibration movement pattern model holding unit and performs vibration movement predetermined for a matched model.
    Type: Application
    Filed: October 17, 2011
    Publication date: October 4, 2012
    Inventors: Motonobu Sugiura, Hiroshi Fujimura
  • Publication number: 20120239399
    Abstract: Disclosed is a voice recognition device which creates a recognition dictionary (statically-created dictionary) in advance for a vocabulary having words to be recognized whose number is equal to or larger than a threshold, and creates a recognition dictionary (dynamically-created dictionary) for a vocabulary having words to be recognized whose number is smaller than the threshold in an interactive situation.
    Type: Application
    Filed: March 30, 2010
    Publication date: September 20, 2012
    Inventors: Michihiro Yamazaki, Yuzo Maruta
  • Publication number: 20120221335
    Abstract: According to one embodiment, the method may include constructing a first voice tag for registration speech based on Hidden Markov acoustic model (HMM), constructing a second voice tag for the registration speech based on template matching, and combining the first voice tag and the second voice tag to construct voice tag of the registration speech.
    Type: Application
    Filed: February 24, 2012
    Publication date: August 30, 2012
    Inventors: Rui Zhao, Lei He
  • Publication number: 20120213428
    Abstract: A training device comprises a first regenerating unit regenerates at least one of an image and a voice for training during the training courses which lead the user to train the operation of an input device, an operation accepting unit accepts the user operation for at least one of the image and the voice for training from a simulated user interface which simulates a user interface of the input device during training, a second regenerating unit regenerates at least one of the image and the voice for training when the training is ended, and a normal operation instructing unit instructs a normal operation to the user by outputting at least one of the image and the voice of the normal operation of the user, which show at least one of the image and the voice for training, which is synchronous with the regeneration of the second regenerating unit.
    Type: Application
    Filed: February 7, 2012
    Publication date: August 23, 2012
    Applicant: TOSHIBA TEC KABUSHIKI KAISHA
    Inventors: Daigo Kudou, Masanori Sambe, Takesi Kawaguti
  • Publication number: 20120191454
    Abstract: A system is described to monitor various parameters of a conversation, for example distinguishing voices in a conversation and reporting who in the group is violating the proper etiquette rules of conversation. These results would indicate any disruptive individuals in a conversation. So they are identified, monitored, trained to prevent further disturbances, and their etiquette is improved to prevent further disturbances. Some of the functions the system can perform include: report the identity of the voices, report how long one has spoken, report how often one interrupts, report how often one raises their voice, count the occurrences of obscenities and determine length of silences.
    Type: Application
    Filed: January 26, 2011
    Publication date: July 26, 2012
    Inventors: Quinton Andrew Gabara, Constance Marie Gabara, Helen Mary Gabara, Simone Marie Gabara, Cassandra Marlene Gabara, Asher Thomas Gabara, Thaddeus John Gabara
  • Publication number: 20120173239
    Abstract: The invention refers to a method of verifying the identity of a speaker based on the speakers voice comprising the steps of: receiving (1, 5) a first and a second voice utterance; using biometric voice data to verify (2, 6) that the speakers voice corresponds to the speaker the identity of which is to be verified based on the received first and/or second voice utterance and determine (8) the similarity of the two received voice utterances characterized in that the similarity is determined using biometric voice characteristics of the two voice utterances or data derived from such biometric voice characteristics.
    Type: Application
    Filed: June 26, 2009
    Publication date: July 5, 2012
    Inventors: Marta Sánchez Asenjo, Marta Garcia Gomar
  • Publication number: 20120162470
    Abstract: A moving image photographing apparatus that recognizes the shape of a speaker's mouth, and/or recognizes the speaker's voice to detect a speaker area, and selectively performs image signal processing with respect to the detected speaker area, and a moving image photographing method using the moving image photographing apparatus. The moving image photographing apparatus may selectively reproduce a moving image by generating a still image including the speaker area and using the still image as a bookmark.
    Type: Application
    Filed: September 22, 2011
    Publication date: June 28, 2012
    Applicant: Samsung Electronics., Ltd.
    Inventors: Eun-young Kim, Seung-a Yi
  • Publication number: 20120143608
    Abstract: An audio signal source verification system is presented that, in certain embodiments, receives a first template for an audio signal and compares it to templates from different sound sources to determine a correlation between them. A question and response format may be used to eliminate false verifications and to increase the probability that an audio signal is from the purported source of the signal. Moreover mobile devices may be operated to provide audio signals generated by users of those phones and the audio signals and templates derived form those signals may be compared to known templates to determine a confidence level or other indication may be used to indicate the mobile device user is who they purport to be. Moreover comparisons can be made using templates of different richness to achieve confidence levels and confidence levels may be represented based on the results of the comparisons.
    Type: Application
    Filed: June 10, 2011
    Publication date: June 7, 2012
    Inventor: John D. KAUFMAN
  • Publication number: 20120123786
    Abstract: A method for identifying and authenticating a user and protecting information. The identification process is enabled by using a mobile device such as a smartphone, laptop, or thin client device. A user speaks a phrase to create an audio voiceprint while a camera streams video images and creates a video print. The video data is converted to a color band calculated pattern to numbers. The audio voiceprint, video print, and color band are registered in a database as a digital fingerprint. Processing of all audio and video input occurs on a human key system server so there is not usage by the thin client systems used by the user to access the human key server for authentication and verification. When a user registers an audio and video fingerprint is created and stored in the database as reference to identify that individual for the purpose of verification.
    Type: Application
    Filed: December 20, 2011
    Publication date: May 17, 2012
    Inventors: David Valin, Alex Socolof
  • Publication number: 20120095763
    Abstract: Digital method for authentication of a person by comparing a current voice profile with a previously stored initial voice profile, wherein to determine the relevant voice profile the person speaks at least one speech sample into the system, this speech sample is conveyed to a voice-profile calculation unit and thereby, on the basis of a prespecified voice-profile algorithm, the voice profile is calculated, such that the overall size of the speech sample and/or parameters of its evaluation to determine the relevant voice profile are established dynamically and automatically as the sample is spoken, in response to the result of an evaluation of a first partial speech sample.
    Type: Application
    Filed: February 19, 2008
    Publication date: April 19, 2012
    Applicant: VOICE.TRUST AG
    Inventors: Raja Kuppuswamy, Christian Pilz
  • Publication number: 20120084078
    Abstract: A scalable voice signature authentication capability is provided herein. The scalable voice signature authentication capability enables authentication of varied services such as speaker identification (e.g. private banking and access to healthcare account records), voice signature as a password (e.g. secure access for remote services and document retrieval) and the Internet and its various services (e.g.
    Type: Application
    Filed: September 30, 2010
    Publication date: April 5, 2012
    Applicant: Alcatel-Lucent USA Inc.
    Inventors: Madhav Moganti, Anish Sankalia
  • Publication number: 20120078638
    Abstract: A communications system includes a receiver and at least one transmitter. The receiver receives, from different intermediate systems, biometric samples from parties attempting to obtain services from the intermediate systems and information characterizing the expected identifies of the parties. The at least one transmitter transmits, to the intermediate systems, verification that the biometric samples match pre-registered biometric information obtained from a storage device such that the expected identities of the parties is verified as the identities of the parties.
    Type: Application
    Filed: November 10, 2011
    Publication date: March 29, 2012
    Applicant: AT&T INTELLECTUAL PROPERTY I, L.P.
    Inventors: Brian M. NOVACK, Daniel Larry MADSEN, Timothy R. THOMPSON
  • Publication number: 20120078624
    Abstract: The present invention relates to a method for detecting a voice section in time-space by using audio and video information. According to an embodiment of the present invention, a method for detecting a voice section from time-space by using audio and video information comprises the steps of: detecting a voice section in an audio signal which is inputted into a microphone array; verifying a speaker from the detected voice section; sensing the face of the speaker by using a video signal which is inputted into a camera if the speaker is successfully verified, and then estimating the direction of the face of the speaker; and determining the detected voice section as the voice section of the speaker if the estimated face direction corresponds to a reference direction which is previously stored.
    Type: Application
    Filed: February 10, 2010
    Publication date: March 29, 2012
    Applicant: Korea University-Industrial & Academic Collaboration Foundation
    Inventors: Dongsuk Yook, Hyeowoo Lee
  • Publication number: 20120065973
    Abstract: A method and apparatus for performing microphone beamforming. The method includes recognizing a speech of a speaker, searching for a previously stored image associated with the speaker, searching for the speaker through a camera based on the image, recognizing a position of the speaker, and performing microphone beamforming according to the position of the speaker.
    Type: Application
    Filed: September 13, 2011
    Publication date: March 15, 2012
    Applicant: Samsung Electronics Co., Ltd.
    Inventors: Sung-Jae Cho, Hyun-Soo Kim
  • Publication number: 20120053941
    Abstract: A wireless voice activation surgical laser system utilizing wireless transmitter receivers. The present invention integrates wireless communication between a surgical laser, a voice recognition device, and a microphone to allow surgeons to verbally activate or deactivate a surgical laser. The voice recognition device is able to recognize a surgeons commands and relay the commands directly to the surgical laser.
    Type: Application
    Filed: August 27, 2011
    Publication date: March 1, 2012
    Inventor: Michael D. SWICK
  • Publication number: 20120035929
    Abstract: The messaging system (100) comprises a message data engine (101), a message generation engine (102), and an inference analysis engine (107). The message data engine (101) is operable to identify the desired recipient of the message, analyse and deconstruct the desired message content into syllables, words or phrases as desired or as appropriate. The message or part message may then be passed to the inference engine (107) for review of the message content and context. The message may then be referred back to the requestor (111) through the interface (103) with details of the problem for remediation, or passed to the message generation engine (102) for transcription. The message generation engine (102) may apply a range of speech samples and/or speech parameters as appropriate to the input message in order to compile a representation of this message with the speaker characteristics that were requested.
    Type: Application
    Filed: February 9, 2010
    Publication date: February 9, 2012
    Inventors: Allan Gauld, Sarah Elizabeth Roberts
  • Publication number: 20120016673
    Abstract: A speaker recognition system generates a codebook store with codebooks representing voice samples of speaker, referred to as trainers. The speaker recognition system may use multiple classifiers and generate a codebook store for each classifier. Each classifier uses a different set of features of a voice sample as its features. A classifier inputs a voice sample of an person and tries to authenticate or identify the person. A classifier generates a sequence of feature vectors for the input voice sample and then a code vector for that sequence. The classifier uses its codebook store to recognize the person. The speaker recognition system then combines the scores of the classifiers to generate an overall score. If the score satisfies a recognition criterion, then the speaker recognition system indicates that the voice sample is from that speaker.
    Type: Application
    Filed: September 27, 2011
    Publication date: January 19, 2012
    Applicant: Microsoft Corporation
    Inventor: Amitava Das
  • Publication number: 20120010886
    Abstract: A language identification system suitable for use with voice data transmitted through either a telephonic or computer network systems is presented. Embodiments that automatically select the language to be used based upon the content of the audio data stream are presented. In one embodiment the content of the data stream is supplemented with the context of the audio stream. In another embodiment the language determination is supplemented with preferences set in the communication devices and in yet another embodiment, global position data for each user of the system is used to supplement the automated language determination.
    Type: Application
    Filed: July 6, 2011
    Publication date: January 12, 2012
    Inventor: Javad Razavilar
  • Publication number: 20120004914
    Abstract: A system generates an audio challenge that includes a first voice and one or more second voices, the first voice being audibly distinguishable, by a human, from the one or more second voices. The first voice conveys first information and the second voice conveys second information. The system provides the audio challenge to a user and verifies that the user is human based on whether the user can identify the first information in the audio challenge.
    Type: Application
    Filed: September 12, 2011
    Publication date: January 5, 2012
    Applicant: Tell Me Networks c/o Microsoft Corporation
    Inventors: Nikko Strom, Dylan F. Salisbury
  • Publication number: 20110320200
    Abstract: One-to-many comparisons of callers' voice prints with known voice prints to identify any matches between them. When a customer communicates with a particular entity, such as a customer service center, the system makes a recording of the real-time call including both the customer's and agent's voices. The system segments the recording to extract at least a portion of the customer's voice to create a customer voice print, and it formats the segmented voice print for network transmission to a server. The server compares the customer's voice print with multiple known voice prints to determine any matches, meaning that the customer's voice print and one of the known voice prints are likely from the same person. The identification of any matches can be used for a variety of purposes, such as determining whether to authorize a transaction requested by the customer.
    Type: Application
    Filed: September 7, 2011
    Publication date: December 29, 2011
    Applicant: American Express Travel Related Services Company, Inc.
    Inventors: Vicki Broman, Vernon Marshall, Seshasayee Bellamkonda, Marcel Leyva, Cynthia Hanson
  • Publication number: 20110313776
    Abstract: A system, method and computer-readable medium for controlling devices connected to a network. The method includes receiving an utterance from a user for remotely controlling a device in a network; converting the received utterance to text using an automatic speech recognition module; accessing a user profile in the network that governs access to a plurality of devices on the network and identifiers which control a conversion of the text to a device specific control language; identifying based on the text a device to be controlled; converting at least a portion of the text to the device control language; and transmitting the device control language to the identified device, wherein the identified device implements a function based on the transmitted device control language.
    Type: Application
    Filed: August 27, 2011
    Publication date: December 22, 2011
    Applicant: AT&T Intellectual Property ll, L.P.
    Inventors: Joseph A. ALFRED, Joseph M. Sommer
  • Publication number: 20110313766
    Abstract: Systems and methods for detecting people or speakers in an automated fashion are disclosed. A pool of features including more than one type of input (like audio input and video input) may be identified and used with a learning algorithm to generate a classifier that identifies people or speakers. The resulting classifier may be evaluated to detect people or speakers.
    Type: Application
    Filed: August 30, 2011
    Publication date: December 22, 2011
    Applicant: MICROSOFT CORPORATION
    Inventors: Cha Zhang, Paul A. Viola, Pei Yin, Ross G. Cutler, Xinding Sun, Yong Rui
  • Publication number: 20110313765
    Abstract: A method for assessing quality of conversational speech between nodes of a communication network (1), comprising establishing a voice communication session via the communication network (1) between a user at a user terminal (2) and a virtual subject system (4), the virtual subject system (4) and user terminal (2) being connected to the communication network (1), the user terminal enabling the user to communicate by voice with the virtual subject system (4), during the session, acting as a conversation partner in a voice conversation with the virtual subject system (4), the virtual subject system being equipped with a speech generation module (42) to enable speaking during the session and a voice recognition module (41) to enable interpreting speech of the user during the session, and assessing the quality of speech over the communication network based on the voice conversation during the session, the assessing being performed by the user.
    Type: Application
    Filed: November 24, 2009
    Publication date: December 22, 2011
    Applicant: ALCATEL LUCENT
    Inventor: Nicolas Tranquart
  • Publication number: 20110295603
    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for speech recognition. In one aspect, a computer-based method includes receiving a speech corpus at a speech management server system that includes multiple speech recognition engines tuned to different speaker types; using the speech recognition engines to associate the received speech corpus with a selected one of multiple different speaker types; and sending a speaker category identification code that corresponds to the associated speaker type from the speech management server system over a network. The speaker category identification code can be used by any one of speech-interactive applications coupled to the network to select one of an appropriate one of multiple application-accessible speech recognition engines tuned to the different speaker types in response to an indication that a user accessing the application is associated with a particular one of the speaker category identification codes.
    Type: Application
    Filed: April 28, 2011
    Publication date: December 1, 2011
    Inventor: William S. Meisel
  • Publication number: 20110295604
    Abstract: Disclosed herein are systems, methods, and computer-readable storage media for processing a message received from a user to determine whether an estimate of intelligibility is below an intelligibility threshold. The method includes recognizing a portion of a user's message that contains the one or more expected utterances from a critical information list, calculating an estimate of intelligibility for the recognized portion of the user's message that contains the one or more expected utterances, and prompting the user to repeat at least the recognized portion of the user's message if the calculated estimate of intelligibility for the recognized portion of the user's message is below an intelligibility threshold. In one aspect, the method further includes prompting the user to repeat at least a portion of the message if any of a measured speech level and a measured signal-to-noise ratio of the user's message are determined to be below their respective thresholds.
    Type: Application
    Filed: August 8, 2011
    Publication date: December 1, 2011
    Applicant: AT&T Intellectual Property II, L.P.
    Inventors: Harvey S. Cohen, Randy G. Goldberg, Kenneth H. Rosen
  • Publication number: 20110285807
    Abstract: A videoconferencing apparatus automatically tracks speakers in a room and dynamically switches between a controlled, people-view camera and a fixed, room-view camera. When no one is speaking, the apparatus shows the room view to the far-end. When there is a dominant speaker in the room, the apparatus directs the people-view camera at the dominant speaker and switches from the room-view camera to the people-view camera. When there is a new speaker in the room, the apparatus switches to the room-view camera first, directs the people-view camera at the new speaker, and then switches to the people-view camera directed at the new speaker. When there are two near-end speakers engaged in a conversation, the apparatus tracks and zooms-in the people-view camera so that both speakers are in view.
    Type: Application
    Filed: May 18, 2010
    Publication date: November 24, 2011
    Applicant: POLYCOM, INC.
    Inventor: Jinwei FENG
  • Publication number: 20110282665
    Abstract: Provided is a method for measuring environmental parameters for multi-modal fusion. The method for measuring environmental parameters for multi-modal fusion, includes: preparing at least one enrolled modality; receiving at least one input modality; calculating image related environmental parameters of input images in at least one input modality based on illumination of enrolled image in at least one enrolled modality; and comparing the image related environmental parameters with a predetermined reference value and discarding the input image or outputting it as a recognition data according to the comparison result.
    Type: Application
    Filed: January 31, 2011
    Publication date: November 17, 2011
    Applicant: Electronics and Telecommunications Research Institute
    Inventors: Hye Jin KIM, Do Hyung KIM, Su Young CHI, Jae Yeon LEE
  • Publication number: 20110276331
    Abstract: A voice recognition system includes: a voice input unit 11 for inputting a voice uttered a plurality of times; a registering voice data storage unit 12 for storing voice data uttered the plurality of times and input into the voice input unit 11; an utterance stability verification unit 13 for determining a similarity between the voice data uttered the plurality of times that are read from the registering voice data storage unit 12, and determining that registration of the voice data is acceptable when the similarity is greater than a threshold Tl; and a standard pattern creation unit 14 for creating a standard pattern by using the voice data where the utterance stability verification unit 13 determines that registration is acceptable.
    Type: Application
    Filed: October 8, 2009
    Publication date: November 10, 2011
    Inventors: Michihiro Yamazaki, Jun Ishii, Hiroki Sakashita, Kazuyuki Nogi
  • Publication number: 20110270611
    Abstract: A system for inspecting an oil level in each part of a railroad car truck includes: an imaging unit that obtains an image of an oil level gauge; an oil level inspection unit that inspects whether or not the oil level in the each part of the railroad car truck is within a predetermined range based on the image of the oil level gauge obtained by the imaging unit; a voice input unit adapted for an inspector to input, via voice, an inspection result; a voice processing unit that determines whether or not the inspection result inputted via the voice input unit is good based on the inputted inspection result, and converts a determination result into displayable data; a display unit that displays an oil level inspection result and the determination result; and a storage unit that stores, as data, the oil level inspection result and the determination result.
    Type: Application
    Filed: December 3, 2009
    Publication date: November 3, 2011
    Applicant: CENTRAL JAPAN RAILWAY COMPANY
    Inventors: Kyouichi Nishimura, Nozomu Nakamura, Kazuhiro Okada, Yoshitaka Tanaka
  • Publication number: 20110267531
    Abstract: An image capturing apparatus and method for selective real-time focus/parameter adjustment. The image capturing apparatus includes a display unit, an adjustment unit, and a generation unit. The display unit is configured to display an image. The interface unit is configured to enable a user to select a plurality of regions of the image displayed on the display unit. The adjustment unit is configured to enable the user to adjust at least one focus/parameter of at least one selected region of the image displayed on the display unit. The generation unit is configured to convert the image including at least one adjusted selected region into image data, where at least one focus/parameter of the at least one adjusted selected region has been adjusted by the adjustment unit prior to conversion.
    Type: Application
    Filed: May 3, 2010
    Publication date: November 3, 2011
    Applicant: CANON KABUSHIKI KAISHA
    Inventor: Francisco Imai
  • Publication number: 20110246198
    Abstract: The present invention refers to a method for verifying the identity of a speaker based on the speakers voice comprising the steps of: a) receiving a voice utterance; b) using biometric voice data to verify (10) that the speakers voice corresponds to the speaker the identity of which is to be verified based on the received voice utterance; and c) verifying (12, 13) that the received voice utterance is not falsified, preferably after having verified the speakers voice; d) accepting (16) the speakers identity to be verified in case that both verification steps give a positive result and not accepting (15) the speakers identity to be verified if any of the verification steps give a negative result. The invention further refers to a corresponding computer readable medium and a computer.
    Type: Application
    Filed: December 10, 2008
    Publication date: October 6, 2011
    Inventors: Marta Sánchez Asenjo, Alfredo Gutiérrez Navarro, Alberto Martin De Los Santos De Las Heras, Marta Garcia Gomar
  • Publication number: 20110231182
    Abstract: A mobile system is provided that includes speech-based and non-speech-based interfaces for telematics applications. The mobile system identifies and uses context, prior information, domain knowledge, and user specific profile data to achieve a natural environment for users that submit requests and/or commands in multiple domains. The invention creates, stores and uses extensive personal profile information for each user, thereby improving the reliability of determining the context and presenting the expected results for a particular question or command. The invention may organize domain specific behavior and information into agents, that are distributable or updateable over a wide area network.
    Type: Application
    Filed: April 11, 2011
    Publication date: September 22, 2011
    Applicant: VoiceBox Technologies, Inc.
    Inventors: Chris Weider, Richard Kennewick, Mike Kennewick, Philippe Di Cristo, Robert A. Kennewick, Samuel Menaker, Lynn Elise Armstrong
  • Publication number: 20110224986
    Abstract: A method for configuring a voice authentication system employing at least one authentication engine comprises utilising the at least one authentication engine to systematically compare a plurality of impostor voice sample against a voice sample of a legitimate person to derive respective authentication scores. The resultant authentication scores are analysed to determine a measure of confidence for the voice authentication system.
    Type: Application
    Filed: July 21, 2009
    Publication date: September 15, 2011
    Inventor: Clive Summerfield
  • Publication number: 20110218798
    Abstract: Techniques implemented as systems, methods, and apparatuses, including computer program products, for obfuscating sensitive content in an audio source representative of an interaction between a contact center caller and a contact center agent. The techniques include performing, by an analysis engine of a contact center system, a context-sensitive content analysis of the audio source to identify each audio source segment that includes content determined by the analysis engine to be sensitive content based on its context; and processing, by an obfuscation engine of the contact center system, one or more identified audio source segments to generate corresponding altered audio source segments each including obfuscated sensitive content.
    Type: Application
    Filed: March 5, 2010
    Publication date: September 8, 2011
    Applicant: Nexdia Inc.
    Inventor: Marsal Gavalda
  • Publication number: 20110196677
    Abstract: According to one illustrative embodiment, a method is provided for analyzing an audio interaction. At least one change in an emotion of a speaker in an audio interaction and at least one aspect of the audio interaction are identified. The at least one change in an emotion is analyzed in conjunction with the at least one aspect to determine a relationship between the at least one change in an emotion and the at least one aspect, and a result of the analysis is provided.
    Type: Application
    Filed: February 11, 2010
    Publication date: August 11, 2011
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Om D. Deshmukh, Chitra Dorai, Shailesh Joshi, Maureen E. Rzasa, Ashish Verma, Karthik Visweswariah, Gary J. Wright, Sai Zeng
  • Publication number: 20110166859
    Abstract: A voice recognition unit is constructed in such a way as to create a voice label string for an inputted voice uttered by a user inputted for each language on the basis of a feature vector time series of the inputted voice uttered by the user and data about a sound standard model, and register the voice label string into a voice label memory 2 while automatically switching among languages for a sound standard model memory 1 used to create the voice label string, and automatically switching among the languages for the voice label memory 2 for holding the created voice label string by using a first language switching unit SW1 and a second language switching unit SW2.
    Type: Application
    Filed: October 20, 2009
    Publication date: July 7, 2011
    Inventors: Tadashi Suzuki, Yasushi Ishikawa, Yuzo Maruta
  • Publication number: 20110137635
    Abstract: The present disclosure describes a system and method of transliterating Semitic languages with support for diacritics. An input module receives and pre-processes Romanized character and forwards the pre-processed Romanized characters to a transliteration engine. The transliteration engine selects candidate transliteration rules, applies the rules, and scores and ranks the results for output. To optimize search for candidate transliteration rules, the transliteration engine may apply word-stemming strategies to process inflections indicated by affixes. The present disclosure further describes optimizations as pre-processing emphasis text, caching, dynamic transliteration rule pruning, and buffering/throttling input. The system and methods are suitable for multiple applications including but not limited to web applications, windows applications, client-server applications and input method editors such as those via Microsoft Text Services Framework TSF™.
    Type: Application
    Filed: December 8, 2009
    Publication date: June 9, 2011
    Applicant: MICROSOFT CORPORATION
    Inventors: Achraf Chalabi, Hany Grees, Mostafa Ashour, Roaa Mohammed
  • Publication number: 20110135069
    Abstract: New functions are added to the existing telephone network to provide services of a telecommunications carrier which are intended to deter frauds and crimes committed using telephony. Also, the telephonic circumstances during the commitment of a fraud or crime are preserved to assist prevention of recommitment of a fraud or crime. A voice announcement indicating that a telephone conversation now started will be recorded is issued to a sender in advance. This offers a function that deters frauds and crimes by creating psychological resistance. A warning is issued to the recipient after performing a voiceprint check. The contents of telephone conversations during the commitment of a fraud or crime are played back to provide information necessary to take countermeasures against frauds and crimes.
    Type: Application
    Filed: July 27, 2010
    Publication date: June 9, 2011
    Inventor: KAZUKI YOSHIDA
  • Publication number: 20110131044
    Abstract: An apparatus, program product and method is provided for separating a target voice from a plurality of other voices having different directions of arrival. The method comprises the steps of disposing a first and a second voice input device at a predetermined distance from one another and upon receipt of voice signals at said devices calculating discrete Fourier transforms for the signals and calculating a CSP (cross-power spectrum phase) coefficient by superpositioning multiple frequency-bin components based on correlation of the two spectra signals received and then calculating a weighted CSP coefficient from said two discrete Fourier-transformed speech signals. A target voice is separated when received by said devices from other voice signals in a spectrum by using the calculated weighted CSP coefficient.
    Type: Application
    Filed: November 29, 2010
    Publication date: June 2, 2011
    Applicant: International Business Machines Corporation
    Inventors: Takashi Fukuda, Osamu Ichikawa, Masafumi Nishimura
  • Publication number: 20110125498
    Abstract: One embodiment of the invention provides a computer-implemented method of handling a telephone call. The method comprises monitoring a conversation between an agent and a customer on a telephone line as part of the telephone call to extract the audio signal therefrom. Real-time voice analytics are performed on the extracted audio signal while the telephone call is in progress. The results from the voice analytics are then passed to a computer-telephony integration system responsible for the call for use by the computer-telephony integration system for determining future handling of the call.
    Type: Application
    Filed: June 19, 2009
    Publication date: May 26, 2011
    Applicant: NEWVOICEMEDIA LTD
    Inventors: Richard Pickering, Joseph Moussalli, Ashley Unitt
  • Publication number: 20110119060
    Abstract: A method and system for speaker diarization are provided. Pre-trained acoustic models of individual speaker and/or groups of speakers are obtained. Speech data with multiple speakers is received and divided into frames. For a frame, an acoustic feature vector is determined extended to include log-likelihood ratios of the pre-trained models in relation to a background population model. The extended acoustic feature vector is used in segmentation and clustering algorithms.
    Type: Application
    Filed: November 15, 2009
    Publication date: May 19, 2011
    Applicant: International Business Machines Corporation
    Inventor: Hagai Aronowitz
  • Publication number: 20110102142
    Abstract: A system and associated method verify the attendance and/or identity of viewers of audio/video/data streams transmitted over the internet. The system and method captures various types of interaction with the viewers and either takes appropriate action, as configured by a webcast program administrator, or simply logs this interaction to a database where audience attention and identity can be validated at a later date.
    Type: Application
    Filed: November 4, 2009
    Publication date: May 5, 2011
    Inventors: IAN J. WIDGER, Steven J. Silves, Jeremy M. Knight
  • Publication number: 20110099011
    Abstract: A method and system for determining and communicating biometrics of a recorded speaker in a voice transcription process. An interactive voice response system receives a request from a user for a transcription of a voice file. A profile associated with the requesting user is obtained, wherein the profile comprises biometric parameters and preferences defined by the user. The requested voice file is analyzed for biometric elements according to the parameters specified in the user's profile. Responsive to detecting biometric elements in the voice file that conform to the parameters specified in the user's profile, a transcription output of the voice file is modified according to the preferences specified in the user's profile for the detected biometric elements to form a modified transcription output file. The modified transcription output file may then be provided to the requesting user.
    Type: Application
    Filed: October 26, 2009
    Publication date: April 28, 2011
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventor: Peeyush Jaiswal
  • Publication number: 20110093267
    Abstract: A device may be configured to provide a query to a user. Voice data may be received from the user responsive to the query. Voice recognition may be performed on the voice data to identify a query answer. A confidence score associated with the query answer may be calculated, wherein the confidence score represents the likelihood that the query answer has been accurately identified. A likely age range associated with the user may be determined based on the confidence score. The device to calculate the confidence score may be tuned to increase a likelihood of recognition of voice data for a particular age range of callers.
    Type: Application
    Filed: December 22, 2010
    Publication date: April 21, 2011
    Applicant: VERIZON PATENT AND LICENSING INC.
    Inventor: Kevin R. Witzman
  • Publication number: 20110093261
    Abstract: Systems and methods are operable to associate each of a plurality of stored audio patterns with at least one of a plurality of digital tokens, identify a user based on user identification input, access a plurality of stored audio patterns associated with a user based on the user identification input, receive from a user at least one audio input from a custom language made up of custom language elements wherein the elements include at least one monosyllabic representation of a number, letter or word, select one of the plurality of stored audio patterns associated with the identified user, in the case that the audio input received from the identified user corresponds with one of the plurality of stored audio patterns, determine the digital token associated with the selected one of the plurality of stored audio patterns, and generate the output signal for use in a device based on the determined digital token.
    Type: Application
    Filed: October 15, 2010
    Publication date: April 21, 2011
    Inventor: Paul Angott
  • Publication number: 20110080289
    Abstract: A device may include a sensor configured to detect when a user is wearing or holding the device. The device may also include a display and a communication interface. The communication interface may be configured to forward an indication to a media playing device when the user is wearing or holding the device and receive content from the media playing device, where the content is received in response to the indication that the user is wearing or holding the device. The communication interface may also output the content to the display.
    Type: Application
    Filed: October 22, 2009
    Publication date: April 7, 2011
    Applicant: SONY ERICSSON MOBILE COMMUNICATIONS AB
    Inventor: Wayne Christopher Minton
  • Publication number: 20110071831
    Abstract: The present invention refers to a method for localizing a person comprising the steps carried out in a computing system (1): determining (20) the localization of a telecommunication means (3, 6, 8) or determining a telecommunication means (3, 6, 8) at a specific location; this can be implemented using ANI or calling number received and a database to look up address of a fixed telephone, for a cellular device, cell-ID or triangulation can be used; receiving (21) a voice utterance of a person by the telecommunications means; and verifying (22) the identity of that person based on the received voice utterance using biometric voice data (speech, speaker recognition). Further the invention relates to a corresponding system and computer readable medium.
    Type: Application
    Filed: May 9, 2008
    Publication date: March 24, 2011
    Applicant: AGNITIO, S.L.
    Inventors: Marta Garcia Gomar, Marta Sanchez Asenjo