Patents by Inventor Wael Hamza

Wael Hamza has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Model configuration

Patent number: 11869490

Abstract: Techniques for tuning parameters for machine learning models are described. Different values for a parameter are tested to determine the value that results in an optimized model. A parameter value may be selected for testing using a search algorithm based on how the model performs with respect to other values for the parameter. Different values may be tested until a stopping criterion (such as time for testing, number of trials, amount of enhancement in performance, etc.) is met. In some embodiments, the techniques may be used to determine parameter values for natural language processing models.

Type: Grant

Filed: August 14, 2020

Date of Patent: January 9, 2024

Assignee: Amazon Technologies, Inc.

Inventors: Rahul Gupta, Jwala Dhamala, Melanie C B Gens, Sachin Midha, Jennifer Yuen, Dewan Muhammed Ibtesham, Wael Hamza, Xinhong Zhang, Md Humayun Arafat
SPEECH PROCESSING

Publication number: 20230368796

Abstract: Techniques for performing spoken language understanding (SLU) processing are described. An SLU component may include an audio encoder configured to perform an audio-to-text processing task and an audio-to-NLU processing task. The SLU component may also include a joint decoder configured to perform the audio-to-text processing task, the audio-to-NLU processing task and a text-to-NLU processing task. Input audio data, representing a spoken input, is processed by the audio encoder and the joint decoder to determine NLU data corresponding to the spoken input.

Type: Application

Filed: May 26, 2023

Publication date: November 16, 2023

Inventors: Beiye Liu, Wael Hamza, Liwei Cai, Konstantine Arkoudas, Chengwei Su, Subendhu Rongali
SHARED ENCODER FOR NATURAL LANGUAGE UNDERSTANDING PROCESSING

Publication number: 20230317066

Abstract: Techniques for using a shared encoder and multiple different decoders for natural language understanding (NLU) tasks are described. The individual decoders are configured to perform different tasks using the output from one shared encoder. The decoders can process with respect to different domains and different languages. Using the shared encoder can reduce computation time during runtime. Using the shared encoder can reduce training costs (e.g., time and resources) when the system is updated to incorporate additional intents and entities. The system employs an attention mechanism to extract encoded representation data that can be used by the different decoders for its specific task.

Type: Application

Filed: March 9, 2022

Publication date: October 5, 2023

Inventors: Jonathan Jakob Hueser, Fabian Triefenbach, Chandana Satya Prakash, Jin Cao, Wael Hamza, Mariusz Momotko
Speech processing

Patent number: 11682400

Abstract: Techniques for performing spoken language understanding (SLU) processing are described. An SLU component may include an audio encoder configured to perform an audio-to-text processing task and an audio-to-NLU processing task. The SLU component may also include a joint decoder configured to perform the audio-to-text processing task, the audio-to-NLU processing task and a text-to-NLU processing task. Input audio data, representing a spoken input, is processed by the audio encoder and the joint decoder to determine NLU data corresponding to the spoken input.

Type: Grant

Filed: November 30, 2020

Date of Patent: June 20, 2023

Assignee: Amazon Technologies, Inc.

Inventors: Beiye Liu, Wael Hamza, Liwei Cai, Konstantine Arkoudas, Chengwei Su, Subendhu Rongali
ON DEMAND TTS VOCABULARY FOR A TELEMATICS SYSTEM

Publication number: 20120095676

Abstract: A driving directions system loads into memory a limited subset of prerecorded, spoken utterances of geographic names from a mass media storage. The subset of spoken utterances may be limited, for example, to the geographic names within a predetermined radius (e.g., a few miles) of the driver's present location. The present location of the driver may be manually entered into the driving directions system by the driver, or automatically determined using a global positioning system (“GPS”) receiver. As the vehicle moves from its present location, the driving directions system loads into memory new names from the mass media storage and overwrites, if necessary, those which are now geographically out of range. Based on the current location of the driving, the driving directions system can audibly output geographic names from the run-time memory.

Type: Application

Filed: October 24, 2011

Publication date: April 19, 2012

Applicant: Nuance Communications, Inc.

Inventors: Raimo Bakis, Ellen M. Eide, Wael Hamza
On demand TTS vocabulary for a telematics system

Patent number: 8046213

Abstract: A driving directions system loads into memory a limited subset of prerecorded, spoken utterances of geographic names from a mass media storage. The subset of spoken utterances may be limited, for example, to the geographic names within a predetermined radius (e.g., a few miles) of the driver's present location. The present location of the driver may be manually entered into the driving directions system by the driver, or automatically determined using a global positioning system (“GPS”) receiver. As the vehicle moves from its present location, the driving directions system loads into memory new names from the mass media storage and overwrites, if necessary, those which are now geographically out of range. Based on the current location of the driving, the driving directions system can audibly output geographic names from the run-time memory.

Type: Grant

Filed: August 6, 2004

Date of Patent: October 25, 2011

Assignee: Nuance Communications, Inc.

Inventors: Raimo Bakis, Ellen M. Eide, Wael Hamza
Generating paralinguistic phenomena via markup in text-to-speech synthesis

Patent number: 7472065

Abstract: Converting marked-up text into a synthesized stream includes providing marked-up text to a processor-based system, converting the marked-up text into a text stream including vocabulary items, retrieving audio segments corresponding to the vocabulary items, concatenating the audio segments to form a synthesized stream, and audibly outputting the synthesized stream, wherein the marked-up text includes a normal text and a paralinguistic text; and wherein the normal text is differentiated from the paralinguistic text by using a grammar constraint, and wherein the paralinguistic text is associated with more than one audio segment, wherein the retrieving of the plurality audio segments includes selecting one audio segment associated with the paralinguistic text.

Type: Grant

Filed: June 4, 2004

Date of Patent: December 30, 2008

Assignee: International Business Machines Corporation

Inventors: Andrew S. Aaron, Raimo Bakis, Ellen M. Eide, Wael Hamza
METHODS AND COMPUTER PROGRAM PRODUCTS FOR PROVIDING PARAPHRASING IN A TEXT-TO-SPEECH SYSTEM

Publication number: 20080167876

Abstract: A method and computer program product for providing paraphrasing in a text-to-speech (TTS) system is provided. The method includes receiving an input text, parsing the input text, and determining a paraphrase of the input text. The method also includes synthesizing the paraphrase into synthesized speech. The method further includes selecting synthesized speech to output, which includes: assigning a score to each synthesized speech associated with each paraphrase, comparing the score of each synthesized speech associated with each paraphrase, and selecting the top-scoring synthesized speech to output. Furthermore, the method includes outputting the selected synthesized speech.

Type: Application

Filed: January 4, 2007

Publication date: July 10, 2008

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Raimo Bakis, Ellen M. Eide, Wael Hamza, Michael A. Picheny
Method and system for configurable allocation of sound segments for use in concatenative text-to-speech voice synthesis

Publication number: 20070073542

Abstract: Embodiments of the present invention provide a method, system and computer program product for synthesizing concatenative speech by allocating speech segments based upon their frequency of access during speech synthesis and storing frequently used speech segments in memory where they can be easily and quickly accessed. Speech data is recorded in separate files from which individual speech units are identified. The method and system of the present invention analyzes the frequency of access of each speech unit during synthesis and uses this data to sort the speech units according to their frequency of access. Those speech units that are accessed more frequently than others are loaded into memory where they can be accessed quickly during subsequent speech synthesis. Other speech units that are not used as frequently can be stored on a data storage disk.

Type: Application

Filed: September 23, 2005

Publication date: March 29, 2007

Applicant: International Business Machines Corporation

Inventors: Hari Chittaluru, Wael Hamza, Brennan Monteiro, Maria Smith
Method, apparatus and computer program providing a multi-speaker database for concatenative text-to-speech synthesis

Publication number: 20060229876

Abstract: A method, apparatus and a computer program product to generate an audible speech word that corresponds to text. The method includes providing a text word and, in response to the text word, processing pre-recorded speech segments that are derived from a plurality of speakers to selectively concatenate together speech segments based on at least one cost function to form audio data for generating an audible speech word that corresponds to the text word. A data structure is also provided for use in a concatenative text-to-speech system that includes a plurality of speech segments derived from a plurality of speakers, where each speech segment includes an associated attribute vector each of which is comprised of at least one attribute vector element that identifies the speaker from which the speech segment was derived.

Type: Application

Filed: April 7, 2005

Publication date: October 12, 2006

Inventors: Andrew Aaron, Ellen Eide, Wael Hamza, Michael Picheny, Charles Rutherfoord, Zhi Shuang, Maria Smith
Methods and apparatus for adapting output speech in accordance with context of communication

Publication number: 20060229873

Abstract: A technique for producing speech output in an automatic dialog system is provided. Communication is received from a user at the automatic dialog system. A context of the communication from the user is detected in a context detector of the automatic dialog system. A message is provided to the user from a text-to-speech system of the automatic dialog system in communication with the context detector, wherein the message is provided in accordance with the detected context of the communication.

Type: Application

Filed: March 29, 2005

Publication date: October 12, 2006

Applicant: International Business Machines Corporation

Inventors: Ellen Eide, Wael Hamza, Michael Picheny
Methods and apparatus for conveying synthetic speech style from a text-to-speech system

Publication number: 20060229872

Abstract: A technique for producing speech output in a text-to-speech system is provided. A message is created for communication to a user in a natural language generator of the text-to-speech system. The message is annotated in the natural language generator with a synthetic speech output style. The message is conveyed to the user through a speech synthesis system in communication with the natural language generator, wherein the message is conveyed in accordance with the synthetic speech output style.

Type: Application

Filed: March 29, 2005

Publication date: October 12, 2006

Applicant: International Business Machines Corporation

Inventors: Ellen Eide, Wael Hamza
On demand TTS vocabulary for a telematics system

Publication number: 20060031062

Abstract: A driving directions system loads into memory a limited subset of prerecorded, spoken utterances of geographic names from a mass media storage. The subset of spoken utterances may be limited, for example, to the geographic names within a predetermined radius (e.g., a few miles) of the driver's present location. The present location of the driver may be manually entered into the driving directions system by the driver, or automatically determined using a global positioning system (“GPS”) receiver. As the vehicle moves from its present location, the driving directions system loads into memory new names from the mass media storage and overwrites, if necessary, those which are now geographically out of range. Based on the current location of the driving, the driving directions system can audibly output geographic names from the run-time memory.

Type: Application

Filed: August 6, 2004

Publication date: February 9, 2006

Applicant: International Business Machines Corporation

Inventors: Raimo Bakis, Ellen Eide, Wael Hamza
Generating paralinguistic phenomena via markup

Publication number: 20050273338

Abstract: Examples of paralinguistic events (e.g., breaths, coughs, sighs, etc.) are recorded. A text-to-speech (“TTS”) engine may insert the examples into a stream of synthetic speech using, for example, markup. The synthetic speech may include a combination of normal text and paralinguistic text.

Type: Application

Filed: June 4, 2004

Publication date: December 8, 2005

Inventors: Andrew Aaron, Raimo Bakis, Ellen Eide, Wael Hamza
Systems and methods for text-to-speech synthesis using spoken example

Publication number: 20050071163

Abstract: Systems and methods for speech synthesis and, in particular, text-to-speech systems and methods for converting a text input to a synthetic waveform by processing prosodic and phonetic content of a spoken example of the text input to accurately mimic the input speech style and pronunciation. Systems and methods provide an interface to a TTS system to allow a user to input a text string and a spoken utterance of the text string, extract prosodic parameters from the spoken input, and process the prosodic parameters to derive corresponding markup for the text input to enable a more natural sounding synthesized speech.

Type: Application

Filed: September 26, 2003

Publication date: March 31, 2005

Applicant: International Business Machines Corporation

Inventors: Andy Aaron, Raimo Bakis, Ellen Eide, Wael Hamza