Patents by Inventor Kim Silverman
Kim Silverman has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20100082349Abstract: Algorithms for synthesizing speech used to identify media assets are provided. Speech may be selectively synthesized form text strings associated with media assets. A text string may be normalized and its native language determined for obtaining a target phoneme for providing human-sounding speech in a language (e.g., dialect or accent) that is familiar to a user. The algorithms may be implemented on a system including several dedicated render engines. The system may be part of a back end coupled to a front end including storage for media assets and associated synthesized speech, and a request processor for receiving and processing requests that result in providing the synthesized speech. The front end may communicate media assets and associated synthesized speech content over a network to host devices coupled to portable electronic devices on which the media assets and synthesized speech are played back.Type: ApplicationFiled: September 29, 2008Publication date: April 1, 2010Applicant: Apple Inc.Inventors: Jerome Bellegarda, Devang Naik, Kim Silverman
-
Publication number: 20100082329Abstract: Algorithms for synthesizing speech used to identify media assets are provided. Speech may be selectively synthesized form text strings associated with media assets. A text string may be normalized and its native language determined for obtaining a target phoneme for providing human-sounding speech in a language (e.g., dialect or accent) that is familiar to a user. The algorithms may be implemented on a system including several dedicated render engines. The system may be part of a back end coupled to a front end including storage for media assets and associated synthesized speech, and a request processor for receiving and processing requests that result in providing the synthesized speech. The front end may communicate media assets and associated synthesized speech content over a network to host devices coupled to portable electronic devices on which the media assets and synthesized speech are played back.Type: ApplicationFiled: September 29, 2008Publication date: April 1, 2010Applicant: Apple Inc.Inventors: Kim Silverman, Devang Naik, Kevin Lenzo, Caroline Henton
-
SYSTEMS AND METHODS FOR SELECTIVE RATE OF SPEECH AND SPEECH PREFERENCES FOR TEXT TO SPEECH SYNTHESIS
Publication number: 20100082344Abstract: Algorithms for synthesizing speech used to identify media assets are provided. Speech may be selectively synthesized form text strings associated with media assets. A text string may be normalized and its native language determined for obtaining a target phoneme for providing human-sounding speech in a language (e.g., dialect or accent) that is familiar to a user. The algorithms may be implemented on a system including several dedicated render engines. The system may be part of a back end coupled to a front end including storage for media assets and associated synthesized speech, and a request processor for receiving and processing requests that result in providing the synthesized speech. The front end may communicate media assets and associated synthesized speech content over a network to host devices coupled to portable electronic devices on which the media assets and synthesized speech are played back.Type: ApplicationFiled: September 29, 2008Publication date: April 1, 2010Applicant: Apple, Inc.Inventors: Devang Naik, Kim Silverman, Jerome Bellegarda -
Publication number: 20080155438Abstract: Methods and systems for providing graphical user interfaces are described. Overlaid, information-bearing windows whose contents remain unchanged for a predetermined period of time become translucent. The translucency can be graduated so that, over time, if the window's contents remain unchanged, the window becomes more translucent. In addition to visual translucency, windows also have a manipulative translucent quality. Upon reaching a certain level of visual translucency, user input in the region of the window is interpreted as an operation on the underlying objects rather than the contents of the overlaying window.Type: ApplicationFiled: March 11, 2008Publication date: June 26, 2008Applicant: APPLE INC.Inventors: Thomas Bonura, Kim Silverman
-
Publication number: 20080091430Abstract: A method and apparatus is provided for generating speech that sounds more natural. In one embodiment, word prominence and latent semantic analysis are used to generate more natural sounding speech. A method for generating speech that sounds more natural may comprise generating synthesized speech having certain word prominence characteristics and applying a semantically-driven word prominence assignment model to specify word prominence consistent with the way humans assign word prominence.Type: ApplicationFiled: December 4, 2007Publication date: April 17, 2008Inventors: Jerome Bellegarda, Kim Silverman
-
Patent number: 7343562Abstract: Methods and systems for providing graphical user interfaces are described. overlaid, Information-bearing windows whose contents remain unchanged for a predetermined period of time become translucent. The translucency can be graduated so that, over time, if the window's contents remain unchanged, the window becomes more translucent. In addition to visual translucency, windows according to the present invention also have a manipulative translucent quality. Upon reaching a certain level of visual translucency, user input in the region of the window is interpreted as an operation on the underlying objects rather than the contents of the overlaying window.Type: GrantFiled: November 5, 2003Date of Patent: March 11, 2008Assignee: Apple Inc.Inventors: Thomas Bonura, Kim Silverman
-
Publication number: 20070294083Abstract: A method and system for training a user authentication by voice signal are described. In one embodiment, a set of feature vectors are decomposed into speaker-specific recognition units. The speaker-specific recognition units are used to compute distribution values to train the voice signal. In addition, spectral feature vectors are decomposed into speaker-specific characteristic units which are compared to the speaker-specific distribution values. If the speaker-specific characteristic units are within a threshold limit of the speaker-specific distribution values, the speech signal is authenticated.Type: ApplicationFiled: June 11, 2007Publication date: December 20, 2007Inventors: Jerome Bellegarda, Kim Silverman
-
Publication number: 20070106742Abstract: A method and apparatus for filtering messages comprising determining a first semantic anchor corresponding to a first group of messages, for example, legitimate messages and a second semantic anchor corresponding to a second group of messages, for example, unsolicited messages. Determining a vector corresponding to an incoming message; comparing the vector corresponding to the incoming message with at least one of the first semantic anchor and the second semantic anchor to obtain a first comparison value and a second comparison value; and filtering the incoming message based on the first comparison value and the second comparison value.Type: ApplicationFiled: December 20, 2006Publication date: May 10, 2007Inventors: Jerome Bellegarda, Devang Naik, Kim Silverman
-
Publication number: 20060206574Abstract: A method and apparatus for filtering messages comprising determining a first semantic anchor corresponding to a first group of messages, for example, legitimate messages and a second semantic anchor corresponding to a second group of messages, for example, unsolicited messages. Determining a vector corresponding to an incoming message; comparing the vector corresponding to the incoming message with at least one of the first semantic anchor and the second semantic anchor to obtain a first comparison value and a second comparison value; and filtering the incoming message based on the first comparison value and the second comparison value.Type: ApplicationFiled: May 9, 2006Publication date: September 14, 2006Inventors: Jerome Bellegarda, Devang Naik, Kim Silverman
-
Publication number: 20060168150Abstract: Improved techniques for providing supplementary media for media items are disclosed. The media items are typically fixed media items. The supplementary media is one or more of audio, video, image, or text that is provided by a user to supplement (e.g., personalize, customize, annotate, etc.) the fixed media items. In one embodiment, the supplementary media can be provided by user interaction with an on-line media store where media items can be browsed, searched, purchased and/or acquired via a computer network. In another embodiment, the supplementary media can be generated on a playback device.Type: ApplicationFiled: March 6, 2006Publication date: July 27, 2006Inventors: Devang Naik, Kim Silverman, Guy Tribble
-
Publication number: 20050038650Abstract: A method and apparatus to use semantic inference with speech recognition systems includes recognizing at least one spoken word, processing the spoken word using a context-free grammar, deriving an output from the context-free grammar, and translating the output to a predetermined command.Type: ApplicationFiled: September 21, 2004Publication date: February 17, 2005Inventors: Jerome Bellegarda, Kim Silverman
-
Patent number: 6785652Abstract: A method and an apparatus for improved duration modeling of phonemes in a speech synthesis system are provided. According to one aspect, text is received into a processor of a speech synthesis system. The received text is processed using a sum-of-products phoneme duration model that is used in either the formant method or the concatenative method of speech generation. The phoneme duration model, which is used along with a phoneme pitch model, is produced by developing a non-exponential functional transformation form for use with a generalized additive model. The non-exponential functional transformation form comprises a root sinusoidal transformation that is controlled in response to a minimum phoneme duration and a maximum phoneme duration. The minimum and maximum phoneme durations are observed in training data. The received text is processed by specifying at least one of a number of contextual factors for the generalized additive model.Type: GrantFiled: December 19, 2002Date of Patent: August 31, 2004Assignee: Apple Computer, Inc.Inventors: Jerome R. Bellegarda, Kim Silverman
-
Publication number: 20040090467Abstract: Methods and systems for providing graphical user interfaces are described. overlaid, Information-bearing windows whose contents remain unchanged for a predetermined period of time become translucent. The translucency can be graduated so that, over time, if the window's contents remain unchanged, the window becomes more translucent. In addition to visual translucency, windows according to the present invention also have a manipulative translucent quality. Upon reaching a certain level of visual translucency, user input in the region of the window is interpreted as an operation on the underlying objects rather than the contents of the overlaying window.Type: ApplicationFiled: November 5, 2003Publication date: May 13, 2004Applicant: Apple Computer, Inc.Inventors: Thomas Bonura, Kim Silverman
-
Patent number: 6697779Abstract: A method and system for training a user authentication by voice signal are described. In one embodiment, during training, a set of all spectral feature vectors for a given speaker is globally decomposed into speaker-specific decomposition units and a speaker-specific recognition unit. During recognition, spectral feature vectors are locally decomposed into speaker-specific characteristic units. The speaker-specific recognition unit is used together with selected speaker-specific characteristic units to compute a speaker-specific comparison unit. If the speaker-specific comparison unit is within a threshold limit, then the voice signal is authenticated. In addition, a speaker-specific content unit is time-aligned with selected speaker-specific characteristic units. If the alignment is within a threshold limit, then the voice signal is authenticated. In one embodiment, if both thresholds are satisfied, then the user is authenticated.Type: GrantFiled: September 29, 2000Date of Patent: February 24, 2004Assignee: Apple Computer, Inc.Inventors: Jerome Bellegarda, Devang Naik, Matthias Neeracher, Kim Silverman
-
Patent number: 6670970Abstract: Methods and systems for providing graphical user interfaces are described. overlaid, Information-bearing windows whose contents remain unchanged for a predetermined period of time become translucent. The translucency can be graduated so that, over time, if the window's contents remain unchanged, the window becomes more translucent. In addition to visual translucency, windows according to the present invention also have a manipulative translucent quality. Upon reaching a certain level of visual translucency, user input in the region of the window is interpreted as an operation on the underlying objects rather than the contents of the overlaying window.Type: GrantFiled: December 20, 1999Date of Patent: December 30, 2003Assignee: Apple Computer, Inc.Inventors: Thomas Bonura, Kim Silverman
-
Publication number: 20030093277Abstract: A method and an apparatus for improved duration modeling of phonemes in a speech synthesis system are provided. According to one aspect, text is received into a processor of a speech synthesis system. The received text is processed using a sum-of-products phoneme duration model that is used in either the formant method or the concatenative method of speech generation. The phoneme duration model, which is used along with a phoneme pitch model, is produced by developing a non-exponential functional transformation form for use with a generalized additive model. The non-exponential functional transformation form comprises a root sinusoidal transformation that is controlled in response to a minimum phoneme duration and a maximum phoneme duration. The minimum and maximum phoneme durations are observed in training data. The received text is processed by specifying at least one of a number of contextual factors for the generalized additive model.Type: ApplicationFiled: December 19, 2002Publication date: May 15, 2003Inventors: Jerome R. Bellegarda, Kim Silverman
-
Patent number: 6553344Abstract: A method and an apparatus for improved duration modeling of phonemes in a speech synthesis system are provided. According to one aspect, text is received into a processor of a speech synthesis system. The received text is processed using a sum-of-products phoneme duration model that is used in either the formant method or the concatenative method of speech generation. The phoneme duration model, which is used along with a phoneme pitch model, is produced by developing a non-exponential functional transformation form for use with a generalized additive model. The non-exponential functional transformation form comprises a root sinusoidal transformation that is controlled in response to a minimum phoneme duration and a maximum phoneme duration. The minimum and maximum phoneme durations are observed in training data. The received text is processed by specifying at least one of a number of contextual factors for the generalized additive model.Type: GrantFiled: February 22, 2002Date of Patent: April 22, 2003Assignee: Apple Computer, Inc.Inventors: Jerome R. Bellegarda, Kim Silverman
-
Publication number: 20020138270Abstract: A method and an apparatus for improved duration modeling of phonemes in a speech synthesis system are provided. According to one aspect, text is received into a processor of a speech synthesis system. The received text is processed using a sum-of-products phoneme duration model that is used in either the formant method or the concatenative method of speech generation. The phoneme duration model, which is used along with a phoneme pitch model, is produced by developing a non-exponential functional transformation form for use with a generalized additive model. The non-exponential functional transformation form comprises a root sinusoidal transformation that is controlled in response to a minimum phoneme duration and a maximum phoneme duration. The minimum and maximum phoneme durations are observed in training data. The received text is processed by specifying at least one of a number of contextual factors for the generalized additive model.Type: ApplicationFiled: February 22, 2002Publication date: September 26, 2002Applicant: Apple Computer, Inc.Inventors: Jerome R. Bellegarda, Kim Silverman
-
Patent number: 6366884Abstract: A method and an apparatus for improved duration modeling of phonemes in a speech synthesis system are provided. According to one aspect, text is received into a processor of a speech synthesis system. The received text is processed using a sum-of-products phoneme duration model that is used in either the formant method or the concatenative method of speech generation. The phoneme duration model, which is used along with a phoneme pitch model, is produced by developing a non-exponential functional transformation form for use with a generalized additive model. The non-exponential functional transformation form comprises a root sinusoidal transformation that is controlled in response to a minimum phoneme duration and a maximum phoneme duration. The minimum and maximum phoneme durations are observed in training data. The received text is processed by specifying at least one of a number of contextual factors for the generalized additive model.Type: GrantFiled: November 8, 1999Date of Patent: April 2, 2002Assignee: Apple Computer, Inc.Inventors: Jerome R. Bellegarda, Kim Silverman
-
Patent number: 6064960Abstract: A method and an apparatus for improved duration modeling of phonemes in a speech synthesis system are provided. According to one aspect, text is received into a processor of a speech synthesis system. The received text is processed using a sum-of-products phoneme duration model that is used in either the formant method or the concatenative method of speech generation. The phoneme duration model, which is used along with a phoneme pitch model, is produced by developing a non-exponential functional transformation form for use with a generalized additive model. The non-exponential functional transformation form comprises a root sinusoidal transformation that is controlled in response to a minimum phoneme duration and a maximum phoneme duration. The minimum and maximum phoneme durations are observed in training data. The received text is processed by specifying at least one of a number of contextual factors for the generalized additive model.Type: GrantFiled: December 18, 1997Date of Patent: May 16, 2000Assignee: Apple Computer, Inc.Inventors: Jerome R. Bellegarda, Kim Silverman