Patents by Inventor Slava Shechtman

Slava Shechtman has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

System and method for generating expressive prosody for speech synthesis

Patent number: 10418025

Abstract: A method for producing speech comprises: accessing an expressive prosody model, wherein the model is generated by: receiving a plurality of non-neutral prosody vector sequences, each vector associated with one of a plurality of time-instances; receiving a plurality of expression labels, each having a time-instance selected from a plurality of non-neutral time-instances of the plurality of time-instances; producing a plurality of neutral prosody vector sequences equivalent to the plurality of non-neutral sequences by applying a linear combination of a plurality of statistical measures to a plurality of sub-sequences selected according to an identified proximity test applied to a plurality of neutral time-instances of the plurality of time-instances; and training at least one machine learning module using the plurality of non-neutral sequences and the plurality of neutral sequences to produce an expressive prosodic model; and using the model within a Text-To-Speech-System to produce an audio waveform from an in

Type: Grant

Filed: December 6, 2017

Date of Patent: September 17, 2019

Assignee: International Business Machines Corporation

Inventors: Slava Shechtman, Zvi Kons
System and method for generating expressive prosody for speech synthesis

Publication number: 20190172443

Abstract: A method for producing speech comprises: accessing an expressive prosody model, wherein the model is generated by: receiving a plurality of non-neutral prosody vector sequences, each vector associated with one of a plurality of time-instances; receiving a plurality of expression labels, each having a time-instance selected from a plurality of non-neutral time-instances of the plurality of time-instances; producing a plurality of neutral prosody vector sequences equivalent to the plurality of non-neutral sequences by applying a linear combination of a plurality of statistical measures to a plurality of sub-sequences selected according to an identified proximity test applied to a plurality of neutral time-instances of the plurality of time-instances; and training at least one machine learning module using the plurality of non-neutral sequences and the plurality of neutral sequences to produce an expressive prosodic model; and using the model within a Text-To-Speech-System to produce an audio waveform from an in

Type: Application

Filed: December 6, 2017

Publication date: June 6, 2019

Inventors: Slava Shechtman, Zvi Kons
Vehicle entertainment system

Patent number: 10226702

Abstract: A computer-implemented method, computerized apparatus and computer program product. The method comprises capturing one or more images of a scene in which a driver is driving a vehicle; analyzing the images to retrieve an event or detail; conveying to the driver the a question or a challenge related to the event or detail; receiving a response from the driver; analyzing the response; and determining a score related to the driver.

Type: Grant

Filed: May 25, 2015

Date of Patent: March 12, 2019

Assignee: International Business Machines Corporation

Inventors: Ron Hoory, Mattias Marder, Slava Shechtman
Real-time system for determining current video scale

Patent number: 9892335

Abstract: Embodiments of the present invention may provide the capability to identify a specific object being interacted with that may be cheaply and easily included in mass-produced objects. In an embodiment, a computer-implemented method for object identification may comprise receiving a signal produced by a physical interaction with an object to be identified, the signal produced by an identification structure coupled to the object during physical interaction with the object, processing the signal to form digital data representing the signal, and accessing a database using the digital data to retrieve information identifying the object.

Type: Grant

Filed: June 5, 2016

Date of Patent: February 13, 2018

Assignee: International Business Machines Corporation

Inventors: Ophir Azulai, Udi Barzelay, Mattias Marder, Dror Porat, Slava Shechtman
Real-Time system for determining current video scale

Publication number: 20170351930

Abstract: Embodiments of the present invention may provide the capability to identify a specific object being interacted with that may be cheaply and easily included in mass-produced objects. In an embodiment, a computer-implemented method for object identification may comprise receiving a signal produced by a physical interaction with an object to be identified, the signal produced by an identification structure coupled to the object during physical interaction with the object, processing the signal to form digital data representing the signal, and accessing a database using the digital data to retrieve information identifying the object.

Type: Application

Filed: June 5, 2016

Publication date: December 7, 2017

Applicant: International Business Machines Corporation

Inventors: Ophir Azulai, Udi Barzelay, Mattias Marder, Dror Porat, Slava Shechtman
Pitch marking in speech processing

Patent number: 9685170

Abstract: According to some embodiments of the present invention, there is provided a computerized method for selecting and correcting pitch marks in speech processing and modification. The method comprises an action of receiving a continuous speech signal representing audible speech recorded by a microphone, where a sequence of pitch values and two or more pitch mark temporal values are computed from the continuous speech signal. The method comprises an action of computing for each of the pitch mark temporal values a lower limit temporal value and an upper limit temporal value by a cross-correlation function of the continuous speech signal around the pitch mark temporal values associated with pairs of elements in the sequence and replacing one or more of the pitch mark temporal values with one or more new temporal value between the lower limit temporal value and the upper limit temporal value.

Type: Grant

Filed: October 21, 2015

Date of Patent: June 20, 2017

Assignee: International Business Machines Corporation

Inventor: Slava Shechtman
PITCH MARKING IN SPEECH PROCESSING

Publication number: 20170117001

Abstract: According to some embodiments of the present invention, there is provided a computerized method for selecting and correcting pitch marks in speech processing and modification. The method comprises an action of receiving a continuous speech signal representing audible speech recorded by a microphone, where a sequence of pitch values and two or more pitch mark temporal values are computed from the continuous speech signal. The method comprises an action of computing for each of the pitch mark temporal values a lower limit temporal value and an upper limit temporal value by a cross-correlation function of the continuous speech signal around the pitch mark temporal values associated with pairs of elements in the sequence and replacing one or more of the pitch mark temporal values with one or more new temporal value between the lower limit temporal value and the upper limit temporal value.

Type: Application

Filed: October 21, 2015

Publication date: April 27, 2017

Inventor: Slava Shechtman
Systems and methods for encoding audio signals

Patent number: 9564140

Abstract: Some embodiments relate to techniques for encoding an audio signal represented by a plurality of frames including a first frame. The techniques include using at least one computer hardware processor to perform: obtaining an initial discrete spectral representation of the first frame; obtaining a primary discrete spectral representation of the initial discrete spectral representation at least in part by estimating a phase envelope of the initial discrete spectral representation and evaluating the estimated phase envelope at a discrete set of frequencies; calculating a residual discrete spectral representation of the initial discrete spectral representation based on the initial discrete spectral representation and the primary discrete spectral representation; and encoding the residual discrete spectral representation using a plurality of codewords.

Type: Grant

Filed: April 7, 2015

Date of Patent: February 7, 2017

Assignee: Nuance Communications, Inc.

Inventors: Slava Shechtman, Alexander Sorin
VEHICLE ENTERTAINMENT SYSTEM

Publication number: 20160346695

Abstract: A computer-implemented method, computerized apparatus and computer program product. The method comprises capturing one or more images of a scene in which a driver is driving a vehicle; analyzing the images to retrieve an event or detail; conveying to the driver the a question or a challenge related to the event or detail; receiving a response from the driver; analyzing the response; and determining a score related to the driver.

Type: Application

Filed: May 25, 2015

Publication date: December 1, 2016

Inventors: Ron Hoory, Mattias Marder, Slava Shechtman
System and method for automatic prediction of speech suitability for statistical modeling

Patent number: 9484045

Abstract: An embodiment according to the invention provides a capability of automatically predicting how favorable a given speech signal is for statistical modeling, which is advantageous in a variety of different contexts. In Multi-Form Segment (MFS) synthesis, for example, an embodiment according to the invention uses prediction capability to provide an automatic acoustic driven template versus model decision maker with an output quality that is high, stable and depends gradually on the system footprint. In speaker selection for a statistical Text-to-Speech synthesis (TTS) system build, as another example context, an embodiment according to the invention enables a fast selection of the most appropriate speaker among several available ones for the full voice dataset recording and preparation, based on a small amount of recorded speech material.

Type: Grant

Filed: September 7, 2012

Date of Patent: November 1, 2016

Assignee: Nuance Communications, Inc.

Inventors: Alexander Sorin, Slava Shechtman, Vincent Pollet
Method and apparatus for detecting synthesized speech

Patent number: 9484036

Abstract: Computer systems employing speaker verification as a security approach to prevent un-authorized access by intruders may be tricked by a synthetic speech with voice characteristics similar to those of an authorized user of the computer system. According to at least one example embodiment, a method and corresponding apparatus for detecting a synthetic speech signal include extracting a plurality of speech features from multiple segments of the speech signal; analyzing the plurality of speech features to determine whether the plurality of speech features exhibit periodic variation behavior; and determining whether the speech signal is a synthetic speech signal or a natural speech signal based on whether or not a periodic variation behavior of the plurality of speech features is detected. The embodiments of synthetic speech detection result in security enhancement of the computer system employing speaker verification.

Type: Grant

Filed: August 28, 2013

Date of Patent: November 1, 2016

Assignee: Nuance Communications, Inc.

Inventors: Zvi Kons, Hagai Aronowitz, Slava Shechtman
SYSTEMS AND METHODS FOR ENCODING AUDIO SIGNALS

Publication number: 20160300580

Abstract: Some embodiments relate to techniques for encoding an audio signal represented by a plurality of frames including a first frame. The techniques include using at least one computer hardware processor to perform: obtaining an initial discrete spectral representation of the first frame; obtaining a primary discrete spectral representation of the initial discrete spectral representation at least in part by estimating a phase envelope of the initial discrete spectral representation and evaluating the estimated phase envelope at a discrete set of frequencies; calculating a residual discrete spectral representation of the initial discrete spectral representation based on the initial discrete spectral representation and the primary discrete spectral representation; and encoding the residual discrete spectral representation using a plurality of codewords.

Type: Application

Filed: April 7, 2015

Publication date: October 13, 2016

Applicant: Nuance Communications, Inc.

Inventors: Slava Shechtman, Alexander Sorin
Wideband speech parameterization for high quality synthesis, transformation and quantization

Patent number: 9224402

Abstract: A method for speech parameterization and coding of a continuous speech signal. The method comprises dividing said speech signal into a plurality of speech frames, and for each one of the plurality of speech frames, modeling said speech frame by a first harmonic modeling to produce a plurality of harmonic model parameters, reconstructing an estimated frame signal from the plurality of harmonic model parameters, subtracting the estimated frame signal from the speech frame to produce a harmonic model residual, performing at least one second harmonic modeling analysis on the first harmonic model residual to determine at least one set of second harmonic model components, removing the at least one set of second harmonic model components from the first harmonic model residual to produce a harmonically-filtered residual signal, and processing the harmonically-filtered residual signal with analysis by synthesis techniques to produce vectors of codebook indices and corresponding gains.

Type: Grant

Filed: September 30, 2013

Date of Patent: December 29, 2015

Assignee: International Business Machines Corporation

Inventor: Slava Shechtman
Deriving geographic distribution of physiological or psychological conditions of human speakers while preserving personal privacy

Patent number: 9159323

Abstract: A method including: obtaining, via a plurality of communication devices, a plurality of speech signals respectively associated with human speakers, the speech signals including verbal components and non-verbal components; identifying a plurality of geographical locations, each geographic location associated with a respective one of the plurality of the communication devices; extracting the non-verbal components from the obtained speech signals; deducing physiological or psychological conditions of the human speakers by analyzing, over a specified period, the extracted non-verbal components, using predefined relations between characteristics of the non-verbal components and physiological or psychological conditions of the human speakers; and providing a geographical distribution of the deduced physiological or psychological conditions of the human speakers by associating the deduced physiological or psychological conditions of the human speakers with geographical locations thereof.

Type: Grant

Filed: July 29, 2013

Date of Patent: October 13, 2015

Assignee: Nuance Communications, Inc.

Inventors: Slava Shechtman, Raphael Steinberg
WIDEBAND SPEECH PARAMETERIZATION FOR HIGH QUALITY SYNTHESIS, TRANSFORMATION AND QUANTIZATION

Publication number: 20150095035

Abstract: A method for speech parameterization and coding of a continuous speech signal. The method comprises dividing said speech signal into a plurality of speech frames, and for each one of the plurality of speech frames, modeling said speech frame by a first harmonic modeling to produce a plurality of harmonic model parameters, reconstructing an estimated frame signal from the plurality of harmonic model parameters, subtracting the estimated frame signal from the speech frame to produce a harmonic model residual, performing at least one second harmonic modeling analysis on the first harmonic model residual to determine at least one set of second harmonic model components, removing the at least one set of second harmonic model components from the first harmonic model residual to produce a harmonically-filtered residual signal, and processing the harmonically-filtered residual signal with analysis by synthesis techniques to produce vectors of codebook indices and corresponding gains.

Type: Application

Filed: September 30, 2013

Publication date: April 2, 2015

Applicant: International Business Machines Corporation

Inventor: Slava Shechtman
Method and Apparatus for Detecting Synthesized Speech

Publication number: 20150066512

Abstract: Computer systems employing speaker verification as a security approach to prevent un-authorized access by intruders may be tricked by a synthetic speech with voice characteristics similar to those of an authorized user of the computer system. According to at least one example embodiment, a method and corresponding apparatus for detecting a synthetic speech signal include extracting a plurality of speech features from multiple segments of the speech signal; analyzing the plurality of speech features to determine whether the plurality of speech features exhibit periodic variation behavior; and determining whether the speech signal is a synthetic speech signal or a natural speech signal based on whether or not a periodic variation behavior of the plurality of speech features is detected. The embodiments of synthetic speech detection result in security enhancement of the computer system employing speaker verification.

Type: Application

Filed: August 28, 2013

Publication date: March 5, 2015

Applicant: NUANCE COMMUNICATIONS, INC.

Inventors: Zvi Kons, Hagai Aronowitz, Slava Shechtman
Device, method and computer program product for responding to media conference deficiencies

Patent number: 8786659

Abstract: A method for responding to media conference deficiencies, the method includes: monitoring, by at least one receiver, a quality of media conference signals being received by at least one receiver during the media conference; sending, in response to the monitoring, to at least an end user transmitter that transmitted the media conference signals, a quality indication representative of a quality of the received media conference signals; recording inadequately received media conference signals that were inadequately received by a certain end user receiver and participating in an activity related to a transmission, to the certain end user receiver, of the inadequately received media conference signals or of a representation of the inadequately received media conference signals.

Type: Grant

Filed: May 29, 2012

Date of Patent: July 22, 2014

Assignee: International Business Machines Corporation

Inventors: Ron Hoory, Michael Rodeh, Slava Shechtman
Statistical enhancement of speech output from a statistical text-to-speech synthesis system

Patent number: 8682670

Abstract: A method, system and computer program product are provided for enhancement of speech synthesized by a statistical text-to-speech (TTS) system employing a parametric representation of speech in a space of acoustic feature vectors. The method includes: defining a parametric family of corrective transformations operating in the space of the acoustic feature vectors and dependent on a set of enhancing parameters; and defining a distortion indictor of a feature vector or a plurality of feature vectors.

Type: Grant

Filed: July 7, 2011

Date of Patent: March 25, 2014

Assignee: International Business Machines Corporation

Inventors: Slava Shechtman, Alexander Sorin
System and Method for Automatic Prediction of Speech Suitability for Statistical Modeling

Publication number: 20140074468

Abstract: An embodiment according to the invention provides a capability of automatically predicting how favorable a given speech signal is for statistical modeling, which is advantageous in a variety of different contexts. In Multi-Form Segment (MFS) synthesis, for example, an embodiment according to the invention uses prediction capability to provide an automatic acoustic driven template versus model decision maker with an output quality that is high, stable and depends gradually on the system footprint. In speaker selection for a statistical Text-to-Speech synthesis (TTS) system build, as another example context, an embodiment according to the invention enables a fast selection of the most appropriate speaker among several available ones for the full voice dataset recording and preparation, based on a small amount of recorded speech material.

Type: Application

Filed: September 7, 2012

Publication date: March 13, 2014

Applicant: Nuance Communications, Inc.

Inventors: Alexander Sorin, Slava Shechtman, Vincent Pollet
DERIVING GEOGRAPHIC DISTRIBUTION OF PHYSIOLOGICAL OR PSYCHOLOGICAL CONDITIONS OF HUMAN SPEAKERS WHILE RESERVING PERSONAL PRIVACY

Publication number: 20130317825

Abstract: A method including: obtaining, via a plurality of communication devices, a plurality of speech signals respectively associated with human speakers, the speech signals including verbal components and non-verbal components; identifying a plurality of geographical locations, each geographic location associated with a respective one of the plurality of the communication devices; extracting the non-verbal components from the obtained speech signals; deducing physiological or psychological conditions of the human speakers by analyzing, over a specified period, the extracted non-verbal components, using predefined relations between characteristics of the non-verbal components and physiological or psychological conditions of the human speakers; and providing a geographical distribution of the deduced physiological or psychological conditions of the human speakers by associating the deduced physiological or psychological conditions of the human speakers with geographical locations thereof.

Type: Application

Filed: July 29, 2013

Publication date: November 28, 2013

Applicant: Nuance Communications, Inc.

Inventors: Slava Shechtman, Raphael Steinberg

1 2 next