Patents by Inventor Alexander Sorin

Alexander Sorin has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Structure-preserving attention mechanism in sequence-to-sequence neural models

Patent number: 11556782

Abstract: In a trained attentive decoder of a trained Sequence-to-Sequence (seq2seq) Artificial Neural Network (ANN): obtaining an encoded input vector sequence; generating, using a trained primary attention mechanism of the trained attentive decoder, a primary attention vectors sequence; for each primary attention vector of the primary attention vectors sequence: (a) generating a set of attention vector candidates corresponding to the respective primary attention vector, (b) evaluating, for each attention vector candidate of the set of attention vector candidates, a structure fit measure that quantifies a similarity of the respective attention vector candidate to a desired attention vector structure, (c) generating, using a trained soft-selection ANN, a secondary attention vector based on said evaluation and on state variables of the trained attentive decoder; and generating, using the trained attentive decoder, an output sequence based on the encoded input vector sequence and the secondary attention vectors.

Type: Grant

Filed: September 19, 2019

Date of Patent: January 17, 2023

Assignee: International Business Machines Corporation

Inventors: Vyacheslav Shechtman, Alexander Sorin
Voice transformation allowance determination and representation

Patent number: 11062691

Abstract: Embodiments of the present systems and methods may provide techniques that provide the capability to automatically generate allowance intervals for voice personas that meet desired requirements for realism and fidelity. For example, a method for voice persona generation may be implemented in a computer system comprising a processor, memory accessible by the processor, and computer program instructions stored in the memory and executable by the processor, the method comprising: displaying to a user, a plurality of user-selectable voice persona parameters that control features of a synthesized voice signal, and displaying, in conjunction with each of at least some of plurality of user-selectable voice persona parameters, voice transformation allowance intervals of the voice persona parameters, accepting from a user, a selection of at least one user-selectable voice persona parameter, and generating a synthesized voice signal based on the selected at least one user-selectable voice persona parameter.

Type: Grant

Filed: May 13, 2019

Date of Patent: July 13, 2021

Assignee: International Business Machines Corporation

Inventors: Vyacheslav Shechtman, Alexander Sorin
STRUCTURE-PRESERVING ATTENTION MECHANISM IN SEQUENCE-TO-SEQUENCE NEURAL MODELS

Publication number: 20210089877

Abstract: In a trained attentive decoder of a trained Sequence-to-Sequence (seq2seq) Artificial Neural Network (ANN): obtaining an encoded input vector sequence; generating, using a trained primary attention mechanism of the trained attentive decoder, a primary attention vectors sequence; for each primary attention vector of the primary attention vectors sequence: (a) generating a set of attention vector candidates corresponding to the respective primary attention vector, (b) evaluating, for each attention vector candidate of the set of attention vector candidates, a structure fit measure that quantifies a similarity of the respective attention vector candidate to a desired attention vector structure, (c) generating, using a trained soft-selection ANN, a secondary attention vector based on said evaluation and on state variables of the trained attentive decoder; and generating, using the trained attentive decoder, an output sequence based on the encoded input vector sequence and the secondary attention vectors.

Type: Application

Filed: September 19, 2019

Publication date: March 25, 2021

Inventors: Vyacheslav Shechtman, Alexander Sorin
VOICE TRANSFORMATION ALLOWANCE DETERMINATION AND REPRESENTATION

Publication number: 20200365135

Abstract: Embodiments of the present systems and methods may provide techniques that provide the capability to automatically generate allowance intervals for voice personas that meet desired requirements for realism and fidelity. For example, a method for voice persona generation may be implemented in a computer system comprising a processor, memory accessible by the processor, and computer program instructions stored in the memory and executable by the processor, the method comprising: displaying to a user, a plurality of user-selectable voice persona parameters that control features of a synthesized voice signal, and displaying, in conjunction with each of at least some of plurality of user-selectable voice persona parameters, voice transformation allowance intervals of the voice persona parameters, accepting from a user, a selection of at least one user-selectable voice persona parameter, and generating a synthesized voice signal based on the selected at least one user-selectable voice persona parameter.

Type: Application

Filed: May 13, 2019

Publication date: November 19, 2020

Inventors: VYACHESLAV SHECHTMAN, Alexander Sorin
Voice-transformation based data augmentation for prosodic classification

Patent number: 10726826

Abstract: A computer system receives a training set of voice data captured from one or more speakers and classified with one or more categorical prosodic labels according to one or more prosodic categories. The computer system transforms the voice data of the training set while preserving one or more portions of the voice data that determine the one or more categorical prosodic labels to automatically form a new training set of voice data automatically classified with the one or more categorical prosodic labels according to the one or more prosodic categories. The computer system augments a database comprising the training set with the new training set for training a speech model of an artificial intelligence system.

Type: Grant

Filed: March 4, 2018

Date of Patent: July 28, 2020

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Raul Fernandez, Andrew Rosenberg, Alexander Sorin
VOICE-TRANSFORMATION BASED DATA AUGMENTATION FOR PROSODIC CLASSIFICATION

Publication number: 20190272818

Abstract: A computer system receives a training set of voice data captured from one or more speakers and classified with one or more categorical prosodic labels according to one or more prosodic categories. The computer system transforms the voice data of the training set while preserving one or more portions of the voice data that determine the one or more categorical prosodic labels to automatically form a new training set of voice data automatically classified with the one or more categorical prosodic labels according to the one or more prosodic categories. The computer system augments a database comprising the training set with the new training set for training a speech model of an artificial intelligence system.

Type: Application

Filed: March 4, 2018

Publication date: September 5, 2019

Inventors: Raul Fernandez, Andrew Rosenberg, Alexander Sorin
Methods and systems of providing supplemental informaton

Patent number: 10134009

Abstract: At least one analytical operation from a set of different analytical operations may be determined based on at least one input. The input(s) may comprise contextual information of working content being displayed to a user on a device and comprising numerical data. Supplemental information for the working content may be generated using the determined analytical operation(s), may comprise a numerical-based analysis of the numerical data, and may be caused to be displayed to the user concurrently with the working content. The contextual information may comprise structured data. The input(s) may further comprise at least one of a history of the user's interactions with the working content, a history of the user's interactions with recommendations of supplemental information for the working content, a history of other users' interactions with the working content, and a history of other users' interactions with recommendations of supplemental information for the working content.

Type: Grant

Filed: March 13, 2013

Date of Patent: November 20, 2018

Assignee: SAP SE

Inventors: Alexander Sorin, David Siegel, Michael Thompson, Julian Gosper
Text-to-Speech Synthesis with Dynamically-Created Virtual Voices

Publication number: 20180330713

Abstract: Text-to-speech synthesis performed by deriving from a voice dataset a sequence of speech frames corresponding to a text, wherein any of the speech frames is represented in the voice dataset by a parameterized vocal tract component, glottal pulse parameters, and an aspiration noise level, transforming the speech frames in the sequence by applying a voice transformation to any of the parameterized vocal tract component, glottal pulse parameters, and aspiration noise level representing the speech frames, wherein the voice transformation is applied in accordance with a virtual voice specification that includes at least one voice control parameter indicating a value for at least one of timbre, glottal tension and breathiness, and producing a digital audio signal of synthesized speech from the transformed sequence of speech frames.

Type: Application

Filed: May 14, 2017

Publication date: November 15, 2018

Inventors: RON HOORY, MARIA E. SMITH, ALEXANDER SORIN
Coherent pitch and intensity modification of speech signals

Patent number: 9922661

Abstract: A method comprising: receiving an utterance, an original pitch contour of the utterance, and a target pitch contour for the utterance, wherein the utterance comprises a plurality of consecutive frames, and wherein at least one of said frames is a voiced frame; calculating an original intensity contour of said utterance; generating a pitch modified utterance based on the target pitch contour; calculating an intensity modification factor for each of said frames, based on said original pitch contour and on said target pitch contour, to produce a sequence of intensity modification factors corresponding to said plurality of consecutive frames; calculating a final intensity contour for said utterance by applying said intensity modification factors to said original intensity contour; and generating a coherently modified speech signal by time dependent scaling of the intensity of said pitch modified utterance according to said final intensity contour.

Type: Grant

Filed: December 14, 2016

Date of Patent: March 20, 2018

Assignee: International Business Machines Corporation

Inventor: Alexander Sorin
Coherently-modified speech signal generation by time-dependent scaling of intensity of a pitch-modified utterance

Patent number: 9922662

Abstract: A method comprising: receiving an utterance, an original pitch contour of the utterance, and a target pitch contour for the utterance, wherein the utterance comprises a plurality of consecutive frames, and wherein at least one of said frames is a voiced frame; calculating an original intensity contour of said utterance; generating a pitch modified utterance based on the target pitch contour; calculating an intensity modification factor for each of said frames, based on said original pitch contour and on said target pitch contour, to produce a sequence of intensity modification factors corresponding to said plurality of consecutive frames; calculating a final intensity contour for said utterance by applying said intensity modification factors to said original intensity contour; and generating a coherently modified speech signal by time dependent scaling of the intensity of said pitch modified utterance according to said final intensity contour.

Type: Grant

Filed: December 14, 2016

Date of Patent: March 20, 2018

Assignee: International Business Machines Corporation

Inventor: Alexander Sorin
Coherent pitch and intensity modification of speech signals

Patent number: 9685169

Abstract: A method comprising: receiving an utterance, an original pitch contour of the utterance, and a target pitch contour for the utterance, wherein the utterance comprises a plurality of consecutive frames, and wherein at least one of said frames is a voiced frame; calculating an original intensity contour of said utterance; generating a pitch modified utterance based on the target pitch contour; calculating an intensity modification factor for each of said frames, based on said original pitch contour and on said target pitch contour, to produce a sequence of intensity modification factors corresponding to said plurality of consecutive frames; calculating a final intensity contour for said utterance by applying said intensity modification factors to said original intensity contour; and generating a coherently modified speech signal by time dependent scaling of the intensity of said pitch modified utterance according to said final intensity contour.

Type: Grant

Filed: April 15, 2015

Date of Patent: June 20, 2017

Assignee: International Business Machines Corporation

Inventor: Alexander Sorin
Coherent Pitch and Intensity Modification of Speech Signals

Publication number: 20170092285

Abstract: A method comprising: receiving an utterance, an original pitch contour of the utterance, and a target pitch contour for the utterance, wherein the utterance comprises a plurality of consecutive frames, and wherein at least one of said frames is a voiced frame; calculating an original intensity contour of said utterance; generating a pitch modified utterance based on the target pitch contour; calculating an intensity modification factor for each of said frames, based on said original pitch contour and on said target pitch contour, to produce a sequence of intensity modification factors corresponding to said plurality of consecutive frames; calculating a final intensity contour for said utterance by applying said intensity modification factors to said original intensity contour; and generating a coherently modified speech signal by time dependent scaling of the intensity of said pitch modified utterance according to said final intensity contour.

Type: Application

Filed: December 14, 2016

Publication date: March 30, 2017

Inventor: Alexander Sorin
Coherent Pitch and Intensity Modification of Speech Signals

Publication number: 20170092286

Abstract: A method comprising: receiving an utterance, an original pitch contour of the utterance, and a target pitch contour for the utterance, wherein the utterance comprises a plurality of consecutive frames, and wherein at least one of said frames is a voiced frame; calculating an original intensity contour of said utterance; generating a pitch modified utterance based on the target pitch contour; calculating an intensity modification factor for each of said frames, based on said original pitch contour and on said target pitch contour, to produce a sequence of intensity modification factors corresponding to said plurality of consecutive frames; calculating a final intensity contour for said utterance by applying said intensity modification factors to said original intensity contour; and generating a coherently modified speech signal by time dependent scaling of the intensity of said pitch modified utterance according to said final intensity contour.

Type: Application

Filed: December 14, 2016

Publication date: March 30, 2017

Inventor: Alexander Sorin
Systems and methods for encoding audio signals

Patent number: 9564140

Abstract: Some embodiments relate to techniques for encoding an audio signal represented by a plurality of frames including a first frame. The techniques include using at least one computer hardware processor to perform: obtaining an initial discrete spectral representation of the first frame; obtaining a primary discrete spectral representation of the initial discrete spectral representation at least in part by estimating a phase envelope of the initial discrete spectral representation and evaluating the estimated phase envelope at a discrete set of frequencies; calculating a residual discrete spectral representation of the initial discrete spectral representation based on the initial discrete spectral representation and the primary discrete spectral representation; and encoding the residual discrete spectral representation using a plurality of codewords.

Type: Grant

Filed: April 7, 2015

Date of Patent: February 7, 2017

Assignee: Nuance Communications, Inc.

Inventors: Slava Shechtman, Alexander Sorin
System and method for automatic prediction of speech suitability for statistical modeling

Patent number: 9484045

Abstract: An embodiment according to the invention provides a capability of automatically predicting how favorable a given speech signal is for statistical modeling, which is advantageous in a variety of different contexts. In Multi-Form Segment (MFS) synthesis, for example, an embodiment according to the invention uses prediction capability to provide an automatic acoustic driven template versus model decision maker with an output quality that is high, stable and depends gradually on the system footprint. In speaker selection for a statistical Text-to-Speech synthesis (TTS) system build, as another example context, an embodiment according to the invention enables a fast selection of the most appropriate speaker among several available ones for the full voice dataset recording and preparation, based on a small amount of recorded speech material.

Type: Grant

Filed: September 7, 2012

Date of Patent: November 1, 2016

Assignee: Nuance Communications, Inc.

Inventors: Alexander Sorin, Slava Shechtman, Vincent Pollet
COHERENT PITCH AND INTENSITY MODIFICATION OF SPEECH SIGNALS

Publication number: 20160307560

Abstract: A method comprising: receiving an utterance, an original pitch contour of the utterance, and a target pitch contour for the utterance, wherein the utterance comprises a plurality of consecutive frames, and wherein at least one of said frames is a voiced frame; calculating an original intensity contour of said utterance; generating a pitch modified utterance based on the target pitch contour; calculating an intensity modification factor for each of said frames, based on said original pitch contour and on said target pitch contour, to produce a sequence of intensity modification factors corresponding to said plurality of consecutive frames; calculating a final intensity contour for said utterance by applying said intensity modification factors to said original intensity contour; and generating a coherently modified speech signal by time dependent scaling of the intensity of said pitch modified utterance according to said final intensity contour.

Type: Application

Filed: April 15, 2015

Publication date: October 20, 2016

Inventor: Alexander Sorin
SYSTEMS AND METHODS FOR ENCODING AUDIO SIGNALS

Publication number: 20160300580

Abstract: Some embodiments relate to techniques for encoding an audio signal represented by a plurality of frames including a first frame. The techniques include using at least one computer hardware processor to perform: obtaining an initial discrete spectral representation of the first frame; obtaining a primary discrete spectral representation of the initial discrete spectral representation at least in part by estimating a phase envelope of the initial discrete spectral representation and evaluating the estimated phase envelope at a discrete set of frequencies; calculating a residual discrete spectral representation of the initial discrete spectral representation based on the initial discrete spectral representation and the primary discrete spectral representation; and encoding the residual discrete spectral representation using a plurality of codewords.

Type: Application

Filed: April 7, 2015

Publication date: October 13, 2016

Applicant: Nuance Communications, Inc.

Inventors: Slava Shechtman, Alexander Sorin
User interface for providing supplemental information

Patent number: 9213472

Abstract: A user interface for providing supplemental information is disclosed. Working content may be caused to be displayed in a display zone of a graphical user interface of a device. The working content may comprise a plurality of graphical objects. A data overlay may be caused to be displayed in the display zone. The data overlay may overlay and at least partially obscure a corresponding portion of the working content. The data overlay may include unchanged graphical objects and at least one additional graphical object. The unchanged graphical objects in the data overlay may be aligned with and overlay corresponding graphical objects of the plurality of graphical objects in the corresponding portion of the working content. The additional graphical object(s) may represent a modification to the corresponding portion of the working content.

Type: Grant

Filed: March 12, 2013

Date of Patent: December 15, 2015

Assignee: SAP SE

Inventor: Alexander Sorin
USER INTERFACE FOR PROVIDING SUPPLEMENTAL INFORMATON

Publication number: 20140282230

Abstract: A user interface for providing supplemental information is disclosed. Working content may be caused to be displayed in a display zone of a graphical user interface of a device. The working content may comprise a plurality of graphical objects. A data overlay may be caused to be displayed in the display zone. The data overlay may overlay and at least partially obscure a corresponding portion of the working content. The data overlay may include unchanged graphical objects and at least one additional graphical object. The unchanged graphical objects in the data overlay may be aligned with and overlay corresponding graphical objects of the plurality of graphical objects in the corresponding portion of the working content. The additional graphical object(s) may represent a modification to the corresponding portion of the working content.

Type: Application

Filed: March 12, 2013

Publication date: September 18, 2014

Applicant: SAP AG

Inventor: Alexander Sorin
METHODS AND SYSTEMS OF PROVIDING SUPPLEMENTAL INFORMATON

Publication number: 20140281846

Abstract: At least one analytical operation from a set of different analytical operations may be determined based on at least one input. The input(s) may comprise contextual information of working content being displayed to a user on a device and comprising numerical data. Supplemental information for the working content may be generated using the determined analytical operation(s), may comprise a numerical-based analysis of the numerical data, and may be caused to be displayed to the user concurrently with the working content. The contextual information may comprise structured data. The input(s) may further comprise at least one of a history of the user's interactions with the working content, a history of the user's interactions with recommendations of supplemental information for the working content, a history of other users' interactions with the working content, and a history of other users' interactions with recommendations of supplemental information for the working content.

Type: Application

Filed: March 13, 2013

Publication date: September 18, 2014

Applicant: SAP AG

Inventors: Alexander Sorin, David Siegel, Michael Thompson, Julian Gosper

1 2 3 next