Patents by Inventor Alexander Sorin

Alexander Sorin has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11556782
    Abstract: In a trained attentive decoder of a trained Sequence-to-Sequence (seq2seq) Artificial Neural Network (ANN): obtaining an encoded input vector sequence; generating, using a trained primary attention mechanism of the trained attentive decoder, a primary attention vectors sequence; for each primary attention vector of the primary attention vectors sequence: (a) generating a set of attention vector candidates corresponding to the respective primary attention vector, (b) evaluating, for each attention vector candidate of the set of attention vector candidates, a structure fit measure that quantifies a similarity of the respective attention vector candidate to a desired attention vector structure, (c) generating, using a trained soft-selection ANN, a secondary attention vector based on said evaluation and on state variables of the trained attentive decoder; and generating, using the trained attentive decoder, an output sequence based on the encoded input vector sequence and the secondary attention vectors.
    Type: Grant
    Filed: September 19, 2019
    Date of Patent: January 17, 2023
    Assignee: International Business Machines Corporation
    Inventors: Vyacheslav Shechtman, Alexander Sorin
  • Patent number: 11062691
    Abstract: Embodiments of the present systems and methods may provide techniques that provide the capability to automatically generate allowance intervals for voice personas that meet desired requirements for realism and fidelity. For example, a method for voice persona generation may be implemented in a computer system comprising a processor, memory accessible by the processor, and computer program instructions stored in the memory and executable by the processor, the method comprising: displaying to a user, a plurality of user-selectable voice persona parameters that control features of a synthesized voice signal, and displaying, in conjunction with each of at least some of plurality of user-selectable voice persona parameters, voice transformation allowance intervals of the voice persona parameters, accepting from a user, a selection of at least one user-selectable voice persona parameter, and generating a synthesized voice signal based on the selected at least one user-selectable voice persona parameter.
    Type: Grant
    Filed: May 13, 2019
    Date of Patent: July 13, 2021
    Assignee: International Business Machines Corporation
    Inventors: Vyacheslav Shechtman, Alexander Sorin
  • Publication number: 20210089877
    Abstract: In a trained attentive decoder of a trained Sequence-to-Sequence (seq2seq) Artificial Neural Network (ANN): obtaining an encoded input vector sequence; generating, using a trained primary attention mechanism of the trained attentive decoder, a primary attention vectors sequence; for each primary attention vector of the primary attention vectors sequence: (a) generating a set of attention vector candidates corresponding to the respective primary attention vector, (b) evaluating, for each attention vector candidate of the set of attention vector candidates, a structure fit measure that quantifies a similarity of the respective attention vector candidate to a desired attention vector structure, (c) generating, using a trained soft-selection ANN, a secondary attention vector based on said evaluation and on state variables of the trained attentive decoder; and generating, using the trained attentive decoder, an output sequence based on the encoded input vector sequence and the secondary attention vectors.
    Type: Application
    Filed: September 19, 2019
    Publication date: March 25, 2021
    Inventors: Vyacheslav Shechtman, Alexander Sorin
  • Publication number: 20200365135
    Abstract: Embodiments of the present systems and methods may provide techniques that provide the capability to automatically generate allowance intervals for voice personas that meet desired requirements for realism and fidelity. For example, a method for voice persona generation may be implemented in a computer system comprising a processor, memory accessible by the processor, and computer program instructions stored in the memory and executable by the processor, the method comprising: displaying to a user, a plurality of user-selectable voice persona parameters that control features of a synthesized voice signal, and displaying, in conjunction with each of at least some of plurality of user-selectable voice persona parameters, voice transformation allowance intervals of the voice persona parameters, accepting from a user, a selection of at least one user-selectable voice persona parameter, and generating a synthesized voice signal based on the selected at least one user-selectable voice persona parameter.
    Type: Application
    Filed: May 13, 2019
    Publication date: November 19, 2020
    Inventors: VYACHESLAV SHECHTMAN, Alexander Sorin
  • Patent number: 10726826
    Abstract: A computer system receives a training set of voice data captured from one or more speakers and classified with one or more categorical prosodic labels according to one or more prosodic categories. The computer system transforms the voice data of the training set while preserving one or more portions of the voice data that determine the one or more categorical prosodic labels to automatically form a new training set of voice data automatically classified with the one or more categorical prosodic labels according to the one or more prosodic categories. The computer system augments a database comprising the training set with the new training set for training a speech model of an artificial intelligence system.
    Type: Grant
    Filed: March 4, 2018
    Date of Patent: July 28, 2020
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Raul Fernandez, Andrew Rosenberg, Alexander Sorin
  • Publication number: 20190272818
    Abstract: A computer system receives a training set of voice data captured from one or more speakers and classified with one or more categorical prosodic labels according to one or more prosodic categories. The computer system transforms the voice data of the training set while preserving one or more portions of the voice data that determine the one or more categorical prosodic labels to automatically form a new training set of voice data automatically classified with the one or more categorical prosodic labels according to the one or more prosodic categories. The computer system augments a database comprising the training set with the new training set for training a speech model of an artificial intelligence system.
    Type: Application
    Filed: March 4, 2018
    Publication date: September 5, 2019
    Inventors: Raul Fernandez, Andrew Rosenberg, Alexander Sorin
  • Patent number: 10134009
    Abstract: At least one analytical operation from a set of different analytical operations may be determined based on at least one input. The input(s) may comprise contextual information of working content being displayed to a user on a device and comprising numerical data. Supplemental information for the working content may be generated using the determined analytical operation(s), may comprise a numerical-based analysis of the numerical data, and may be caused to be displayed to the user concurrently with the working content. The contextual information may comprise structured data. The input(s) may further comprise at least one of a history of the user's interactions with the working content, a history of the user's interactions with recommendations of supplemental information for the working content, a history of other users' interactions with the working content, and a history of other users' interactions with recommendations of supplemental information for the working content.
    Type: Grant
    Filed: March 13, 2013
    Date of Patent: November 20, 2018
    Assignee: SAP SE
    Inventors: Alexander Sorin, David Siegel, Michael Thompson, Julian Gosper
  • Publication number: 20180330713
    Abstract: Text-to-speech synthesis performed by deriving from a voice dataset a sequence of speech frames corresponding to a text, wherein any of the speech frames is represented in the voice dataset by a parameterized vocal tract component, glottal pulse parameters, and an aspiration noise level, transforming the speech frames in the sequence by applying a voice transformation to any of the parameterized vocal tract component, glottal pulse parameters, and aspiration noise level representing the speech frames, wherein the voice transformation is applied in accordance with a virtual voice specification that includes at least one voice control parameter indicating a value for at least one of timbre, glottal tension and breathiness, and producing a digital audio signal of synthesized speech from the transformed sequence of speech frames.
    Type: Application
    Filed: May 14, 2017
    Publication date: November 15, 2018
    Inventors: RON HOORY, MARIA E. SMITH, ALEXANDER SORIN
  • Patent number: 9922661
    Abstract: A method comprising: receiving an utterance, an original pitch contour of the utterance, and a target pitch contour for the utterance, wherein the utterance comprises a plurality of consecutive frames, and wherein at least one of said frames is a voiced frame; calculating an original intensity contour of said utterance; generating a pitch modified utterance based on the target pitch contour; calculating an intensity modification factor for each of said frames, based on said original pitch contour and on said target pitch contour, to produce a sequence of intensity modification factors corresponding to said plurality of consecutive frames; calculating a final intensity contour for said utterance by applying said intensity modification factors to said original intensity contour; and generating a coherently modified speech signal by time dependent scaling of the intensity of said pitch modified utterance according to said final intensity contour.
    Type: Grant
    Filed: December 14, 2016
    Date of Patent: March 20, 2018
    Assignee: International Business Machines Corporation
    Inventor: Alexander Sorin
  • Patent number: 9922662
    Abstract: A method comprising: receiving an utterance, an original pitch contour of the utterance, and a target pitch contour for the utterance, wherein the utterance comprises a plurality of consecutive frames, and wherein at least one of said frames is a voiced frame; calculating an original intensity contour of said utterance; generating a pitch modified utterance based on the target pitch contour; calculating an intensity modification factor for each of said frames, based on said original pitch contour and on said target pitch contour, to produce a sequence of intensity modification factors corresponding to said plurality of consecutive frames; calculating a final intensity contour for said utterance by applying said intensity modification factors to said original intensity contour; and generating a coherently modified speech signal by time dependent scaling of the intensity of said pitch modified utterance according to said final intensity contour.
    Type: Grant
    Filed: December 14, 2016
    Date of Patent: March 20, 2018
    Assignee: International Business Machines Corporation
    Inventor: Alexander Sorin
  • Patent number: 9685169
    Abstract: A method comprising: receiving an utterance, an original pitch contour of the utterance, and a target pitch contour for the utterance, wherein the utterance comprises a plurality of consecutive frames, and wherein at least one of said frames is a voiced frame; calculating an original intensity contour of said utterance; generating a pitch modified utterance based on the target pitch contour; calculating an intensity modification factor for each of said frames, based on said original pitch contour and on said target pitch contour, to produce a sequence of intensity modification factors corresponding to said plurality of consecutive frames; calculating a final intensity contour for said utterance by applying said intensity modification factors to said original intensity contour; and generating a coherently modified speech signal by time dependent scaling of the intensity of said pitch modified utterance according to said final intensity contour.
    Type: Grant
    Filed: April 15, 2015
    Date of Patent: June 20, 2017
    Assignee: International Business Machines Corporation
    Inventor: Alexander Sorin
  • Publication number: 20170092285
    Abstract: A method comprising: receiving an utterance, an original pitch contour of the utterance, and a target pitch contour for the utterance, wherein the utterance comprises a plurality of consecutive frames, and wherein at least one of said frames is a voiced frame; calculating an original intensity contour of said utterance; generating a pitch modified utterance based on the target pitch contour; calculating an intensity modification factor for each of said frames, based on said original pitch contour and on said target pitch contour, to produce a sequence of intensity modification factors corresponding to said plurality of consecutive frames; calculating a final intensity contour for said utterance by applying said intensity modification factors to said original intensity contour; and generating a coherently modified speech signal by time dependent scaling of the intensity of said pitch modified utterance according to said final intensity contour.
    Type: Application
    Filed: December 14, 2016
    Publication date: March 30, 2017
    Inventor: Alexander Sorin
  • Publication number: 20170092286
    Abstract: A method comprising: receiving an utterance, an original pitch contour of the utterance, and a target pitch contour for the utterance, wherein the utterance comprises a plurality of consecutive frames, and wherein at least one of said frames is a voiced frame; calculating an original intensity contour of said utterance; generating a pitch modified utterance based on the target pitch contour; calculating an intensity modification factor for each of said frames, based on said original pitch contour and on said target pitch contour, to produce a sequence of intensity modification factors corresponding to said plurality of consecutive frames; calculating a final intensity contour for said utterance by applying said intensity modification factors to said original intensity contour; and generating a coherently modified speech signal by time dependent scaling of the intensity of said pitch modified utterance according to said final intensity contour.
    Type: Application
    Filed: December 14, 2016
    Publication date: March 30, 2017
    Inventor: Alexander Sorin
  • Patent number: 9564140
    Abstract: Some embodiments relate to techniques for encoding an audio signal represented by a plurality of frames including a first frame. The techniques include using at least one computer hardware processor to perform: obtaining an initial discrete spectral representation of the first frame; obtaining a primary discrete spectral representation of the initial discrete spectral representation at least in part by estimating a phase envelope of the initial discrete spectral representation and evaluating the estimated phase envelope at a discrete set of frequencies; calculating a residual discrete spectral representation of the initial discrete spectral representation based on the initial discrete spectral representation and the primary discrete spectral representation; and encoding the residual discrete spectral representation using a plurality of codewords.
    Type: Grant
    Filed: April 7, 2015
    Date of Patent: February 7, 2017
    Assignee: Nuance Communications, Inc.
    Inventors: Slava Shechtman, Alexander Sorin
  • Patent number: 9484045
    Abstract: An embodiment according to the invention provides a capability of automatically predicting how favorable a given speech signal is for statistical modeling, which is advantageous in a variety of different contexts. In Multi-Form Segment (MFS) synthesis, for example, an embodiment according to the invention uses prediction capability to provide an automatic acoustic driven template versus model decision maker with an output quality that is high, stable and depends gradually on the system footprint. In speaker selection for a statistical Text-to-Speech synthesis (TTS) system build, as another example context, an embodiment according to the invention enables a fast selection of the most appropriate speaker among several available ones for the full voice dataset recording and preparation, based on a small amount of recorded speech material.
    Type: Grant
    Filed: September 7, 2012
    Date of Patent: November 1, 2016
    Assignee: Nuance Communications, Inc.
    Inventors: Alexander Sorin, Slava Shechtman, Vincent Pollet
  • Publication number: 20160307560
    Abstract: A method comprising: receiving an utterance, an original pitch contour of the utterance, and a target pitch contour for the utterance, wherein the utterance comprises a plurality of consecutive frames, and wherein at least one of said frames is a voiced frame; calculating an original intensity contour of said utterance; generating a pitch modified utterance based on the target pitch contour; calculating an intensity modification factor for each of said frames, based on said original pitch contour and on said target pitch contour, to produce a sequence of intensity modification factors corresponding to said plurality of consecutive frames; calculating a final intensity contour for said utterance by applying said intensity modification factors to said original intensity contour; and generating a coherently modified speech signal by time dependent scaling of the intensity of said pitch modified utterance according to said final intensity contour.
    Type: Application
    Filed: April 15, 2015
    Publication date: October 20, 2016
    Inventor: Alexander Sorin
  • Publication number: 20160300580
    Abstract: Some embodiments relate to techniques for encoding an audio signal represented by a plurality of frames including a first frame. The techniques include using at least one computer hardware processor to perform: obtaining an initial discrete spectral representation of the first frame; obtaining a primary discrete spectral representation of the initial discrete spectral representation at least in part by estimating a phase envelope of the initial discrete spectral representation and evaluating the estimated phase envelope at a discrete set of frequencies; calculating a residual discrete spectral representation of the initial discrete spectral representation based on the initial discrete spectral representation and the primary discrete spectral representation; and encoding the residual discrete spectral representation using a plurality of codewords.
    Type: Application
    Filed: April 7, 2015
    Publication date: October 13, 2016
    Applicant: Nuance Communications, Inc.
    Inventors: Slava Shechtman, Alexander Sorin
  • Patent number: 9213472
    Abstract: A user interface for providing supplemental information is disclosed. Working content may be caused to be displayed in a display zone of a graphical user interface of a device. The working content may comprise a plurality of graphical objects. A data overlay may be caused to be displayed in the display zone. The data overlay may overlay and at least partially obscure a corresponding portion of the working content. The data overlay may include unchanged graphical objects and at least one additional graphical object. The unchanged graphical objects in the data overlay may be aligned with and overlay corresponding graphical objects of the plurality of graphical objects in the corresponding portion of the working content. The additional graphical object(s) may represent a modification to the corresponding portion of the working content.
    Type: Grant
    Filed: March 12, 2013
    Date of Patent: December 15, 2015
    Assignee: SAP SE
    Inventor: Alexander Sorin
  • Publication number: 20140282230
    Abstract: A user interface for providing supplemental information is disclosed. Working content may be caused to be displayed in a display zone of a graphical user interface of a device. The working content may comprise a plurality of graphical objects. A data overlay may be caused to be displayed in the display zone. The data overlay may overlay and at least partially obscure a corresponding portion of the working content. The data overlay may include unchanged graphical objects and at least one additional graphical object. The unchanged graphical objects in the data overlay may be aligned with and overlay corresponding graphical objects of the plurality of graphical objects in the corresponding portion of the working content. The additional graphical object(s) may represent a modification to the corresponding portion of the working content.
    Type: Application
    Filed: March 12, 2013
    Publication date: September 18, 2014
    Applicant: SAP AG
    Inventor: Alexander Sorin
  • Publication number: 20140281846
    Abstract: At least one analytical operation from a set of different analytical operations may be determined based on at least one input. The input(s) may comprise contextual information of working content being displayed to a user on a device and comprising numerical data. Supplemental information for the working content may be generated using the determined analytical operation(s), may comprise a numerical-based analysis of the numerical data, and may be caused to be displayed to the user concurrently with the working content. The contextual information may comprise structured data. The input(s) may further comprise at least one of a history of the user's interactions with the working content, a history of the user's interactions with recommendations of supplemental information for the working content, a history of other users' interactions with the working content, and a history of other users' interactions with recommendations of supplemental information for the working content.
    Type: Application
    Filed: March 13, 2013
    Publication date: September 18, 2014
    Applicant: SAP AG
    Inventors: Alexander Sorin, David Siegel, Michael Thompson, Julian Gosper