Patents by Inventor Slava Shechtman
Slava Shechtman has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 10418025Abstract: A method for producing speech comprises: accessing an expressive prosody model, wherein the model is generated by: receiving a plurality of non-neutral prosody vector sequences, each vector associated with one of a plurality of time-instances; receiving a plurality of expression labels, each having a time-instance selected from a plurality of non-neutral time-instances of the plurality of time-instances; producing a plurality of neutral prosody vector sequences equivalent to the plurality of non-neutral sequences by applying a linear combination of a plurality of statistical measures to a plurality of sub-sequences selected according to an identified proximity test applied to a plurality of neutral time-instances of the plurality of time-instances; and training at least one machine learning module using the plurality of non-neutral sequences and the plurality of neutral sequences to produce an expressive prosodic model; and using the model within a Text-To-Speech-System to produce an audio waveform from an inType: GrantFiled: December 6, 2017Date of Patent: September 17, 2019Assignee: International Business Machines CorporationInventors: Slava Shechtman, Zvi Kons
-
Publication number: 20190172443Abstract: A method for producing speech comprises: accessing an expressive prosody model, wherein the model is generated by: receiving a plurality of non-neutral prosody vector sequences, each vector associated with one of a plurality of time-instances; receiving a plurality of expression labels, each having a time-instance selected from a plurality of non-neutral time-instances of the plurality of time-instances; producing a plurality of neutral prosody vector sequences equivalent to the plurality of non-neutral sequences by applying a linear combination of a plurality of statistical measures to a plurality of sub-sequences selected according to an identified proximity test applied to a plurality of neutral time-instances of the plurality of time-instances; and training at least one machine learning module using the plurality of non-neutral sequences and the plurality of neutral sequences to produce an expressive prosodic model; and using the model within a Text-To-Speech-System to produce an audio waveform from an inType: ApplicationFiled: December 6, 2017Publication date: June 6, 2019Inventors: Slava Shechtman, Zvi Kons
-
Patent number: 10226702Abstract: A computer-implemented method, computerized apparatus and computer program product. The method comprises capturing one or more images of a scene in which a driver is driving a vehicle; analyzing the images to retrieve an event or detail; conveying to the driver the a question or a challenge related to the event or detail; receiving a response from the driver; analyzing the response; and determining a score related to the driver.Type: GrantFiled: May 25, 2015Date of Patent: March 12, 2019Assignee: International Business Machines CorporationInventors: Ron Hoory, Mattias Marder, Slava Shechtman
-
Patent number: 9892335Abstract: Embodiments of the present invention may provide the capability to identify a specific object being interacted with that may be cheaply and easily included in mass-produced objects. In an embodiment, a computer-implemented method for object identification may comprise receiving a signal produced by a physical interaction with an object to be identified, the signal produced by an identification structure coupled to the object during physical interaction with the object, processing the signal to form digital data representing the signal, and accessing a database using the digital data to retrieve information identifying the object.Type: GrantFiled: June 5, 2016Date of Patent: February 13, 2018Assignee: International Business Machines CorporationInventors: Ophir Azulai, Udi Barzelay, Mattias Marder, Dror Porat, Slava Shechtman
-
Publication number: 20170351930Abstract: Embodiments of the present invention may provide the capability to identify a specific object being interacted with that may be cheaply and easily included in mass-produced objects. In an embodiment, a computer-implemented method for object identification may comprise receiving a signal produced by a physical interaction with an object to be identified, the signal produced by an identification structure coupled to the object during physical interaction with the object, processing the signal to form digital data representing the signal, and accessing a database using the digital data to retrieve information identifying the object.Type: ApplicationFiled: June 5, 2016Publication date: December 7, 2017Applicant: International Business Machines CorporationInventors: Ophir Azulai, Udi Barzelay, Mattias Marder, Dror Porat, Slava Shechtman
-
Patent number: 9685170Abstract: According to some embodiments of the present invention, there is provided a computerized method for selecting and correcting pitch marks in speech processing and modification. The method comprises an action of receiving a continuous speech signal representing audible speech recorded by a microphone, where a sequence of pitch values and two or more pitch mark temporal values are computed from the continuous speech signal. The method comprises an action of computing for each of the pitch mark temporal values a lower limit temporal value and an upper limit temporal value by a cross-correlation function of the continuous speech signal around the pitch mark temporal values associated with pairs of elements in the sequence and replacing one or more of the pitch mark temporal values with one or more new temporal value between the lower limit temporal value and the upper limit temporal value.Type: GrantFiled: October 21, 2015Date of Patent: June 20, 2017Assignee: International Business Machines CorporationInventor: Slava Shechtman
-
Publication number: 20170117001Abstract: According to some embodiments of the present invention, there is provided a computerized method for selecting and correcting pitch marks in speech processing and modification. The method comprises an action of receiving a continuous speech signal representing audible speech recorded by a microphone, where a sequence of pitch values and two or more pitch mark temporal values are computed from the continuous speech signal. The method comprises an action of computing for each of the pitch mark temporal values a lower limit temporal value and an upper limit temporal value by a cross-correlation function of the continuous speech signal around the pitch mark temporal values associated with pairs of elements in the sequence and replacing one or more of the pitch mark temporal values with one or more new temporal value between the lower limit temporal value and the upper limit temporal value.Type: ApplicationFiled: October 21, 2015Publication date: April 27, 2017Inventor: Slava Shechtman
-
Patent number: 9564140Abstract: Some embodiments relate to techniques for encoding an audio signal represented by a plurality of frames including a first frame. The techniques include using at least one computer hardware processor to perform: obtaining an initial discrete spectral representation of the first frame; obtaining a primary discrete spectral representation of the initial discrete spectral representation at least in part by estimating a phase envelope of the initial discrete spectral representation and evaluating the estimated phase envelope at a discrete set of frequencies; calculating a residual discrete spectral representation of the initial discrete spectral representation based on the initial discrete spectral representation and the primary discrete spectral representation; and encoding the residual discrete spectral representation using a plurality of codewords.Type: GrantFiled: April 7, 2015Date of Patent: February 7, 2017Assignee: Nuance Communications, Inc.Inventors: Slava Shechtman, Alexander Sorin
-
Publication number: 20160346695Abstract: A computer-implemented method, computerized apparatus and computer program product. The method comprises capturing one or more images of a scene in which a driver is driving a vehicle; analyzing the images to retrieve an event or detail; conveying to the driver the a question or a challenge related to the event or detail; receiving a response from the driver; analyzing the response; and determining a score related to the driver.Type: ApplicationFiled: May 25, 2015Publication date: December 1, 2016Inventors: Ron Hoory, Mattias Marder, Slava Shechtman
-
Patent number: 9484045Abstract: An embodiment according to the invention provides a capability of automatically predicting how favorable a given speech signal is for statistical modeling, which is advantageous in a variety of different contexts. In Multi-Form Segment (MFS) synthesis, for example, an embodiment according to the invention uses prediction capability to provide an automatic acoustic driven template versus model decision maker with an output quality that is high, stable and depends gradually on the system footprint. In speaker selection for a statistical Text-to-Speech synthesis (TTS) system build, as another example context, an embodiment according to the invention enables a fast selection of the most appropriate speaker among several available ones for the full voice dataset recording and preparation, based on a small amount of recorded speech material.Type: GrantFiled: September 7, 2012Date of Patent: November 1, 2016Assignee: Nuance Communications, Inc.Inventors: Alexander Sorin, Slava Shechtman, Vincent Pollet
-
Patent number: 9484036Abstract: Computer systems employing speaker verification as a security approach to prevent un-authorized access by intruders may be tricked by a synthetic speech with voice characteristics similar to those of an authorized user of the computer system. According to at least one example embodiment, a method and corresponding apparatus for detecting a synthetic speech signal include extracting a plurality of speech features from multiple segments of the speech signal; analyzing the plurality of speech features to determine whether the plurality of speech features exhibit periodic variation behavior; and determining whether the speech signal is a synthetic speech signal or a natural speech signal based on whether or not a periodic variation behavior of the plurality of speech features is detected. The embodiments of synthetic speech detection result in security enhancement of the computer system employing speaker verification.Type: GrantFiled: August 28, 2013Date of Patent: November 1, 2016Assignee: Nuance Communications, Inc.Inventors: Zvi Kons, Hagai Aronowitz, Slava Shechtman
-
Publication number: 20160300580Abstract: Some embodiments relate to techniques for encoding an audio signal represented by a plurality of frames including a first frame. The techniques include using at least one computer hardware processor to perform: obtaining an initial discrete spectral representation of the first frame; obtaining a primary discrete spectral representation of the initial discrete spectral representation at least in part by estimating a phase envelope of the initial discrete spectral representation and evaluating the estimated phase envelope at a discrete set of frequencies; calculating a residual discrete spectral representation of the initial discrete spectral representation based on the initial discrete spectral representation and the primary discrete spectral representation; and encoding the residual discrete spectral representation using a plurality of codewords.Type: ApplicationFiled: April 7, 2015Publication date: October 13, 2016Applicant: Nuance Communications, Inc.Inventors: Slava Shechtman, Alexander Sorin
-
Patent number: 9224402Abstract: A method for speech parameterization and coding of a continuous speech signal. The method comprises dividing said speech signal into a plurality of speech frames, and for each one of the plurality of speech frames, modeling said speech frame by a first harmonic modeling to produce a plurality of harmonic model parameters, reconstructing an estimated frame signal from the plurality of harmonic model parameters, subtracting the estimated frame signal from the speech frame to produce a harmonic model residual, performing at least one second harmonic modeling analysis on the first harmonic model residual to determine at least one set of second harmonic model components, removing the at least one set of second harmonic model components from the first harmonic model residual to produce a harmonically-filtered residual signal, and processing the harmonically-filtered residual signal with analysis by synthesis techniques to produce vectors of codebook indices and corresponding gains.Type: GrantFiled: September 30, 2013Date of Patent: December 29, 2015Assignee: International Business Machines CorporationInventor: Slava Shechtman
-
Patent number: 9159323Abstract: A method including: obtaining, via a plurality of communication devices, a plurality of speech signals respectively associated with human speakers, the speech signals including verbal components and non-verbal components; identifying a plurality of geographical locations, each geographic location associated with a respective one of the plurality of the communication devices; extracting the non-verbal components from the obtained speech signals; deducing physiological or psychological conditions of the human speakers by analyzing, over a specified period, the extracted non-verbal components, using predefined relations between characteristics of the non-verbal components and physiological or psychological conditions of the human speakers; and providing a geographical distribution of the deduced physiological or psychological conditions of the human speakers by associating the deduced physiological or psychological conditions of the human speakers with geographical locations thereof.Type: GrantFiled: July 29, 2013Date of Patent: October 13, 2015Assignee: Nuance Communications, Inc.Inventors: Slava Shechtman, Raphael Steinberg
-
Publication number: 20150095035Abstract: A method for speech parameterization and coding of a continuous speech signal. The method comprises dividing said speech signal into a plurality of speech frames, and for each one of the plurality of speech frames, modeling said speech frame by a first harmonic modeling to produce a plurality of harmonic model parameters, reconstructing an estimated frame signal from the plurality of harmonic model parameters, subtracting the estimated frame signal from the speech frame to produce a harmonic model residual, performing at least one second harmonic modeling analysis on the first harmonic model residual to determine at least one set of second harmonic model components, removing the at least one set of second harmonic model components from the first harmonic model residual to produce a harmonically-filtered residual signal, and processing the harmonically-filtered residual signal with analysis by synthesis techniques to produce vectors of codebook indices and corresponding gains.Type: ApplicationFiled: September 30, 2013Publication date: April 2, 2015Applicant: International Business Machines CorporationInventor: Slava Shechtman
-
Publication number: 20150066512Abstract: Computer systems employing speaker verification as a security approach to prevent un-authorized access by intruders may be tricked by a synthetic speech with voice characteristics similar to those of an authorized user of the computer system. According to at least one example embodiment, a method and corresponding apparatus for detecting a synthetic speech signal include extracting a plurality of speech features from multiple segments of the speech signal; analyzing the plurality of speech features to determine whether the plurality of speech features exhibit periodic variation behavior; and determining whether the speech signal is a synthetic speech signal or a natural speech signal based on whether or not a periodic variation behavior of the plurality of speech features is detected. The embodiments of synthetic speech detection result in security enhancement of the computer system employing speaker verification.Type: ApplicationFiled: August 28, 2013Publication date: March 5, 2015Applicant: NUANCE COMMUNICATIONS, INC.Inventors: Zvi Kons, Hagai Aronowitz, Slava Shechtman
-
Patent number: 8786659Abstract: A method for responding to media conference deficiencies, the method includes: monitoring, by at least one receiver, a quality of media conference signals being received by at least one receiver during the media conference; sending, in response to the monitoring, to at least an end user transmitter that transmitted the media conference signals, a quality indication representative of a quality of the received media conference signals; recording inadequately received media conference signals that were inadequately received by a certain end user receiver and participating in an activity related to a transmission, to the certain end user receiver, of the inadequately received media conference signals or of a representation of the inadequately received media conference signals.Type: GrantFiled: May 29, 2012Date of Patent: July 22, 2014Assignee: International Business Machines CorporationInventors: Ron Hoory, Michael Rodeh, Slava Shechtman
-
Patent number: 8682670Abstract: A method, system and computer program product are provided for enhancement of speech synthesized by a statistical text-to-speech (TTS) system employing a parametric representation of speech in a space of acoustic feature vectors. The method includes: defining a parametric family of corrective transformations operating in the space of the acoustic feature vectors and dependent on a set of enhancing parameters; and defining a distortion indictor of a feature vector or a plurality of feature vectors.Type: GrantFiled: July 7, 2011Date of Patent: March 25, 2014Assignee: International Business Machines CorporationInventors: Slava Shechtman, Alexander Sorin
-
Publication number: 20140074468Abstract: An embodiment according to the invention provides a capability of automatically predicting how favorable a given speech signal is for statistical modeling, which is advantageous in a variety of different contexts. In Multi-Form Segment (MFS) synthesis, for example, an embodiment according to the invention uses prediction capability to provide an automatic acoustic driven template versus model decision maker with an output quality that is high, stable and depends gradually on the system footprint. In speaker selection for a statistical Text-to-Speech synthesis (TTS) system build, as another example context, an embodiment according to the invention enables a fast selection of the most appropriate speaker among several available ones for the full voice dataset recording and preparation, based on a small amount of recorded speech material.Type: ApplicationFiled: September 7, 2012Publication date: March 13, 2014Applicant: Nuance Communications, Inc.Inventors: Alexander Sorin, Slava Shechtman, Vincent Pollet
-
Publication number: 20130317825Abstract: A method including: obtaining, via a plurality of communication devices, a plurality of speech signals respectively associated with human speakers, the speech signals including verbal components and non-verbal components; identifying a plurality of geographical locations, each geographic location associated with a respective one of the plurality of the communication devices; extracting the non-verbal components from the obtained speech signals; deducing physiological or psychological conditions of the human speakers by analyzing, over a specified period, the extracted non-verbal components, using predefined relations between characteristics of the non-verbal components and physiological or psychological conditions of the human speakers; and providing a geographical distribution of the deduced physiological or psychological conditions of the human speakers by associating the deduced physiological or psychological conditions of the human speakers with geographical locations thereof.Type: ApplicationFiled: July 29, 2013Publication date: November 28, 2013Applicant: Nuance Communications, Inc.Inventors: Slava Shechtman, Raphael Steinberg