Patents by Inventor Zvi Kons

Zvi Kons has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

LANGUAGE IDENTIFICATION CLASSIFIER TRAINED USING ENCODED AUDIO FROM ENCODER OF PRE-TRAINED SPEECH-TO-TEXT SYSTEM

Publication number: 20240038216

Abstract: An example system includes a processor to receive encoded audio from an encoder of a pre-trained speech-to-text (STT) model. The processor is to further train a language identification (LID) classifier to detect a language of the encoded audio using training samples labeled by language.

Type: Application

Filed: July 27, 2022

Publication date: February 1, 2024

Inventor: Zvi KONS
INTENT CLASSIFICATION ENHANCEMENT THROUGH TRAINING DATA AUGMENTATION

Publication number: 20230177273

Abstract: A computer-implemented method, a computer system and a computer program product enhance an intent classifier through training data augmentation. The method includes selecting a target sample from a plurality of samples. The method also includes determining an ambiguity level for the target sample based on confidence scores of at least two intent labels associated with the target sample. The method further includes selecting a nearest neighboring sample from a group of neighboring samples when the ambiguity level is below a threshold. The nearest neighboring sample includes a confidence score associated with an intent label. The method also includes, for every intent label, merging the confidence scores of the two samples into an overall confidence score for the intent label and modifying the ambiguity level using the overall confidence score. Lastly, the method includes labeling the target sample with the intent label when the modified ambiguity level is above the threshold.

Type: Application

Filed: December 8, 2021

Publication date: June 8, 2023

Inventors: Zvi Kons, Aharon Satt
Systems and methods for automated generation of subtitles

Patent number: 11211053

Abstract: There is provided a computer implemented method of presenting color coded text generated from an audio track of a video, the color coding denoting respective speakers, comprising: receiving the audio track of the video divided into a plurality of audio-segments each representing speech spoken by a respective speaker of a plurality of speakers, for each audio-segment of the plurality of audio-segments: receiving a text representation of the audio-segment, extracting a feature vector from the audio-segment, mapping the feature vector to a color space, coloring the text representation according to the color space, and presenting the colored text representation in association with a video-segment corresponding to the audio-segment.

Type: Grant

Filed: May 23, 2019

Date of Patent: December 28, 2021

Assignee: International Business Machines Corporation

Inventors: Hagai Aronowitz, Zvi Kons
VOICE CONVERSION AND VERIFICATION

Publication number: 20210304783

Abstract: Method, system and computer program product, the method comprising: receiving a first audio, wherein the first audio is a conversion of an audio by a first source to a second source, wherein the first audio having embedded therein first information characterizing the first source of the audio; extracting from the first audio the first information of the first source embedded within the first audio; obtaining second information characterizing a third source; comparing the first information to the second information to obtain comparison results; and subject to the comparison results indicating that the first source is the same as the third source, initiating an action.

Type: Application

Filed: March 31, 2020

Publication date: September 30, 2021

Inventors: ZVI KONS, VYACHESLAV SHECHTMAN
SYSTEMS AND METHODS FOR AUTOMATED GENERATION OF SUBTITLES

Publication number: 20200372899

Abstract: There is provided a computer implemented method of presenting color coded text generated from an audio track of a video, the color coding denoting respective speakers, comprising: receiving the audio track of the video divided into a plurality of audio-segments each representing speech spoken by a respective speaker of a plurality of speakers, for each audio-segment of the plurality of audio-segments: receiving a text representation of the audio-segment, extracting a feature vector from the audio-segment, mapping the feature vector to a color space, coloring the text representation according to the color space, and presenting the colored text representation in association with a video-segment corresponding to the audio-segment.

Type: Application

Filed: May 23, 2019

Publication date: November 26, 2020

Inventors: HAGAI ARONOWITZ, Zvi Kons
System and method for generating expressive prosody for speech synthesis

Patent number: 10418025

Abstract: A method for producing speech comprises: accessing an expressive prosody model, wherein the model is generated by: receiving a plurality of non-neutral prosody vector sequences, each vector associated with one of a plurality of time-instances; receiving a plurality of expression labels, each having a time-instance selected from a plurality of non-neutral time-instances of the plurality of time-instances; producing a plurality of neutral prosody vector sequences equivalent to the plurality of non-neutral sequences by applying a linear combination of a plurality of statistical measures to a plurality of sub-sequences selected according to an identified proximity test applied to a plurality of neutral time-instances of the plurality of time-instances; and training at least one machine learning module using the plurality of non-neutral sequences and the plurality of neutral sequences to produce an expressive prosodic model; and using the model within a Text-To-Speech-System to produce an audio waveform from an in

Type: Grant

Filed: December 6, 2017

Date of Patent: September 17, 2019

Assignee: International Business Machines Corporation

Inventors: Slava Shechtman, Zvi Kons
System and method for generating expressive prosody for speech synthesis

Publication number: 20190172443

Abstract: A method for producing speech comprises: accessing an expressive prosody model, wherein the model is generated by: receiving a plurality of non-neutral prosody vector sequences, each vector associated with one of a plurality of time-instances; receiving a plurality of expression labels, each having a time-instance selected from a plurality of non-neutral time-instances of the plurality of time-instances; producing a plurality of neutral prosody vector sequences equivalent to the plurality of non-neutral sequences by applying a linear combination of a plurality of statistical measures to a plurality of sub-sequences selected according to an identified proximity test applied to a plurality of neutral time-instances of the plurality of time-instances; and training at least one machine learning module using the plurality of non-neutral sequences and the plurality of neutral sequences to produce an expressive prosodic model; and using the model within a Text-To-Speech-System to produce an audio waveform from an in

Type: Application

Filed: December 6, 2017

Publication date: June 6, 2019

Inventors: Slava Shechtman, Zvi Kons
Method and apparatus for detecting splicing attacks on a speaker verification system

Patent number: 10276166

Abstract: A method of detecting an occurrence of splicing in a speech signal includes comparing one or more discontinuities in the test speech signal to one or more reference speech signals corresponding to the test speech signal. The method may further include calculating a frame-based spectral-like representation ST of the speech signal, and calculating a frame-based spectral-like representation SE of a reference speech signal corresponding to the speech signal. The method further includes aligning ST and SE in time and frequency, calculating a distance function associated with aligned ST and SE, and evaluating the distance function to determine a score. The method also includes comparing the score to a threshold to detect if splicing occurs in the speech signal.

Type: Grant

Filed: July 22, 2014

Date of Patent: April 30, 2019

Assignee: Nuance Communications, Inc.

Inventors: Zvi Kons, Ron Hoory, Hagai Aronowitz
Liveness detector for face verification

Patent number: 9996732

Abstract: A method, product and system for implementing liveness detector for face verification. A method comprising detecting a symmetry line of the face; and verifying that the subject moved the mouth by computing a score based on values of a pair of images in the symmetry lines, wherein the score is indicative to a difference in the shape of the mouth between the pair of images. Another method comprises: verifying identity of a subject based on facial recognition and voice recognition, said verifying comprise determining there is mouth movement in an image sequence, wherein said determining comprises: in each image of the sequence, detecting a symmetry line of the face; and verifying that the subject moved the mouth, wherein said verifying comprises: computing a score based on comparison of symmetry lines of the face in different images of the set of images; and comparing the score with a threshold.

Type: Grant

Filed: July 20, 2015

Date of Patent: June 12, 2018

Assignee: International Business Machines Corporation

Inventor: Zvi Kons
Facial feature location using symmetry line

Patent number: 9990537

Abstract: A method, system and product for locating facial features using symmetry line of the face. The method comprises: obtaining an image of a face of a subject; automatically detecting a symmetry line of the face, wherein the symmetry line intersects at least a mouth region of the face; and automatically locating a facial feature of the face using the symmetry line. Optionally, a rotation of the symmetry line is used to select a template, to rotate a template or to rotate the image. Optionally, the facial feature is symmetrical and the facial feature is searched for using a symmetrical template. Optionally, the automatic locating comprises performing a one dimension correlation of an intensity cross section defined by the symmetry line with a cross section template. Optionally, the automatic location comprises correlating a curve defined by the symmetry line with a template curve.

Type: Grant

Filed: July 20, 2015

Date of Patent: June 5, 2018

Assignee: International Business Machines Corporation

Inventor: Zvi Kons
FACIAL FEATURE LOCATION USING SYMMETRY LINE

Publication number: 20170024607

Abstract: A method, system and product for locating facial features using symmetry line of the face. The method comprises: obtaining an image of a face of a subject; automatically detecting a symmetry line of the face, wherein the symmetry line intersects at least a mouth region of the face; and automatically locating a facial feature of the face using the symmetry line. Optionally, a rotation of the symmetry line is used to select a template, to rotate a template or to rotate the image. Optionally, the facial feature is symmetrical and the facial feature is searched for using a symmetrical template. Optionally, the automatic locating comprises performing a one dimension correlation of an intensity cross section defined by the symmetry line with a cross section template. Optionally, the automatic location comprises correlating a curve defined by the symmetry line with a template curve.

Type: Application

Filed: July 20, 2015

Publication date: January 26, 2017

Inventor: Zvi Kons
LIVENESS DETECTOR FOR FACE VERIFICATION

Publication number: 20170024608

Abstract: A method, product and system for implementing liveness detector for face verification. A method comprising detecting a symmetry line of the face; and verifying that the subject moved the mouth by computing a score based on values of a pair of images in the symmetry lines, wherein the score is indicative to a difference in the shape of the mouth between the pair of images. Another method comprises: verifying identity of a subject based on facial recognition and voice recognition, said verifying comprise determining there is mouth movement in an image sequence, wherein said determining comprises: in each image of the sequence, detecting a symmetry line of the face; and verifying that the subject moved the mouth, wherein said verifying comprises: computing a score based on comparison of symmetry lines of the face in different images of the set of images; and comparing the score with a threshold.

Type: Application

Filed: July 20, 2015

Publication date: January 26, 2017

Inventor: Zvi Kons
Method and apparatus for detecting synthesized speech

Patent number: 9484036

Abstract: Computer systems employing speaker verification as a security approach to prevent un-authorized access by intruders may be tricked by a synthetic speech with voice characteristics similar to those of an authorized user of the computer system. According to at least one example embodiment, a method and corresponding apparatus for detecting a synthetic speech signal include extracting a plurality of speech features from multiple segments of the speech signal; analyzing the plurality of speech features to determine whether the plurality of speech features exhibit periodic variation behavior; and determining whether the speech signal is a synthetic speech signal or a natural speech signal based on whether or not a periodic variation behavior of the plurality of speech features is detected. The embodiments of synthetic speech detection result in security enhancement of the computer system employing speaker verification.

Type: Grant

Filed: August 28, 2013

Date of Patent: November 1, 2016

Assignee: Nuance Communications, Inc.

Inventors: Zvi Kons, Hagai Aronowitz, Slava Shechtman
Method and system for text-to-speech synthesis with personalized voice

Patent number: 9368102

Abstract: A method and system are provided for text-to-speech synthesis with personalized voice. The method includes receiving an incidental audio input (403) of speech in the form of an audio communication from an input speaker (401) and generating a voice dataset (404) for the input speaker (401). The method includes receiving a text input (411) at the same device as the audio input (403) and synthesizing (312) the text from the text input (411) to synthesized speech including using the voice dataset (404) to personalize the synthesized speech to sound like the input speaker (401). In addition, the method includes analyzing (316) the text for expression and adding the expression (315) to the synthesized speech. The audio communication may be part of a video communication (453) and the audio input (403) may have an associated visual input (455) of an image of the input speaker.

Type: Grant

Filed: October 10, 2014

Date of Patent: June 14, 2016

Assignee: Nuance Communications, Inc.

Inventors: Itzhack Goldberg, Ron Hoory, Boaz Mizrachi, Zvi Kons
METHOD AND APPARATUS FOR DETECTING SPLICING ATTACKS ON A SPEAKER VERIFICATION SYSTEM

Publication number: 20160027444

Abstract: A method of detecting an occurrence of splicing in a speech signal includes comparing one or more discontinuities in the test speech signal to one or more reference speech signals corresponding to the test speech signal. The method may further include calculating a frame-based spectral-like representation ST of the speech signal, and calculating a frame-based spectral-like representation SE of a reference speech signal corresponding to the speech signal. The method further includes aligning ST and SE in time and frequency, calculating a distance function associated with aligned ST and SE, and evaluating the distance function to determine a score. The method also includes comparing the score to a threshold to detect if splicing occurs in the speech signal.

Type: Application

Filed: July 22, 2014

Publication date: January 28, 2016

Inventors: Zvi Kons, Ron Hoory, Hagai Aronowitz
Vocal source extraction by maximum phase detection

Patent number: 9105272

Abstract: Methods, apparatus and computer program products implement embodiments of the present invention that include receiving a time domain voice signal, and extracting a single pitch cycle from the received signal. The extracted single pitch cycle is transformed to a frequency domain, and the misclassified roots of the frequency domain are identified and corrected. Using the corrected roots, an indication of a maximum phase of the frequency domain is generated.

Type: Grant

Filed: June 4, 2012

Date of Patent: August 11, 2015

Assignees: The Lithuanian University of Health Sciences, INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Aharon Satt, Zvi Kons, Ron Hoory, Virgilijus Ulozas
Method and Apparatus for Detecting Synthesized Speech

Publication number: 20150066512

Abstract: Computer systems employing speaker verification as a security approach to prevent un-authorized access by intruders may be tricked by a synthetic speech with voice characteristics similar to those of an authorized user of the computer system. According to at least one example embodiment, a method and corresponding apparatus for detecting a synthetic speech signal include extracting a plurality of speech features from multiple segments of the speech signal; analyzing the plurality of speech features to determine whether the plurality of speech features exhibit periodic variation behavior; and determining whether the speech signal is a synthetic speech signal or a natural speech signal based on whether or not a periodic variation behavior of the plurality of speech features is detected. The embodiments of synthetic speech detection result in security enhancement of the computer system employing speaker verification.

Type: Application

Filed: August 28, 2013

Publication date: March 5, 2015

Applicant: NUANCE COMMUNICATIONS, INC.

Inventors: Zvi Kons, Hagai Aronowitz, Slava Shechtman
METHOD AND SYSTEM FOR TEXT-TO-SPEECH SYNTHESIS WITH PERSONALIZED VOICE

Publication number: 20150025891

Abstract: A method and system are provided for text-to-speech synthesis with personalized voice. The method includes receiving an incidental audio input (403) of speech in the form of an audio communication from an input speaker (401) and generating a voice dataset (404) for the input speaker (401). The method includes receiving a text input (411) at the same device as the audio input (403) and synthesizing (312) the text from the text input (411) to synthesized speech including using the voice dataset (404) to personalize the synthesized speech to sound like the input speaker (401). In addition, the method includes analyzing (316) the text for expression and adding the expression (315) to the synthesized speech. The audio communication may be part of a video communication (453) and the audio input (403) may have an associated visual input (455) of an image of the input speaker.

Type: Application

Filed: October 10, 2014

Publication date: January 22, 2015

Applicant: Nuance Communications, Inc.

Inventors: Itzhack Goldberg, Ron Hoory, Boaz Mizrachi, Zvi Kons
Voice transformation with encoded information

Patent number: 8930182

Abstract: Method, system, and computer program product for voice transformation are provided. The method includes transforming a source speech using transformation parameters, and encoding information on the transformation parameters in an output speech using steganography, wherein the source speech can be reconstructed using the output speech and the information on the transformation parameters. A method for reconstructing voice transformation is also provided including: receiving an output speech of a voice transformation system wherein the output speech is transformed speech which has encoded information on the transformation parameters using steganography; extracting the information on the transformation parameters; and carrying out an inverse transformation of the output speech to obtain an approximation of an original source speech.

Type: Grant

Filed: March 17, 2011

Date of Patent: January 6, 2015

Assignee: International Business Machines Corporation

Inventors: Shay Ben-David, Ron Hoory, Zvi Kons, David Nahamoo
Method and system for text-to-speech synthesis with personalized voice

Patent number: 8886537

Abstract: A method and system are provided for text-to-speech synthesis with personalized voice. The method includes receiving an incidental audio input (403) of speech in the form of an audio communication from an input speaker (401) and generating a voice dataset (404) for the input speaker (401). The method includes receiving a text input (411) at the same device as the audio input (403) and synthesizing (312) the text from the text input (411) to synthesized speech including using the voice dataset (404) to personalize the synthesized speech to sound like the input speaker (401). In addition, the method includes analyzing (316) the text for expression and adding the expression (315) to the synthesized speech. The audio communication may be part of a video communication (453) and the audio input (403) may have an associated visual input (455) of an image of the input speaker.

Type: Grant

Filed: March 20, 2007

Date of Patent: November 11, 2014

Assignee: Nuance Communications, Inc.

Inventors: Itzhack Goldberg, Ron Hoory, Boaz Mizrachi, Zvi Kons

1 2 next