Patents by Inventor Zvi Kons

Zvi Kons has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20240038216
    Abstract: An example system includes a processor to receive encoded audio from an encoder of a pre-trained speech-to-text (STT) model. The processor is to further train a language identification (LID) classifier to detect a language of the encoded audio using training samples labeled by language.
    Type: Application
    Filed: July 27, 2022
    Publication date: February 1, 2024
    Inventor: Zvi KONS
  • Publication number: 20230177273
    Abstract: A computer-implemented method, a computer system and a computer program product enhance an intent classifier through training data augmentation. The method includes selecting a target sample from a plurality of samples. The method also includes determining an ambiguity level for the target sample based on confidence scores of at least two intent labels associated with the target sample. The method further includes selecting a nearest neighboring sample from a group of neighboring samples when the ambiguity level is below a threshold. The nearest neighboring sample includes a confidence score associated with an intent label. The method also includes, for every intent label, merging the confidence scores of the two samples into an overall confidence score for the intent label and modifying the ambiguity level using the overall confidence score. Lastly, the method includes labeling the target sample with the intent label when the modified ambiguity level is above the threshold.
    Type: Application
    Filed: December 8, 2021
    Publication date: June 8, 2023
    Inventors: Zvi Kons, Aharon Satt
  • Patent number: 11211053
    Abstract: There is provided a computer implemented method of presenting color coded text generated from an audio track of a video, the color coding denoting respective speakers, comprising: receiving the audio track of the video divided into a plurality of audio-segments each representing speech spoken by a respective speaker of a plurality of speakers, for each audio-segment of the plurality of audio-segments: receiving a text representation of the audio-segment, extracting a feature vector from the audio-segment, mapping the feature vector to a color space, coloring the text representation according to the color space, and presenting the colored text representation in association with a video-segment corresponding to the audio-segment.
    Type: Grant
    Filed: May 23, 2019
    Date of Patent: December 28, 2021
    Assignee: International Business Machines Corporation
    Inventors: Hagai Aronowitz, Zvi Kons
  • Publication number: 20210304783
    Abstract: Method, system and computer program product, the method comprising: receiving a first audio, wherein the first audio is a conversion of an audio by a first source to a second source, wherein the first audio having embedded therein first information characterizing the first source of the audio; extracting from the first audio the first information of the first source embedded within the first audio; obtaining second information characterizing a third source; comparing the first information to the second information to obtain comparison results; and subject to the comparison results indicating that the first source is the same as the third source, initiating an action.
    Type: Application
    Filed: March 31, 2020
    Publication date: September 30, 2021
    Inventors: ZVI KONS, VYACHESLAV SHECHTMAN
  • Publication number: 20200372899
    Abstract: There is provided a computer implemented method of presenting color coded text generated from an audio track of a video, the color coding denoting respective speakers, comprising: receiving the audio track of the video divided into a plurality of audio-segments each representing speech spoken by a respective speaker of a plurality of speakers, for each audio-segment of the plurality of audio-segments: receiving a text representation of the audio-segment, extracting a feature vector from the audio-segment, mapping the feature vector to a color space, coloring the text representation according to the color space, and presenting the colored text representation in association with a video-segment corresponding to the audio-segment.
    Type: Application
    Filed: May 23, 2019
    Publication date: November 26, 2020
    Inventors: HAGAI ARONOWITZ, Zvi Kons
  • Patent number: 10418025
    Abstract: A method for producing speech comprises: accessing an expressive prosody model, wherein the model is generated by: receiving a plurality of non-neutral prosody vector sequences, each vector associated with one of a plurality of time-instances; receiving a plurality of expression labels, each having a time-instance selected from a plurality of non-neutral time-instances of the plurality of time-instances; producing a plurality of neutral prosody vector sequences equivalent to the plurality of non-neutral sequences by applying a linear combination of a plurality of statistical measures to a plurality of sub-sequences selected according to an identified proximity test applied to a plurality of neutral time-instances of the plurality of time-instances; and training at least one machine learning module using the plurality of non-neutral sequences and the plurality of neutral sequences to produce an expressive prosodic model; and using the model within a Text-To-Speech-System to produce an audio waveform from an in
    Type: Grant
    Filed: December 6, 2017
    Date of Patent: September 17, 2019
    Assignee: International Business Machines Corporation
    Inventors: Slava Shechtman, Zvi Kons
  • Publication number: 20190172443
    Abstract: A method for producing speech comprises: accessing an expressive prosody model, wherein the model is generated by: receiving a plurality of non-neutral prosody vector sequences, each vector associated with one of a plurality of time-instances; receiving a plurality of expression labels, each having a time-instance selected from a plurality of non-neutral time-instances of the plurality of time-instances; producing a plurality of neutral prosody vector sequences equivalent to the plurality of non-neutral sequences by applying a linear combination of a plurality of statistical measures to a plurality of sub-sequences selected according to an identified proximity test applied to a plurality of neutral time-instances of the plurality of time-instances; and training at least one machine learning module using the plurality of non-neutral sequences and the plurality of neutral sequences to produce an expressive prosodic model; and using the model within a Text-To-Speech-System to produce an audio waveform from an in
    Type: Application
    Filed: December 6, 2017
    Publication date: June 6, 2019
    Inventors: Slava Shechtman, Zvi Kons
  • Patent number: 10276166
    Abstract: A method of detecting an occurrence of splicing in a speech signal includes comparing one or more discontinuities in the test speech signal to one or more reference speech signals corresponding to the test speech signal. The method may further include calculating a frame-based spectral-like representation ST of the speech signal, and calculating a frame-based spectral-like representation SE of a reference speech signal corresponding to the speech signal. The method further includes aligning ST and SE in time and frequency, calculating a distance function associated with aligned ST and SE, and evaluating the distance function to determine a score. The method also includes comparing the score to a threshold to detect if splicing occurs in the speech signal.
    Type: Grant
    Filed: July 22, 2014
    Date of Patent: April 30, 2019
    Assignee: Nuance Communications, Inc.
    Inventors: Zvi Kons, Ron Hoory, Hagai Aronowitz
  • Patent number: 9996732
    Abstract: A method, product and system for implementing liveness detector for face verification. A method comprising detecting a symmetry line of the face; and verifying that the subject moved the mouth by computing a score based on values of a pair of images in the symmetry lines, wherein the score is indicative to a difference in the shape of the mouth between the pair of images. Another method comprises: verifying identity of a subject based on facial recognition and voice recognition, said verifying comprise determining there is mouth movement in an image sequence, wherein said determining comprises: in each image of the sequence, detecting a symmetry line of the face; and verifying that the subject moved the mouth, wherein said verifying comprises: computing a score based on comparison of symmetry lines of the face in different images of the set of images; and comparing the score with a threshold.
    Type: Grant
    Filed: July 20, 2015
    Date of Patent: June 12, 2018
    Assignee: International Business Machines Corporation
    Inventor: Zvi Kons
  • Patent number: 9990537
    Abstract: A method, system and product for locating facial features using symmetry line of the face. The method comprises: obtaining an image of a face of a subject; automatically detecting a symmetry line of the face, wherein the symmetry line intersects at least a mouth region of the face; and automatically locating a facial feature of the face using the symmetry line. Optionally, a rotation of the symmetry line is used to select a template, to rotate a template or to rotate the image. Optionally, the facial feature is symmetrical and the facial feature is searched for using a symmetrical template. Optionally, the automatic locating comprises performing a one dimension correlation of an intensity cross section defined by the symmetry line with a cross section template. Optionally, the automatic location comprises correlating a curve defined by the symmetry line with a template curve.
    Type: Grant
    Filed: July 20, 2015
    Date of Patent: June 5, 2018
    Assignee: International Business Machines Corporation
    Inventor: Zvi Kons
  • Publication number: 20170024607
    Abstract: A method, system and product for locating facial features using symmetry line of the face. The method comprises: obtaining an image of a face of a subject; automatically detecting a symmetry line of the face, wherein the symmetry line intersects at least a mouth region of the face; and automatically locating a facial feature of the face using the symmetry line. Optionally, a rotation of the symmetry line is used to select a template, to rotate a template or to rotate the image. Optionally, the facial feature is symmetrical and the facial feature is searched for using a symmetrical template. Optionally, the automatic locating comprises performing a one dimension correlation of an intensity cross section defined by the symmetry line with a cross section template. Optionally, the automatic location comprises correlating a curve defined by the symmetry line with a template curve.
    Type: Application
    Filed: July 20, 2015
    Publication date: January 26, 2017
    Inventor: Zvi Kons
  • Publication number: 20170024608
    Abstract: A method, product and system for implementing liveness detector for face verification. A method comprising detecting a symmetry line of the face; and verifying that the subject moved the mouth by computing a score based on values of a pair of images in the symmetry lines, wherein the score is indicative to a difference in the shape of the mouth between the pair of images. Another method comprises: verifying identity of a subject based on facial recognition and voice recognition, said verifying comprise determining there is mouth movement in an image sequence, wherein said determining comprises: in each image of the sequence, detecting a symmetry line of the face; and verifying that the subject moved the mouth, wherein said verifying comprises: computing a score based on comparison of symmetry lines of the face in different images of the set of images; and comparing the score with a threshold.
    Type: Application
    Filed: July 20, 2015
    Publication date: January 26, 2017
    Inventor: Zvi Kons
  • Patent number: 9484036
    Abstract: Computer systems employing speaker verification as a security approach to prevent un-authorized access by intruders may be tricked by a synthetic speech with voice characteristics similar to those of an authorized user of the computer system. According to at least one example embodiment, a method and corresponding apparatus for detecting a synthetic speech signal include extracting a plurality of speech features from multiple segments of the speech signal; analyzing the plurality of speech features to determine whether the plurality of speech features exhibit periodic variation behavior; and determining whether the speech signal is a synthetic speech signal or a natural speech signal based on whether or not a periodic variation behavior of the plurality of speech features is detected. The embodiments of synthetic speech detection result in security enhancement of the computer system employing speaker verification.
    Type: Grant
    Filed: August 28, 2013
    Date of Patent: November 1, 2016
    Assignee: Nuance Communications, Inc.
    Inventors: Zvi Kons, Hagai Aronowitz, Slava Shechtman
  • Patent number: 9368102
    Abstract: A method and system are provided for text-to-speech synthesis with personalized voice. The method includes receiving an incidental audio input (403) of speech in the form of an audio communication from an input speaker (401) and generating a voice dataset (404) for the input speaker (401). The method includes receiving a text input (411) at the same device as the audio input (403) and synthesizing (312) the text from the text input (411) to synthesized speech including using the voice dataset (404) to personalize the synthesized speech to sound like the input speaker (401). In addition, the method includes analyzing (316) the text for expression and adding the expression (315) to the synthesized speech. The audio communication may be part of a video communication (453) and the audio input (403) may have an associated visual input (455) of an image of the input speaker.
    Type: Grant
    Filed: October 10, 2014
    Date of Patent: June 14, 2016
    Assignee: Nuance Communications, Inc.
    Inventors: Itzhack Goldberg, Ron Hoory, Boaz Mizrachi, Zvi Kons
  • Publication number: 20160027444
    Abstract: A method of detecting an occurrence of splicing in a speech signal includes comparing one or more discontinuities in the test speech signal to one or more reference speech signals corresponding to the test speech signal. The method may further include calculating a frame-based spectral-like representation ST of the speech signal, and calculating a frame-based spectral-like representation SE of a reference speech signal corresponding to the speech signal. The method further includes aligning ST and SE in time and frequency, calculating a distance function associated with aligned ST and SE, and evaluating the distance function to determine a score. The method also includes comparing the score to a threshold to detect if splicing occurs in the speech signal.
    Type: Application
    Filed: July 22, 2014
    Publication date: January 28, 2016
    Inventors: Zvi Kons, Ron Hoory, Hagai Aronowitz
  • Patent number: 9105272
    Abstract: Methods, apparatus and computer program products implement embodiments of the present invention that include receiving a time domain voice signal, and extracting a single pitch cycle from the received signal. The extracted single pitch cycle is transformed to a frequency domain, and the misclassified roots of the frequency domain are identified and corrected. Using the corrected roots, an indication of a maximum phase of the frequency domain is generated.
    Type: Grant
    Filed: June 4, 2012
    Date of Patent: August 11, 2015
    Assignees: The Lithuanian University of Health Sciences, INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Aharon Satt, Zvi Kons, Ron Hoory, Virgilijus Ulozas
  • Publication number: 20150066512
    Abstract: Computer systems employing speaker verification as a security approach to prevent un-authorized access by intruders may be tricked by a synthetic speech with voice characteristics similar to those of an authorized user of the computer system. According to at least one example embodiment, a method and corresponding apparatus for detecting a synthetic speech signal include extracting a plurality of speech features from multiple segments of the speech signal; analyzing the plurality of speech features to determine whether the plurality of speech features exhibit periodic variation behavior; and determining whether the speech signal is a synthetic speech signal or a natural speech signal based on whether or not a periodic variation behavior of the plurality of speech features is detected. The embodiments of synthetic speech detection result in security enhancement of the computer system employing speaker verification.
    Type: Application
    Filed: August 28, 2013
    Publication date: March 5, 2015
    Applicant: NUANCE COMMUNICATIONS, INC.
    Inventors: Zvi Kons, Hagai Aronowitz, Slava Shechtman
  • Publication number: 20150025891
    Abstract: A method and system are provided for text-to-speech synthesis with personalized voice. The method includes receiving an incidental audio input (403) of speech in the form of an audio communication from an input speaker (401) and generating a voice dataset (404) for the input speaker (401). The method includes receiving a text input (411) at the same device as the audio input (403) and synthesizing (312) the text from the text input (411) to synthesized speech including using the voice dataset (404) to personalize the synthesized speech to sound like the input speaker (401). In addition, the method includes analyzing (316) the text for expression and adding the expression (315) to the synthesized speech. The audio communication may be part of a video communication (453) and the audio input (403) may have an associated visual input (455) of an image of the input speaker.
    Type: Application
    Filed: October 10, 2014
    Publication date: January 22, 2015
    Applicant: Nuance Communications, Inc.
    Inventors: Itzhack Goldberg, Ron Hoory, Boaz Mizrachi, Zvi Kons
  • Patent number: 8930182
    Abstract: Method, system, and computer program product for voice transformation are provided. The method includes transforming a source speech using transformation parameters, and encoding information on the transformation parameters in an output speech using steganography, wherein the source speech can be reconstructed using the output speech and the information on the transformation parameters. A method for reconstructing voice transformation is also provided including: receiving an output speech of a voice transformation system wherein the output speech is transformed speech which has encoded information on the transformation parameters using steganography; extracting the information on the transformation parameters; and carrying out an inverse transformation of the output speech to obtain an approximation of an original source speech.
    Type: Grant
    Filed: March 17, 2011
    Date of Patent: January 6, 2015
    Assignee: International Business Machines Corporation
    Inventors: Shay Ben-David, Ron Hoory, Zvi Kons, David Nahamoo
  • Patent number: 8886537
    Abstract: A method and system are provided for text-to-speech synthesis with personalized voice. The method includes receiving an incidental audio input (403) of speech in the form of an audio communication from an input speaker (401) and generating a voice dataset (404) for the input speaker (401). The method includes receiving a text input (411) at the same device as the audio input (403) and synthesizing (312) the text from the text input (411) to synthesized speech including using the voice dataset (404) to personalize the synthesized speech to sound like the input speaker (401). In addition, the method includes analyzing (316) the text for expression and adding the expression (315) to the synthesized speech. The audio communication may be part of a video communication (453) and the audio input (403) may have an associated visual input (455) of an image of the input speaker.
    Type: Grant
    Filed: March 20, 2007
    Date of Patent: November 11, 2014
    Assignee: Nuance Communications, Inc.
    Inventors: Itzhack Goldberg, Ron Hoory, Boaz Mizrachi, Zvi Kons