Patents by Inventor Zvi Kons
Zvi Kons has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20240038216Abstract: An example system includes a processor to receive encoded audio from an encoder of a pre-trained speech-to-text (STT) model. The processor is to further train a language identification (LID) classifier to detect a language of the encoded audio using training samples labeled by language.Type: ApplicationFiled: July 27, 2022Publication date: February 1, 2024Inventor: Zvi KONS
-
Publication number: 20230177273Abstract: A computer-implemented method, a computer system and a computer program product enhance an intent classifier through training data augmentation. The method includes selecting a target sample from a plurality of samples. The method also includes determining an ambiguity level for the target sample based on confidence scores of at least two intent labels associated with the target sample. The method further includes selecting a nearest neighboring sample from a group of neighboring samples when the ambiguity level is below a threshold. The nearest neighboring sample includes a confidence score associated with an intent label. The method also includes, for every intent label, merging the confidence scores of the two samples into an overall confidence score for the intent label and modifying the ambiguity level using the overall confidence score. Lastly, the method includes labeling the target sample with the intent label when the modified ambiguity level is above the threshold.Type: ApplicationFiled: December 8, 2021Publication date: June 8, 2023Inventors: Zvi Kons, Aharon Satt
-
Patent number: 11211053Abstract: There is provided a computer implemented method of presenting color coded text generated from an audio track of a video, the color coding denoting respective speakers, comprising: receiving the audio track of the video divided into a plurality of audio-segments each representing speech spoken by a respective speaker of a plurality of speakers, for each audio-segment of the plurality of audio-segments: receiving a text representation of the audio-segment, extracting a feature vector from the audio-segment, mapping the feature vector to a color space, coloring the text representation according to the color space, and presenting the colored text representation in association with a video-segment corresponding to the audio-segment.Type: GrantFiled: May 23, 2019Date of Patent: December 28, 2021Assignee: International Business Machines CorporationInventors: Hagai Aronowitz, Zvi Kons
-
Publication number: 20210304783Abstract: Method, system and computer program product, the method comprising: receiving a first audio, wherein the first audio is a conversion of an audio by a first source to a second source, wherein the first audio having embedded therein first information characterizing the first source of the audio; extracting from the first audio the first information of the first source embedded within the first audio; obtaining second information characterizing a third source; comparing the first information to the second information to obtain comparison results; and subject to the comparison results indicating that the first source is the same as the third source, initiating an action.Type: ApplicationFiled: March 31, 2020Publication date: September 30, 2021Inventors: ZVI KONS, VYACHESLAV SHECHTMAN
-
Publication number: 20200372899Abstract: There is provided a computer implemented method of presenting color coded text generated from an audio track of a video, the color coding denoting respective speakers, comprising: receiving the audio track of the video divided into a plurality of audio-segments each representing speech spoken by a respective speaker of a plurality of speakers, for each audio-segment of the plurality of audio-segments: receiving a text representation of the audio-segment, extracting a feature vector from the audio-segment, mapping the feature vector to a color space, coloring the text representation according to the color space, and presenting the colored text representation in association with a video-segment corresponding to the audio-segment.Type: ApplicationFiled: May 23, 2019Publication date: November 26, 2020Inventors: HAGAI ARONOWITZ, Zvi Kons
-
Patent number: 10418025Abstract: A method for producing speech comprises: accessing an expressive prosody model, wherein the model is generated by: receiving a plurality of non-neutral prosody vector sequences, each vector associated with one of a plurality of time-instances; receiving a plurality of expression labels, each having a time-instance selected from a plurality of non-neutral time-instances of the plurality of time-instances; producing a plurality of neutral prosody vector sequences equivalent to the plurality of non-neutral sequences by applying a linear combination of a plurality of statistical measures to a plurality of sub-sequences selected according to an identified proximity test applied to a plurality of neutral time-instances of the plurality of time-instances; and training at least one machine learning module using the plurality of non-neutral sequences and the plurality of neutral sequences to produce an expressive prosodic model; and using the model within a Text-To-Speech-System to produce an audio waveform from an inType: GrantFiled: December 6, 2017Date of Patent: September 17, 2019Assignee: International Business Machines CorporationInventors: Slava Shechtman, Zvi Kons
-
Publication number: 20190172443Abstract: A method for producing speech comprises: accessing an expressive prosody model, wherein the model is generated by: receiving a plurality of non-neutral prosody vector sequences, each vector associated with one of a plurality of time-instances; receiving a plurality of expression labels, each having a time-instance selected from a plurality of non-neutral time-instances of the plurality of time-instances; producing a plurality of neutral prosody vector sequences equivalent to the plurality of non-neutral sequences by applying a linear combination of a plurality of statistical measures to a plurality of sub-sequences selected according to an identified proximity test applied to a plurality of neutral time-instances of the plurality of time-instances; and training at least one machine learning module using the plurality of non-neutral sequences and the plurality of neutral sequences to produce an expressive prosodic model; and using the model within a Text-To-Speech-System to produce an audio waveform from an inType: ApplicationFiled: December 6, 2017Publication date: June 6, 2019Inventors: Slava Shechtman, Zvi Kons
-
Patent number: 10276166Abstract: A method of detecting an occurrence of splicing in a speech signal includes comparing one or more discontinuities in the test speech signal to one or more reference speech signals corresponding to the test speech signal. The method may further include calculating a frame-based spectral-like representation ST of the speech signal, and calculating a frame-based spectral-like representation SE of a reference speech signal corresponding to the speech signal. The method further includes aligning ST and SE in time and frequency, calculating a distance function associated with aligned ST and SE, and evaluating the distance function to determine a score. The method also includes comparing the score to a threshold to detect if splicing occurs in the speech signal.Type: GrantFiled: July 22, 2014Date of Patent: April 30, 2019Assignee: Nuance Communications, Inc.Inventors: Zvi Kons, Ron Hoory, Hagai Aronowitz
-
Patent number: 9996732Abstract: A method, product and system for implementing liveness detector for face verification. A method comprising detecting a symmetry line of the face; and verifying that the subject moved the mouth by computing a score based on values of a pair of images in the symmetry lines, wherein the score is indicative to a difference in the shape of the mouth between the pair of images. Another method comprises: verifying identity of a subject based on facial recognition and voice recognition, said verifying comprise determining there is mouth movement in an image sequence, wherein said determining comprises: in each image of the sequence, detecting a symmetry line of the face; and verifying that the subject moved the mouth, wherein said verifying comprises: computing a score based on comparison of symmetry lines of the face in different images of the set of images; and comparing the score with a threshold.Type: GrantFiled: July 20, 2015Date of Patent: June 12, 2018Assignee: International Business Machines CorporationInventor: Zvi Kons
-
Patent number: 9990537Abstract: A method, system and product for locating facial features using symmetry line of the face. The method comprises: obtaining an image of a face of a subject; automatically detecting a symmetry line of the face, wherein the symmetry line intersects at least a mouth region of the face; and automatically locating a facial feature of the face using the symmetry line. Optionally, a rotation of the symmetry line is used to select a template, to rotate a template or to rotate the image. Optionally, the facial feature is symmetrical and the facial feature is searched for using a symmetrical template. Optionally, the automatic locating comprises performing a one dimension correlation of an intensity cross section defined by the symmetry line with a cross section template. Optionally, the automatic location comprises correlating a curve defined by the symmetry line with a template curve.Type: GrantFiled: July 20, 2015Date of Patent: June 5, 2018Assignee: International Business Machines CorporationInventor: Zvi Kons
-
Publication number: 20170024607Abstract: A method, system and product for locating facial features using symmetry line of the face. The method comprises: obtaining an image of a face of a subject; automatically detecting a symmetry line of the face, wherein the symmetry line intersects at least a mouth region of the face; and automatically locating a facial feature of the face using the symmetry line. Optionally, a rotation of the symmetry line is used to select a template, to rotate a template or to rotate the image. Optionally, the facial feature is symmetrical and the facial feature is searched for using a symmetrical template. Optionally, the automatic locating comprises performing a one dimension correlation of an intensity cross section defined by the symmetry line with a cross section template. Optionally, the automatic location comprises correlating a curve defined by the symmetry line with a template curve.Type: ApplicationFiled: July 20, 2015Publication date: January 26, 2017Inventor: Zvi Kons
-
Publication number: 20170024608Abstract: A method, product and system for implementing liveness detector for face verification. A method comprising detecting a symmetry line of the face; and verifying that the subject moved the mouth by computing a score based on values of a pair of images in the symmetry lines, wherein the score is indicative to a difference in the shape of the mouth between the pair of images. Another method comprises: verifying identity of a subject based on facial recognition and voice recognition, said verifying comprise determining there is mouth movement in an image sequence, wherein said determining comprises: in each image of the sequence, detecting a symmetry line of the face; and verifying that the subject moved the mouth, wherein said verifying comprises: computing a score based on comparison of symmetry lines of the face in different images of the set of images; and comparing the score with a threshold.Type: ApplicationFiled: July 20, 2015Publication date: January 26, 2017Inventor: Zvi Kons
-
Patent number: 9484036Abstract: Computer systems employing speaker verification as a security approach to prevent un-authorized access by intruders may be tricked by a synthetic speech with voice characteristics similar to those of an authorized user of the computer system. According to at least one example embodiment, a method and corresponding apparatus for detecting a synthetic speech signal include extracting a plurality of speech features from multiple segments of the speech signal; analyzing the plurality of speech features to determine whether the plurality of speech features exhibit periodic variation behavior; and determining whether the speech signal is a synthetic speech signal or a natural speech signal based on whether or not a periodic variation behavior of the plurality of speech features is detected. The embodiments of synthetic speech detection result in security enhancement of the computer system employing speaker verification.Type: GrantFiled: August 28, 2013Date of Patent: November 1, 2016Assignee: Nuance Communications, Inc.Inventors: Zvi Kons, Hagai Aronowitz, Slava Shechtman
-
Patent number: 9368102Abstract: A method and system are provided for text-to-speech synthesis with personalized voice. The method includes receiving an incidental audio input (403) of speech in the form of an audio communication from an input speaker (401) and generating a voice dataset (404) for the input speaker (401). The method includes receiving a text input (411) at the same device as the audio input (403) and synthesizing (312) the text from the text input (411) to synthesized speech including using the voice dataset (404) to personalize the synthesized speech to sound like the input speaker (401). In addition, the method includes analyzing (316) the text for expression and adding the expression (315) to the synthesized speech. The audio communication may be part of a video communication (453) and the audio input (403) may have an associated visual input (455) of an image of the input speaker.Type: GrantFiled: October 10, 2014Date of Patent: June 14, 2016Assignee: Nuance Communications, Inc.Inventors: Itzhack Goldberg, Ron Hoory, Boaz Mizrachi, Zvi Kons
-
Publication number: 20160027444Abstract: A method of detecting an occurrence of splicing in a speech signal includes comparing one or more discontinuities in the test speech signal to one or more reference speech signals corresponding to the test speech signal. The method may further include calculating a frame-based spectral-like representation ST of the speech signal, and calculating a frame-based spectral-like representation SE of a reference speech signal corresponding to the speech signal. The method further includes aligning ST and SE in time and frequency, calculating a distance function associated with aligned ST and SE, and evaluating the distance function to determine a score. The method also includes comparing the score to a threshold to detect if splicing occurs in the speech signal.Type: ApplicationFiled: July 22, 2014Publication date: January 28, 2016Inventors: Zvi Kons, Ron Hoory, Hagai Aronowitz
-
Patent number: 9105272Abstract: Methods, apparatus and computer program products implement embodiments of the present invention that include receiving a time domain voice signal, and extracting a single pitch cycle from the received signal. The extracted single pitch cycle is transformed to a frequency domain, and the misclassified roots of the frequency domain are identified and corrected. Using the corrected roots, an indication of a maximum phase of the frequency domain is generated.Type: GrantFiled: June 4, 2012Date of Patent: August 11, 2015Assignees: The Lithuanian University of Health Sciences, INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Aharon Satt, Zvi Kons, Ron Hoory, Virgilijus Ulozas
-
Publication number: 20150066512Abstract: Computer systems employing speaker verification as a security approach to prevent un-authorized access by intruders may be tricked by a synthetic speech with voice characteristics similar to those of an authorized user of the computer system. According to at least one example embodiment, a method and corresponding apparatus for detecting a synthetic speech signal include extracting a plurality of speech features from multiple segments of the speech signal; analyzing the plurality of speech features to determine whether the plurality of speech features exhibit periodic variation behavior; and determining whether the speech signal is a synthetic speech signal or a natural speech signal based on whether or not a periodic variation behavior of the plurality of speech features is detected. The embodiments of synthetic speech detection result in security enhancement of the computer system employing speaker verification.Type: ApplicationFiled: August 28, 2013Publication date: March 5, 2015Applicant: NUANCE COMMUNICATIONS, INC.Inventors: Zvi Kons, Hagai Aronowitz, Slava Shechtman
-
Publication number: 20150025891Abstract: A method and system are provided for text-to-speech synthesis with personalized voice. The method includes receiving an incidental audio input (403) of speech in the form of an audio communication from an input speaker (401) and generating a voice dataset (404) for the input speaker (401). The method includes receiving a text input (411) at the same device as the audio input (403) and synthesizing (312) the text from the text input (411) to synthesized speech including using the voice dataset (404) to personalize the synthesized speech to sound like the input speaker (401). In addition, the method includes analyzing (316) the text for expression and adding the expression (315) to the synthesized speech. The audio communication may be part of a video communication (453) and the audio input (403) may have an associated visual input (455) of an image of the input speaker.Type: ApplicationFiled: October 10, 2014Publication date: January 22, 2015Applicant: Nuance Communications, Inc.Inventors: Itzhack Goldberg, Ron Hoory, Boaz Mizrachi, Zvi Kons
-
Patent number: 8930182Abstract: Method, system, and computer program product for voice transformation are provided. The method includes transforming a source speech using transformation parameters, and encoding information on the transformation parameters in an output speech using steganography, wherein the source speech can be reconstructed using the output speech and the information on the transformation parameters. A method for reconstructing voice transformation is also provided including: receiving an output speech of a voice transformation system wherein the output speech is transformed speech which has encoded information on the transformation parameters using steganography; extracting the information on the transformation parameters; and carrying out an inverse transformation of the output speech to obtain an approximation of an original source speech.Type: GrantFiled: March 17, 2011Date of Patent: January 6, 2015Assignee: International Business Machines CorporationInventors: Shay Ben-David, Ron Hoory, Zvi Kons, David Nahamoo
-
Patent number: 8886537Abstract: A method and system are provided for text-to-speech synthesis with personalized voice. The method includes receiving an incidental audio input (403) of speech in the form of an audio communication from an input speaker (401) and generating a voice dataset (404) for the input speaker (401). The method includes receiving a text input (411) at the same device as the audio input (403) and synthesizing (312) the text from the text input (411) to synthesized speech including using the voice dataset (404) to personalize the synthesized speech to sound like the input speaker (401). In addition, the method includes analyzing (316) the text for expression and adding the expression (315) to the synthesized speech. The audio communication may be part of a video communication (453) and the audio input (403) may have an associated visual input (455) of an image of the input speaker.Type: GrantFiled: March 20, 2007Date of Patent: November 11, 2014Assignee: Nuance Communications, Inc.Inventors: Itzhack Goldberg, Ron Hoory, Boaz Mizrachi, Zvi Kons