Pitch Patents (Class 704/207)
  • Patent number: 11348591
    Abstract: A speaker identification system and method to identify a speaker based on the speaker's voice is disclosed. In an exemplary embodiment, the speaker identification system comprises a Gaussian Mixture Model (GMM) for speaker accent and dialect identification for a given speech signal input by the speaker and an Artificial Neural Network (ANN) to identify the speaker based on the identified dialect, in which the output of the GMM is input to the ANN.
    Type: Grant
    Filed: September 23, 2021
    Date of Patent: May 31, 2022
    Assignee: King Abdulaziz University
    Inventors: Muhammad Moinuddin, Ubaid M. Al-Saggaf, Shahid Munir Shah, Rizwan Ahmed Khan, Zahraa Ubaid Al-Saggaf
  • Patent number: 11303489
    Abstract: A transmitting apparatus includes a first signal generating unit that generates, on the basis of data a first signal transmitted by single carrier block transmission; a second signal generating unit that generates, on the basis of an RS, a second signal transmitted by orthogonal frequency division multiplex transmission; a switching operator that selects and outputs the second signal in a first transmission period and selects and outputs the first signal in a second transmission period; an antenna that transmits the signal output from the switching operator; and a control-signal generating unit that controls the second signal generating unit such that, in the first transmission period, the RS is arranged in a frequency band allocated for transmission of the RS from the transmitting apparatus among frequency bands usable in OFDM.
    Type: Grant
    Filed: January 30, 2020
    Date of Patent: April 12, 2022
    Assignee: Mitsubishi Electric Corporation
    Inventors: Fumihiro Hasegawa, Akinori Taira
  • Patent number: 11289067
    Abstract: Methods and systems for generating voices based on characteristics of an avatar. One or more characteristics of an avatar are obtained and one or more parameters of a voice synthesizer for generating a voice corresponding to the one or more avatar characteristics are determined. The voice synthesizer is configured based on the one or more parameters and a voice is generated using the parameterized voice synthesizer.
    Type: Grant
    Filed: June 25, 2019
    Date of Patent: March 29, 2022
    Assignee: International Business Machines Corporation
    Inventors: Kristina Marie Brimijoin, Gregory Boland, Joseph Schwarz
  • Patent number: 11282534
    Abstract: Systems and methods for intelligent playback of media content may include an intelligent media playback system that, in response to determining the speech tempo in audio content by measuring syllable density of speech in the audio content, automatically adjusts a playback speed of the audio content as the audio content is being played based on the determined speech tempo. In some embodiments, the system may automatically and dynamically adjust the playback speed to result in a desired target speech tempo. In addition, the system may determine whether to automatically adjust playback speed of the audio content, as the media is being played, based on the detected speech tempo of the speech in the audio content and the determined type of content of media. Such automatic adjustments in playback speed result in more efficient playback of the audio content.
    Type: Grant
    Filed: August 3, 2018
    Date of Patent: March 22, 2022
    Assignee: Sling Media PVT Ltd
    Inventors: Yatish Jayant Naik Raikar, Varunkumar Tripathi, Karthik Mahabaleshwar Hegde
  • Patent number: 11276412
    Abstract: A method and device allocates a bit-budget to a plurality of first parts of a CELP core module of (a) an encoder for encoding a sound signal or (b) a decoder for decoding the sound signal. In the method and device, bit-budget allocation tables assign, for each of a plurality of intermediate bit rates, respective bit-budgets to the first CELP core module parts. A CELP core module bit rate is determined and one of the intermediate bit rates is selected based on the determined CELP core module bit rate. The respective bit-budgets assigned by the bit-budget allocation tables for the selected intermediate bit rate are allocated to the first CELP core module parts.
    Type: Grant
    Filed: September 20, 2018
    Date of Patent: March 15, 2022
    Assignee: VOICEAGE CORPORATION
    Inventor: Vaclav Eksler
  • Patent number: 11270071
    Abstract: Systems, apparatuses, and methods are described herein for providing language-level content recommendations to users based on an analysis of closed captions of content viewed by the users and other data. Language-level analysis of content viewed by a user may be performed to generate metrics that are associated with the user. The metrics may be used to provide recommendations for content, which may include advertising, that is closely aligned with the user's interests.
    Type: Grant
    Filed: December 28, 2017
    Date of Patent: March 8, 2022
    Assignee: Comcast Cable Communications, LLC
    Inventor: Richard Walsh
  • Patent number: 11270721
    Abstract: Pre-processing systems, methods of pre-processing, and speech processing systems for improved Automated Speech Recognition are provided. Some pre-processing systems for improved speech recognition of a speech signal are provided, which systems comprise a pitch estimation circuit; and a pitch equalization processor. The pitch estimation circuit is configured to receive the speech signal to determine a pitch index of the speech signal, and the pitch equalization processor is configured to receive the speech signal and pitch information, to equalize a speech pitch of the speech signal using the pitch information, and to provide a pitch-equalized speech signal.
    Type: Grant
    Filed: May 21, 2019
    Date of Patent: March 8, 2022
    Assignee: PLANTRONICS, INC.
    Inventors: Youhong Lu, Arun Rajasekaran
  • Patent number: 11263876
    Abstract: Data is collected for Self-Service Terminals (SSTs) including tallies, events, and outcomes associated with servicing the SSTs. Statistical correlations are derived from the tallies and events with respect to the outcomes. Subsequent collected data is processed with the statistical correlations and a probability for a failure of a component or a part of the component associated with a particular SST is reported for servicing the component or part before the failure.
    Type: Grant
    Filed: September 28, 2017
    Date of Patent: March 1, 2022
    Assignee: NCR Corporation
    Inventors: Claudio Cifarelli, Gardiner Arthur, Iain M. N. Cowan, Massimo Mastropietro, Callum Ellis Morton
  • Patent number: 11250826
    Abstract: Digital signal processing and machine learning techniques can be employed in a vocal capture and performance social network to computationally generate vocal pitch tracks from a collection of vocal performances captured against a common temporal baseline such as a backing track or an original performance by a popularizing artist. In this way, crowd-sourced pitch tracks may be generated and distributed for use in subsequent karaoke-style vocal audio captures or other applications. Large numbers of performances of a song can be used to generate a pitch track. Computationally determined pitch trackings from individual audio signal encodings of the crowd-sourced vocal performance set are aggregated and processed as an observation sequence of a trained Hidden Markov Model (HMM) or other statistical model to produce an output pitch track.
    Type: Grant
    Filed: October 28, 2019
    Date of Patent: February 15, 2022
    Assignee: Smule, Inc.
    Inventors: Stefan Sullivan, John Shimmin, Dean Schaffer, Perry R. Cook
  • Patent number: 11250221
    Abstract: Methods, systems, and computer-readable storage media for contextual interpretation of a Japanese word are provided. A first set of characters representing Japanese words is received. The first set of characters are received is input to a neural network. The neural network is trained to processes characters based on bi-directional context interpretation. The first set of characters is processed by the neural network through a plurality of learning layers that process the first set of characters in an order of the first set of characters and in a reverse order to determine semantical meanings of the characters in the first set of characters. An alphabet representation of at least one character of the first set of characters representing a Japanese word is output. The alphabet representation corresponds to a semantical meaning of the at least one character within the first set of characters.
    Type: Grant
    Filed: March 14, 2019
    Date of Patent: February 15, 2022
    Assignee: SAP SE
    Inventor: Sean Saito
  • Patent number: 11241635
    Abstract: The present disclosure according to at least one embodiment relates to, in the learning process of a child using smart toys, a method and system for providing an interactive service by using a smart toy, which provide more accurate classified emotional state of the child based on at least one or more sensed data items of an optical image, a thermal image, and voice data of the child, as well as adaptively provide a flexible and versatile interactive service according to classified emotions.
    Type: Grant
    Filed: November 15, 2019
    Date of Patent: February 8, 2022
    Inventors: Heui Yul Noh, Myeong Ho Roh, Chang Woo Ban, Oh Soung Kwon, Seung Pil Lee, Seung Min Shin
  • Patent number: 11244694
    Abstract: A method is described that processes an audio signal. A discontinuity between a filtered past frame and a filtered current frame of the audio signal is removed using linear predictive filtering.
    Type: Grant
    Filed: January 23, 2017
    Date of Patent: February 8, 2022
    Assignee: Fraunhofer-Gesellschaft zur Foerderung der angewandten Forschung e.V.
    Inventors: Emmanuel Ravelli, Manuel Jander, Grzegorz Pietrzyk, Martin Dietz, Marc Gayer
  • Patent number: 11239859
    Abstract: A method for partitioning of input vectors for coding is presented. The method comprises obtaining of an input vector. The input vector is segmented, in a non-recursive manner, into an integer number, NSEG, of input vector segments. A representation of a respective relative energy difference between parts of the input vector on each side of each boundary between the input vector segments is determined, in a recursive manner. The input vector segments and the representations of the relative energy differences are provided for individual coding. Partitioning units and computer programs for partitioning of input vectors for coding, as well as positional encoders, are presented.
    Type: Grant
    Filed: June 5, 2020
    Date of Patent: February 1, 2022
    Assignee: TELEFONAKTIEBOLAGET LM ERICSSON (PUBL)
    Inventors: Tomas Jansson Toftgård, Volodya Grancharov, Jonas Svedberg
  • Patent number: 11227586
    Abstract: Systems and methods improving the performance of statistical model-based single-channel speech enhancement systems using a deep neural network (DNN) are disclosed. Embodiments include a DNN-trained system to predict speech presence in the input signal, and this information can be used to create frameworks for tracking noise and conducting a priori signal to-noise ratio estimation. Example frameworks provide increased flexibility for various aspects of system design, such as gain estimation. Examples include training a DNN to detect speech in the presence of both noise and reverberation, enabling joint suppression of additive noise and reverberation. Example frameworks provide significant improvements in objective speech quality metrics relative to baseline systems.
    Type: Grant
    Filed: September 11, 2019
    Date of Patent: January 18, 2022
    Assignee: Massachusetts Institute of Technology
    Inventors: Bengt J. Borgstrom, Michael S. Brandstein, Robert B. Dunn
  • Patent number: 11217237
    Abstract: At least one exemplary embodiment is directed to a method and device for voice operated control with learning. The method can include measuring a first sound received from a first microphone, measuring a second sound received from a second microphone, detecting a spoken voice based on an analysis of measurements taken at the first and second microphone, learning from the analysis when the user is speaking and a speaking level in noisy environments, training a decision unit from the learning to be robust to a detection of the spoken voice in the noisy environments, mixing the first sound and the second sound to produce a mixed signal, and controlling the production of the mixed signal based on the learning of one or more aspects of the spoken voice and ambient sounds in the noisy environments.
    Type: Grant
    Filed: November 13, 2018
    Date of Patent: January 4, 2022
    Assignee: Staton Techiya, LLC
    Inventors: John Usher, Steven Goldstein, Marc Boillot
  • Patent number: 11216853
    Abstract: A method and system for advertising dynamic content in an immersive digital medium user experience operate a plurality of computer processors and databases in an associated network for receiving, processing, and communicate instructions and data relating to advertising content in an immersive digital medium user experience. The method and system execute instructions and processing data relating to advertising objects, the advertising objects comprising images of objects, signs, labels, and related indicia of object origin for indicating sources of purchasing one or more objects for advertising to receive advertising instructions and data from a plurality of software applications and further respond to variations in said advertising instructions and data whereby operation of said computer processors and databases enables swapping out of various advertising messages and images according to the context of said immersive digital medium user experience.
    Type: Grant
    Filed: March 27, 2020
    Date of Patent: January 4, 2022
    Inventor: Quintan Ian Pribyl
  • Patent number: 11159589
    Abstract: Described are a system, method, and computer program product for task-based teleconference management. The method includes initiating a teleconference bridge and generating a teleconference session hosted by the bridge. The method also includes connecting teleconference participants of an organization to the bridge and receiving a participant identifier for each participant. The method further includes determining an association of an organization group with each participant based on a respective participant identifier. The method further includes generating display data configured to cause a computing device to display a control interface depicting: (i) the teleconference session having groups of participants, the groups selected from predetermined groups based on task data, and each participant visually associated which its group; and (ii) labels of each participant to identify the group associated therewith.
    Type: Grant
    Filed: August 28, 2019
    Date of Patent: October 26, 2021
    Assignee: Visa International Service Association
    Inventors: Yi Shen, Trinath Anaparthi, Sangram Pattanaik
  • Patent number: 11134330
    Abstract: Embodiments of the invention determine a speech estimate using a bone conduction sensor or accelerometer, without employing voice activity detection gating of speech estimation. Speech estimation is based either exclusively on the bone conduction signal, or is performed in combination with a microphone signal. The speech estimate is then used to condition an output signal of the microphone. There are multiple use cases for speech processing in audio devices.
    Type: Grant
    Filed: July 12, 2019
    Date of Patent: September 28, 2021
    Assignee: Cirrus Logic, Inc.
    Inventors: David Leigh Watts, Brenton Robert Steele, Thomas Ivan Harvey, Vitaliy Sapozhnykov
  • Patent number: 11127416
    Abstract: A method and an apparatus for voice activity detection provided in embodiments of the present disclosure allow for dividing a to-be-detected audio file into frames to obtain a first sequence of audio frames, extracting an acoustic features of each audio frame in the first sequence of audio frames, and then inputting the acoustic feature of each audio frame to a noise-added VAD model in chronological order to obtain a probability value of each audio frame in the first sequence of audio frames; and then determining, by an electronic device, a start and an end of the voice signal according to the probability value of each audio frame. During the VAD detection, the start and the end of a voice signal in an audio are recognized with a noise-added VAD model to realize the purpose of accurately recognizing the start and the end of the voice signal.
    Type: Grant
    Filed: September 6, 2019
    Date of Patent: September 21, 2021
    Inventor: Chao Li
  • Patent number: 11094328
    Abstract: Various embodiments herein each include at least one of systems, methods, and software for conference audio manipulation for inclusion and accessibility. One embodiment, in the form of a method that may be performed, for example, on a server or a participant computing device. This method includes receiving a voice signal via a network and modifying an audible characteristic of the voice signal that is perceptible when the voice signal is audibly output. The method further includes outputting the voice signal including the modified audible characteristic.
    Type: Grant
    Filed: September 27, 2019
    Date of Patent: August 17, 2021
    Assignee: NCR Corporation
    Inventor: Phil Noel Day
  • Patent number: 11087231
    Abstract: This disclosure is directed to an apparatus for intelligent matching of disparate input data received from disparate input data systems in a complex computing network for establishing targeted communication to a computing device associated with the intelligently matched disparate input data.
    Type: Grant
    Filed: December 9, 2019
    Date of Patent: August 10, 2021
    Assignee: Research Now Group, LLC
    Inventors: Melanie D. Courtright, Vincent P. Derobertis, Michael D. Bigby, William C. Robinson, Greg Ellis, Heidi D. E. Wilton, John R. Rothwell, Jeremy S. Antoniuk
  • Patent number: 11062094
    Abstract: A method of analyzing sentiments includes receiving one or more strings of text, identifying sentiments related to a first topic from the one or more strings of text, and assigning a sentiment score to each of the sentiments related to the first topic, where the sentiment score corresponds to a degree of positivity or negativity of a sentiment of the sentiments. The method further includes calculating an average sentiment score for the first topic based on the sentiment score for each of the sentiments related to the first topic, determining a percentile for the first topic based on a frequency of sentiments related to the first topic, where the percentile for the first topic is determined with respect to a maximum frequency of sentiments related to one or more other topics, and computing an X-Score based on the average sentiment score and the percentile of the first topic.
    Type: Grant
    Filed: May 7, 2019
    Date of Patent: July 13, 2021
    Assignee: LANGUAGE LOGIC, LLC
    Inventors: Rick Kieser, Charles Baylis, Serge Luyens
  • Patent number: 11049492
    Abstract: Described herein are real-time musical translation devices (RETM) and methods of use thereof. Exemplary uses of RETMs include optimizing the understanding and/or recall of an input message for a user and improving a cognitive process in a user.
    Type: Grant
    Filed: November 10, 2020
    Date of Patent: June 29, 2021
    Assignee: YAO THE BARD, LLC
    Inventors: Leonardus H. T. Van Der Ploeg, Halley Young
  • Patent number: 11038787
    Abstract: In accordance with an example embodiment of the present invention, disclosed is a method and an apparatus thereof for selecting a packet loss concealment procedure for a lost audio frame of a received audio signal. A method for selecting a packet loss concealment procedure comprises detecting an audio type of a received audio frame and determining a packet loss concealment procedure based on the audio type. In the method, detecting an audio type comprises determining a stability of a spectral envelope of signals of received audio frames.
    Type: Grant
    Filed: October 1, 2019
    Date of Patent: June 15, 2021
    Assignee: TELEFONAKTIEBOLAGET LM ERICSSON (PUBL)
    Inventor: Stefan Bruhn
  • Patent number: 11011160
    Abstract: A computerized system for transforming recorded speech into a derived expression of intent from the recorded speech includes: (1) a text classification module comparing a transcription of at least a portion of recorded speech against a text classifier to generate a first set of one or more of the representations of potential intents based upon such comparison; (2) a phonetics classification module comparing a phonetic transcription of at least a portion of the recorded speech against a phonetics classifier to generate a second set of one or more of the representations of potential intents based upon such comparison; (3) an audio classification module comparing an audio version of at least a portion of the recorded speech with an audio classifier to generate a third set of one or more of the representations of potential intents based upon such comparison; and a (4) discriminator module for receiving the first, second and third sets of the one or more representations of potential intents and generating at least
    Type: Grant
    Filed: January 19, 2018
    Date of Patent: May 18, 2021
    Assignee: OPEN WATER DEVELOPMENT LLC
    Inventor: Moshe Villaizan
  • Patent number: 10984813
    Abstract: A method and an apparatus for detecting correctness of a pitch period, where the method for detecting correctness of a pitch period includes determining, according to an initial pitch period of an input signal in a time domain, a pitch frequency bin of the input signal, where the initial pitch period is obtained by performing open-loop detection on the input signal, determining, based on an amplitude spectrum of the input signal in a frequency domain, a pitch period correctness decision parameter, associated with the pitch frequency bin, of the input signal, and determining correctness of the initial pitch period according to the pitch period correctness decision parameter. Hence, the method and apparatus for detecting correctness of the pitch period improve, based on a relatively less complex algorithm, accuracy of detecting correctness of the pitch period.
    Type: Grant
    Filed: February 15, 2019
    Date of Patent: April 20, 2021
    Assignee: HUAWEI TECHNOLOGIES CO., LTD.
    Inventors: Fengyan Qi, Lei Miao
  • Patent number: 10971191
    Abstract: A generally diverse set of audiovisual clips is sourced from one or more repositories for use in preparing a coordinated audiovisual work. In some cases, audiovisual clips are retrieved using tags such as user-assigned hashtags or metadata. Pre-existing associations of such tags can be used as hints that certain audiovisual clips are likely to share correspondence with an audio signal encoding of a particular song or other audio baseline. Clips are evaluated for computationally determined correspondence with an audio baseline track. In general, comparisons of audio power spectra, of rhythmic features, tempo, pitch sequences and other extracted audio features may be used to establish correspondence. For clips exhibiting a desired level of correspondence, computationally determined temporal alignments of individual clips with the baseline audio track are used to prepare a coordinated audiovisual work that mixes the selected audiovisual clips with the audio track.
    Type: Grant
    Filed: June 15, 2015
    Date of Patent: April 6, 2021
    Inventors: Mark T. Godfrey, Turner Evan Kirk, Ian S. Simon, Nick Kruge
  • Patent number: 10938366
    Abstract: A volume level meter has a housing that is mounted on a microphone, and is connected to a pop filter positioned in front of a vocalist and adjacent to a microphone. The display faces the vocalist, and is arranged on the housing so that it indicates a volume level of audio signals received from the microphone. The vocalist can see indicators on the display and know the volume level of the audio signal from the microphone. This allows the vocalist to monitor the volume level indicators of the volume level display and control their vocal volume levels based on the indicators. In this way, the vocalist to reduce fluctuations in vocal volume levels that may lead to distortion of the audio signal by monitoring the volume level display.
    Type: Grant
    Filed: May 3, 2019
    Date of Patent: March 2, 2021
    Inventors: Joseph N Griffin, Corey D Chapman
  • Patent number: 10924193
    Abstract: Embodiments include techniques for transmitting and receiving radio frequency (RF) signals, where the techniques for generating, via a digital analog converter (DAC), a frequency signal, and filtering the frequency signal to produce a first filtered signal and a second filtered signal. The techniques also include transmitting the second filtered signal to a device under test, and filtering the second filtered signal into a sub-signal having one or more components. The techniques include mixing the first filtered signal with the sub-signal to produce a first mixed signal, subsequently mixing the first mixed signal with an output signal received from the device under test to produce a second mixed signal, and converting the second mixed signal for analysis.
    Type: Grant
    Filed: September 29, 2017
    Date of Patent: February 16, 2021
    Assignee: International Business Machines Corporation
    Inventors: Mohit Kapur, Muir Kumph
  • Patent number: 10902841
    Abstract: Systems, methods, and computer program products customizing and delivering contextually relevant, artificially synthesized, voiced content that is targeted toward the individual user behaviors, viewing habits, experiences and preferences of each individual user accessing the content of a content provider. A network accessible profile service collects and analyzes collected user profile data and recommends contextually applicable voices based on the user's profile data. As user input to access voiced content or triggers voiced content maintained by a content provider, the voiced content being delivered to the user is a modified version comprising artificially synthesized human speech mimicking the recommended voice and delivering the dialogue of the voiced content, in a manner that imitates the sounds and speech patterns of the recommended voice.
    Type: Grant
    Filed: February 15, 2019
    Date of Patent: January 26, 2021
    Assignee: International Business Machines Corporation
    Inventors: Su Liu, Eric J. Rozner, Inseok Hwang, Chungkuk Yoo
  • Patent number: 10878800
    Abstract: According to one aspect of the present disclosure, a computer-implemented method for changing a voice interacting with a user can be provided. Identity information for a user can be received. The identity information can be analyzed to identify the user. Voice change information for the user indicating help for the user to understand the voice can be retrieved. A change to be made to the voice based on retrieved user information can be made. The changed voice can be provided to the user.
    Type: Grant
    Filed: May 29, 2019
    Date of Patent: December 29, 2020
    Assignee: Capital One Services, LLC
    Inventors: Anh Truong, Mark Watson, Jeremy Goodsitt, Vincent Pham, Fardin Abdi Taghi Abad, Kate Key, Austin Walters, Reza Farivar
  • Patent number: 10867525
    Abstract: Computer-implemented systems and methods are provided for automatically generating recitation items. For example, a computer performing the recitation item generation can receive one or more text sets that each includes one or more texts. The computer can determine a value for each text set using one or more metrics, such as a vocabulary difficulty metric, a syntactic complexity metric, a phoneme distribution metric, a phonetic difficulty metric, and a prosody distribution metric. Then the computer can select a final text set based on the value associated with each text set. The selected final text set can be used as the recitation items for a speaking assessment test.
    Type: Grant
    Filed: February 13, 2018
    Date of Patent: December 15, 2020
    Assignee: Educational Testing Service
    Inventors: Su-Youn Yoon, Lei Chen, Keelan Evanini, Klaus Zechner
  • Patent number: 10861482
    Abstract: Temporal regions of a time-based media program that contain spoken dialog in a language that is dubbed from a primary language are identified automatically. A primary language audio track of the media program is compared with an alternate language audio track. Closely similar regions are assumed not to contain dubbed dialog, while the temporal inverse of the similar regions are candidate regions for containing dubbed speech. The candidate regions are provided to a dub validator to facilitate locating each region to be validated without having to play back or search the entire time-based media program. Corresponding regions of the primary and alternate language tracks that are closely similar and that contain voice activity are candidate regions of forced narrative, and the temporal locations of these regions may be used by a validator to facilitate rapid validation of forced narrative in the program.
    Type: Grant
    Filed: October 12, 2018
    Date of Patent: December 8, 2020
    Assignee: Avid Technology, Inc.
    Inventors: Jacob B. Garland, Vedantha G. Hothur
  • Patent number: 10854217
    Abstract: A wind noise filtering device includes a mixer, an extraction unit, a decision unit, a wind noise filter and an output module. The mixer receives a source sound and outputs an input audio. The extraction unit is electrically connected to the mixer to receive the input audio, the extraction unit performs feature extraction on the input audio to generate a plurality of feature data. The decision unit is electrically connected to the extraction unit to receive the feature data, the decision unit outputs a decision signal according to the plurality of feature data. The wind noise filter is electrically connected to the decision unit to receive the decision signal and is controlled to be turned on or off by the decision signal. The output module is electrically connected to the wind noise filter and the mixer to output an output audio according to the input audio or the filtered audio.
    Type: Grant
    Filed: March 11, 2020
    Date of Patent: December 1, 2020
    Assignee: COMPAL ELECTRONICS, INC.
    Inventor: Chung-Han Lin
  • Patent number: 10847179
    Abstract: The present disclosure provides a method, an apparatus and a device for recognizing voice endpoints. In the method of the present disclosure, a start point recognition model and a finish point recognition model are obtained by training a cyclic neural network with a start point training set and a finish point training set, respectively, and a voice start point frame among audio frames is recognized according to each of acoustic features of the audio frames and the start point recognition model, thereby avoiding affecting a delay time of the finish point frame recognition while ensuring the accuracy of the start frame recognition as high as possible; and a voice finish point frame among the audio frames is recognized according to the acoustic features of the audio frames and the finish point recognition model.
    Type: Grant
    Filed: December 28, 2018
    Date of Patent: November 24, 2020
    Assignee: BAIDU ONLINE NETWORK TECHNOLOGY (BEIJING) CO., LTD.
    Inventors: Chao Li, Weixin Zhu
  • Patent number: 10838881
    Abstract: A device management server computer (“server”) is programmed to manage a plurality of input devices and output devices in a physical room. The server is programmed to analyze media data capturing actions performed by a user in real time as a participant in the physical room, determine how the user would like to connect at least one of the input devices and one of the output devices from the analysis, and enable the connection. The sever is programmed to interpret the actions and derive commands for connecting two or more devices based on predetermined data regarding the input devices and output devices and rules for referring to and connecting these devices.
    Type: Grant
    Filed: April 26, 2019
    Date of Patent: November 17, 2020
    Assignee: XIO RESEARCH, INC.
    Inventors: Aditya Vempaty, Robert Smith, Shom Ponoth, Sharad Sundararajan, Ravindranath Kokku, Robert Hutter, Satya Nitta
  • Patent number: 10818311
    Abstract: An auditory selection method based on a memory and attention model, including: step S1, encoding an original speech signal into a time-frequency matrix; step S2, encoding and transforming the time-frequency matrix to convert the matrix into a speech vector; step S3, using a long-term memory unit to store a speaker and a speech vector corresponding to the speaker; step S4, obtaining a speech vector corresponding to a target speaker, and separating a target speech from the original speech signal through an attention selection model. A storage device includes a plurality of programs stored in the storage device. The plurality of programs are configured to be loaded by a processor and execute the auditory selection method based on the memory and attention model. A processing unit includes the processor and the storage device.
    Type: Grant
    Filed: November 14, 2018
    Date of Patent: October 27, 2020
    Assignee: INSTITUTE OF AUTOMATION, CHINESE ACADEMY OF SCIENCES
    Inventors: Jiaming Xu, Jing Shi, Bo Xu
  • Patent number: 10818308
    Abstract: Systems, devices, media, and methods are presented for converting sounds in an audio stream. The systems and methods receive an audio conversion request initiating conversion of one or more sound characteristics of an audio stream from a first state to a second state. The systems and methods access an audio conversion model associated with an audio signature for the second state. The audio stream is converted based on the audio conversion model and an audio construct is compiled from the converted audio stream and a base audio segment. The compiled audio construct is presented at a client device.
    Type: Grant
    Filed: April 27, 2018
    Date of Patent: October 27, 2020
    Assignee: Snap Inc.
    Inventor: Wei Chu
  • Patent number: 10789938
    Abstract: A speech synthesis method and device. The method comprises: determining language types of a statement to be synthesized; determining base models corresponding to the language types; determining a target timbre, performing adaptive transformation on the spectrum parameter models based on the target timbre, and training the statement to be synthesized based on the spectrum parameter models subjected to adaptive transformation to generate spectrum parameters; training the statement to be synthesized based on the fundamental frequency parameters to generate fundamental frequency parameters, and adjusting the fundamental frequency parameters based on the target timbre; and synthesizing the statement to be synthesized into a target speech based on the spectrum parameters, and the fundamental frequency parameters after adjusting.
    Type: Grant
    Filed: September 5, 2016
    Date of Patent: September 29, 2020
    Assignee: BAIDU ONLINE NETWORK TECHNOLOGY (BEIJING) CO., LTD
    Inventors: Hao Li, Yongguo Kang
  • Patent number: 10777215
    Abstract: A method and system for enhancing a speech signal is provided herein. The method may include the following steps: obtaining an original video, wherein the original video includes a sequence of original input images showing a face of at least one human speaker, and an original soundtrack synchronized with said sequence of images; and processing, using a computer processor, the original video, to yield an enhanced speech signal of said at least one human speaker, by detecting sounds that are acoustically unrelated to the speech of the at least one human speaker, based on visual data derived from the sequence of original input images.
    Type: Grant
    Filed: November 11, 2019
    Date of Patent: September 15, 2020
    Assignee: Yissum Research Development Company of The Hebrew University of Jerusalem Ltd.
    Inventors: Shmuel Peleg, Asaph Shamir, Tavi Halperin, Aviv Gabbay, Ariel Ephrat
  • Patent number: 10762907
    Abstract: An apparatus for improving a transition from a concealed audio signal portion is provided. The apparatus includes a processor being configured to generate a decoded audio signal portion of the audio signal. The processor is configured to generate the decoded audio signal portion using the first sub-portion of the first audio signal portion and using the second audio signal portion or a second sub-portion of the second audio signal portion, such that for each sample of two or more samples of the second audio signal portion, the sample position of the sample of the two or more samples of the second audio signal portion is equal to the sample position of one of the samples of the decoded audio signal portion.
    Type: Grant
    Filed: July 27, 2018
    Date of Patent: September 1, 2020
    Assignee: Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.
    Inventors: Adrian Tomasek, Jérémie LeComte
  • Patent number: 10755718
    Abstract: A method for classifying speakers includes: receiving, by a speaker recognition system including a processor and memory, input audio including speech from a speaker; extracting, by the speaker recognition system, a plurality of speech frames containing voiced speech from the input audio; computing, by the speaker recognition system, a plurality of features for each of the speech frames of the input audio; computing, by the speaker recognition system, a plurality of recognition scores for the plurality of features; computing, by the speaker recognition system, a speaker classification result in accordance with the recognition scores; and outputting, by the speaker recognition system, the speaker classification result.
    Type: Grant
    Filed: December 7, 2017
    Date of Patent: August 25, 2020
    Inventors: Zhenhao Ge, Ananth N. Iyer, Srinath Cheluvaraja, Ram Sundaram, Aravind Ganapathiraju
  • Patent number: 10715173
    Abstract: A method for partitioning of input vectors for coding is presented. The method comprises obtaining of an input vector. The input vector is segmented, in a non-recursive manner, into an integer number, NSEG, of input vector segments. A representation of a respective relative energy difference between parts of the input vector on each side of each boundary between the input vector segments is determined, in a recursive manner. The input vector segments and the representations of the relative energy differences are provided for individual coding. Partitioning units and computer programs for partitioning of input vectors for coding, as well as positional encoders, are presented.
    Type: Grant
    Filed: May 7, 2019
    Date of Patent: July 14, 2020
    Assignee: TELEFONAKTIEBOLAGET LM ERICSSON (PUBL)
    Inventors: Tomas Jansson Toftgård, Volodya Grancharov, Jonas Svedberg
  • Patent number: 10686465
    Abstract: An improved mixed oscillator-and-external excitation model and methods for estimating the model parameters, for evaluating model quality, and for combining it with known in the art methods are disclosed. The improvement over existing oscillators allows the model to receive, as an input, all except the most recent point in the acquired data. Model stability is achieved through a process which includes restoring unavailable to the decoder data from the optimal model parameters and by using metrics to select a stable restored model output. The present invention is effective for very low bit-rate coding/compression and decoding/decompression of digital signals, including digitized speech, audio, and image data, and for analysis, detection, and classification of signals. Operations can be performed in real time, and parameterization can be achieved at a user-specified level of compression.
    Type: Grant
    Filed: July 24, 2018
    Date of Patent: June 16, 2020
    Assignee: Luce Communications
    Inventors: Irina Gorodnitsky, Anton Yen
  • Patent number: 10684683
    Abstract: Technologies for natural language interactions with virtual personal assistant systems include a computing device configured to capture audio input, distort the audio input to produce a number of distorted audio variations, and perform speech recognition on the audio input and the distorted audio variants. The computing device selects a result from a large number of potential speech recognition results based on contextual information. The computing device may measure a user's engagement level by using an eye tracking sensor to determine whether the user is visually focused on an avatar rendered by the virtual personal assistant. The avatar may be rendered in a disengaged state, a ready state, or an engaged state based on the user engagement level. The avatar may be rendered as semitransparent in the disengaged state, and the transparency may be reduced in the ready state or the engaged state. Other embodiments are described and claimed.
    Type: Grant
    Filed: January 25, 2019
    Date of Patent: June 16, 2020
    Assignee: Intel Corporation
    Inventor: William C. Deleeuw
  • Patent number: 10679632
    Abstract: An apparatus for decoding an audio signal includes a receiving interface, wherein the receiving interface is configured to receive a first frame and a second frame. Moreover, the apparatus includes a noise level tracing unit for determining noise level information being represented in a tracing domain. Furthermore, the apparatus includes a first reconstruction unit for reconstructing a third audio signal portion of the audio signal depending on the noise level information and a second reconstruction unit for reconstructing a fourth audio signal portion depending on noise level information being represented in the second reconstruction domain.
    Type: Grant
    Filed: January 24, 2018
    Date of Patent: June 9, 2020
    Assignee: Fraunhofer-Gesellschaft zur Foerderung der angewandten Forschung e.V.
    Inventors: Michael Schnabel, Goran Markovic, Ralph Sperschneider, Jérémie Lecomte, Christian Helmrich
  • Patent number: 10679644
    Abstract: A method, a system, and a computer program product are provided for interpreting low amplitude speech and transmitting amplified speech to a remote communication device. At least one computing device receives sensor data from multiple sensors. The sensor data is associated with the low amplitude speech. At least one of the at least one computing device analyzes the sensor data to map the sensor data to at least one syllable resulting in a string of one or more words. An electronic representation of the string of the one or more words may be generated and transmitted to a remote communication device for producing the amplified speech from the electronic representation.
    Type: Grant
    Filed: August 15, 2019
    Date of Patent: June 9, 2020
    Assignee: International Business Machines Corporation
    Inventors: Sarbajit K. Rakshit, Martin G. Keen, James E. Bostick, John M. Ganci, Jr.
  • Patent number: 10679256
    Abstract: A content server uses a form of artificial intelligence such as machine learning to identify audio content with musicological characteristics. The content server obtains an indication of a music item presented by a client device and obtains reference music features describing musicological characteristics of the music item. The content server identifies candidate audio content associated with candidate music features. The candidate music features are determined by analyzing acoustic features of the candidate audio content and mapping the acoustic features to music features according to a music feature model. Acoustic features quantify low-level properties of the candidate audio content. One of the candidate audio content items is selected according to comparisons between the candidate music features of the candidate audio advertisements and the reference music features of the music item. The selected audio content is provided the client device for presentation.
    Type: Grant
    Filed: June 25, 2015
    Date of Patent: June 9, 2020
    Assignee: Pandora Media, LLC
    Inventors: Christopher Irwin, Shriram Bharath, Andrew J. Asman
  • Patent number: 10665253
    Abstract: Voice activity detection (VAD) is an enabling technology for a variety of speech based applications. Herein disclosed is a robust VAD algorithm that is also language independent. Rather than classifying short segments of the audio as either “speech” or “silence”, the VAD as disclosed herein employees a soft-decision mechanism. The VAD outputs a speech-presence probability, which is based on a variety of characteristics.
    Type: Grant
    Filed: April 23, 2018
    Date of Patent: May 26, 2020
    Assignee: VERINT SYSTEMS LTD.
    Inventor: Ron Wein
  • Patent number: 10650837
    Abstract: Network communication speech handling systems are provided herein. In one example, a method of processing audio signals by a network communications handling node is provided. The method includes processing an audio signal to determine a pitch cycle property associated with the audio signal, determining transfer times for encoded segments of the audio signal based at least in part on the pitch cycle property, and transferring packets comprising one or more encoded segments for delivery to a target node in accordance with the transfer time.
    Type: Grant
    Filed: August 29, 2017
    Date of Patent: May 12, 2020
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Karsten Vandborg Sørensen, Sriram Srinivasan, Koen Bernard Vos