Pitch Patents (Class 704/207)
-
Patent number: 12260853Abstract: A speech processing method includes obtaining first speech information from a user, determining one or more similar speech segments in the first speech information and deleting one or more similar frames each of the one or more similar speech segments to obtain second speech information, and analyzing the second speech information to determine a user intent corresponding to the first speech information. A duration of the first speech information exceeds a preset analysis duration threshold, and a duration of the second speech information does not exceed the preset analysis duration threshold.Type: GrantFiled: March 10, 2022Date of Patent: March 25, 2025Assignee: LENOVO (BEIJING) LIMITEDInventors: Yinping Zhang, Chenyu Zhang, Lili Guo
-
Patent number: 12223968Abstract: Described herein is a method of encoding an audio signal. The method comprises: generating a plurality of subband audio signals based on the audio signal; determining a spectral envelope of the audio signal; for each subband audio signal, determining autocorrelation information for the subband audio signal based on an autocorrelation function of the subband audio signal; and generating an encoded representation of the audio signal, the encoded representation comprising a representation of the spectral envelope of the audio signal and a representation of the autocorrelation information for the plurality of subband audio signals. Further described are methods of decoding the audio signal from the encoded representation, as well as corresponding encoders, decoders, computer programs, and computer-readable recording media.Type: GrantFiled: August 18, 2020Date of Patent: February 11, 2025Assignee: Dolby International ABInventors: Lars Villemoes, Heidi-Maria Lehtonen, Heiko Purnhagen, Per Hedelin
-
Patent number: 12190897Abstract: An apparatus for processing an audio signal having associated therewith a pitch lag information and a gain information, includes a domain converter for converting a first domain representation of the audio signal into a second domain representation of the audio signal; and a harmonic post-filter for filtering the second domain representation of the audio signal, wherein the post-filter is based on a transfer function including a numerator and a denominator, wherein the numerator includes a gain value indicated by the gain information, and wherein the denominator includes an integer part of a pitch lag indicated by the pitch lag information and a multi-tap filter depending on a fractional part of the pitch lag.Type: GrantFiled: May 16, 2023Date of Patent: January 7, 2025Assignee: Fraunhofer-Gesellschaft zur Foerderung der angewandten Forschung e.V.Inventors: Emmanuel Ravelli, Christian Helmrich, Goran Markovic, Matthias Neusinger, Sascha Disch, Manuel Jander, Martin Dietz
-
Patent number: 12183353Abstract: The present technology relates to a decoding apparatus, a decoding method and a program which make it possible to obtain sound with higher quality. A demultiplexing circuit demultiplexes an input code string into a gain code string and a signal code string. A signal decoding circuit decodes the signal code string to output a time series signal. A gain decoding circuit decodes the gain code string. That is, the gain decoding circuit reads out gain values and gain inclination values at predetermined gain sample positions of the time series signal and interpolation mode information. An interpolation processing unit obtains a gain value at each sample position between two gain sample positions through linear interpolation or non-linear interpolation according to the interpolation mode based on the gain values and the gain inclination values. A gain applying circuit adjusts a gain of the time series signal based on the gain values. The present technology can be applied to a decoding apparatus.Type: GrantFiled: March 31, 2023Date of Patent: December 31, 2024Assignee: Sony Group CorporationInventors: Yuki Yamamoto, Toru Chinen, Hiroyuki Honma, Runyu Shi
-
Patent number: 12142287Abstract: A method for transforming an audio signal comprises obtaining a plurality of segmental original frequency-domain signal segments and a plurality of segmental target frequency-domain signal segments by segmenting and performing a Fourier transform on an original audio signal and an initial target audio signal obtained by pitch shifting on the original audio signal; obtaining a plurality of original formant envelopes by respectively filtering the plurality of segmental original frequency-domain signal segments according to a plurality of original segment window functions, and obtaining a plurality of target formant envelopes by respectively filtering the plurality of segmental target frequency-domain signal segments according to a plurality of target segment window functions; and determining a pitch-shifted audio signal based on the plurality of segmental target frequency-domain signal segments, the plurality of original formant envelopes, and the plurality of target formant envelopes.Type: GrantFiled: November 29, 2019Date of Patent: November 12, 2024Assignee: BIGO TECHNOLOGY PTE. LTD.Inventor: Xiaojie Wu
-
Patent number: 12119843Abstract: An entropy code encoder includes a register and first, second, third, and fourth arithmetic circuits. The first arithmetic circuit is configured to output, based on an input symbol, a first value corresponding to an appearance frequency of the input symbol and a second value corresponding to a cumulative distribution of the first value. The second arithmetic circuit is configured to output a third value corresponding to division of a value of bits in the register by the first value. The third arithmetic circuit is configured to output a fourth value obtained by adding the second value to a bit-shifted value of the third value, to update a value in the register. The fourth arithmetic circuit is configured to compare the value of upper bits in the register and the first value and output a value of lower bits in the register as a compressed data stream.Type: GrantFiled: March 3, 2023Date of Patent: October 15, 2024Assignee: Kioxia CorporationInventor: Takashi Takemoto
-
Patent number: 12112736Abstract: For example, a controller of an Active Acoustic Control (AAC) system may be configured to process input information including AAC configuration information, and a plurality of noise inputs representing acoustic noise at a plurality of noise sensing locations. For example, the controller may be configured to process the input information to determine a sound control pattern to control sound within a sound control zone based on the plurality of noise inputs. For example, the controller may include a Neural-Network (NN) trained to generate an NN output based on an NN input, wherein the NN input is based on the AAC configuration information. For example, the controller may be configured to generate the sound control pattern based on the NN output, and to output the sound control pattern to one or more acoustic transducers.Type: GrantFiled: June 27, 2023Date of Patent: October 8, 2024Assignee: SILENTIUM LTD.Inventors: Tzvi Fridman, Yochai Edlitz, Noam Bar, Mordehay Elbaz
-
Patent number: 12088392Abstract: Systems and methods for low-power auto-correlation antenna selection for multi-antenna systems are disclosed. In particular, a computing device, such as an Internet of Things (IoT) computing device, may include a transceiver operating with multiple antennas. For example, the computing device may operate under a low-power wireless standard such as Long Range BLUETOOTH LOW ENERGY (LR BLE). In an exemplary aspect, an antenna from amongst the multiple antennas may be selected based on which antenna is receiving a best copy of a periodic signal. The periodic signal is likely indicative of a preamble pattern and, as such, may be used to activate a cross-correlation circuit for signal detection confirmation. Power consumption is reduced by delaying activation of the cross-correlation circuit until a likely signal is detected by detection of the periodic signal.Type: GrantFiled: February 6, 2023Date of Patent: September 10, 2024Assignee: Qorvo US, Inc.Inventor: Andrew Fort
-
Patent number: 12062379Abstract: An audio coding method includes obtaining a current frame that includes a high-frequency band signal and a low-frequency band signal; performing first coding on the high-frequency band signal and the low-frequency band signal to obtain a first coding parameter; determining a spectrum reservation flag of each frequency bin of the high-frequency band signal, where the spectrum reservation flag indicates whether a first spectrum corresponding to the frequency bin is reserved in a second spectrum corresponding to the frequency bin; and performing second coding on the high-frequency band signal based on the spectrum reservation flag of each frequency bin of the high-frequency band signal to obtain a second coding parameter, where the second coding parameter indicates information about a target tonal component of the high-frequency band signal.Type: GrantFiled: November 30, 2022Date of Patent: August 13, 2024Assignee: HUAWEI TECHNOLOGIES CO., LTD.Inventors: Bingyin Xia, Jiawei Li, Zhe Wang
-
Patent number: 12014735Abstract: An emotion adjustment system for determining a user's emotions based on a user's voice includes: a microphone configured to receive the user's voice; a controller configured to extract a plurality of sound quality factors in response to processing the user's voice, calculate a depression index of the user based on at least one sound quality factor among the plurality of sound quality factors, identify an emotional state of the user as a depressive state when the depression index is a preset value or more, determine the depressive state as a first state or a second state based on a correlation between at least two sound quality factors among the plurality of sound quality factors, and transmit a control command corresponding to the emotional state of the user identified as the first state or the second state; and a feedback device configured to perform an operation corresponding to the control command.Type: GrantFiled: July 30, 2021Date of Patent: June 18, 2024Assignees: Hyundai Motor Company, Kia CorporationInventors: Ki Chang Kim, Dong Chul Park, Tae Kun Yun, Jin Sung Lee
-
Patent number: 11990145Abstract: A method performed by an encoder. The method comprises determining envelope representation residual coefficients as first compressed envelope representation coefficients subtracted from the input envelope representation coefficients. The method comprises transforming the envelope representation residual coefficients into a warped domain so as to obtain transformed envelope representation residual coefficients. The method comprises applying, at least one of a plurality of gain-shape coding schemes on the transformed envelope representation residual coefficients in order to achieve gain-shape coded envelope representation residual coefficients, where the plurality of gain-shape coding schemes have mutually different trade-offs in one or more of gain resolution and shape resolution for one or more of the transformed envelope representation residual coefficients.Type: GrantFiled: August 22, 2022Date of Patent: May 21, 2024Assignee: TELEFONAKTIEBOLAGET LM ERICSSON (PUBL)Inventors: Jonas Svedberg, Stefan Bruhn, Martin Sehlstedt
-
Patent number: 11900904Abstract: Digital signal processing and machine learning techniques can be employed in a vocal capture and performance social network to computationally generate vocal pitch tracks from a collection of vocal performances captured against a common temporal baseline such as a backing track or an original performance by a popularizing artist. In this way, crowd-sourced pitch tracks may be generated and distributed for use in subsequent karaoke-style vocal audio captures or other applications. Large numbers of performances of a song can be used to generate a pitch track. Computationally determined pitch trackings from individual audio signal encodings of the crowd-sourced vocal performance set are aggregated and processed as an observation sequence of a trained Hidden Markov Model (HMM) or other statistical model to produce an output pitch track.Type: GrantFiled: February 14, 2022Date of Patent: February 13, 2024Assignee: Smule, Inc.Inventors: Stefan Sullivan, John Shimmin, Dean Schaffer, Perry R. Cook
-
Patent number: 11900954Abstract: A voice processing method includes: determining a historical voice frame corresponding to a target voice frame; determining a frequency-domain characteristic of the historical voice frame; invoking a network model to predict the frequency-domain characteristic of the historical voice frame, to obtain a parameter set of the target voice frame, the parameter set including a plurality of types of parameters, the network model including a plurality of neural networks (NNs), and a number of the types of the parameters in the parameter set being determined according to a number of the NNs; and reconstructing the target voice frame according to the parameter set.Type: GrantFiled: March 24, 2022Date of Patent: February 13, 2024Assignee: TENCENT TECHNOLOGY (SHENZHEN) COMPANY LIMITEDInventors: Wei Xiao, Meng Wang, Shidong Shang, Zurong Wu
-
Patent number: 11900938Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for handing off a user conversation between computer-implemented agents. One of the methods includes receiving, by a computer-implemented agent specific to a user device, a digital representation of speech encoding an utterance, determining, by the computer-implemented agent, that the utterance specifies a requirement to establish a communication with another computer-implemented agent, and establishing, by the computer-implemented agent, a communication between the other computer-implemented agent and the user device.Type: GrantFiled: July 18, 2022Date of Patent: February 13, 2024Assignee: GOOGLE LLCInventors: Johnny Chen, Thomas L. Dean, Qiangfeng Peter Lau, Sudeep Gandhe, Gabriel Schine
-
Patent number: 11887578Abstract: A method and system for automatic dubbing method is disclosed, comprising, responsive to receiving a selection of media content for playback on a user device by a user of the user device, processing extracted speeches of a first voice from the media content to generate replacement speeches using a set of phenomes of a second voice of the user of the user device, and replacing the extracted speeches of the first voice with the generated replacement speeches in the audio portion of the media content for playback on the user device.Type: GrantFiled: November 10, 2022Date of Patent: January 30, 2024Assignee: Microsoft Technology Licensing, LLCInventors: Henry Gabryjelski, Jian Luan, Dapeng Li
-
Patent number: 11887622Abstract: The present disclosure generally relates to a system and method for obtaining a diagnosis of a mental health condition. An exemplary system can receive an audio input; convert the audio input into a text string; identify a speaker associated with the text string; based on at least a portion of the audio input, determine a predefined audio characteristic of a plurality of predefined audio characteristics; based on the determined audio characteristic, identify an emotion corresponding to the portion of the audio input; generate a set of structured data based on the text string, the speaker, the predefined audio characteristic, and the identified emotion; and provide an output for obtaining the diagnosis of the mental disorder or condition, wherein the output is indicative of at least a portion of the set of structured data.Type: GrantFiled: September 12, 2019Date of Patent: January 30, 2024Assignee: United States Department of Veteran AffairsInventors: Qian Hu, Brian P. Marx, Patricia D. King, Seth-David Donald Dworman, Matthew E. Coarr, Keith A. Crouch, Stelios Melachrinoudis, Cheryl Clark, Terence M. Keane
-
Patent number: 11848021Abstract: An envelope sequence is provided that can improve approximation accuracy near peaks caused by the pitch period of an audio signal. A periodic-combined-envelope-sequence generation device according to the present invention takes, as an input audio signal, a time-domain audio digital signal in each frame, which is a predetermined time segment, and generates a periodic combined envelope sequence as an envelope sequence. The periodic-combined-envelope-sequence generation device according to the present invention comprises at least a spectral-envelope-sequence calculating part and a periodic-combined-envelope generating part. The spectral-envelope-sequence calculating part calculates a spectral envelope sequence of the input audio signal on the basis of time-domain linear prediction of the input audio signal.Type: GrantFiled: September 29, 2022Date of Patent: December 19, 2023Assignee: NIPPON TELEGRAPH AND TELEPHONE CORPORATIONInventors: Takehiro Moriya, Yutaka Kamamoto, Noboru Harada
-
Patent number: 11842719Abstract: A sound processing method obtains note data representative of a note; obtains an audio signal to be processed; specifies, in accordance with the note, an expression sample representative of a sound expression to be imparted to the note and an expression period, of the audio signal, to which the sound expression is to be imparted to the note; and specifies, in accordance with the expression sample and the expression period, a processing parameter relating to an expression imparting processing for imparting the sound expression to a portion corresponding to the expression period in the audio signal. The method then generates a processed audio signal by performing the expression imparting processing in accordance with the expression sample, the expression period, and the processing parameter to the audio signal.Type: GrantFiled: September 21, 2020Date of Patent: December 12, 2023Assignee: YAMAHA CORPORATIONInventors: Merlijn Blaauw, Jordi Bonada, Ryunosuke Daido, Yuji Hisaminato
-
Patent number: 11816577Abstract: Generally, the present disclosure is directed to systems and methods that generate augmented training data for machine-learned models via application of one or more augmentation techniques to audiographic images that visually represent audio signals. In particular, the present disclosure provides a number of novel augmentation operations which can be performed directly upon the audiographic image (e.g., as opposed to the raw audio data) to generate augmented training data that results in improved model performance. As an example, the audiographic images can be or include one or more spectrograms or filter bank sequences.Type: GrantFiled: September 28, 2021Date of Patent: November 14, 2023Assignee: GOOGLE LLCInventors: Daniel Sung-Joon Park, Quoc Le, William Chan, Ekin Dogus Cubuk, Barret Zoph, Yu Zhang, Chung-Cheng Chiu
-
Patent number: 11763796Abstract: A computer-implemented method for speech synthesis, a computer device, and a non-transitory computer readable storage medium are provided. The method includes: obtaining a speech text to be synthesized; obtaining a Mel spectrum corresponding to the speech text to be synthesized according to the speech text to be synthesized; inputting the Mel spectrum into a complex neural network, and obtaining a complex spectrum corresponding to the speech text to be synthesized, wherein the complex spectrum comprises real component information and imaginary component information; and obtaining a synthetic speech corresponding to the speech text to be synthesized, according to the complex spectrum. The method can efficiently and simply complete speech synthesis.Type: GrantFiled: December 10, 2020Date of Patent: September 19, 2023Assignee: UBTECH ROBOTICS CORP LTDInventors: Dongyan Huang, Leyuan Sheng, Youjun Xiong
-
Patent number: 11756530Abstract: Example embodiments relate to techniques for training artificial neural networks or oilier machine-learning encoders to accurately predict the pitch of input audio samples in a semitone or otherwise logarithmically-scaled pitch space. An example method may include generating, from a sample of audio data, two training samples by applying two different pitch shifts to the sample of audio training data. This can be done by converting the sample of audio data into the frequency domain and then shifting the transformed data. These known shifts are then compared to the predicted pitches generated by applying the two training samples to the encoder. The encoder is then updated based on the comparison, such that the relative pitch output by the encoder is improved with respect to accuracy. One or more audio samples, labeled with absolute pitch values, can then be used to calibrate the relative pitch values generated by the trained encoder.Type: GrantFiled: September 25, 2020Date of Patent: September 12, 2023Assignee: Google LLCInventors: Marco Tagliasacchi, Mihajlo Velimirovic, Matthew Sharifi, Dominik Roblek, Christian Frank, Beat Gfeller
-
Patent number: 11749290Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for performing long-term prediction (LTP) are described. One example of the methods includes determining a pitch gain and a pitch lag of an input audio signal for at least a predetermined number of frames. It is determined that the pitch gain of the input audio signal has exceeded a predetermined threshold and that a change of the pitch lag of the input audio signal has been within a predetermined range for at least the predetermined number of frames. In response to determining that the pitch gain of the input audio signal has exceeded the predetermined threshold and that the change of the third pitch lag has been within the predetermined range for at least the predetermined number of frames, a pitch gain is set for a current frame of the input audio signal.Type: GrantFiled: July 12, 2021Date of Patent: September 5, 2023Assignee: HUAWEI TECHNOLOGIES CO., LTD.Inventor: Yang Gao
-
Patent number: 11727922Abstract: A computerized system for deriving expression of intent from recorded speech includes: a text classification module comparing a transcription of recorded speech against a text classifier to generate a first set of representations of potential intents; a phonetics classification module comparing a phonetic transcription of the recorded speech against a phonetics classifier to generate a second set of representations; an audio classification module comparing an audio version of the recorded speech with an audio classifier to generate a third set of representations; and a discriminator module for receiving the first, second and third sets of the representations of potential intents and generating one derived expression of intent by processing the first, second and third sets together; where at least two of the text classification module, the phonetics classification module, and the audio classification module are asynchronous processes from one another.Type: GrantFiled: May 11, 2021Date of Patent: August 15, 2023Assignee: Verint Americas Inc.Inventor: Moshe Villaizan
-
Patent number: 11711648Abstract: Techniques are provided for audio-based detection and tracking of an acoustic source. A methodology implementing the techniques according to an embodiment includes generating acoustic signal spectra from signals provided by a microphone array, and performing beamforming on the acoustic signal spectra to generate beam signal spectra, using time-frequency masks to reduce noise. The method also includes detecting, by a deep neural network (DNN) classifier, an acoustic event, associated with the acoustic source, in the beam signal spectra. The DNN is trained on acoustic features associated with the acoustic event. The method further includes performing pattern extraction, in response to the detection, to identify time-frequency bins of the acoustic signal spectra that are associated with the acoustic event, and estimating a motion direction of the source relative to the array of microphones based on Doppler frequency shift of the acoustic event calculated from the time-frequency bins of the extracted pattern.Type: GrantFiled: March 10, 2020Date of Patent: July 25, 2023Assignee: Intel CorporationInventors: Kuba Lopatka, Adam Kupryjanow, Lukasz Kurylo, Karol Duzinkiewicz, Przemyslaw Maziewski, Marek Zabkiewicz
-
Patent number: 11710492Abstract: Methods, systems, and devices for encoding are described. A device, which may be otherwise known as user equipment (UE), may support standards-compatible audio encoding (e.g., speech encoding) using a pre-encoded database. The device may receive a digital representation of an audio signal and identify, based on receiving the digital representation of the audio signal, a database that is pre-encoded according to a coding standard and that includes a quantity of digital representations of other audio signals. The device may encode the digital representation of the audio signal using a machine learning scheme and information from the database pre-encoded according to the coding standard. The device may generate a bitstream of the digital representation that is compatible with the coding standard based on encoding the digital representation of the audio signal, and output a representation of the bitstream.Type: GrantFiled: October 2, 2019Date of Patent: July 25, 2023Assignee: QUALCOMM IncorporatedInventors: Stephane Pierre Villette, Daniel Jared Sinder
-
Patent number: 11640824Abstract: Systems, devices, and methods transcribe words recorded in audio data. A computer-generated transcript is provided. The transcript comprises records for each word in the computer-generated transcript. At least one confirmation input is received for each record. The at least one confirmation input modifies a selected record and automatically identifies a next record for receiving a next confirmation input. A sequence of confirmation inputs may rapidly modify and validate each record in a sequence of records in the computer-generated transcript. A validated transcript is generated from the modified records and is provided from an evidence management system.Type: GrantFiled: July 15, 2020Date of Patent: May 2, 2023Assignee: Axon Enterprise, Inc.Inventors: Noah Spitzer-Williams, Choongyeun Cho, Thomas Crosley, Zachary Charles Goist, Daniel Michael Bellia, Vinh Hein Nguyen, Chelsea Alexander-Taylor
-
Patent number: 11636836Abstract: Provided is a method for processing audio including: acquiring an accompaniment audio signal and a voice signal of a current to-be-processed musical composition; determining a target reverberation intensity parameter value of the acquired accompaniment audio signal, wherein the target reverberation intensity parameter value is configured to indicate a rhythm speed, an accompaniment type, and a performance score of a singer of the current to-be-processed musical composition; and reverberating the acquired vocal signal based on the target reverberation intensity parameter value.Type: GrantFiled: March 23, 2022Date of Patent: April 25, 2023Assignee: BEIJING DAJIA INTERNET INFORMATION TECHNOLOGY CO., LTD.Inventors: Xiguang Zheng, Chen Zhang
-
Patent number: 11621725Abstract: A method for partitioning of input vectors for coding is presented. The method comprises obtaining of an input vector. The input vector is segmented, in a non-recursive manner, into an integer number, NSEG, of input vector segments. A representation of a respective relative energy difference between parts of the input vector on each side of each boundary between the input vector segments is determined, in a recursive manner. The input vector segments and the representations of the relative energy differences are provided for individual coding. Partitioning units and computer programs for partitioning of input vectors for coding, as well as positional encoders, are presented.Type: GrantFiled: January 11, 2022Date of Patent: April 4, 2023Assignee: TELEFONAKTIEBOLAGET LM ERICSSON (PUBL)Inventors: Tomas Jansson Toftgård, Volodya Grancharov, Jonas Svedberg
-
Patent number: 11610577Abstract: Methods and Systems for providing a change to a voice interacting with a user are described. Information indicating a change that can be made to the voice can be received. The voice can be changed based on the information.Type: GrantFiled: November 19, 2020Date of Patent: March 21, 2023Assignee: Capital One Services, LLCInventors: Anh Truong, Mark Watson, Jeremy Goodsitt, Vincent Pham, Fardin Abdi Taghi Abad, Kate Key, Austin Walters, Reza Farivar
-
Patent number: 11605377Abstract: The dialog device according to the present invention includes a prediction unit 254 configured to predict an utterance length attribute of a user utterance in response to a the machine utterance, a selection unit 256 configured to use the utterance length attribute to select, as a feature model for usage in an end determination of the user utterance, at least one of an acoustic feature model or a lexical feature model, and an estimation unit 258 configured to estimate an end point in the user utterance using the selected model. By using this dialog device, it is possible to shorten the waiting time until a response is output to a user utterance by a machine, and to realize a more natural conversation between a user and a machine.Type: GrantFiled: March 19, 2020Date of Patent: March 14, 2023Assignee: HITACHI, LTD.Inventors: Amalia Istiqlali Adiba, Takeshi Homma, Takashi Sumiyoshi
-
Patent number: 11553235Abstract: Techniques have been developed to facilitate the livestreaming of group audiovisual performances. Audiovisual performances including vocal music are captured and coordinated with performances of other users in ways that can create compelling user and listener experiences. For example, in some cases or embodiments, duets with a host performer may be supported in a sing-with-the-artist style audiovisual livestream in which aspiring vocalists request or queue particular songs for a live radio show entertainment format. The developed techniques provide a communications latency-tolerant mechanism for synchronizing vocal performances captured at geographically-separated devices (e.g., at globally-distributed, but network-connected mobile phones or tablets or at audiovisual capture devices geographically separated from a live studio).Type: GrantFiled: June 7, 2021Date of Patent: January 10, 2023Assignee: Smule, Inc.Inventors: Anton Holmberg, Benjamin Hersh, Jeannie Yang, Perry R. Cook, Jeffrey C. Smith
-
Patent number: 11545153Abstract: Provided is a device, a method that allow a remote terminal to perform a process on the basis of a local-terminal-side user utterance. There are a local terminal and a remote terminal. The local terminal performs a process of a semantic analysis of a user utterance input into the local terminal. On the basis of a result of the semantic analysis, the local terminal determines whether or not the user utterance is a request to the remote terminal for a process. Moreover, in a case where the user utterance is a request to the remote terminal for a process, the local terminal transmits the result of the semantic analysis by a semantic-analysis part to the remote terminal. The remote terminal receives the result of the semantic analysis of the local-terminal-side user utterance, and performs a process based on the received result of the semantic analysis of the local-terminal-side user utterance.Type: GrantFiled: March 12, 2019Date of Patent: January 3, 2023Assignee: Sony CorporationInventor: Keiichi Yamada
-
Patent number: 11488620Abstract: The present invention is a computer program product and method for increasing the playback speed of audio or other media files. The computer program product and method identifies pedagogic media files and adds a flag to the metadata of the media file. The flag represents the number and type of pauses or silent sections in the pedagogic media file. Based on the flag, the computer program product and method may fast forward or remove a portion of the pauses and silent sections to provide a new playback speed.Type: GrantFiled: June 12, 2019Date of Patent: November 1, 2022Assignee: International Business Machines CorporationInventor: Deepa Jain
-
Patent number: 11443761Abstract: A technique, suitable for real-time processing, is disclosed for pitch tracking by detection of glottal excitation epochs in speech signal. It uses Hilbert envelope to enhance saliency of the glottal excitation epochs and to reduce the ripples due to the vocal tract filter. The processing comprises the steps of dynamic range compression, calculation of the Hilbert envelope, and epoch marking. The Hilbert envelope is calculated using the output of a FIR filter based Hilbert transformer and the delay-compensated signal. The epoch marking uses a dynamic peak detector with fast rise and slow fall and nonlinear smoothing to further enhance the saliency of the epochs, followed by a differentiator or a Teager energy operator, and amplitude-duration thresholding. The technique is meant for use in speech codecs, voice conversion, speech and speaker recognition, diagnosis of voice disorders, speech training aids, and other applications involving pitch estimation.Type: GrantFiled: August 3, 2019Date of Patent: September 13, 2022Inventors: Prem Chand Pandey, Hirak Dasgupta, Nataraj Kathriki Shambulingappa
-
Patent number: 11443751Abstract: Innovations in phase quantization during speech encoding and phase reconstruction during speech decoding are described. For example, to encode a set of phase values, a speech encoder omits higher-frequency phase values and/or represents at least some of the phase values as a weighted sum of basis functions. Or, as another example, to decode a set of phase values, a speech decoder reconstructs at least some of the phase values using a weighted sum of basis functions and/or reconstructs lower-frequency phase values then uses at least some of the lower-frequency phase values to synthesize higher-frequency phase values. In many cases, the innovations improve the performance of a speech codec in low bitrate scenarios, even when encoded data is delivered over a network that suffers from insufficient bandwidth or transmission quality problems.Type: GrantFiled: February 12, 2021Date of Patent: September 13, 2022Assignee: Microsoft Technology Licensing, LLCInventors: Soren Skak Jensen, Sriram Srinivasan, Koen Bernard Vos
-
Patent number: 11430461Abstract: A method for detecting a voice activity in an input audio signal composed of frames includes that a noise characteristic of the input signal is determined based on a received frame of the input audio signal. A voice activity detection (VAD) parameter is derived based on the noise characteristic of the input audio signal using an adaptive function. The derived VAD parameter is compared with a threshold value to provide a voice activity detection decision. The input audio signal is processed according to the voice activity detection decision.Type: GrantFiled: September 21, 2020Date of Patent: August 30, 2022Assignee: HUAWEI TECHNOLOGIES CO., LTD.Inventor: Zhe Wang
-
Patent number: 11423902Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for handing off a user conversation between computer-implemented agents. One of the methods includes receiving, by a computer-implemented agent specific to a user device, a digital representation of speech encoding an utterance, determining, by the computer-implemented agent, that the utterance specifies a requirement to establish a communication with another computer-implemented agent, and establishing, by the computer-implemented agent, a communication between the other computer-implemented agent and the user device.Type: GrantFiled: July 27, 2020Date of Patent: August 23, 2022Assignee: GOOGLE LLCInventors: Johnny Chen, Thomas L. Dean, Qiangfeng Peter Lau, Sudeep Gandhe, Gabriel Schine
-
Patent number: 11380351Abstract: A method for pulmonary condition monitoring includes selecting a phrase from an utterance of a user of an electronic device, wherein the phrase matches an entry of multiple phrases. At least one speech feature that is associated with one or more pulmonary conditions within the phrase is identified. A pulmonary condition is determined based on analysis of the at least one speech feature.Type: GrantFiled: January 14, 2019Date of Patent: July 5, 2022Assignee: SAMSUNG ELECTRONICS CO., LTD.Inventors: Ebrahim Nematihosseinabadi, Md Mahbubur Rahman, Viswam Nathan, Korosh Vatanparvar, Jilong Kuang, Jun Gao
-
Patent number: 11348591Abstract: A speaker identification system and method to identify a speaker based on the speaker's voice is disclosed. In an exemplary embodiment, the speaker identification system comprises a Gaussian Mixture Model (GMM) for speaker accent and dialect identification for a given speech signal input by the speaker and an Artificial Neural Network (ANN) to identify the speaker based on the identified dialect, in which the output of the GMM is input to the ANN.Type: GrantFiled: September 23, 2021Date of Patent: May 31, 2022Assignee: King Abdulaziz UniversityInventors: Muhammad Moinuddin, Ubaid M. Al-Saggaf, Shahid Munir Shah, Rizwan Ahmed Khan, Zahraa Ubaid Al-Saggaf
-
Patent number: 11303489Abstract: A transmitting apparatus includes a first signal generating unit that generates, on the basis of data a first signal transmitted by single carrier block transmission; a second signal generating unit that generates, on the basis of an RS, a second signal transmitted by orthogonal frequency division multiplex transmission; a switching operator that selects and outputs the second signal in a first transmission period and selects and outputs the first signal in a second transmission period; an antenna that transmits the signal output from the switching operator; and a control-signal generating unit that controls the second signal generating unit such that, in the first transmission period, the RS is arranged in a frequency band allocated for transmission of the RS from the transmitting apparatus among frequency bands usable in OFDM.Type: GrantFiled: January 30, 2020Date of Patent: April 12, 2022Assignee: Mitsubishi Electric CorporationInventors: Fumihiro Hasegawa, Akinori Taira
-
Patent number: 11289067Abstract: Methods and systems for generating voices based on characteristics of an avatar. One or more characteristics of an avatar are obtained and one or more parameters of a voice synthesizer for generating a voice corresponding to the one or more avatar characteristics are determined. The voice synthesizer is configured based on the one or more parameters and a voice is generated using the parameterized voice synthesizer.Type: GrantFiled: June 25, 2019Date of Patent: March 29, 2022Assignee: International Business Machines CorporationInventors: Kristina Marie Brimijoin, Gregory Boland, Joseph Schwarz
-
Patent number: 11282534Abstract: Systems and methods for intelligent playback of media content may include an intelligent media playback system that, in response to determining the speech tempo in audio content by measuring syllable density of speech in the audio content, automatically adjusts a playback speed of the audio content as the audio content is being played based on the determined speech tempo. In some embodiments, the system may automatically and dynamically adjust the playback speed to result in a desired target speech tempo. In addition, the system may determine whether to automatically adjust playback speed of the audio content, as the media is being played, based on the detected speech tempo of the speech in the audio content and the determined type of content of media. Such automatic adjustments in playback speed result in more efficient playback of the audio content.Type: GrantFiled: August 3, 2018Date of Patent: March 22, 2022Assignee: Sling Media PVT LtdInventors: Yatish Jayant Naik Raikar, Varunkumar Tripathi, Karthik Mahabaleshwar Hegde
-
Patent number: 11276412Abstract: A method and device allocates a bit-budget to a plurality of first parts of a CELP core module of (a) an encoder for encoding a sound signal or (b) a decoder for decoding the sound signal. In the method and device, bit-budget allocation tables assign, for each of a plurality of intermediate bit rates, respective bit-budgets to the first CELP core module parts. A CELP core module bit rate is determined and one of the intermediate bit rates is selected based on the determined CELP core module bit rate. The respective bit-budgets assigned by the bit-budget allocation tables for the selected intermediate bit rate are allocated to the first CELP core module parts.Type: GrantFiled: September 20, 2018Date of Patent: March 15, 2022Assignee: VOICEAGE CORPORATIONInventor: Vaclav Eksler
-
Patent number: 11270721Abstract: Pre-processing systems, methods of pre-processing, and speech processing systems for improved Automated Speech Recognition are provided. Some pre-processing systems for improved speech recognition of a speech signal are provided, which systems comprise a pitch estimation circuit; and a pitch equalization processor. The pitch estimation circuit is configured to receive the speech signal to determine a pitch index of the speech signal, and the pitch equalization processor is configured to receive the speech signal and pitch information, to equalize a speech pitch of the speech signal using the pitch information, and to provide a pitch-equalized speech signal.Type: GrantFiled: May 21, 2019Date of Patent: March 8, 2022Assignee: PLANTRONICS, INC.Inventors: Youhong Lu, Arun Rajasekaran
-
Patent number: 11270071Abstract: Systems, apparatuses, and methods are described herein for providing language-level content recommendations to users based on an analysis of closed captions of content viewed by the users and other data. Language-level analysis of content viewed by a user may be performed to generate metrics that are associated with the user. The metrics may be used to provide recommendations for content, which may include advertising, that is closely aligned with the user's interests.Type: GrantFiled: December 28, 2017Date of Patent: March 8, 2022Assignee: Comcast Cable Communications, LLCInventor: Richard Walsh
-
Patent number: 11263876Abstract: Data is collected for Self-Service Terminals (SSTs) including tallies, events, and outcomes associated with servicing the SSTs. Statistical correlations are derived from the tallies and events with respect to the outcomes. Subsequent collected data is processed with the statistical correlations and a probability for a failure of a component or a part of the component associated with a particular SST is reported for servicing the component or part before the failure.Type: GrantFiled: September 28, 2017Date of Patent: March 1, 2022Assignee: NCR CorporationInventors: Claudio Cifarelli, Gardiner Arthur, Iain M. N. Cowan, Massimo Mastropietro, Callum Ellis Morton
-
Patent number: 11250826Abstract: Digital signal processing and machine learning techniques can be employed in a vocal capture and performance social network to computationally generate vocal pitch tracks from a collection of vocal performances captured against a common temporal baseline such as a backing track or an original performance by a popularizing artist. In this way, crowd-sourced pitch tracks may be generated and distributed for use in subsequent karaoke-style vocal audio captures or other applications. Large numbers of performances of a song can be used to generate a pitch track. Computationally determined pitch trackings from individual audio signal encodings of the crowd-sourced vocal performance set are aggregated and processed as an observation sequence of a trained Hidden Markov Model (HMM) or other statistical model to produce an output pitch track.Type: GrantFiled: October 28, 2019Date of Patent: February 15, 2022Assignee: Smule, Inc.Inventors: Stefan Sullivan, John Shimmin, Dean Schaffer, Perry R. Cook
-
Patent number: 11250221Abstract: Methods, systems, and computer-readable storage media for contextual interpretation of a Japanese word are provided. A first set of characters representing Japanese words is received. The first set of characters are received is input to a neural network. The neural network is trained to processes characters based on bi-directional context interpretation. The first set of characters is processed by the neural network through a plurality of learning layers that process the first set of characters in an order of the first set of characters and in a reverse order to determine semantical meanings of the characters in the first set of characters. An alphabet representation of at least one character of the first set of characters representing a Japanese word is output. The alphabet representation corresponds to a semantical meaning of the at least one character within the first set of characters.Type: GrantFiled: March 14, 2019Date of Patent: February 15, 2022Assignee: SAP SEInventor: Sean Saito
-
Patent number: 11244694Abstract: A method is described that processes an audio signal. A discontinuity between a filtered past frame and a filtered current frame of the audio signal is removed using linear predictive filtering.Type: GrantFiled: January 23, 2017Date of Patent: February 8, 2022Assignee: Fraunhofer-Gesellschaft zur Foerderung der angewandten Forschung e.V.Inventors: Emmanuel Ravelli, Manuel Jander, Grzegorz Pietrzyk, Martin Dietz, Marc Gayer
-
Patent number: 11241635Abstract: The present disclosure according to at least one embodiment relates to, in the learning process of a child using smart toys, a method and system for providing an interactive service by using a smart toy, which provide more accurate classified emotional state of the child based on at least one or more sensed data items of an optical image, a thermal image, and voice data of the child, as well as adaptively provide a flexible and versatile interactive service according to classified emotions.Type: GrantFiled: November 15, 2019Date of Patent: February 8, 2022Inventors: Heui Yul Noh, Myeong Ho Roh, Chang Woo Ban, Oh Soung Kwon, Seung Pil Lee, Seung Min Shin