Speech Signal Processing Patents (Class 704/200)
  • Patent number: 11086487
    Abstract: Methods and apparatuses are comprising: a screen; an input device; at least one non-transitory memory storing instructions; and one or more processors in communication with the screen, the input device, and the at least one non-transitory memory, wherein the one or more processors execute the instructions to: display, utilizing the screen, a contactor window including: at least one contactor user interface element configured to have presented, in connection therewith, a plurality of contactor identifiers of a contactor communicant represented by a contactor email communications agent, at least one contactee user interface element configured to have presented, in connection therewith, a plurality of contactee identifiers of a plurality of contactee communicants each represented by a corresponding contactee email communications agent, a message user interface element configured to present a message addressed from one of the plurality of contactor identifiers of the contactor selected in connection with the at l
    Type: Grant
    Filed: November 25, 2019
    Date of Patent: August 10, 2021
    Assignee: GRUS TECH, LLC
    Inventor: Robert Paul Morris
  • Patent number: 11081126
    Abstract: A method for processing sound data for separating N sound sources of a multichannel sound signal sensed in a real medium. The method includes: separating sources to the sensed multichannel signal and obtaining a separation matrix and a set of M sound components, with M?N; calculating a set of bi-variate first descriptors representative of statistical relations between the components of the pairs of the set obtained of M components, calculating a set of uni-variate second descriptors representative of characteristics of encoding of the components of the set obtained of M components; and classifying the components of the set of M components, according to two classes of components, a first class of N direct components corresponding to the N direct sound sources and a second class of M?N reverberated components, by calculating probability of membership in one of the two classes, dependent on the sets of first and second descriptors.
    Type: Grant
    Filed: May 24, 2018
    Date of Patent: August 3, 2021
    Inventors: Mathieu Baque, Alexandre Guerin
  • Patent number: 11074917
    Abstract: A method of speaker identification, comprises: receiving an audio signal representing speech; removing effects of a channel and/or noise from the received audio signal to obtain a cleaned audio signal; obtaining an average spectrum of at least a part of the cleaned audio signal; and comparing the average spectrum with a long term average speaker model for an enrolled speaker. Based on the comparison, it can be determined whether the speech is the speech of the enrolled speaker.
    Type: Grant
    Filed: October 25, 2018
    Date of Patent: July 27, 2021
    Assignee: Cirrus Logic, inc.
    Inventor: John Paul Lesso
  • Patent number: 11074297
    Abstract: Provided is a method for communication with an intelligent industrial assistant and industrial machine. The method may include receiving a first natural language input from a user. The first natural language input may be associated with a first command for an industrial machine to perform a first process. The industrial machine may be instructed to perform the first process based on the first natural language input. A second natural language input may be received from the user while the industrial machine is performing the first process. A first response may be determined based on the second natural language input. Communication of the first response to the user may be initiated while the industrial machine is performing the first process. A system and computer program product are also disclosed.
    Type: Grant
    Filed: July 16, 2019
    Date of Patent: July 27, 2021
    Assignee: iT SpeeX LLC
    Inventor: Ronald D. Bagley, Jr.
  • Patent number: 11017792
    Abstract: An audio system includes: a head unit comprising at least a first processor, the head unit being configured to generate a plurality of program content signals, one of the plurality of program content signals being a phone program content signal being received from a phone, wherein the plurality of program content signals are transduced by an acoustic transducer into an acoustic signal within a vehicle cabin; a microphone disposed within the vehicle cabin such that the microphone receives the acoustic signal and produces a microphone signal comprising a plurality of echo signals; and a multichannel echo-cancellation unit being implemented by a second processor, the multichannel echo-cancellation unit being configured to receive a plurality of reference signals and to minimize the plurality of echo signals, according to the plurality of reference signals, to produce an estimated voice signal, and to provide the estimated voice signal to the head unit.
    Type: Grant
    Filed: June 17, 2019
    Date of Patent: May 25, 2021
    Assignee: Bose Corporation
    Inventors: Cristian M. Hera, Elie Bou Daher, Jeffery R. Vautin, Vigneish Kathavarayan, Ankita D. Jain, Tobe Z. Barksdale
  • Patent number: 11011182
    Abstract: An audio processing system has multiple microphones that capture an audio signal. A noise suppression circuit analyses the audio signal to detect a type of noise present in the signal (e.g., stationary or non-stationary background noise). Based on the detected background noise type, the system operates in either a first or second mode of operation. In the first mode (stationary noise detected), one microphone is used to enhance a speech signal from the audio signal, and in the second mode (non-stationary noise detected), more than one microphone is used to enhance the speech signal. Processing more than one microphone input signal requires additional complexity and more processing power than one-microphone speech enhancement, so by classifying the background noise type and then switching between one microphone or N-microphones based speech enhancement, processing power is reduced during stationary noise conditions.
    Type: Grant
    Filed: March 25, 2019
    Date of Patent: May 18, 2021
    Assignee: NXP B.V.
    Inventors: Gunasekaran Shanmugam, Omkar Reddy, Vinoda Kumar Somashekhara
  • Patent number: 10991378
    Abstract: A method reduces noise in an audio signal. In the method a signal component subsequent to the prediction time is predicted for a plurality of prediction times with reference to signal components of the audio signal that are respectively prior to the prediction time. A predicted audio signal is formed from the signal components respectively following a prediction time, and a noise-reduced audio signal is generated based on the predicted audio signal.
    Type: Grant
    Filed: April 30, 2019
    Date of Patent: April 27, 2021
    Assignee: Sivantos Pte. Ltd.
    Inventor: Tobias Daniel Rosenkranz
  • Patent number: 10993048
    Abstract: A hearing device includes: an antenna for receiving a first wireless input signal from an external device and providing an antenna output signal; a transceiver configured to provide a transceiver input signal; an input module for provision of a first input signal, the input module comprising a first microphone; a processor; a receiver configured to provide an audio output signal; a pre-processor for provision of a pre-processor output signal based on the first input signal; and a controller comprising a speech intelligibility estimator for determining a speech intelligibility indicator indicative of speech intelligibility based on the transceiver input signal and a first controller input signal, wherein the controller is configured to provide a controller output signal based on the speech intelligibility indicator; wherein the pre-processor is configured to apply, based on the controller output signal, a pre-processing scheme to the first input signal and/or the transceiver input signal.
    Type: Grant
    Filed: May 8, 2018
    Date of Patent: April 27, 2021
    Assignee: GN Hearing A/S
    Inventors: Jesper B. Boldt, Charlotte Sørensen, Rene Burmand Johannesson
  • Patent number: 10978096
    Abstract: Disclosed are techniques for transmitting bundles of silence indicator (SID) frames during a voice call among a plurality of access terminals. In an aspect, a source access terminal detects a transition to a silence state, generates, in response to detection of the transition, at least a first bundle of SID frames, wherein each SID frame of the at least the first bundle of SID frames includes data representing comfort noise to be played at one or more target access terminals of the plurality of access terminals during the silence state, and transmits the at least the first bundle of SID frames to a base station serving the source access terminal. In an aspect, the base station receives the at least the first bundle of SID frames, and periodically forwards SID frames of the at least the first bundle of SID frames to the one or more target access terminals.
    Type: Grant
    Filed: April 23, 2018
    Date of Patent: April 13, 2021
    Assignee: QUALCOMM Incorporated
    Inventors: Srinivasan Balasubramanian, Neha Goel, Ramachandran Subramanian, Shailesh Maheshwari, Kirankumar Bhoja Anchan
  • Patent number: 10978074
    Abstract: A method for processing speech, comprising semantically parsing a received natural language speech input with respect to a plurality of predetermined command grammars in an automated speech processing system; determining if the parsed speech input unambiguously corresponds to a command and is sufficiently complete for reliable processing, then processing the command; if the speech input ambiguously corresponds to a single command or is not sufficiently complete for reliable processing, then prompting a user for further speech input to reduce ambiguity or increase completeness, in dependence on a relationship of previously received speech input and at least one command grammar of the plurality of predetermined command grammars, reparsing the further speech input in conjunction with previously parsed speech input, and iterating as necessary. The system also monitors abort, fail or cancel conditions in the speech input.
    Type: Grant
    Filed: October 31, 2017
    Date of Patent: April 13, 2021
    Assignee: Great Northern Research, LLC
    Inventors: Philippe Roy, Paul J. Lagassey
  • Patent number: 10957320
    Abstract: Systems, computer-implemented methods, and computer program products that can facilitate predicting a source of a subsequent spoken dialogue are provided. According to an embodiment, a system can comprise a memory that stores computer executable components and a processor that executes the computer executable components stored in the memory. The computer executable components can comprise a speech receiving component that can receive a spoken dialogue from a first entity. The computer executable components can further comprise a speech processing component that can employ a network that can concurrently process a transition type and a dialogue act of the spoken dialogue to predict a source of a subsequent spoken dialogue.
    Type: Grant
    Filed: January 25, 2019
    Date of Patent: March 23, 2021
    Assignees: INTERNATIONAL BUSINESS MACHINES CORPORATION, THE REGENTS OF THE UNIVERSITY OF MICHIGAN
    Inventors: Lazaros Polymenakos, Dimitrios B. Dimitriadis, Zakaria Aldeneh, Emily Mower Provost
  • Patent number: 10949458
    Abstract: Systems and methods include optimizing resource utilization of an automated content recognition (ACR) system by delaying the identification of certain large quantities of media cue data. The delayed identification of the media may be for the purpose of, for example, generating usage statistics or other non-time critical work flow, among other non-real-time uses. In addition, real-time identification of a certain subset of media cue data is performed for the purposes of video program substitution, interactive television opportunities or other time-specific events.
    Type: Grant
    Filed: July 15, 2016
    Date of Patent: March 16, 2021
    Assignee: INSCAPE DATA, INC.
    Inventors: Zeev Neumeier, Michael Collette
  • Patent number: 10938995
    Abstract: A system and method for associating an audio clip with an object is provided wherein the voice-based system, such as a voicemail system, is used to record the audio clips.
    Type: Grant
    Filed: May 23, 2019
    Date of Patent: March 2, 2021
    Assignee: Quest Patent Research Corporation
    Inventors: Jarold Bowerman, David Mancini
  • Patent number: 10923123
    Abstract: A system receives a first voice input from a first user, such as a baby or a person who has had a stroke. Although the first user intends to communicate a particular meaning, the first voice input is not in a language that is known to the system and thus the system does not know the particular meaning that the first user intended. After receiving the first voice input, a second voice input is received from a second user. This second voice input is in a language that is known to the system. The system determines a meaning of the second voice input, associates this meaning with the first voice input, and uses this association to train a machine learning system. This machine learning system is used to attempt to understand the meaning of subsequent voice inputs received from the first user.
    Type: Grant
    Filed: March 8, 2019
    Date of Patent: February 16, 2021
    Assignee: Motorola Mobility LLC
    Inventors: Zhengping Ji, Rachid M. Alameh
  • Patent number: 10925167
    Abstract: A computing system includes a circuit board assembly and multiple expansion cards connected to one another and also connected to the circuit board assembly. The connected expansion cards form a modular expansion card bus that allows the expansion cards to communicate between each other without routing the communications through the circuit board assembly. In some embodiments, the expansion cards are mounted on a tray that includes mounting pins that engage mounting slots of the expansion cards, allowing for simple installation of various combinations of expansion cards connected together to form a modular expansion card bus.
    Type: Grant
    Filed: August 23, 2019
    Date of Patent: February 16, 2021
    Assignee: Amazon Technologies, Inc.
    Inventors: Kevin Bailey, Priscilla Lam, Darin Lee Frink, Jason Alexander Harland, Felipe Enrique Ortega Gutierrez
  • Patent number: 10878829
    Abstract: A method for spectrum recovery in spectral decoding of an audio signal, comprises obtaining of an initial set of spectral coefficients representing the audio signal, and determining a transition frequency. The transition frequency is adapted to a spectral content of the audio signal. Spectral holes in the initial set of spectral coefficients below the transition frequency are noise filled and the initial set of spectral coefficients are bandwidth extended above the transition frequency. Decoders and encoders being arranged for performing part of or the entire method are also illustrated.
    Type: Grant
    Filed: December 21, 2018
    Date of Patent: December 29, 2020
    Assignee: TELEFONAKTIEBOLAGET LM ERICSSON (PUBL)
    Inventors: Gustaf Ullberg, Manuel Briand, Anisse Taleb
  • Patent number: 10847169
    Abstract: An audio signal encoding method is provided comprising: receiving first and second audio signal frames; processing a second portion of the first audio signal frame and a first portion of the second audio signal frame using an orthogonal transformation to determine in part a first intermediate encoding result; and processing the first intermediate encoding result using an orthogonal transformation to determine a set of spectral coefficients that corresponds to at least a portion of the first audio signal frame.
    Type: Grant
    Filed: April 30, 2018
    Date of Patent: November 24, 2020
    Assignee: DTS, Inc.
    Inventors: Michael M. Goodwin, Antonius Kalker, Albert Chau
  • Patent number: 10818307
    Abstract: Method, apparatus, and storage medium for voice imitation are provided. The voice imitation method, includes: separately obtaining a training voice of a source user and training voices of a plurality of imitation users including a target user; determining, according to the training voice of the source user and a training voice of the target user, a conversion rule for converting the training voice of the source user into the training voice of the target user; collecting voice information of the source user; and converting the voice information of the source user into an imitation voice of the target user according to the conversion rule.
    Type: Grant
    Filed: January 11, 2018
    Date of Patent: October 27, 2020
    Assignee: TENCENT TECHNOLOGY (SHENZHEN) COMPANY LIMITED
    Inventors: Yuanyuan Liu, Guangjun Liu, Guoli Lu, Fen Fu
  • Patent number: 10817647
    Abstract: An apparatus, computer program product and method are provided for managing automatic generation of reports using computer program queries. Embodiments intelligently determining if a report is needed based on application data, and if so, automatically generating the report. The application data may include data from a calendar application, communication application, social media application, user profile, and/or the like. The application data is analyzed to determine report parameters, including a type of report data, range of the report data, and grouping characteristic of the report data. The report parameters may be further based on a report request history, and/or user profiles, in which a user may indicate generally what data may be desired in a report. In some examples, no user input is required for a particular report to be generated. Reports that are no longer needed may be automatically prevented from being generated.
    Type: Grant
    Filed: October 26, 2017
    Date of Patent: October 27, 2020
    Assignee: Wells Fargo Bank, N.A.
    Inventors: Jeffrey H. Johnson, Debra L. Johnson, Gretchen W. Owens, Nadezhda V. Turner
  • Patent number: 10803867
    Abstract: A dialogue system, comprises an input unit configured to acquire utterance contents of a user in a dialogue; a mode determining unit configured to determine, based on the utterance contents acquired by the input unit, whether a mode of the dialogue is task-oriented or non-task-oriented; a plurality of intention understanding units each corresponding to a specific domain; and a domain determining unit configured to determine, when the mode of the dialogue is task-oriented, a domain of the dialogue based on a result of intention understanding of the utterance contents performed using each of the plurality of intention understanding units.
    Type: Grant
    Filed: October 4, 2018
    Date of Patent: October 13, 2020
    Assignee: TOYOTA JIDOSHA KABUSHIKI KAISHA
    Inventors: Sei Kato, Takuma Minemura
  • Patent number: 10796716
    Abstract: A speech-capture device can capture audio data during wakeword monitoring and use the audio data to determine if a user is present nearby the device, even if no wakeword is spoken. Audio such as speech, human originating sounds (e.g., coughing, sneezing), or other human related noises (e.g., footsteps, doors closing) can be used to detect audio. Audio frames are individually scored as to whether a human presence is detected in the particular audio frames. The scores are then smoothed relative to nearby frames to create a decision for a particular frame. Presence information can then be sent according to a periodic schedule to a remote device to create a presence “heartbeat” that regularly identifies whether a user is detected proximate to a speech-capture device.
    Type: Grant
    Filed: October 11, 2018
    Date of Patent: October 6, 2020
    Assignee: AMAZON TECHNOLOGIES, INC.
    Inventors: Shiva Kumar Sundaram, Chao Wang, Shiv Naga Prasad Vitaladevuni, Spyridon Matsoukas, Arindam Mandal
  • Patent number: 10791412
    Abstract: Methods and systems are provided for visualizing spatial audio using determined properties for time segments of the spatial audio. Such properties include the position sound is coming from, intensity of the sound, focus of the sound, and color of the sound at a time segment of the spatial audio. These properties can be determined by analyzing the time segment of the spatial audio. Upon determining these properties, the properties are used in rendering a visualization of the sound with attributes based on the properties of the sound(s) at the time segment of the spatial audio.
    Type: Grant
    Filed: February 13, 2020
    Date of Patent: September 29, 2020
    Assignee: ADOBE INC.
    Inventors: Stephen Joseph DiVerdi, Yaniv De Ridder
  • Patent number: 10783903
    Abstract: A sound collection apparatus includes: a sound collection unit including a microphone configured to collect sound; a noise determination unit configured to determine noise in dictation based on voice collected by the sound collection unit; and a presentation unit configured to perform presentation based on a determination result by the noise determination unit. With this configuration, presentation is performed to indicate environmental noise at sound collection for dictation, which leads to improved efficiency of dictation work.
    Type: Grant
    Filed: May 2, 2018
    Date of Patent: September 22, 2020
    Assignee: OLYMPUS CORPORATION
    Inventors: Kazutaka Tanaka, Osamu Nonaka, Kazuhiko Osa
  • Patent number: 10771631
    Abstract: Systems and methods are described for modifying one of far-end signal playback and capture of local audio on an audio device. Frames of both a far-end audio stream and a near-end audio stream may be analyzed using a measure of voice activity, the analyzing producing voice data associated with each frame. Based on the voice data, a conference state may be determined, and one of playback of the far-end audio stream and capture of local audio on an audio device may be modified based on the determined conference state. By associating the likely intent with a predefined state, the device may further cull or remove unwanted or unlikely content from the device input and output. This may have a substantial advantage in allowing for full duplex operation in the case of more meaningful and continuing voice activity, particularly in the case where there are many connected endpoints.
    Type: Grant
    Filed: August 2, 2017
    Date of Patent: September 8, 2020
    Assignee: Dolby Laboratories Licensing Corporation
    Inventors: David Gunawan, Glenn N. Dickins
  • Patent number: 10770050
    Abstract: An audio data processing method and apparatus are provided. The method includes obtaining audio data. An overall spectrum of the audio data is obtained and separated into a singing voice spectrum and an accompaniment spectrum. An accompaniment binary mask of the audio data is calculated according to the audio data. The singing voice spectrum and the accompaniment spectrum are processed using the accompaniment binary mask, to obtain accompaniment data and singing voice data.
    Type: Grant
    Filed: June 2, 2017
    Date of Patent: September 8, 2020
    Assignee: TENCENT TECHNOLOGY (SHENZHEN) COMPANY LIMITED
    Inventors: Bi Lei Zhu, Ke Li, Yong Jian Wu, Fei Yue Huang
  • Patent number: 10762898
    Abstract: A voice input by a vehicle user is taken as a basis for determining at least one keyword from a set of prescribed keywords. The at least one keyword is taken as a basis for determining at least one event and/or at least one state from a set of events and/or states of the vehicle that are stored during a prescribed period of time. This involves the respective event and/or the respective state being stored in conjunction with at least one condition occurrence that characterizes a respective condition that needs to be met in order for the event to occur and/or the respective state to exist. In addition, a response is determined from a set of prescribed responses on the basis of the condition occurrence that is associated with the determined event and/or state. Furthermore, a signaling signal is determined on the basis of the ascertained response.
    Type: Grant
    Filed: January 30, 2015
    Date of Patent: September 1, 2020
    Assignee: Bayerische Motoren Werke Aktiengesellschaft
    Inventor: Andreas Haslinger
  • Patent number: 10732818
    Abstract: A mobile terminal including a terminal body; a touch screen; a plurality of magnetic sensors configured to detect a spatial position of an input device having a magnetic field generating unit; and a controller configured to display a first graphic object notifying an area corresponding to the detected spatial position on the touch screen when the spatial position of the input device is detected adjacent to an edge of the terminal body at an outside of the touch screen without contacting the touch screen, and display a second graphic object notifying a hidden function with respect to a displayed area of the first graphic object on the touch screen when the detected spatial position of the input device is fixed for a predetermined time.
    Type: Grant
    Filed: December 7, 2016
    Date of Patent: August 4, 2020
    Assignee: LG ELECTRONICS INC.
    Inventors: Yoonchan Won, Yongjae Kim, Suyoung Lee
  • Patent number: 10706867
    Abstract: A method and system for converting a source voice to a target voice is disclosed. The method comprises: recording source voice data and target voice data; extracting spectral envelope features from the source voice data and target voice data; time-aligning pairs of frames based on the extracted spectral envelope features; converting each pair of frames into a frequency domain; generating a plurality of frequency-warping factor candidates, wherein each of the plurality of frequency-warping factor candidates is associated with one of the pairs of frames; generating a single global frequency-warping factor based on the candidates; acquiring source speech; converting the source speech to target speech based on the global frequency-warping factor; generating a waveform comprising the target speech; and playing the waveform comprising the target speech to a user.
    Type: Grant
    Filed: March 5, 2018
    Date of Patent: July 7, 2020
    Assignee: OBEN, INC.
    Inventors: Fernando Villavicencio, Mark Harvilla
  • Patent number: 10622002
    Abstract: A method of building a new voice having a new timbre using a timbre vector space includes receiving timbre data filtered using a temporal receptive field. The timbre data is mapped in the timbre vector space. The timbre data is related to a plurality of different voices. Each of the plurality of different voices has respective timbre data in the timbre vector space. The method builds the new timbre using the timbre data of the plurality of different voices using a machine learning system.
    Type: Grant
    Filed: May 24, 2018
    Date of Patent: April 14, 2020
    Assignee: Modulate, Inc.
    Inventors: William Carter Huffman, Michael Pappas
  • Patent number: 10594817
    Abstract: A social network platform and method thereof for providing Internet of Things (I-o-T) devices with social behavior for communicating natural language (NL) text messages. An I-o-T device is provided with a social device application to form a unit capable of: reading free form NL messages, and responsively perform an action. The social device application generates NL text in response to reading a text message and/or in response to receiving readings from a set of sensors. Types of messages generated include messages for initiating social relationships with other devices which may communicate an acceptance/declination. The platform may be centralized with a server for ranking the importance of read messages based on the relationships and addressing NL text messages to other social units or groups of social units based on the relationships. The platform further enables direct messaging between social unit devices, brokering trust, and moderating information flow between devices.
    Type: Grant
    Filed: October 4, 2017
    Date of Patent: March 17, 2020
    Assignee: International Business Machines Corporation
    Inventors: Vincent Lonij, Bradley Eck, Amadou Ba
  • Patent number: 10587710
    Abstract: A social network platform and method thereof for providing Internet of Things (I-o-T) devices with social behavior for communicating natural language (NL) text messages. An I-o-T device is provided with a social device application to form a unit capable of: reading free form NL messages, and responsively perform an action. The social device application generates NL text in response to reading a text message and/or in response to receiving readings from a set of sensors. Types of messages generated include messages for initiating social relationships with other devices which may communicate an acceptance/declination. The platform may be centralized with a server for ranking the importance of read messages based on the relationships and addressing NL text messages to other social units or groups of social units based on the relationships. The platform further enables direct messaging between social unit devices, brokering trust, and moderating information flow between devices.
    Type: Grant
    Filed: November 9, 2017
    Date of Patent: March 10, 2020
    Assignee: International Business Machines Corporation
    Inventors: Vincent Lonij, Bradley Eck, Amadou Ba
  • Patent number: 10585554
    Abstract: A system, process and computer-readable media that incorporate teachings of the subject disclosure may include, for example, an interactive application delivering captions of an audio signal, such as a voicemail message or audio received concurrently during a telephone conversation. The application can be a television application, for example, receiving at a media processor associated with equipment of a first party, a textual interpretation of an audio signal of a second party, for example, during an active telephone call between the first party and the second party. A graphical image of the textual interpretation of the audio signal is rendered at the media processor and presented to a display device, such as a television display. Other embodiments are disclosed.
    Type: Grant
    Filed: April 8, 2016
    Date of Patent: March 10, 2020
    Assignee: AT&T Intellectual Property I, L.P.
    Inventors: Darryl Cynthia Moore, Burt Joseph Bossi
  • Patent number: 10572538
    Abstract: According to an embodiment, a lattice finalization device finalizes a portion of a lattice that is generated by pattern recognition with respect to a signal on a frame-by-frame basis in chronological order. The device includes a detector and a finalizer. The detector is configured to detect, as a splitting position, a frame in the lattice in which the number of nodes and passing arcs is equal to or smaller than a reference value set in advance. The finalizer is configured to finalize nodes and arcs in paths from a start node to the splitting position in the lattice.
    Type: Grant
    Filed: April 22, 2016
    Date of Patent: February 25, 2020
    Assignee: KABUSHIKI KAISHA TOSHIBA
    Inventor: Manabu Nagao
  • Patent number: 10553200
    Abstract: A text-to-speech (TTS) computing includes a processor and a memory. The TTS computing device is configured to generate a machine pronunciation of a text data according to at least one phonetic rule, and provide the machine pronunciation to a user interface of the TTS computing device such that the machine pronunciation is audibly communicated to a user of the TTS computing device. The TTS computing device is also configured to receive a pronunciation correction of the machine pronunciation from the user via the user interface, and store the pronunciation correction in a TTS data source. The TTS computing device is further configured to assign the pronunciation correction provided by the user to a user profile that corresponds to the text data.
    Type: Grant
    Filed: April 27, 2018
    Date of Patent: February 4, 2020
    Assignee: Mastercard International Incorporated
    Inventor: Jason Jay Lacoss-Arnold
  • Patent number: 10535336
    Abstract: A system and method of converting source speech to target speech using intermediate speech data is disclosed. The method comprises identifying intermediate speech data that match target voice training data based on acoustic features; performing dynamic time warping to match the second set of acoustic features of intermediate speech data and the first set of acoustic features of target voice training data; training a neural network to convert the intermediate speech data to target voice training data; receiving source speech data; converting the source speech data to an intermediate speech; converting the intermediate speech to a target speech sequence using the neural network; and converting the target speech sequence to target speech using the pitch from the target voice training data.
    Type: Grant
    Filed: January 22, 2019
    Date of Patent: January 14, 2020
    Assignee: OBEN, INC.
    Inventor: Seyed Hamidreza Mohammadi
  • Patent number: 10536775
    Abstract: An auditory signal processor includes a filter bank generating frequency components of a source audio signal; a spatial localization network operative in response to the frequency components to generate spike trains for respective spatially separated components of the source audio signal; a cortical network operative in response to the spike trains to generate a resultant spike train for selected spatially separated components of the source audio signal; and a stimulus reconstruction circuit that processes the resultant spike train to generate a reconstructed audio output signal for a target component of the source audio signal. The cortical network incorporates top-down attentional inhibitory modulation of respective spatial channels to produce the resultant spike train for the selected spatially separate components of the source audio signal, and the stimulus reconstruction circuit employs convolution of a reconstruction kernel with the resultant spike train to generate the reconstructed audio output.
    Type: Grant
    Filed: June 21, 2019
    Date of Patent: January 14, 2020
    Assignee: Trustees of Boston University
    Inventors: Kamal Sen, Harry Steven Colburn, Junzi Dong, Kenny Feng-Hsu Chou
  • Patent number: 10528867
    Abstract: A method for learning a neural network by adjusting a learning rate each time when an accumulated number of iterations reaches one of a first to an n-th specific values. The method includes steps of: (a) a learning device, while increasing k from 1 to (n?1), (b1) performing a k-th learning process of repeating the learning of the neural network at a k-th learning rate by using a part of the training data while the accumulated number of iterations is greater than a (k?1)-th specific value and is equal to or less than a k-th specific value, (b2) changing a k-th gamma to a (k+1)-th gamma by referring to k-th losses of the neural network which are obtained by the k-th learning process and (ii) changing a k-th learning rate to a (k+1)-th learning rate by using the (k+1)-th gamma.
    Type: Grant
    Filed: October 8, 2018
    Date of Patent: January 7, 2020
    Assignee: StradVision, Inc.
    Inventors: Kye-Hyeon Kim, Yongjoong Kim, Insu Kim, Hak-Kyoung Kim, Woonhyun Nam, Sukhoon Boo, Myungchul Sung, Donghun Yeo, Wooju Ryu, Taewoong Jang, Kyungjoong Jeong, Hongmo Je, Hojin Cho
  • Patent number: 10515156
    Abstract: Customer support, and other types of activities in which there is a dialogue between two humans can generate large volumes of conversation records. Automated analysis of these records can provide information about high-level features of, for example, the workings of a customer service department. Analysis of these conversations between a customer and a customer-support agent may also allow identification of customer support activities that can be provided by virtual agents instead of actual human agents. The analysis may evaluate conversations in terms of complexity, duration, and sentiment of the participants. Additionally, the conversations may also be analyzed to identify the existence of selected concepts or keywords. Workflow characteristics, the extent to which the conversation represents a multi-step process intended to accomplish a task, may also be determined for the conversations.
    Type: Grant
    Filed: July 8, 2019
    Date of Patent: December 24, 2019
    Assignee: Verint Americas Inc
    Inventor: Charles C. Wooters
  • Patent number: 10504536
    Abstract: A technique for estimating and enhancing audio quality in a real-time communication session between parties over a computer network produces real-time measurements of factors that are known to impact audio quality, assigns a separate MOS value to each of the measured factors, and combines the MOS values for the various factors to produce an overall measure of audio quality. At least one party to the real-time communication session operates a computing device that runs a software program, and the technique further includes directing the software program to render an indication of the overall audio quality, thereby enabling the party operating the computing device to take remedial action to improve the audio quality.
    Type: Grant
    Filed: November 30, 2017
    Date of Patent: December 10, 2019
    Assignee: LogMeIn, Inc.
    Inventors: Bjorn Volcker, Matthieu Hodgkinson
  • Patent number: 10505672
    Abstract: A decoding apparatus and method. When all the code words in the to-be-decoded group meet that a checksum is 0, forward error correction (FEC) decoding is not performed, and only the sign bit decision is performed. That is, in a process of performing multiple times of decoding on each code word, FEC decoding is not always performed every time. This reduces power consumption required by FEC decoding.
    Type: Grant
    Filed: July 27, 2017
    Date of Patent: December 10, 2019
    Assignee: HUAWEI TECHNOLOGIES CO., LTD.
    Inventors: Zhiyu Xiao, Mo Li
  • Patent number: 10504513
    Abstract: A dock device connects participating devices such as a tablet device and an audio activated device, allowing them to operate as a single device. These participating devices may be associated with different accounts, each account being associated with particular “speechlets” or data processing functions. A natural language understanding (NLU) system uses NLU models to process text obtained from an automatic speech recognition (ASR) system to determine a set of possible intents. A second set of possible intents may then be generated that is limited to those possible intents that correspond to the speechlets associated with the docked device. The intents within the second set of possible intents are ranked, and the highest ranked intent may be deemed to be the intent of the user. Command data corresponding to the highest ranked intent may be generated and used to perform the action associated with that intent.
    Type: Grant
    Filed: September 26, 2017
    Date of Patent: December 10, 2019
    Assignee: AMAZON TECHNOLOGIES, INC.
    Inventors: Timothy Thomas Gray, Michal Grzegorz Kurpanik, Jenny Toi Wah Lam, Sarveshwar Nigam, Shirin Saleem, Jonhenry A. Righter, Jeremy Richard Hill, Kavya Ravikumar, Joe Virgil Fernandez, Kynan Dylan Antos, Kelly James Vanee
  • Patent number: 10484448
    Abstract: A method for buffer load management in a communication device includes storing in a first buffer of the communication device, multimedia data comprised in data packets, determining an indication of the input rate at that first buffer and adding the indication to a second buffer containing information on the input rate over time, performing an autocorrelation on a signal comprising said information on the input rate over time, finding peaks in the autocorrelation and identifying a peak in a period to perform for the peak, a crosscorrelation of the signal comprising the information on the input rate over time with a periodic signal with given phase, selecting a part of the information on the input rate stored in the second buffer, using a reference signal, determining a target latency for the first buffer, and applying the target latency to the first buffer.
    Type: Grant
    Filed: October 1, 2015
    Date of Patent: November 19, 2019
    Assignee: JACOTI BVBA
    Inventors: Jacques Kinsbergen, Nun Mendez, Nicolas Wack
  • Patent number: 10482872
    Abstract: A speech recognition apparatus according to an embodiment includes a microphone that acquires an audio stream in which speech vocalized by a person is recorded, a camera that acquires an image data in which at least a mouth of the person is captured, and an operation element that recognizes speech including a consonant vocalized by the person, based on the audio stream, estimates the consonant vocalized by the person, based on the shape of the mouth of the person in the image data, and specifies the consonant based on the estimated consonant and the speech-recognized consonant.
    Type: Grant
    Filed: November 28, 2017
    Date of Patent: November 19, 2019
    Assignee: Olympus Corporation
    Inventors: Hiroyuki Tokiwa, Kenta Yumoto, Osamu Nonaka
  • Patent number: 10453445
    Abstract: Disclosed herein is a GPU-accelerated speech recognition engine optimized for faster than real time speech recognition on a scalable server-client heterogeneous CPU-GPU architecture, which is specifically optimized to simultaneously decode multiple users in real-time. In order to efficiently support real-time speech recognition for multiple users, a “producer/consumer” design pattern is applied to decouple speech processes that run at different rates in order to handle multiple processes at the same time. Furthermore, the speech recognition process is divided into multiple consumers in order to maximize hardware utilization. As a result, the platform architecture is able to process more than 45 real-time audio streams with an average latency of less than 0.3 seconds using one-million-word vocabulary language models.
    Type: Grant
    Filed: February 16, 2017
    Date of Patent: October 22, 2019
    Assignee: CARNEGIE MELLON UNIVERSITY
    Inventors: Ian Richard Lane, Jungsuk Kim
  • Patent number: 10438596
    Abstract: An audio encoder for providing an encoded audio information on the basis of an input audio information has a bandwidth extension information provider configured to provide bandwidth extension information using a variable temporal resolution and a detector configured to detect an onset of a fricative or affricate. The audio encoder is configured to adjust a temporal resolution used by the bandwidth extension information provider such that bandwidth extension information is provided with an increased temporal resolution at least for a predetermined period of time before a time at which an onset of a fricative or affricate is detected and for a predetermined period of time following the time at which the onset of the fricative or affricate is detected. Alternatively or in addition, the bandwidth extension information is provided with an increased temporal resolution in response to a detection of an offset of a fricative or affricate. Audio encoders and methods use a corresponding concept.
    Type: Grant
    Filed: July 29, 2015
    Date of Patent: October 8, 2019
    Assignee: Fraunhofer-Gesellschaft zur Foerderung der angewandten Forschung e.V.
    Inventors: Sascha Disch, Christian Helmrich, Markus Multrus, Markus Schnell, Arthur Tritthart
  • Patent number: 10437929
    Abstract: Disclosed embodiments include systems and methods relevant to improvements to natural language processing used to determine an intent and one or more associated parameters from 5 a given input string. In an example, an input string is received and first and second different n-grams are applied to the input string. Recurrent neural network models are then used to generate output data based in part on the first and second different n-grams. In particular embodiments a recurrent neural network in both forward and backward directions specific to unigrams is applied. Intent detection and semantic labeling are applied to the output of the recurrent neural network models.
    Type: Grant
    Filed: March 31, 2017
    Date of Patent: October 8, 2019
    Assignee: MALUUBA INC.
    Inventors: Jing He, Jean Merheb-Harb, Zheng Ye, Kaheer Suleman
  • Patent number: 10430445
    Abstract: Methods and systems for indexing document passages are presented. In some embodiments, a computing device may identify a plurality of documents that comprise a plurality of passages. A passage index comprising a plurality of entries may be generated. Each entry may comprise keywords from a passage of the plurality of passages in one of the plurality of documents. Each entry may further comprise at least one annotation associated with the passage. A search query comprising at least one search keyword may be received. The passage index for each document of the plurality of documents may be analyzed using the at least one search keyword to identify at least one passage from the plurality of documents that matches the search query. In response to the query, the at least one passage may be presented.
    Type: Grant
    Filed: September 12, 2014
    Date of Patent: October 1, 2019
    Assignee: Nuance Communications, Inc.
    Inventors: Richard S. Crouch, Marisa F. Boston, Ali Erdem Ozcan, Peter R. Stubley
  • Patent number: 10418957
    Abstract: An audio event detection system that subsamples input audio data using a series of recurrent neural networks to create data of a coarser time scale than the audio data. Data frames corresponding to the coarser time scale may then be upsampled to data frames that match the finer time scale of the original audio data frames. The resulting data frames are then scored with a classifier to determine a likelihood that the individual frames correspond to an audio event. Each frame is then weighted by its score and a composite weighted frame is created by summing the weighted frames and dividing by the cumulative score. The composite weighted frame is then scored by the classifier. The resulting score is taken as an overall score indicating a likelihood that the input audio data includes an audio event.
    Type: Grant
    Filed: June 29, 2018
    Date of Patent: September 17, 2019
    Assignee: Amazon Technologies, Inc.
    Inventors: Weiran Wang, Chao Wang, Chieh-Chi Kao
  • Patent number: 10397749
    Abstract: A system for identifying audio signatures from user equipment based on a plurality of transmissions from a plurality of user equipment (UE). Each transmission of the plurality of transmissions comprises an audio signature, a geolocation of the audio signature, and a timestamp of the audio signature. The identification of audio signatures may be used to identify a geolocation of a UE or UEs based on geohashed areas. This identification may be further employed to determine whether the UE is in a location that permits data transmission, and to either transmit data to the UE based on a determination that the UE is in a location that permits data transmission, or not transmitting data the UE is associated with a DO NOT SEND state or if a location type associated with the geohashed area and the timestamp is associated with the DO NOT SEND state.
    Type: Grant
    Filed: November 9, 2017
    Date of Patent: August 27, 2019
    Assignee: Sprint Communications Company L.P.
    Inventors: Abhik Barua, Michael A. Gailloux, Vanessa Suwak
  • Patent number: 10373625
    Abstract: According to an aspect of the present invention an encoder for encoding an audio signal has an analyzer configured for deriving prediction coefficients and a residual signal from a frame of the audio signal. The encoder has a formant information calculator configured for calculating a speech related spectral shaping information from the prediction coefficients, a gain parameter calculator configured for calculating a gain parameter from an unvoiced residual signal and the spectral shaping information and a bitstream former configured for forming an output signal based on an information related to a voiced signal frame, the gain parameter or a quantized gain parameter and the prediction coefficients.
    Type: Grant
    Filed: April 18, 2016
    Date of Patent: August 6, 2019
    Assignee: Fraunhofer-Gesellschaft zur Foerderung der angewandten Forschung e.V.
    Inventors: Guillaume Fuchs, Markus Multrus, Emmanuel Ravelli, Markus Schnell