Patents Examined by Fariba Sirjani
-
Patent number: 11978444Abstract: A method, system and apparatus to generate an augmented voice command, including identifying a plurality of sounds from a respective plurality of transducers to a smart speaker device, generating a visualization of the sounds using an augmented reality device, wherein one or more of the sounds can be selected using the visualization, and generating the augmented voice command for the smart speaker device, wherein the augmented voice command comprises the one or more sounds selected using the visualization of the augmented reality device.Type: GrantFiled: November 24, 2020Date of Patent: May 7, 2024Assignee: International Business Machines CorporationInventors: Clement Decrop, Tushar Agrawal, Jeremy R. Fox, Sarbajit K Rakshit
-
Patent number: 11967340Abstract: Disclosed is a method for detecting a voice from audio data, performed by a computing device according to an exemplary embodiment of the present disclosure. The method includes obtaining audio data; generating image data based on a spectrum of the obtained audio data; analyzing the generated image data by utilizing a pre-trained neural network model; and determining whether an automated response system (ARS) voice is included in the audio data, based on the analysis of the image data.Type: GrantFiled: June 23, 2023Date of Patent: April 23, 2024Assignee: ActionPower Corp.Inventors: Subong Choi, Dongchan Shin, Jihwa Lee
-
Patent number: 11929087Abstract: Systems and methods for amplifying and/or attenuating audio signals are disclosed. In one implementation, a system for selectively amplifying audio signals may include at least one microphone for capturing sounds from an environment of the user and a processor. The processor may be programmed to receive an audio signal representative of sounds captured by the at least one microphone; determine whether the audio signal comprises speech by a user of the system; subject to the audio signal comprising speech by the user, modify the audio signal by attenuating a first part of the audio signal comprising the speech by the user; subject to the audio signal comprising audio other than speech by the user, modify the audio signal by amplifying a second part of the audio signal comprising audio other than the speech by the user; and transmit the modified audio signal to a hearing interface device.Type: GrantFiled: September 16, 2021Date of Patent: March 12, 2024Assignee: ORCAM TECHNOLOGIES LTD.Inventors: Tal Rosenwein, Roi Nathan, Ronen Katsir, Yonatan Wexler, Amnon Shashua
-
Patent number: 11894012Abstract: Disclosed are methods, systems, device, and other implementations, including a method that includes receiving an audio signal representation, detecting in the received audio signal representation, using a first learning model, one or more silent intervals with reduced foreground sound levels, determining based on the detected one or more silent intervals an estimated full noise profile corresponding to the audio signal representation, and generating with a second learning model, based on the received audio signal representation and on the determined estimated full noise profile, a resultant audio signal representation with a reduced noise level.Type: GrantFiled: May 19, 2023Date of Patent: February 6, 2024Assignees: SoftBank Corp.Inventors: Changxi Zheng, Ruilin Xu, Rundi Wu, Carl Vondrick, Yuko Ishiwaka
-
Patent number: 11881221Abstract: Systems and methods are disclosed. A digitized human vocal expression of a user and digital images are received over a network from a remote device. The digitized human vocal expression is processed to determine characteristics of the human vocal expression, including: pitch, volume, rapidity, a magnitude spectrum identify, and/or pauses in speech. Digital images are received and processed to detect characteristics of the user face, including detecting if any of the following is present: a sagging lip, a crooked smile, uneven eyebrows, and/or facial droop. Using the human vocal expression characteristics and face characteristics, a determination is made as to what action is to be taken. A cepstrum pitch may be determined using an inverse Fourier transform of a logarithm of a spectrum of a human vocal expression signal. The volume may be determined using peak heights in a power spectrum of the human vocal expression.Type: GrantFiled: June 30, 2022Date of Patent: January 23, 2024Assignee: The Notebook, LLCInventor: Karen Elaine Khaleghi
-
Patent number: 11875817Abstract: A technique capable of detecting harmful behavior such as power harassment, sexual harassment, or bullying in work environment to support handling is provided. A harmful behavior detecting system includes a computer that executes observation and detection regarding harmful behavior including power harassment, sexual harassment, and bullying among people in work environment. The computer obtains voice data into which voice around a target person is inputted; obtains voice information containing words and emotion information from the voice data; and obtains data such as vital data, date and time, or a location of the target person. The computer uses five elements including words and an emotion of the other person, words, an emotion, and vital data of the target person to calculate an index value regarding the harmful behavior; estimate a state of the harmful behavior based on the index value; and output handling data for handling the harmful behavior in accordance with the estimated state.Type: GrantFiled: November 12, 2019Date of Patent: January 16, 2024Assignee: Hitachi Systems Ltd.Inventors: Satoshi Iwagaki, Atsushi Shimada, Masumi Suehiro, Hidenori Chiba, Kouichi Horiuchi
-
Patent number: 11875813Abstract: Disclosed are methods, systems, device, and other implementations, including a method (performed by, for example, a hearing aid device) that includes obtaining a combined sound signal for signals combined from multiple sound sources in an area in which a person is located, and obtaining neural signals for the person, with the neural signals being indicative of one or more target sound sources, from the multiple sound sources, the person is attentive to. The method further includes determining a separation filter based, at least in part, on the neural signals obtained for the person, and applying the separation filter to a representation of the combined sound signal to derive a resultant separated signal representation associated with sound from the one or more target sound sources the person is attentive to.Type: GrantFiled: March 31, 2023Date of Patent: January 16, 2024Assignee: The Trustees of Columbia University in the City of New YorkInventors: Nima Mesgarani, Enea Ceolini, Cong Han
-
Patent number: 11869514Abstract: An apparatus for decoding an audio signal includes a receiving interface, wherein the receiving interface is configured to receive a first frame and a second frame. Moreover, the apparatus includes a noise level tracing unit for determining noise level information being represented in a tracing domain. Furthermore, the apparatus includes a first reconstruction unit for reconstructing a third audio signal portion of the audio signal depending on the noise level information and a second reconstruction unit for reconstructing a fourth audio signal portion depending on noise level information being represented in the second reconstruction domain.Type: GrantFiled: April 15, 2020Date of Patent: January 9, 2024Assignee: Fraunhofer-Gesellschaft zur Foerderung der angewandten Forschung e.V.Inventors: Michael Schnabel, Goran Markovic, Ralph Sperschneider, Jérémie Lecomte, Christian Helmrich
-
Patent number: 11861305Abstract: Provided is a word processing system which includes: a first generation unit which generates, based on sentence information including a plurality of sentences, hierarchy data indicating syntax tree data for each hierarchy with regard to each sentence; a second generation unit which acquires, from a plurality of hierarchy data generated by the first generation unit, hierarchy data of a second sentence similar to hierarchy data of a first sentence generated by the first generation unit, extracts a difference between the hierarchy data of the first sentence and the hierarchy data of the second sentence, and generates, as paraphrasing rule data, first expression data as a difference in the first sentence and second expression data as a difference in the second sentence; and a storage unit which stores the paraphrasing rule data generated by the second generation unit in a storage unit.Type: GrantFiled: December 7, 2020Date of Patent: January 2, 2024Assignee: HITACHI, LTD.Inventors: Yudai Kato, Noriko Takaya, Takahiro Hamada, Junya Sawazaki
-
Patent number: 11862178Abstract: An electronic device and method are provided. The method includes identifying a speech section of a user and a speech section of a neighbor in a received audio signal, identifying a user utterance in the speech section of the user and a neighbor answer to the user utterance in the speech section of the neighbor, obtaining preference information associated with the user utterance, giving a first reliability to the neighbor answer and a second reliability to an agent answer of an artificial intelligence agent generated in response to the user utterance, based on the preference information, not responding to the user utterance when the second reliability is lower than the first reliability, and outputting the agent answer when the second reliability is equal to or higher than the first reliability.Type: GrantFiled: January 10, 2022Date of Patent: January 2, 2024Assignee: Samsung Electronics Co., Ltd.Inventors: Hoseon Shin, Chulmin Lee
-
Patent number: 11862141Abstract: The present technology relates to a signal processing device, a signal processing method, and a program that allow for easier sound source separation. The signal processing device includes a sound source separation unit that recursively performs sound source separation on an input acoustic signal by using a predetermined sound source separation model learned in advance to separate a predetermined sound source from an acoustic signal for learning including the predetermined sound source. The present technology can be applied to a signal processing device.Type: GrantFiled: March 13, 2020Date of Patent: January 2, 2024Assignee: SONY GROUP CORPORATIONInventor: Naoya Takahashi
-
Patent number: 11848025Abstract: In a method for efficiently and accurately measuring the intelligibility of speech, a user may utter a sample text, and an automatic speech assessment (ASA) system may receive an acoustic signal encoding the utterance. An automatic speech recognition (ASR) module may generate an N-best output corresponding to the utterance and generate an intelligibility score representing the intelligibility of the utterance based on the N-best output and the sample text. Generating the intelligibility score may involve (1) calculating conditional intelligibility value(s) for the N recognition result(s), and (2) determining the intelligibility score based on the conditional intelligibility value of the most intelligible recognition result. Optionally, the process of generating the intelligibility score may involve adjusting the intelligibility score to account for environmental information (e.g., a pronunciation score for the user's speech and/or a confidence score assigned to the 1-best recognition result).Type: GrantFiled: January 15, 2021Date of Patent: December 19, 2023Assignee: ELSA, Corp.Inventors: Jorge Daniel Leonardo Proença, Xavier Anguera Miro, Ganna Raboshchuk, Ângela Maria Pereira da Costa
-
Patent number: 11837219Abstract: In several aspects for creating a minute of a teleconference from a record thereof, a processor classifies portions of the record as relevant or non-relevant according to corresponding relevance indicators with respect to a topic of the teleconference. A processor removes the non-relevant portions from the record. A processor classifies pairs of relevant portions as similar or non-similar according to corresponding similarity indicators. A processor removes one of the relevant portions of each similar pair of relevant portions from the minute.Type: GrantFiled: November 18, 2021Date of Patent: December 5, 2023Assignee: International Business Machines CorporationInventors: Damiano Bassani, Alfonso D'Aniello, Andrea Tortosa, Roberto Giordani, Michela Melfa
-
Patent number: 11830481Abstract: Methods are performed by one or more processing devices for correcting prosody in audio data. A method includes operations for accessing subject audio data in an audio edit region of the audio data. The subject audio data in the audio edit region potentially lacks prosodic continuity with unedited audio data in an unedited audio portion of the audio data. The operations further include predicting, based on a context of the unedited audio data, phoneme durations including a respective phoneme duration of each phoneme in the unedited audio data. The operations further include predicting, based on the context of the unedited audio data, a pitch contour comprising at least one respective pitch value of each phoneme in the unedited audio data. Additionally, the operations include correcting prosody of the subject audio data in the audio edit region by applying the phoneme durations and the pitch contour to the subject audio data.Type: GrantFiled: November 30, 2021Date of Patent: November 28, 2023Assignee: Adobe Inc.Inventors: Maxwell Morrison, Zeyu Jin, Nicholas Bryan, Juan Pablo Caceres Chomali, Lucas Rencker
-
Patent number: 11817117Abstract: In various examples, end of speech (EOS) for an audio signal is determined based at least in part on a rate of speech for a speaker. For a segment of the audio signal, EOS is indicated based at least in part on an EOS threshold determined based at least in part on the rate of speech for the speaker.Type: GrantFiled: January 29, 2021Date of Patent: November 14, 2023Assignee: NVIDIA CORPORATIONInventors: Utkarsh Vaidya, Ravindra Yeshwant Lokhande, Viraj Gangadhar Karandikar, Niranjan Rajendra Wartikar, Sumit Kumar Bhattacharya
-
Patent number: 11805378Abstract: Systems, apparatuses, and methods are described for a privacy blocking device configured to prevent receipt, by a listening device, of video and/or audio data until a trigger occurs. A blocker may be configured to prevent receipt of video and/or audio data by one or more microphones and/or one or more cameras of a listening device. The blocker may use the one or more microphones, the one or more cameras, and/or one or more second microphones and/or one or more second cameras to monitor for a trigger. The blocker may process the data. Upon detecting the trigger, the blocker may transmit data to the listening device. For example, the blocker may transmit all or a part of a spoken phrase to the listening device.Type: GrantFiled: October 29, 2020Date of Patent: October 31, 2023Inventor: Thomas Stachura
-
Patent number: 11769487Abstract: A voice topic spotting system includes a learning module and a voice topic classifier module. The learning module receives training audio segments with topic labels and generates a fast keyword filter model based on a set of topic-indicative words and generates a topic identification model based on a training set of topic keyword-containing lattices. The voice topic classifier module includes an automatic speech recognition engine arranged to identify one or more keywords included in a received audio segment and output the one or more keywords. A fast keyword filter, implements the fast keyword model to output the received audio segment if a topic-indicative word is detected in the audio segment. A decoder generates a topic keyword-containing lattice associated with the audio segment. A voice topic classifier implements the voice topic identification model to determine a topic associated with received audio segment.Type: GrantFiled: March 16, 2021Date of Patent: September 26, 2023Assignee: RAYTHEON APPLIED SIGNAL TECHNOLOGY, INC.Inventor: Jonathan C. Wintrode
-
Patent number: 11769493Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training acoustic models and using the trained acoustic models. A connectionist temporal classification (CTC) acoustic model is accessed, the CTC acoustic model having been trained using a context-dependent state inventory generated from approximate phonetic alignments determined by another CTC acoustic model trained without fixed alignment targets. Audio data for a portion of an utterance is received. Input data corresponding to the received audio data is provided to the accessed CTC acoustic model. Data indicating a transcription for the utterance is generated based on output that the accessed CTC acoustic model produced in response to the input data. The data indicating the transcription is provided as output of an automated speech recognition service.Type: GrantFiled: May 3, 2022Date of Patent: September 26, 2023Assignee: Google LLCInventors: Kanury Kanishka Rao, Andrew W. Senior, Hasim Sak
-
Patent number: 11763833Abstract: Disclosed are a method, a device, and a computer-readable storage medium for reducing crosstalk when performing automatic speech translation between at least two users speaking different languages. The method for reducing crosstalk includes receiving a signal inputted to an out-ear microphone of a first user, wherein the first user is wearing a headset equipped with an in-ear microphone and the out-ear microphone and the signal includes a voice signal A of the first user and a voice signal b of a second user, receiving a voice signal Binear inputted to an in-ear microphone of the second user, wherein the second user is wearing a headset equipped with the in-ear microphone and an out-ear microphone, and removing the voice signal b of the second user from the signal A+b inputted to the out-ear microphone of the first user, based on the voice signal Binear inputted to the in-ear microphone of the second user.Type: GrantFiled: October 31, 2019Date of Patent: September 19, 2023Inventor: Jung Keun Kim
-
Patent number: 11763836Abstract: Disclosed is a hierarchical generated audio detection system, comprising an audio preprocessing module, a CQCC feature extraction module, a LFCC feature extraction module, a first-stage lightweight coarse-level detection model and a second-stage fine-level deep identification model; the audio preprocessing module preprocesses collected audio or video data to obtain an audio clip with a length not exceeding the limit; inputting the audio clip into CQCC feature extraction module and LFCC feature extraction module respectively to obtain CQCC feature and LFCC feature; inputting CQCC feature or LFCC feature into the first-stage lightweight coarse-level detection model for first-stage screening to screen out the first-stage real audio and the first-stage generated audio; inputting the CQCC feature or LFCC feature of the first-stage generated audio into the second-stage fine-level deep identification model to identify the second-stage real audio and the second-stage generated audio, and the second-stage generated auType: GrantFiled: February 17, 2022Date of Patent: September 19, 2023Assignee: INSTITUTE OF AUTOMATION, CHINESE ACADEMY OF SCIENCESInventors: Jianhua Tao, Zhengkun Tian, Jiangyan Yi