Patents by Inventor Francesco Nesta

Francesco Nesta has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Binary and multi-class classification systems and methods using connectionist temporal classification

Patent number: 10762891

Abstract: A classification training system for binary and multi-class classification comprises a neural network operable to perform classification of input data, a training dataset including pre-segmented, labeled training samples, and a classification training module operable to train the neural network using the training dataset. The classification training module includes a forward pass processing module, and a backward pass processing module. The backward pass processing module is operable to determine whether a current frame is in a region of target (ROT), determine ROT information such as beginning and length of the ROT and update weights and biases using a cross-entropy cost function and connectionist temporal classification cost function. The backward pass module further computes a soft target value using ROT information and computes a signal output error using the soft target value and network output value.

Type: Grant

Filed: February 12, 2018

Date of Patent: September 1, 2020

Assignee: SYNAPTICS INCORPORATED

Inventors: Saeed Mosayyebpour Kaskari, Trausti Thormundsson, Francesco Nesta
Connectionist temporal classification using segmented labeled sequence data

Patent number: 10762427

Abstract: Classification training systems and methods include a neural network for classification of input data, a training dataset providing segmented labeled training data, and a classification training module operable to train the neural network using the training data. A forward pass processing module is operable to generate neural network outputs for the training data using weights and bias for the neural network, and a backward pass processing module is operable to update the weights and biases in a backward pass, including obtaining Region of Target (ROT) information from the training data, generate a forward-backward masking based on the ROT information, the forward-backward masking placing at least one restriction on a neural network output path, compute modified forward and backward variables based on the neural network outputs and the forward-backward masking, and update the weights and biases.

Type: Grant

Filed: March 1, 2018

Date of Patent: September 1, 2020

Assignee: SYNAPTICS INCORPORATED

Inventors: Saeed Mosayyebpour Kaskari, Trausti Thormundsson, Francesco Nesta
Efficient connectionist temporal classification for binary classification

Patent number: 10762417

Abstract: A classification system and method for training a neural network includes receiving a stream of segmented, labeled training data having a sequence of frames, computing a stream of input features data for the sequence of frames, and generating neural network outputs for the sequence of frames in a forward pass through the training data and in accordance weights and biases. The weights and biases are updated in a backward pass through the training data, including determining Region of Target (ROT) information from the segmented, labeled training data, computing modified forward and backward variables based on the neural network outputs and the ROT information, deriving a signal error for each frame within the sequence of frames based on the modified forward and backward variables, and updating the weights and biases based on the derived signal error. An adaptive learning module is provided to improve a convergence rate of the neural network.

Type: Grant

Filed: February 12, 2018

Date of Patent: September 1, 2020

Assignee: SYNAPTICS INCORPORATED

Inventors: Saeed Mosayyebpour Kaskari, Trausti Thormundsson, Francesco Nesta
ADAPTIVE SPATIAL VAD AND TIME-FREQUENCY MASK ESTIMATION FOR HIGHLY NON-STATIONARY NOISE SOURCES

Publication number: 20200219530

Abstract: Systems and methods include a first voice activity detector operable to detect speech in a frame of a multichannel audio input signal and output a speech determination, a constrained minimum variance adaptive filter operable to receive the multichannel audio input signal and the speech determination and minimize a signal variance at the output of the filter, thereby producing an equalized target speech signal, a mask estimator operable to receive the equalized target speech signal and the speech determination and generate a spectral-temporal mask to discriminate a target speech from noise and interference speech, and a second activity voice detector operable to detect voice in a frame of the speech discriminated signal. An audio input sensor array including a plurality of microphones, each microphone generating a channel of the multichannel audio input signal. A sub-band analysis module operable to decompose each of the channels into a plurality of frequency sub-bands.

Type: Application

Filed: January 6, 2020

Publication date: July 9, 2020

Inventors: Francesco Nesta, Alireza Masnadi-Shirazi
MULTI-STREAM TARGET-SPEECH DETECTION AND CHANNEL FUSION

Publication number: 20200184985

Abstract: Audio processing systems and methods include an audio sensor array configured to receive a multichannel audio input and generate a corresponding multichannel audio signal and target-speech detection logic and an automatic speech recognition engine or VoIP application. An audio processing device includes a target speech enhancement engine configured to analyze a multichannel audio input signal and generate a plurality of enhanced target streams, a multi-stream target-speech detection generator comprising a plurality of target-speech detector engines each configured to determine a probability of detecting a specific target-speech of interest in the stream, wherein the multi-stream target-speech detection generator is configured to determine a plurality of weights associated with the enhanced target streams, and a fusion subsystem configured to apply the plurality of weights to the enhanced target streams to generate an enhancement output signal.

Type: Application

Filed: December 6, 2019

Publication date: June 11, 2020

Inventors: Francesco Nesta, Saeed Mosayyebpour Kaskari
Voice enhancement in audio signals through modified generalized eigenvalue beamformer

Patent number: 10679617

Abstract: A real-time audio signal processing system includes an audio signal processor configured to process audio signals using a modified generalized eigenvalue (GEV) beamforming technique to generate an enhanced target audio output signal. The digital signal processor includes a sub-band decomposition circuitry configured to decompose the audio signal into sub-band frames in the frequency domain and a target activity detector configured to detect whether a target audio is present in the sub-band frames. Based on information related to the sub-band frames and the determination of whether the target audio is present in the sub-band frames, the digital signal processor is configured to use the modified GEV technique to estimate the relative transfer function (RTF) of the target audio source, and generate a filter based on the estimated RTF. The filter may then be applied to the audio signals to generate the enhanced audio output signal.

Type: Grant

Filed: December 6, 2017

Date of Patent: June 9, 2020

Assignee: SYNAPTICS INCORPORATED

Inventors: Frederic Philippe Denis Mustiere, Francesco Nesta
ROBUST START-END POINT DETECTION ALGORITHM USING NEURAL NETWORK

Publication number: 20200126556

Abstract: An end detector configured to receive the feature data and detect an end point of a keyword, and a start detector configured to receive an indication of the detected end point and process the feature data associated with corresponding input frames to detect a start point of the keyword. The start detector and end detector comprise neural networks trained through a process using a cross-entropy cost function for non-Region of Target (ROT) frames and a One-Spike Connectionist Temporal Classification cost function for ROT frames.

Type: Application

Filed: December 20, 2019

Publication date: April 23, 2020

Inventors: Saeed Mosayyebpour, Francesco Nesta, Trausti Thormundsson
BINARY AND MULTI-CLASS CLASSIFICATION SYSTEMS AND METHODS USING ONE SPIKE CONNECTIONIST TEMPORAL CLASSIFICATION

Publication number: 20200125951

Abstract: A classification training system for binary and multi-class classification comprises a neural network operable to perform classification of input data, a training dataset including pre-segmented, labeled training samples, and a classification training module operable to train the neural network using the training dataset. The classification training module includes a forward pass processing module, and a backward pass processing module. The backward pass processing module is operable to determine whether a current frame is in a region of target (ROT), determine ROT information such as beginning and length of the ROT and update weights and biases using a cross-entropy cost function and One Spike Connectionist Temporal Classification (OSCTC) cost function. The backward pass module further computes a soft target value using ROT information and computes a signal output error using the soft target value and network output value.

Type: Application

Filed: December 20, 2019

Publication date: April 23, 2020

Inventors: Saeed Mosayyebpour, Trausti Thormundsson, Francesco Nesta
Two channel headset-based own voice enhancement

Patent number: 10614788

Abstract: Systems and methods for enhancing a headset user's own voice include an outside microphone, an inside microphone, audio input components operable to receive a plurality of time-domain microphone signals, including an outside microphone signal from the outside microphone and an inside microphone signal from the inside microphone, a subband decomposition module operable to transform the time-domain microphone signals to frequency domain subband signals, a voice activity detector operable to detect speech presence and absence in the subband signals, a speech extraction module operable to predict a clean speech signal in each of the inside microphone signal and the outside microphone signal, and cancel audio sources other than a headset user's own voice by combining the predicted clean speech signal from the inside microphone signal and the predicted clean speech signal from the outside microphone signal, and a postfiltering module operable to reduce residual noise.

Type: Grant

Filed: March 15, 2018

Date of Patent: April 7, 2020

Assignee: SYNAPTICS INCORPORATED

Inventors: Frederic Philippe Denis Mustiere, Francesco Nesta, Trausti Thormundsson
Voice activity detection systems and methods

Patent number: 10504539

Abstract: An audio processing device or method includes an audio transducer operable to receive audio input and generate an audio signal based on the audio input. The audio processing device or method also includes an audio signal processor operable to extract local features from the audio signal, such as Power-Normalized Coefficients (PNCC) of the audio signal. The audio signal processor also is operable to extract global features from the audio signal, such as chroma features and harmonicity features. A neural network is provided to determine a probability that a target audio is present in the audio signal based on the local and global features. In particular, the neural network is trained to output a value indicating whether the target audio is present and locally dominant in the audio signal.

Type: Grant

Filed: December 5, 2017

Date of Patent: December 10, 2019

Assignee: SYNAPTICS INCORPORATED

Inventors: Saeed Mosayyebpour Kaskari, Francesco Nesta
360-DEGREE MULTI-SOURCE LOCATION DETECTION, TRACKING AND ENHANCEMENT

Publication number: 20190355373

Abstract: Audio processing systems and methods comprise an audio sensor array configured to receive a multichannel audio input and generate a corresponding multichannel audio signal and a target activity detector configured to identify audio target sources in the multichannel audio signal. The target activity detector includes a VAD, an instantaneous locations component configured to detect a location of a plurality of audio sources, a dominant locations component configured to selectively buffer a subset of the plurality of audio sources comprising dominant audio sources, a source tracker configured to track locations of the dominant audio sources over time, and a dominance selection component configured to select the dominant target sources for further audio processing. The instantaneous location component computes a discrete spatial map comprising the location of the plurality of audio sources, and the dominant location component selects N of the dominant sources from the discrete spatial map for source tracking.

Type: Application

Filed: May 16, 2019

Publication date: November 21, 2019

Inventors: Francesco Nesta, Saeed Mosayyebpour Kaskari, Dror Givon
RECURRENT MULTIMODAL ATTENTION SYSTEM BASED ON EXPERT GATED NETWORKS

Publication number: 20190354797

Abstract: Systems and methods for multimodal classification include a plurality of expert modules, each expert module configured to receive data corresponding to one of a plurality of input modalities and extract associated features, a plurality of class prediction modules, each class prediction module configured to receive extracted features from a corresponding one of the expert modules and predict an associated class, a gate expert configured to receive the extracted features from the plurality of expert modules and output a set of weights for the input modalities, and a fusion module configured to generate a weighted prediction based on the class predictions and the set of weights. Various embodiments include one or more of an image expert, a video expert, an audio expert, class prediction modules, a gate expert, and a co-learning framework.

Type: Application

Filed: May 20, 2019

Publication date: November 21, 2019

Inventors: Francesco Nesta, Lijiang Guo, Minje Kim
Online dereverberation algorithm based on weighted prediction error for noisy time-varying environments

Patent number: 10446171

Abstract: Systems and methods for processing multichannel audio signals include receiving a multichannel time-domain audio input, transforming the input signal to plurality of multi-channel frequency domain, k-spaced under-sampled subband signals, buffering and delaying each channel, saving a subset of spectral frames for prediction filter estimation at each of the spectral frames, estimating a variance of the frequency domain signal at each of the spectral frames, adaptively estimating the prediction filter in an online manner using a recursive least squares (RLS) algorithm, linearly filtering each channel using the estimated prediction filter, nonlinearly filtering the linearly filtered output signal to reduce residual reverberation and the estimated variances, producing a nonlinearly filtered output signal, and synthesizing the nonlinearly filtered output signal to reconstruct a dereverberated time-domain multi-channel audio signal.

Type: Grant

Filed: December 22, 2017

Date of Patent: October 15, 2019

Assignee: SYNAPTICS INCORPORATED

Inventors: Saeed Mosayyebpour Kaskari, Francesco Nesta, Trausti Thormundsson
Semi-supervised system for multichannel source enhancement through configurable unsupervised adaptive transformations and supervised deep neural network

Patent number: 10347271

Abstract: Various techniques are provided to perform enhanced automatic speech recognition. For example, a subband analysis may be performed that transforms time-domain signals of multiple audio channels in subband signals. An adaptive configurable transformation may also be performed to produce single or multichannel-based features whose values are correlated to an Ideal Binary Mask (IBM). An unsupervised Gaussian Mixture Model (GMM) model fitting the distribution of the features and producing posterior probabilities may also be performed, and the posteriors may be combined to produce deep neural network (DNN) feature vectors. A DNN may be provided that predicts oracle spectral gains from the input feature vectors. Spectral processing may be performed to produce an estimate of the target source time-frequency magnitudes from the mixtures and the output of the DNN. Subband synthesis may be performed to transform signals back to time-domain.

Type: Grant

Filed: December 2, 2016

Date of Patent: July 9, 2019

Assignee: SYNAPTICS INCORPORATED

Inventors: Francesco Nesta, Xiangyuan Zhao, Trausti Thormundsson
VOICE ENHANCEMENT IN AUDIO SIGNALS THROUGH MODIFIED GENERALIZED EIGENVALUE BEAMFORMER

Publication number: 20190172450

Abstract: A real-time audio signal processing system includes an audio signal processor configured to process audio signals using a modified generalized eigenvalue (GEV) beamforming technique to generate an enhanced target audio output signal. The digital signal processor includes a sub-band decomposition circuitry configured to decompose the audio signal into sub-band frames in the frequency domain and a target activity detector configured to detect whether a target audio is present in the sub-band frames. Based on information related to the sub-band frames and the determination of whether the target audio is present in the sub-band frames, the digital signal processor is configured to use the modified GEV technique to estimate the relative transfer function (RTF) of the target audio source, and generate a filter based on the estimated RTF. The filter may then be applied to the audio signals to generate the enhanced audio output signal.

Type: Application

Filed: December 6, 2017

Publication date: June 6, 2019

Inventors: Frederic Philippe Denis Mustiere, Francesco Nesta
VOICE ACTIVITY DETECTION SYSTEMS AND METHODS

Publication number: 20190172480

Abstract: An audio processing device or method includes an audio transducer operable to receive audio input and generate an audio signal based on the audio input. The audio processing device or method also includes an audio signal processor operable to extract local features from the audio signal, such as Power-Normalized Coefficients (PNCC) of the audio signal. The audio signal processor also is operable to extract global features from the audio signal, such as chroma features and harmonicity features. A neural network is provided to determine a probability that a target audio is present in the audio signal based on the local and global features. In particular, the neural network is trained to output a value indicating whether the target audio is present and locally dominant in the audio signal.

Type: Application

Filed: December 5, 2017

Publication date: June 6, 2019

Inventors: Saeed Mosayyebpour Kaskari, Francesco Nesta
Selective audio source enhancement

Patent number: 10123113

Abstract: A selective audio source enhancement system includes a processor and a memory, and a pre-processing unit configured to receive audio data including a target audio signal, and to perform sub-band domain decomposition of the audio data to generate buffered outputs. In addition, the system includes a target source detection unit configured to receive the buffered outputs, and to generate a target presence probability corresponding to the target audio signal, as well as a spatial filter estimation unit configured to receive the target presence probability, and to transform frames buffered in each sub-band into a higher resolution frequency-domain. The system also includes a spectral filtering unit configured to retrieve a multichannel image of the target audio signal and noise signals associated with the target audio signal, and an audio synthesis unit configured to extract an enhanced mono signal corresponding to the target audio signal from the multichannel image.

Type: Grant

Filed: May 15, 2017

Date of Patent: November 6, 2018

Assignee: SYNAPTICS INCORPORATED

Inventors: Francesco Nesta, Trausti Thormundsson, Willie Wu
REAL-TIME SINGLE-CHANNEL SPEECH ENHANCEMENT IN NOISY AND TIME-VARYING ENVIRONMENTS

Publication number: 20180308503

Abstract: Systems and methods for processing an audio signal include an audio input operable to receive an input signal comprising a time-domain, single-channel audio signal, a subband analysis block operable to transform the input signal to a frequency domain input signal comprising a plurality of k-spaced under-sampled subband signals, a reverberation reduction block operable to reduce reverberation effect, including late reverberation, in the plurality of k-spaced under-sampled subband signals, a noise reduction block operable to reduce background noise from the plurality of k-spaced under-sampled subband signals, and a subband synthesis block operable to transform the subband signals to the time-domain, thereby producing an enhanced output signal.

Type: Application

Filed: April 19, 2018

Publication date: October 25, 2018

Inventors: Saeed Mosayyebpour Kaskari, Francesco Nesta, Trausti Thormundsson, Thomas Aaron Gulliver
TWO CHANNEL HEADSET-BASED OWN VOICE ENHANCEMENT

Publication number: 20180268798

Abstract: Systems and methods for enhancing a headset user's own voice include an outside microphone, an inside microphone, audio input components operable to receive a plurality of time-domain microphone signals, including an outside microphone signal from the outside microphone and an inside microphone signal from the inside microphone, a subband decomposition module operable to transform the time-domain microphone signals to frequency domain subband signals, a voice activity detector operable to detect speech presence and absence in the subband signals, a speech extraction module operable to predict a clean speech signal in each of the inside microphone signal and the outside microphone signal, and cancel audio sources other than a headset user's own voice by combining the predicted clean speech signal from the inside microphone signal and the predicted clean speech signal from the outside microphone signal, and a postfiltering module operable to reduce residual noise.

Type: Application

Filed: March 15, 2018

Publication date: September 20, 2018

Inventors: Frederic Philippe Denis Mustiere, Francesco Nesta, Trausti Thormundsson
CONNECTIONIST TEMPORAL CLASSIFICATION USING SEGMENTED LABELED SEQUENCE DATA

Publication number: 20180253648

Abstract: Classification training systems and methods include a neural network for classification of input data, a training dataset providing segmented labeled training data, and a classification training module operable to train the neural network using the training data. A forward pass processing module is operable to generate neural network outputs for the training data using weights and bias for the neural network, and a backward pass processing module is operable to update the weights and biases in a backward pass, including obtaining Region of Target (ROT) information from the training data, generate a forward-backward masking based on the ROT information, the forward-backward masking placing at least one restriction on a neural network output path, compute modified forward and backward variables based on the neural network outputs and the forward-backward masking, and update the weights and biases.

Type: Application

Filed: March 1, 2018

Publication date: September 6, 2018

Inventors: Saeed Mosayyebpour Kaskari, Trausti Thormundsson, Francesco Nesta

prev 1 2 3 next