Patents by Inventor Marc Delcroix

Marc Delcroix has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20240129666
    Abstract: An estimation apparatus 10 is a signal processing apparatus for processing an acoustic signal and estimates an observation signal of a virtual microphone arranged virtually from an input observation signal of a real microphone using a deep learning model having a neural network (NN) 11.
    Type: Application
    Filed: January 29, 2021
    Publication date: April 18, 2024
    Applicant: NIPPON TELEGRAPH AND TELEPHONE CORPORATION
    Inventors: Tsubasa OCHIAI, Marc DELCROIX, Tomohiro NAKATANI, Rintaro IKESHITA, Keisuke KINOSHITA, Shoko ARAKI
  • Publication number: 20240062771
    Abstract: A learning device includes a conversion unit, a combination unit, an extraction unit, and an update unit. The conversion unit converts a mixed sound, of which sound sources for each component are known, into embedding vectors for each sound source using an embedding neural network. The combination unit combines the embedding vectors using a combination neural network to obtain a combined vector. The extraction unit extracts a target sound from the mixed sound and the combined vector using an extraction neural network. The update unit updates parameters of the embedding neural network such that a loss function calculated based on information regarding the sound sources for each component of the mixed sound and the target sound extracted by the extraction unit is optimized.
    Type: Application
    Filed: January 5, 2021
    Publication date: February 22, 2024
    Applicant: NIPPON TELEGRAPH AND TELEPHONE CORPORATION
    Inventors: Marc DELCROIX, Tsubasa OCHIAI, Tomohiro NAKATANI, Keisuke KINOSHITA
  • Publication number: 20240038254
    Abstract: A signal processing device includes processing circuitry configured to receive an input of extraction target information indicating which audio class of an audio signal is to be extracted from a mixture audio signal constituted by a mixture of audio signals of a plurality of audio classes, and output a result of extracting the audio signal of the audio class indicated by the extraction target information from the mixture audio signal, with a neural network by using a feature value of the mixture audio signal and the extraction target information.
    Type: Application
    Filed: August 13, 2020
    Publication date: February 1, 2024
    Applicant: NIPPON TELEGRAPH AND TELEPHONE CORPORATION
    Inventors: Tsubasa OCHIAI, Marc DELCROIX, Yuma KOIZUMI, Hiroaki ITO, Keisuke KINOSHITA, Shoko ARAKI
  • Publication number: 20240005104
    Abstract: A data processing device includes processing circuitry configured to extract a second word corresponding to a first word included in first text from among a plurality of words belonging to a predetermined domain, repeat processing in the extraction for all words included in the first text to generate a confusion network that expresses a plurality of sentence possibilities with one network configuration and is an expression format of a word sequence, and search for a grammatically correct word string in the confusion network using a language model that evaluates grammatical correctness of the word string, and select a word string to be output.
    Type: Application
    Filed: October 7, 2020
    Publication date: January 4, 2024
    Applicant: NIPPON TELEGRAPH AND TELEPHONE CORPORATION
    Inventors: Atsunori OGAWA, Naohiro TAWARA, Marc DELCROIX
  • Patent number: 11837222
    Abstract: A determination device includes a memory, and processing circuitry coupled to the memory and configured to accept input of a plurality of sequences provided as candidates for a solution to one given input, and determine, for two sequences of the plurality of sequences, a sequence that has a higher accuracy than the other sequence of the two sequences, using a model expressed as a neural network.
    Type: Grant
    Filed: February 1, 2019
    Date of Patent: December 5, 2023
    Assignee: NIPPON TELEGRAPH AND TELEPHONE CORPORATION
    Inventors: Atsunori Ogawa, Marc Delcroix, Shigeki Karita, Tomohiro Nakatani
  • Patent number: 11763834
    Abstract: Features are extracted from an observed speech signal including at least speech of multiple speakers including a target speaker. A mask is calculated for extracting speech of the target speaker based on the features of the observed speech signal and a speech signal of the target speaker serving as adaptation data of the target speaker. The signal of the speech of the target speaker is calculated from the observed speech signal based on the mask. Speech of the target speaker can be extracted from observed speech that includes speech of multiple speakers.
    Type: Grant
    Filed: July 18, 2018
    Date of Patent: September 19, 2023
    Assignee: NIPPON TELEGRAPH AND TELEPHONE CORPORATION
    Inventors: Marc Delcroix, Keisuke Kinoshita, Atsunori Ogawa, Takuya Higuchi, Tomohiro Nakatani
  • Publication number: 20230239616
    Abstract: Provided is a target sound extraction technique based on a steering vector generation method enabling instability in a calculation to be prevented when a neural network is trained by using an error back propagation method to reduce an estimation error of a beamformer. A target sound signal generation apparatus generates a target sound signal yt,f corresponding to a target sound included in an observed sound from an observed signal vector xt,f corresponding to the observed sound collected by using a plurality of microphones. The target sound signal generation apparatus includes a mask generation unit, a steering vector generation unit, a beamformer vector generation unit, and a target sound signal generation unit. The mask generation unit is configured as a neural network trained by using an error back propagation method.
    Type: Application
    Filed: June 19, 2020
    Publication date: July 27, 2023
    Applicant: NIPPON TELEGRAPH AND TELEPHONE CORPORATION
    Inventors: Tomohiro NAKATANI, Keisuke KINOSHITA, Marc DELCROIX
  • Patent number: 11676619
    Abstract: A time-variant noise spatial covariance matrix is estimated effectively. Using time-frequency-divided observation signals based on observation signals acquired by collecting acoustic signals emitted from one or a plurality of sound sources and mask information expressing the occupancy probability of a component of each of the time-frequency-divided observation signals that corresponds to each noise source, a time-independent first noise spatial covariance matrix corresponding to the time-frequency-divided observation signals and the mask information belonging to a long time interval is acquired for each noise source. Further, using the mask information of each of a plurality of different short time intervals, a mixture weight corresponding to each noise source in each short time interval is acquired.
    Type: Grant
    Filed: February 28, 2020
    Date of Patent: June 13, 2023
    Assignee: NIPPON TELEGRAPH AND TELEPHONE CORPORATION
    Inventors: Tomohiro Nakatani, Marc Delcroix, Keisuke Kinoshita, Shoko Araki, Yuki Kubo
  • Publication number: 20230067132
    Abstract: A signal processing apparatus includes a neural network (“NN”), a sorting unit, and a spatial covariance matrix calculation unit. The NN converts a mixed signal, in which sounds of a plurality of sound sources input by a plurality of channels are mixed, into a separated signal separated into a signal for each sound source as a signal in a time domain as it is and outputs the separated signal. The sorting unit sorts, for the separated signal of each channel output from the NN, the separated signal of each channel such that the plurality of sound sources of a plurality of the separated signals are aligned among the plurality of channels. The spatial covariance matrix calculation unit calculates a spatial covariance matrix corresponding to each sound source in accordance with the separated signal for each channel output from the sorting unit and sorted.
    Type: Application
    Filed: February 14, 2020
    Publication date: March 2, 2023
    Applicant: NIPPON TELEGRAPH AND TELEPHONE CORPORATION
    Inventors: Tsubasa OCHIAI, Marc DELCROIX, Rintaro IKESHITA, Keisuke KINOSHITA, Tomohiro NAKATANI, Shoko ARAKI
  • Publication number: 20230032372
    Abstract: The extraction unit 132 extracts a second word corresponding to a first word included in a first text from among a plurality of words belonging to a predetermined domain. The determination unit 133 determines whether a predetermined condition for the word class of the first word is satisfied or not. When it is determined by the determination unit 133 that the condition is satisfied, the generation unit 134 generates a second text in which the first word of the first text is exchanged with the second word.
    Type: Application
    Filed: January 22, 2020
    Publication date: February 2, 2023
    Applicant: NIPPON TELEGRAPH AND TELEPHONE CORPORATION
    Inventors: Atsunori OGAWA, Naohiro TAWARA, Shigeki KARITA, Marc DELCROIX
  • Patent number: 11551667
    Abstract: A learning device (10) includes a feature extracting unit (11) that extracts features of speech from speech data for training, a probability calculating unit (12) that, on the basis of the features of speech, performs prefix searching using a speech recognition model of which a neural network is representative, and calculates a posterior probability of a recognition character string to obtain a plurality of hypothetical character strings, an error calculating unit (13) that calculates an error by word error rates of the plurality of hypothetical character strings and a correct character string for training, and obtains a parameter for the entire model that minimizes an expected value of summation of loss in the word error rates, and an updating unit (14) that updates a parameter of the model in accordance with the parameter obtained by the error calculating unit (13).
    Type: Grant
    Filed: February 1, 2019
    Date of Patent: January 10, 2023
    Assignee: NIPPON TELEGRAPH AND TELEPHONE CORPORATION
    Inventors: Shigeki Karita, Atsunori Ogawa, Marc Delcroix, Tomohiro Nakatani
  • Publication number: 20220335965
    Abstract: An audio signal processing apparatus (10) includes a first auxiliary feature conversion unit (12) and a second auxiliary feature conversion unit (13) that convert a plurality of signals relating to processing of an audio signal of a target speaker into a plurality of auxiliary features for the plurality of signals using a plurality of auxiliary neural networks corresponding to the plurality of signals, and an audio signal processing unit (11) that estimates information regarding an audio signal of the target speaker included in a mixed audio signal using a main neural network based on an input feature of the mixed audio signal and the plurality of auxiliary features, wherein the plurality of signals relating to processing of the audio signal of the target speaker are two or more pieces of information of different modalities.
    Type: Application
    Filed: August 7, 2020
    Publication date: October 20, 2022
    Applicant: NIPPON TELEGRAPH AND TELEPHONE CORPORATION
    Inventors: Hiroshi SATO, Tsubasa OCHIAI, Keisuke KINOSHITA, Marc DELCROIX, Tomohiro NAKATANI, Atsunori OGAWA
  • Patent number: 11456003
    Abstract: An estimation device includes a memory, and processing circuitry coupled to the memory and configured to receive an input of an input audio signal that is an audio signal in which sounds from a plurality of sound sources are mixed, and an input of supplemental information, and output an estimation result of mask information that identifies a mask for extracting a sound of any one of the sound sources included in an entire or a part of a signal included in the input audio signal, the signal being identified by the supplemental information, cause a neural network to iterate a process of outputting the estimation result of the mask information, and cause the neural network to output an estimation result of the mask information for a different sound source, by inputting a different piece of the supplemental information to the neural network at each iteration.
    Type: Grant
    Filed: January 29, 2019
    Date of Patent: September 27, 2022
    Assignee: NIPPON TELEGRAPH AND TELEPHONE CORPORATION
    Inventors: Keisuke Kinoshita, Marc Delcroix, Tomohiro Nakatani, Shoko Araki, Lukas Drude, Thilo Christoph Von Neumann
  • Publication number: 20220262356
    Abstract: A reranking device include a hypothesis input unit configured to receive input of N-best hypotheses associated with scores of a speech recognition accuracy; a hypothesis selection unit configured to select two hypotheses to be determined from among the input N-best hypotheses. Further, there is a determination unit configured to determine which accuracy of two hypotheses is higher by using: a plurality of first auxiliary model to M-th auxiliary model represented by such a neural network as to be capable of converting, when the selected two hypotheses are given, the two hypotheses into hidden state vectors, and determining which of the two hypotheses is higher based on the hidden state vectors of the two hypotheses; and a main model represented by such a neural network as to be capable of determining which of the two hypotheses is higher based on the hidden state vectors of the two hypotheses.
    Type: Application
    Filed: August 8, 2019
    Publication date: August 18, 2022
    Applicant: NIPPON TELEGRAPH AND TELEPHONE CORPORATION
    Inventors: Atsunori OGAWA, Marc DELCROIX, Shigeki KARITA, Tomohiro NAKATANI
  • Publication number: 20220130406
    Abstract: A time-variant noise spatial covariance matrix is estimated effectively. Using time-frequency-divided observation signals based on observation signals acquired by collecting acoustic signals emitted from one or a plurality of sound sources and mask information expressing the occupancy probability of a component of each of the time-frequency-divided observation signals that corresponds to each noise source, a time-independent first noise spatial covariance matrix corresponding to the time-frequency-divided observation signals and the mask information belonging to a long time interval is acquired for each noise source. Further, using the mask information of each of a plurality of different short time intervals, a mixture weight corresponding to each noise source in each short time interval is acquired.
    Type: Application
    Filed: February 28, 2020
    Publication date: April 28, 2022
    Applicant: NIPPON TELEGRAPH AND TELEPHONE CORPORATION
    Inventors: Tomohiro NAKATANI, Marc DELCROIX, Keisuke KINOSHITA, Shoko ARAKI, Yuki KUBO
  • Patent number: 11304000
    Abstract: A signal processing device includes a power estimating unit that treats the feature quantity of a signal including reverberation as the input; inputs an observation feature quantity corresponding to an observation signal to a neural network which is learnt in such a way that the estimate value of the feature quantity corresponding to the power of the signal having reduced reverberation, from among the input signal, is output; and estimates the estimate value of the feature quantity corresponding to the power of the signal having reduced reverberation and corresponding to the observation signal. Moreover, the signal processing device includes a regression coefficient estimating unit that uses the estimate value of the feature quantity corresponding to the power as obtained as the estimation result by the power estimating unit, and estimates a regression coefficient of the autoregressive process for generating the observation signal.
    Type: Grant
    Filed: August 1, 2018
    Date of Patent: April 12, 2022
    Assignee: NIPPON TELEGRAPH AND TELEPHONE CORPORATION
    Inventors: Keisuke Kinoshita, Tomohiro Nakatani, Marc Delcroix
  • Publication number: 20220076690
    Abstract: A signal processing device according to an embodiment of the present invention includes: a conversion unit configured to convert an input mixed acoustic signal into a plurality of first internal states, a weighting unit configured to generate a second internal state which is a weighted sum of the plurality of first internal states based on auxiliary information regarding an acoustic signal of a target sound source when the auxiliary information is input, and generate the second internal state by selecting one of the plurality of first internal states when the auxiliary information is not input, and a mask estimation unit configured to estimate a mask based on the second internal state.
    Type: Application
    Filed: February 12, 2020
    Publication date: March 10, 2022
    Applicant: NIPPON TELEGRAPH AND TELEPHONE CORPORATION
    Inventors: Tsubasa OCHIAI, Marc DELCROIX, Keisuke KINOSHITA, Atsunori OGAWA, Tomohiro NAKATANI
  • Patent number: 11264044
    Abstract: To begin with, an acoustic model training apparatus extracts speech features representing speech characteristics, and calculates an acoustic-condition feature representing a feature of an acoustic condition of the speech data using an acoustic-condition calculation model that is represented as a neural network, based on an acoustic-condition calculation model parameter characterizing the acoustic-condition calculation model. The acoustic model training apparatus then generates an adjusted parameter that is an acoustic model parameter adjusted based on the acoustic-condition feature, the acoustic model parameter characterizing an acoustic model represented as a neural network to which an output layer of the acoustic-condition calculation model is coupled. The acoustic model training apparatus then updates the acoustic model parameter based on the adjusted parameter and the speech features, and updates the acoustic-condition calculation model parameters based on the adjusted parameter and the speech features.
    Type: Grant
    Filed: January 26, 2017
    Date of Patent: March 1, 2022
    Assignee: NIPPON TELEGRAPH AND TELEPHONE CORPORATION
    Inventors: Marc Delcroix, Keisuke Kinoshita, Atsunori Ogawa, Takuya Yoshioka, Tomohiro Nakatani
  • Publication number: 20210400383
    Abstract: A signal processing device includes a power estimating unit that treats the feature quantity of a signal including reverberation as the input; inputs an observation feature quantity corresponding to an observation signal to a neural network which is learnt in such a way that the estimate value of the feature quantity corresponding to the power of the signal having reduced reverberation, from among the input signal, is output; and estimates the estimate value of the feature quantity corresponding to the power of the signal having reduced reverberation and corresponding to the observation signal. Moreover, the signal processing device includes a regression coefficient estimating unit that uses the estimate value of the feature quantity corresponding to the power as obtained as the estimation result by the power estimating unit, and estimates a regression coefficient of the autoregressive process for generating the observation signal.
    Type: Application
    Filed: August 1, 2018
    Publication date: December 23, 2021
    Applicant: NIPPON TELEGRAPH AND TELEPHONE CORPORATION
    Inventors: Keisuke KINOSHITA, Tomohiro NAKATANI, Marc DELCROIX
  • Publication number: 20210366502
    Abstract: An estimation device includes a memory, and processing circuitry coupled to the memory and configured to receive an input of an input audio signal that is an audio signal in which sounds from a plurality of sound sources are mixed, and an input of supplemental information, and output an estimation result of mask information that identifies a mask for extracting a sound of any one of the sound sources included in an entire or a part of a signal included in the input audio signal, the signal being identified by the supplemental information, cause a neural network to iterate a process of outputting the estimation result of the mask information, and cause the neural network to output an estimation result of the mask information for a different sound source, by inputting a different piece of the supplemental information to the neural network at each iteration.
    Type: Application
    Filed: January 29, 2019
    Publication date: November 25, 2021
    Applicant: NIPPON TELEGRAPH AND TELEPHONE CORPORATION
    Inventors: Keisuke KINOSHITA, Marc DELCROIX, Tomohiro NAKATANI, Shoko ARAKI, Lukas DRUDE, Thilo Christoph VON NEUMANN