Abstract: A system configured to improve beamforming by using deep neural networks (DNNs). The system can use one trained DNN to focus on a first person speaking an utterance (e.g., target user) and one or more trained DNNs to focus on noise source(s) (e.g., wireless loudspeaker(s), a second person speaking, other localized sources of noise, or the like). The DNNs may generate time-frequency mask data that indicates individual frequency bands that correspond to the particular source detected by the DNN. Using this mask data, a beamformer can generate beamformed audio data that is specific to a source of noise. The system may perform noise cancellation to isolate first beamformed audio data associated with the target user by removing second beamformed audio data associated with noise source(s).
Type:
Grant
Filed:
February 13, 2018
Date of Patent:
December 31, 2019
Assignee:
AMAZON TECHONLOGIES, INC.
Inventors:
Robert Ayrapetian, Philip Ryan Hilmes, Trausti Thor Kristjansson