Patents by Inventor James R. Glass

James R. Glass has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Self-supervised speech recognition

Patent number: 12211491

Abstract: One or more computer processors obtain an initial subnetwork at a target sparsity and an initial pruning mask from a pre-trained self-supervised learning (SSL) speech model. The one or more computer processors finetune the initial subnetwork, comprising: the one or more computer processors zero out one or more masked weights in the initial subnetwork specified by the initial pruning mask; the one or more computer processors train a new subnetwork from the zeroed out subnetwork; the one or more computer processors prune one or more weights of lowest magnitude in the new subnetwork regardless of network structure to satisfy the target sparsity. The one or more computer processors classify an audio segment with the finetuned subnetwork.

Type: Grant

Filed: May 9, 2022

Date of Patent: January 28, 2025

Assignee: International Business Machines Corporation

Inventors: Cheng-I Lai, Yang Zhang, Kaizhi Qian, Chuang Gan, James R. Glass, Alexander Haojan Liu
Low-power automatic speech recognition device

Patent number: 11961513

Abstract: A decoder includes a feature extraction circuit for calculating one or more feature vectors. An acoustic model circuit is coupled to receive one or more feature vectors from and assign one or more likelihood values to the one or more feature vectors. A memory architecture that utilizes on-chip state lattices and an off-chip memory for storing states of transition of the decoder is used to reduce reading and writing to the off-chip memory. The on-chip state lattice is populated with at least one of the states of transition stored in the off-chip memory. An on-chip word is generated from a snapshot from the on-chip state lattice. The on-chip state lattice and the on-chip word lattice act as an on-chip cache to reduce reading and writing to the off-chip memory.

Type: Grant

Filed: July 29, 2021

Date of Patent: April 16, 2024

Assignee: Massachusetts Institute of Technology

Inventors: Michael R. Price, James R. Glass, Anantha P. Chandrakasan
Learning device, learning method, and learning program for images and sound which uses a similarity matrix

Patent number: 11830478

Abstract: A learning device calculates a feature of each data included in a pair of datasets in which two modalities among a plurality of modalities are combined, using a model that receives data on a corresponding modality among the modalities and outputs a feature obtained by mapping the received data into an embedding space. The learning device then selects similar data similar to each target data that is data on a first modality in a first dataset of the datasets, from data on a second modality included in a second dataset of the datasets. The learning device further updates a parameter of the model such that the features of the data in the pair included in the first and the second datasets are similar to one another, and the feature of data paired with the target data is similar to the feature of data paired with the similar data.

Type: Grant

Filed: April 1, 2021

Date of Patent: November 28, 2023

Assignees: NIPPON TELEGRAPH AND TELEPHONE CORPORATION, MASSACHUSETTS INSTITUTE OF TECHNOLOGY

Inventors: Yasunori Ohishi, Akisato Kimura, Takahito Kawanishi, Kunio Kashino, James R. Glass, David Harwath
Learning device, learning method, learning program, retrieval device, retrieval method, and retrieval program

Patent number: 11817081

Abstract: A learning device calculates an image feature using a model (image encoder) that receives an image and outputs the image feature obtained by mapping the image into a latent space. The learning device calculates an audio feature using a model (audio encoder) that receives a speech in a predetermined language and outputs the audio feature obtained by mapping the speech into the latent space, and that includes a neural network provided with a self-attention mechanism. The learning device updates parameters of the models used by an image feature calculation unit and an audio feature calculation unit such that the image feature of a first image is similar to the audio feature of a speech corresponding to the first image.

Type: Grant

Filed: March 31, 2021

Date of Patent: November 14, 2023

Assignees: NIPPON TELEGRAPH AND TELEPHONE CORPORATION, MASSACHUSETTS INSTITUTE OF TECHNOLOGY

Inventors: Yasunori Ohishi, Akisato Kimura, Takahito Kawanishi, Kunio Kashino, James R. Glass, David Harwath
SELF-SUPERVISED SPEECH RECOGNITION

Publication number: 20230360642

Abstract: One or more computer processors obtain an initial subnetwork at a target sparsity and an initial pruning mask from a pre-trained self-supervised learning (SSL) speech model. The one or more computer processors finetune the initial subnetwork, comprising: the one or more computer processors zero out one or more masked weights in the initial subnetwork specified by the initial pruning mask; the one or more computer processors train a new subnetwork from the zeroed out subnetwork; the one or more computer processors prune one or more weights of lowest magnitude in the new subnetwork regardless of network structure to satisfy the target sparsity. The one or more computer processors classify an audio segment with the finetuned subnetwork.

Type: Application

Filed: May 9, 2022

Publication date: November 9, 2023

Inventors: Cheng-I Lai, Yang Zhang, Kaizhi Qian, Chuang Gan, James R. Glass, Alexander Haojan Liu
LEARNING DEVICE, LEARNING METHOD, AND LEARNING PROGRAM

Publication number: 20220319495

Abstract: A learning device calculates a feature of each data included in a pair of datasets in which two modalities among a plurality of modalities are combined, using a model that receives data on a corresponding modality among the modalities and outputs a feature obtained by mapping the received data into an embedding space. The learning device then selects similar data similar to each target data that is data on a first modality in a first dataset of the datasets, from data on a second modality included in a second dataset of the datasets. The learning device further updates a parameter of the model such that the features of the data in the pair included in the first and the second datasets are similar to one another, and the feature of data paired with the target data is similar to the feature of data paired with the similar data.

Type: Application

Filed: April 1, 2021

Publication date: October 6, 2022

Applicants: NIPPON TELEGRAPH AND TELEPHONE CORPORATION, Massachusetts Institute of Technology

Inventors: Yasunori OHISHI, Akisato KIMURA, Takahito KAWANISHI, Kunio KASHINO, James R. GLASS, David HARWATH
LEARNING DEVICE, LEARNING METHOD, LEARNING PROGRAM, RETRIEVAL DEVICE, RETRIEVAL METHOD, AND RETRIEVAL PROGRAM

Publication number: 20220319493

Abstract: A learning device calculates an image feature using a model (image encoder) that receives an image and outputs the image feature obtained by mapping the image into a latent space. The learning device calculates an audio feature using a model (audio encoder) that receives a speech in a predetermined language and outputs the audio feature obtained by mapping the speech into the latent space, and that includes a neural network provided with a self-attention mechanism. The learning device updates parameters of the models used by an image feature calculation unit and an audio feature calculation unit such that the image feature of a first image is similar to the audio feature of a speech corresponding to the first image.

Type: Application

Filed: March 31, 2021

Publication date: October 6, 2022

Applicants: NIPPON TELEGRAPH AND TELEPHONE CORPORATION, Massachusetts Institute of Technology

Inventors: Yasunori OHISHI, Akisato KIMURA, Takahito KAWANISHI, Kunio KASHINO, James R. GLASS, David HARWATH
Low-Power Automatic Speech Recognition Device

Publication number: 20210358484

Abstract: A decoder includes a feature extraction circuit for calculating one or more feature vectors. An acoustic model circuit is coupled to receive one or more feature vectors from and assign one or more likelihood values to the one or more feature vectors. A memory architecture that utilizes on-chip state lattices and an off-chip memory for storing states of transition of the decoder is used to reduce reading and writing to the off-chip memory. The on-chip state lattice is populated with at least one of the states of transition stored in the off-chip memory. An an on-chip word is generated from a snapshot from the on-chip state lattice. The on-chip state lattice and the on-chip word lattice act as an on-chip cache to reduce reading and writing to the off-chip memory.

Type: Application

Filed: July 29, 2021

Publication date: November 18, 2021

Inventors: Michael R. PRICE, James R. GLASS, Anantha P. CHANDRAKASAN
Low-power automatic speech recognition device

Patent number: 11107461

Abstract: A decoder comprises a feature extraction circuit for calculating one or more feature vectors; an acoustic model circuit coupled to receive one or more feature vectors from said feature extraction circuit and assign one or more likelihood values to the one or more feature vectors; a memory for storing states of transition of the decoder; and a search circuit for receiving an input from said acoustic model circuit corresponding to the one or more likelihood values based upon the one or more feature vectors, and for choosing states of transition from the memory based on the input from said acoustic model.

Type: Grant

Filed: May 31, 2017

Date of Patent: August 31, 2021

Assignee: MASSACHUSETTS INSTITUTE OF TECHNOLOGY

Inventors: Michael R. Price, James R. Glass, Anantha P. Chandrakasan
System and method for semantic mapping of natural language input to database entries via convolutional neural networks

Patent number: 10817509

Abstract: A system for associating a string of natural language with items in a relational database includes a first subsystem having a pre-trained first artificial neural network configured to apply a semantic tag selected from a predefined set of semantic labels to a segment of a plurality of tokens representing the string of natural language. A second subsystem includes a second artificial neural network configured to convert the plurality of labeled tokens into a first multi-dimensional vector representing the string of natural language. A third subsystem is configured to rank the first multi-dimensional vector against a second multi-dimensional vector representing a plurality of items in the relational database.

Type: Grant

Filed: March 15, 2018

Date of Patent: October 27, 2020

Assignee: Massachusetts Institute of Technology

Inventors: Mandy Barrett Korpusik, James R. Glass
Joint acoustic and visual processing

Patent number: 10515292

Abstract: An approach to joint acoustic and visual processing associates images with corresponding audio signals, for example, for the retrievals of images according to voice queries. A set of paired images and audio signals are processed without requiring transcription, segmentation, or annotation of either the images or the audio. This processing of the paired images and audio is used to determine parameters of an image processor and an audio processor, with the outputs of these processors being comparable to determine a similarity across acoustic and visual modalities. In some implementations, the image processor and the audio processor make use of deep neural networks. Further embodiments associate parts of images with corresponding parts of audio signals.

Type: Grant

Filed: June 15, 2017

Date of Patent: December 24, 2019

Assignee: Massachusetts Institute of Technology

Inventors: David F. Harwath, James R. Glass
Low-Power Automatic Speech Recognition Device

Publication number: 20190147856

Abstract: A decoder comprises a feature extraction circuit for calculating one or more feature vectors; an acoustic model circuit coupled to receive one or more feature vectors from said feature extraction circuit and assign one or more likelihood values to the one or more feature vectors; a memory for storing states of transition of the decoder; and a search circuit for receiving an input from said acoustic model circuit corresponding to the one or more likelihood values based upon the one or more feature vectors, and for choosing states of transition from the memory based on the input from said acoustic model.

Type: Application

Filed: May 31, 2017

Publication date: May 16, 2019

Inventors: Michael R. PRICE, James R. GLASS, Anantha P. CHANDRAKASAN
System and Method for Semantic Mapping of Natural Language Input to Database Entries via Convolutional Neural Networks

Publication number: 20180268023

Abstract: A system for associating a string of natural language with items in a relational database includes a first subsystem having a pre-trained first artificial neural network configured to apply a semantic tag selected from a predefined set of semantic labels to a segment of a plurality of tokens representing the string of natural language. A second subsystem includes a second artificial neural network configured to convert the plurality of labeled tokens into a first multi-dimensional vector representing the string of natural language. A third subsystem is configured to rank the first multi-dimensional vector against a second multi-dimensional vector representing a plurality of items in the relational database.

Type: Application

Filed: March 15, 2018

Publication date: September 20, 2018

Inventors: Mandy Barrett Korpusik, James R. Glass
JOINT ACOUSTIC AND VISUAL PROCESSING

Publication number: 20180039859

Abstract: An approach to joint acoustic and visual processing associates images with corresponding audio signals, for example, for the retrievals of images according to voice queries. A set of paired images and audio signals are processed without requiring transcription, segmentation, or annotation of either the images or the audio. This processing of the paired images and audio is used to determine parameters of an image processor and an audio processor, with the outputs of these processors being comparable to determine a similarity across acoustic and visual modalities. In some implementations, the image processor and the audio processor make use of deep neural networks. Further embodiments associate parts of images with corresponding parts of audio signals.

Type: Application

Filed: June 15, 2017

Publication date: February 8, 2018

Inventors: David F. Harwath, James R. Glass
Speech data retrieval apparatus, speech data retrieval method, speech data retrieval program and computer usable medium having computer readable speech data retrieval program embodied therein

Patent number: 8386264

Abstract: A speech data retrieval apparatus (10) includes a speech database (1), a speech recognition unit (2), a confusion network creation unit (3), an inverted index table creation unit (4), a query input unit (6), a query conversion unit (7) and a label string check unit (8). The speech recognition unit (2) reads speech data from the speech database (1), carries out a speech recognition process with respect to the read speech data, and outputs a result of speech recognition process as a lattice in which a phoneme, a syllable, or a word is a base unit. The confusion network creation unit (3) creates a confusion network based on the output lattice and outputs the result of speech recognition process as the confusion network. The inverted index table creation unit (4) creates an inverted index table based on the output confusion network.

Type: Grant

Filed: April 11, 2008

Date of Patent: February 26, 2013

Assignees: Nippon Telegraph and Telephone Corporation, Massachusetts Institute of Technology

Inventors: Takaaki Hori, I. Lee Hetherington, Timothy J. Hazen, James R. Glass
Speech Data Retrieval Apparatus, Speech Data Retrieval Method, Speech Data Retrieval Program and Computer Usable Medium Having Computer Readable Data Retrieval Program Embodied Therein

Publication number: 20100121642

Abstract: A speech data retrieval apparatus (10) includes a speech database (1), a speech recognition unit (2), a confusion network creation unit (3), an inverted index table creation unit (4), a query input unit (6), a query conversion unit (7) and a label string check unit (8). The speech recognition unit (2) reads speech data from the speech database (1), carries out a speech recognition process with respect to the read speech data, and outputs a result of speech recognition process as a lattice in which a phoneme, a syllable, or a word is a base unit. The confusion network creation unit (3) creates a confusion network based on the output lattice and outputs the result of speech recognition process as the confusion network. The inverted index table creation unit (4) creates an inverted index table based on the output confusion network.

Type: Application

Filed: November 4, 2008

Publication date: May 13, 2010

Applicant: Massachusetts Institute of Technology

Inventors: Takaaki Hori, I. Lee Hetherington, Timothy J. Hazen, James R. Glass
Composite and layered particles for efficient delivery of polyelectrolytes

Publication number: 20080261045

Abstract: The invention provides a method of making composite particles for efficient delivery of polyelectrolytes to a target. Composite particles are made by two methods: 1) by first forming disperse polyelectrolyte condensates, by mixing the polyelectrolyte with a condensing agent, and then combining the disperse polyelectrolyte condensates with particles so that the disperse polyelectrolyte condensates bind to the surfaces of the particles or 2) combining particles with opposite charge polyelectrolyte to form a polyelectrolyte coated particles followed by a subsequent polyelectrolyte of opposite charge to form a composite particle. The invention includes composite particles, where each composite particle is comprised of a particle with the polyelectrolyte from one or more polyelectrolyte condensates bound to that particle. One advantage of these composite particles is that they permit more efficient and increased amounts of polyelectrolytes to be delivered to a target, in comparison to the prior art.

Type: Application

Filed: January 18, 2008

Publication date: October 23, 2008

Inventors: James R. Glass, David Schultz, Steven J. Oldenburg
Modulation of gene expression in formation of fatty atherosclerotic lesions

Publication number: 20030215817

Abstract: Polynucleotides, polypeptides, kits and methods are provided related to genes regulated by the formation of fatty atherosclerotic lesions, and by administration of a dihydropyridine calcium antagonist, lercanidipine.

Type: Application

Filed: February 3, 2003

Publication date: November 20, 2003

Inventors: Amedeo Leonardi, Abraham Sartani, James R. Glass, J. Gregor Sutcliffe, Karl W. Hasel
Immobilization of peptides to hyaluronate

Patent number: 5677276

Abstract: The present invention provides novel conjugates of a synthetic polypeptide containing RGD or (dR)GD and a biodegradable polymer, hyaluronate. The conjugates are prepared by any one of three different methods provided by the present invention: (1) an epoxide method (2) a sodium periodate method, and (3) a tresyl chloride method. The conjugates prepared by these methods are useful to aid in wound healing and tissue regeneration by providing a temporary matrix for tissue repair. The invention also provides novel RGD-peptides.

Type: Grant

Filed: June 5, 1995

Date of Patent: October 14, 1997

Assignee: La Jolla Cancer Research Foundation

Inventors: Kenneth T. Dickerson, James R. Glass, Lin-Shu Liu, James W. Polarek, William S. Craig, Daniel G. Mullen, Soan Cheng
Segment-based apparatus and method for speech recognition by analyzing multiple speech unit frames and modeling both temporal and spatial correlation

Patent number: 5625749

Abstract: Phonetic recognition is provided by capturing dynamical behavior and statistical dependencies of the acoustic attributes used to represent a subject speech waveform. A segment based framework is employed. Temporal behavior is modelled explicitly by creating dynamic templates, called tracks, of the acoustic attributes used to represent the speech waveform, and by generating the estimation of the acoustic spatio-temporal correlation structure. An error model represents this estimation as the temporal and spatial correlations between the input speech waveform and track generated speech segment. Models incorporating these two components (track and error estimation) are created for both phonetic units and for phonetic transitions. Phonetic contextual influences are accounted for by merging context-dependent tracks and pooling error statistics over the different contexts. This allows for a large number of contextual models without compromising the robustness of the statistical parameter estimates.

Type: Grant

Filed: August 22, 1994

Date of Patent: April 29, 1997

Assignee: Massachusetts Institute of Technology

Inventors: William D. Goldenthal, James R. Glass