Patents by Inventor Ian Richard LANE

Ian Richard LANE has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 10964326
    Abstract: Disclosed herein is method of performing speech recognition using audio and visual information, where the visual information provides data related to a person's face. Image preprocessing identifies regions of interest, which is then combined with the audio data before being processed by a speech recognition engine.
    Type: Grant
    Filed: February 16, 2017
    Date of Patent: March 30, 2021
    Assignee: CARNEGIE MELLON UNIVERSITY, a Pennsylvania Non-Profit Corporation
    Inventor: Ian Richard Lane
  • Publication number: 20210019601
    Abstract: Disclosed herein is a deep learning model that can be used for performing speech or image processing tasks. The model uses multi-task training, where the model is trained for at least two inter-related tasks. For face detection, the first task is face detection (i.e. face or non-face) and the second task is facial feature identification (i.e. mouth, eyes, nose). The multi-task model improves the accuracy of the task over single-task models.
    Type: Application
    Filed: October 5, 2020
    Publication date: January 21, 2021
    Applicant: CARNEGIE MELLON UNIVERSITY
    Inventors: Ian Richard Lane, Bo Yu
  • Patent number: 10453445
    Abstract: Disclosed herein is a GPU-accelerated speech recognition engine optimized for faster than real time speech recognition on a scalable server-client heterogeneous CPU-GPU architecture, which is specifically optimized to simultaneously decode multiple users in real-time. In order to efficiently support real-time speech recognition for multiple users, a “producer/consumer” design pattern is applied to decouple speech processes that run at different rates in order to handle multiple processes at the same time. Furthermore, the speech recognition process is divided into multiple consumers in order to maximize hardware utilization. As a result, the platform architecture is able to process more than 45 real-time audio streams with an average latency of less than 0.3 seconds using one-million-word vocabulary language models.
    Type: Grant
    Filed: February 16, 2017
    Date of Patent: October 22, 2019
    Assignee: CARNEGIE MELLON UNIVERSITY
    Inventors: Ian Richard Lane, Jungsuk Kim
  • Publication number: 20170236518
    Abstract: Disclosed herein is a GPU-accelerated speech recognition engine optimized for faster than real time speech recognition on a scalable server-client heterogeneous CPU-GPU architecture, which is specifically optimized to simultaneously decode multiple users in real-time. In order to efficiently support real-time speech recognition for multiple users, a “producer/consumer” design pattern is applied to decouple speech processes that run at different rates in order to handle multiple processes at the same time. Furthermore, the speech recognition process is divided into multiple consumers in order to maximize hardware utilization. As a result, the platform architecture is able to process more than 45 real-time audio streams with an average latency of less than 0.3 seconds using one-million-word vocabulary language models.
    Type: Application
    Filed: February 16, 2017
    Publication date: August 17, 2017
    Applicant: CARNEGIE MELLON UNIVERSITY, a Pennsylvania Non-Profit Corporation
    Inventors: Ian Richard Lane, Jungsuk Kim
  • Publication number: 20170236057
    Abstract: Disclosed herein is a deep learning model that can be used for performing speech or image processing tasks. The model uses multi-task training, where the model is trained for at least two inter-related tasks. For face detection, the first task is face detection (i.e. face or non-face) and the second task is facial feature identification (i.e. mouth, eyes, nose). The multi-task model improves the accuracy of the task over single-task models.
    Type: Application
    Filed: February 16, 2017
    Publication date: August 17, 2017
    Applicant: CARNEGIE MELLON UNIVERSITY, a Pennsylvania Non-Profit Corporation
    Inventors: Ian Richard Lane, Bo Yu
  • Publication number: 20170236516
    Abstract: Disclosed herein is method of performing speech recognition using audio and visual information, where the visual information provides data related to a person's face. Image preprocessing identifies regions of interest, which is then combined with the audio data before being processed by a speech recognition engine.
    Type: Application
    Filed: February 16, 2017
    Publication date: August 17, 2017
    Applicant: CARNEGIE MELLON UNIVERSITY, a Pennsylvania Non-Profit Corporation
    Inventor: Ian Richard Lane
  • Patent number: 8886535
    Abstract: A method of optimizing the calculation of matching scores between phone states and acoustic frames across a matrix of an expected progression of phone states aligned with an observed progression of acoustic frames within an utterance is provided. The matrix has a plurality of cells associated with a characteristic acoustic frame and a characteristic phone state. A first set and second set of cells that meet a threshold probability of matching a first phone state or a second phone state, respectively, are determined. The phone states are stored on a local cache of a first core and a second core, respectively. The first and second sets of cells are also provided to the first core and second core, respectively. Further, matching scores of each characteristic state and characteristic observation of each cell of the first set of cells and of the second set of cells are calculated.
    Type: Grant
    Filed: January 23, 2014
    Date of Patent: November 11, 2014
    Assignee: Accumente, LLC
    Inventors: Jike Chong, Ian Richard Lane, Senaka Wimal Buthpitiya
  • Publication number: 20140142942
    Abstract: A method of optimizing the calculation of matching scores between phone states and acoustic frames across a matrix of an expected progression of phone states aligned with an observed progression of acoustic frames within an utterance is provided. The matrix has a plurality of cells associated with a characteristic acoustic frame and a characteristic phone state. A first set and second set of cells that meet a threshold probability of matching a first phone state or a second phone state, respectively, are determined. The phone states are stored on a local cache of a first core and a second core, respectively. The first and second sets of cells are also provided to the first core and second core, respectively. Further, matching scores of each characteristic state and characteristic observation of each cell of the first set of cells and of the second set of cells are calculated.
    Type: Application
    Filed: January 23, 2014
    Publication date: May 22, 2014
    Applicant: Accumente, LLC
    Inventors: Jike CHONG, Ian Richard LANE, Senaka Wimal BUTHPITIYA