Patents by Inventor Huimeng ZHENG

Huimeng ZHENG has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20240211724
    Abstract: Modern deep neural network (DNN) models have many layers with a single layer potentially involving large matrix multiplications. Such heavy calculation brings challenges to deploy such DNN models on a single edge device, which has relatively limited computation resources. Therefore, multiple and even heterogeneous edge devices may be required for applications with stringent latency requirements. Disclosed in the present patent documents are embodiments of a model scheduling framework that schedules multiple models on a heterogeneous platform. Multiple-model heterogeneous computing is partitioned into a neural computation optimizer (NCO) part and a neural computation accelerator (NCA) part. The migration, transition, or transformation of DNN models from cloud to edge is handled by the NCO, while the deployment of the transformed DNN models on the heterogeneous platform is handled by the NCA. Such a separation of implementation simplifies task execution and improves the flexibility for the overall framework.
    Type: Application
    Filed: August 11, 2021
    Publication date: June 27, 2024
    Applicants: Baidu USA LLC, Baidu.com Times Technology (Beijing) Co., Ltd.
    Inventors: Haofeng KOU, Xing LI, Huimeng ZHENG, Lei WANG, Zhen CHEN
  • Publication number: 20240193002
    Abstract: A system obtains a performance profile corresponding to times taken to perform an inferencing by a machine learning (ML) model using a different number of processing resources from a plurality of processing resources. The system determines one or more groupings of processing resources from the plurality of processing resources, each grouping includes one or more partitions. The system calculates performance speeds corresponding to each grouping based on the performance profile. The system determines a grouping having a best performance speed from the calculated performance speeds. The system partitions the processing resources based on the determined grouping to perform the inferencing.
    Type: Application
    Filed: June 10, 2022
    Publication date: June 13, 2024
    Inventors: HAOFENG KOU, DAVY HUANG, MANJIANG ZHANG, XING LI, LEI WANG, HUIMENG ZHENG, ZHEN CHEN, RUICHANG CHENG
  • Publication number: 20240185098
    Abstract: A system determines a timing matrix corresponding to inference times taken for a number of machine learning (ML) models to be executed by a number of processing resources of a computing device. The processing resources includes at least a first and a second type of processing resources. The system applies a service-specific model-first scheduling scheme or a service-specific hardware-first scheduling scheme to obtain corresponding service-specific mappings. The system determines a best mapping from the corresponding service-specific mappings. The system schedules each of the ML models to a corresponding processing resource from the processing resources according to the best mapping. The system executes the ML models using corresponding mapped processing resources.
    Type: Application
    Filed: April 15, 2022
    Publication date: June 6, 2024
    Inventors: HAOFENG KOU, DAVY HUANG, MANJIANG ZHANG, XING LI, LEI WANG, HUIMENG ZHENG, ZHEN CHEN, RUICHANG CHENG
  • Publication number: 20240185587
    Abstract: Modem deep neural network (DNN) models have many layers with a single layer potentially involving large matrix multiplications. Such heavy calculation brings challenges to deploy such DNN models on a single edge device, which has relatively limited computation resources. Therefore, multiple and even heterogeneous edge devices may be required for applications with stringent latency requirements. Disclosed in the present patent documents are embodiments of a model scheduling framework that schedules multiple models on a heterogeneous platform. Two different approaches, model first scheduling (MFS) and hardware first scheduling (HFS), are presented to allocate a group of models for a service into corresponding heterogeneous edge devices, including CPU, VPU and GPU. Experimental results prove the effectiveness of the MFS and HFS methods for improving the inference speed of single and multiple AI-based services.
    Type: Application
    Filed: August 16, 2021
    Publication date: June 6, 2024
    Applicants: Baidu.com Times Technology (Beijing) Co., Ltd., Baidu USA LLC
    Inventors: Haofeng KOU, Xing LI, Huimeng ZHENG, Lei WANG, Zhen CHEN
  • Publication number: 20230229119
    Abstract: One application of deep learning methods and labelled data is for industrial production or work applications. For such applications implemented with machine learning applications, massive amounts of data are required to train, validate, and/or tune models for better fitting the requirements. However, obtaining such data has typically be costly and difficult. Embodiments provide adaptable processes that provide data labelling methods for work settings. Embodiments take advantage of the work or production processes to label and collect data, which save time and money and improves accuracy. Embodiments prevent or reduce the need for worker training costs and human mistake-triggered data labelling problems. Embodiments also improve data labelling quality and speed-up of the development cycle.
    Type: Application
    Filed: February 10, 2021
    Publication date: July 20, 2023
    Applicants: Baidu USA LLC, Baidu.com Times Technology (Beijing) Co., Ltd.
    Inventors: Huimeng ZHENG, Haofeng KOU
  • Publication number: 20230229890
    Abstract: Embodiments presented herein facilitate improvement of a deployed neural network model's accuracy without significantly affecting its operation. In one or more embodiments, online training of the deployed model may be performed using a second neural network model that has higher accuracy than the deployed neural network model. In one or more embodiments, the second neural network model may also be improved online. Embodiments may be deployed in system, such as edge computing environments, in which neural networks deployed at the edge can be centrally monitored and updated.
    Type: Application
    Filed: December 10, 2020
    Publication date: July 20, 2023
    Applicants: Baidu USA LLC, Baidu.com Times Technology (Beijing) Co., Ltd.
    Inventors: Haofeng KOU, Huimeng ZHENG