Patents by Inventor Huimeng ZHENG

Huimeng ZHENG has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

MULTIPLE-MODEL HETEROGENEOUS COMPUTING

Publication number: 20240211724

Abstract: Modern deep neural network (DNN) models have many layers with a single layer potentially involving large matrix multiplications. Such heavy calculation brings challenges to deploy such DNN models on a single edge device, which has relatively limited computation resources. Therefore, multiple and even heterogeneous edge devices may be required for applications with stringent latency requirements. Disclosed in the present patent documents are embodiments of a model scheduling framework that schedules multiple models on a heterogeneous platform. Multiple-model heterogeneous computing is partitioned into a neural computation optimizer (NCO) part and a neural computation accelerator (NCA) part. The migration, transition, or transformation of DNN models from cloud to edge is handled by the NCO, while the deployment of the transformed DNN models on the heterogeneous platform is handled by the NCA. Such a separation of implementation simplifies task execution and improves the flexibility for the overall framework.

Type: Application

Filed: August 11, 2021

Publication date: June 27, 2024

Applicants: Baidu USA LLC, Baidu.com Times Technology (Beijing) Co., Ltd.

Inventors: Haofeng KOU, Xing LI, Huimeng ZHENG, Lei WANG, Zhen CHEN
PARALLEL COMPUTING OF ML SERVICES AND APPLICATIONS

Publication number: 20240193002

Abstract: A system obtains a performance profile corresponding to times taken to perform an inferencing by a machine learning (ML) model using a different number of processing resources from a plurality of processing resources. The system determines one or more groupings of processing resources from the plurality of processing resources, each grouping includes one or more partitions. The system calculates performance speeds corresponding to each grouping based on the performance profile. The system determines a grouping having a best performance speed from the calculated performance speeds. The system partitions the processing resources based on the determined grouping to perform the inferencing.

Type: Application

Filed: June 10, 2022

Publication date: June 13, 2024

Inventors: HAOFENG KOU, DAVY HUANG, MANJIANG ZHANG, XING LI, LEI WANG, HUIMENG ZHENG, ZHEN CHEN, RUICHANG CHENG
SCHEDULING ML SERVICES AND MODELS WITH HETEROGENEOUS RESOURCES

Publication number: 20240185098

Abstract: A system determines a timing matrix corresponding to inference times taken for a number of machine learning (ML) models to be executed by a number of processing resources of a computing device. The processing resources includes at least a first and a second type of processing resources. The system applies a service-specific model-first scheduling scheme or a service-specific hardware-first scheduling scheme to obtain corresponding service-specific mappings. The system determines a best mapping from the corresponding service-specific mappings. The system schedules each of the ML models to a corresponding processing resource from the processing resources according to the best mapping. The system executes the ML models using corresponding mapped processing resources.

Type: Application

Filed: April 15, 2022

Publication date: June 6, 2024

Inventors: HAOFENG KOU, DAVY HUANG, MANJIANG ZHANG, XING LI, LEI WANG, HUIMENG ZHENG, ZHEN CHEN, RUICHANG CHENG
HARDWARE ADAPTIVE MULTI-MODEL SCHEDULING

Publication number: 20240185587

Abstract: Modem deep neural network (DNN) models have many layers with a single layer potentially involving large matrix multiplications. Such heavy calculation brings challenges to deploy such DNN models on a single edge device, which has relatively limited computation resources. Therefore, multiple and even heterogeneous edge devices may be required for applications with stringent latency requirements. Disclosed in the present patent documents are embodiments of a model scheduling framework that schedules multiple models on a heterogeneous platform. Two different approaches, model first scheduling (MFS) and hardware first scheduling (HFS), are presented to allocate a group of models for a service into corresponding heterogeneous edge devices, including CPU, VPU and GPU. Experimental results prove the effectiveness of the MFS and HFS methods for improving the inference speed of single and multiple AI-based services.

Type: Application

Filed: August 16, 2021

Publication date: June 6, 2024

Applicants: Baidu.com Times Technology (Beijing) Co., Ltd., Baidu USA LLC

Inventors: Haofeng KOU, Xing LI, Huimeng ZHENG, Lei WANG, Zhen CHEN
ROBOTIC PROCESS AUTOMATION (RPA)-BASED DATA LABELLING

Publication number: 20230229119

Abstract: One application of deep learning methods and labelled data is for industrial production or work applications. For such applications implemented with machine learning applications, massive amounts of data are required to train, validate, and/or tune models for better fitting the requirements. However, obtaining such data has typically be costly and difficult. Embodiments provide adaptable processes that provide data labelling methods for work settings. Embodiments take advantage of the work or production processes to label and collect data, which save time and money and improves accuracy. Embodiments prevent or reduce the need for worker training costs and human mistake-triggered data labelling problems. Embodiments also improve data labelling quality and speed-up of the development cycle.

Type: Application

Filed: February 10, 2021

Publication date: July 20, 2023

Applicants: Baidu USA LLC, Baidu.com Times Technology (Beijing) Co., Ltd.

Inventors: Huimeng ZHENG, Haofeng KOU
TRAINING OF DEPLOYED NEURAL NETWORKS

Publication number: 20230229890

Abstract: Embodiments presented herein facilitate improvement of a deployed neural network model's accuracy without significantly affecting its operation. In one or more embodiments, online training of the deployed model may be performed using a second neural network model that has higher accuracy than the deployed neural network model. In one or more embodiments, the second neural network model may also be improved online. Embodiments may be deployed in system, such as edge computing environments, in which neural networks deployed at the edge can be centrally monitored and updated.

Type: Application

Filed: December 10, 2020

Publication date: July 20, 2023

Applicants: Baidu USA LLC, Baidu.com Times Technology (Beijing) Co., Ltd.

Inventors: Haofeng KOU, Huimeng ZHENG

MULTIPLE-MODEL HETEROGENEOUS COMPUTING

PARALLEL COMPUTING OF ML SERVICES AND APPLICATIONS

SCHEDULING ML SERVICES AND MODELS WITH HETEROGENEOUS RESOURCES

HARDWARE ADAPTIVE MULTI-MODEL SCHEDULING

ROBOTIC PROCESS AUTOMATION (RPA)-BASED DATA LABELLING

TRAINING OF DEPLOYED NEURAL NETWORKS