Patents by Inventor Haihao SHEN

Haihao SHEN has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

APPARATUS, METHOD, DEVICE AND MEDIUM FOR LABEL-BALANCED CALIBRATION IN POST-TRAINING QUANTIZATION OF DNN

Publication number: 20250077861

Abstract: The disclosure provides an apparatus, method, device and medium for label-balanced calibration in post-training quantization of DNNs. An apparatus includes interface circuitry configured to receive a training dataset and processor circuitry coupled to the interface circuitry. The processor circuitry is configured to generate a small ground truth dataset by selecting images with a ground truth number of 1 from the training dataset; generate a calibration dataset randomly from the training dataset; if any image in the calibration dataset has the ground truth number of 1, remove the image from the small ground truth dataset; generate a label balanced calibration dataset by replacing an image with a ground truth number greater than a preset threshold in the calibration dataset with a replacing image selected randomly from the small ground truth dataset; and perform calibration using the label balanced calibration dataset in post-training quantization. Other embodiments are disclosed and claimed.

Type: Application

Filed: November 3, 2021

Publication date: March 6, 2025

Applicant: Intel Corporation

Inventors: Haihao SHEN, Feng TIAN, Xi CHEN, Huma ABIDI, Yuwen ZHOU
METHOD AND APPARATUS FOR ACCELERATING DEEP LEANING INFERENCE BASED ON HW-AWARE SPARSITY PATTERN

Publication number: 20250045586

Abstract: The application provides a method and apparatus for accelerating deep learning inference based on a HW-aware sparsity pattern. The method may include determining a hardware-aware sparsity pattern based on a register width specified by an ISA of a hardware unit for implementing the DNN for deep learning inference, the sparsity pattern specifying a block size and a sparsity ratio for block-wise sparsification of a weight matrix of an operator in the DNN; performing the block-wise sparsification for the weight matrix based on the sparsity pattern to obtain a sparse weight matrix, during a training process of the DNN; compressing the sparse weight matrix into a concentrated weight matrix by removing all-zero blocks from the sparse weight matrix; and generating a mask to indicate an index of each row of non-zero blocks in the sparse weight matrix to enable extraction of corresponding elements from an activation matrix of the operator.

Type: Application

Filed: March 4, 2022

Publication date: February 6, 2025

Inventors: Hengyu MENG, Jiong GONG, Xudong LIU, Haihao SHEN
METHOD AND APPARATUS FOR OPTIMIZING INFERENCE OF DEEP NEURAL NETWORKS

Publication number: 20240289612

Abstract: The application provides a hardware-aware cost model for optimizing inference of a deep neural network (DNN) comprising: a computation cost estimator configured to compute estimated computation cost based on input tensor, weight tensor and output tensor from the DNN; and a memory/cache cost estimator configured to perform memory/cache cost estimation strategy based on hardware specifications, wherein the hardware-aware cost model is used to perform performance simulation on target hardware to provide dynamic quantization knobs to quantization as required for converting a conventional precision inference model to an optimized inference model based on the result of the performance simulation.

Type: Application

Filed: October 26, 2021

Publication date: August 29, 2024

Inventors: Haihao SHEN, Hengyu MENG, Feng TIAN
OPTIMIZING LOW PRECISION INFERENCE MODELS FOR DEPLOYMENT OF DEEP NEURAL NETWORKS

Publication number: 20230118802

Abstract: Systems, apparatuses and methods may provide technology for optimizing an inference neural network model that performs asymmetric quantization by generating a quantized neural network, wherein model weights of the neural network are quantized as signed integer values, and wherein an input layer of the neural network is configured to quantize input values as unsigned integer values, generating a weights accumulation table based on the quantized model weights and a kernel size for the neural network, and generating an output restoration function for an output layer of the neural network based on the weights accumulation table and the kernel size. The technology may also perform per-input channel quantization. The technology may also perform mixed-precision auto-tuning.

Type: Application

Filed: March 13, 2020

Publication date: April 20, 2023

Inventors: Jiong Gong, Yong Wu, Haihao Shen, Xiao Dong Lin, Guoming Zhang, Feng Yuan
Generating Pretrained Sparse Student Model for Transfer Learning

Publication number: 20230010142

Abstract: A student model may be trained in two stages by using two teacher models, respectively. The first teacher model has been trained with a pretraining dataset. The second teacher model has been trained with a training dataset that is specific to a task to be performed by the student model. In the first stage, the student model may be generated based on a structure of the first teacher model. Internal parameters of the student model are adjusted through a pretraining process based on the first teacher model and the pretraining dataset. Weights of the student model may be pruned during the pretraining process. In the second stage, a sparsity mask is generated for the student model to lock the sparsity pattern generated from the first stage. Further, some of the internal parameters of the student model are modified based on the second teacher model and the training dataset.

Type: Application

Filed: September 22, 2022

Publication date: January 12, 2023

Inventors: Ofir Zafrir, Guy Boudoukh, Ariel Lahrey, Moshe Wasserblat, Haihao Shen
METHOD AND APPARATUS FOR KEEPING STATISTICAL INFERENCE ACCURACY WITH 8-BIT WINOGRAD CONVOLUTION

Publication number: 20210350210

Abstract: A method and apparatus for keeping statistical inference accuracy with 8-bit winograd convolution. A calibration dataset and a pretrained CNN comprising 32-bit floating point weight values may be sampled to generate an input activation tensor and a weight tensor. A transformed input activation tensor may be generated by multiplying the input activation tensor and an input matrix to generate a transformed input activation tensor. A transformed weight tensor may be generated by multiplying the weight tensor and a weight matrix. A scale factor may be computed for each transformed tensor. An 8-bit CNN model including the scale factors may be generated.

Type: Application

Filed: July 30, 2018

Publication date: November 11, 2021

Applicant: INTEL CORPORATION

Inventors: Jiong GONG, Haihao SHEN, Xiao Dong LIN, Xiaoli LIU
TECHNOLOGIES FOR DEVICE INDEPENDENT AUTOMATED APPLICATION TESTING

Publication number: 20180173614

Abstract: Technologies for device-independent application testing include a host computing device and one or more test computing devices. The host computer device records user interface events generated by an application of the test computing device and video data indicative of the display interface of the application. The host computing device detects user interface objects in the video data that correspond to user interface events using a computer vision algorithm, which may include image feature detection or optical character recognition. The host computing device generates an object-based test script that identifies the user interface object and a user interaction. The host computing device may identify the user interface object in the display interface of an application executed by a different test computing device using the computer vision algorithm. The host computing device performs the specified user interaction on the detected user interface object. Other embodiments are described and claimed.

Type: Application

Filed: June 26, 2015

Publication date: June 21, 2018

Inventors: Jiong GONG, Yun WANG, Haihao SHEN

APPARATUS, METHOD, DEVICE AND MEDIUM FOR LABEL-BALANCED CALIBRATION IN POST-TRAINING QUANTIZATION OF DNN

METHOD AND APPARATUS FOR ACCELERATING DEEP LEANING INFERENCE BASED ON HW-AWARE SPARSITY PATTERN

METHOD AND APPARATUS FOR OPTIMIZING INFERENCE OF DEEP NEURAL NETWORKS

OPTIMIZING LOW PRECISION INFERENCE MODELS FOR DEPLOYMENT OF DEEP NEURAL NETWORKS

Generating Pretrained Sparse Student Model for Transfer Learning

METHOD AND APPARATUS FOR KEEPING STATISTICAL INFERENCE ACCURACY WITH 8-BIT WINOGRAD CONVOLUTION

TECHNOLOGIES FOR DEVICE INDEPENDENT AUTOMATED APPLICATION TESTING