Patents by Inventor Haihao SHEN

Haihao SHEN has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20250077861
    Abstract: The disclosure provides an apparatus, method, device and medium for label-balanced calibration in post-training quantization of DNNs. An apparatus includes interface circuitry configured to receive a training dataset and processor circuitry coupled to the interface circuitry. The processor circuitry is configured to generate a small ground truth dataset by selecting images with a ground truth number of 1 from the training dataset; generate a calibration dataset randomly from the training dataset; if any image in the calibration dataset has the ground truth number of 1, remove the image from the small ground truth dataset; generate a label balanced calibration dataset by replacing an image with a ground truth number greater than a preset threshold in the calibration dataset with a replacing image selected randomly from the small ground truth dataset; and perform calibration using the label balanced calibration dataset in post-training quantization. Other embodiments are disclosed and claimed.
    Type: Application
    Filed: November 3, 2021
    Publication date: March 6, 2025
    Applicant: Intel Corporation
    Inventors: Haihao SHEN, Feng TIAN, Xi CHEN, Huma ABIDI, Yuwen ZHOU
  • Publication number: 20250045586
    Abstract: The application provides a method and apparatus for accelerating deep learning inference based on a HW-aware sparsity pattern. The method may include determining a hardware-aware sparsity pattern based on a register width specified by an ISA of a hardware unit for implementing the DNN for deep learning inference, the sparsity pattern specifying a block size and a sparsity ratio for block-wise sparsification of a weight matrix of an operator in the DNN; performing the block-wise sparsification for the weight matrix based on the sparsity pattern to obtain a sparse weight matrix, during a training process of the DNN; compressing the sparse weight matrix into a concentrated weight matrix by removing all-zero blocks from the sparse weight matrix; and generating a mask to indicate an index of each row of non-zero blocks in the sparse weight matrix to enable extraction of corresponding elements from an activation matrix of the operator.
    Type: Application
    Filed: March 4, 2022
    Publication date: February 6, 2025
    Inventors: Hengyu MENG, Jiong GONG, Xudong LIU, Haihao SHEN
  • Publication number: 20240289612
    Abstract: The application provides a hardware-aware cost model for optimizing inference of a deep neural network (DNN) comprising: a computation cost estimator configured to compute estimated computation cost based on input tensor, weight tensor and output tensor from the DNN; and a memory/cache cost estimator configured to perform memory/cache cost estimation strategy based on hardware specifications, wherein the hardware-aware cost model is used to perform performance simulation on target hardware to provide dynamic quantization knobs to quantization as required for converting a conventional precision inference model to an optimized inference model based on the result of the performance simulation.
    Type: Application
    Filed: October 26, 2021
    Publication date: August 29, 2024
    Inventors: Haihao SHEN, Hengyu MENG, Feng TIAN
  • Publication number: 20230118802
    Abstract: Systems, apparatuses and methods may provide technology for optimizing an inference neural network model that performs asymmetric quantization by generating a quantized neural network, wherein model weights of the neural network are quantized as signed integer values, and wherein an input layer of the neural network is configured to quantize input values as unsigned integer values, generating a weights accumulation table based on the quantized model weights and a kernel size for the neural network, and generating an output restoration function for an output layer of the neural network based on the weights accumulation table and the kernel size. The technology may also perform per-input channel quantization. The technology may also perform mixed-precision auto-tuning.
    Type: Application
    Filed: March 13, 2020
    Publication date: April 20, 2023
    Inventors: Jiong Gong, Yong Wu, Haihao Shen, Xiao Dong Lin, Guoming Zhang, Feng Yuan
  • Publication number: 20230010142
    Abstract: A student model may be trained in two stages by using two teacher models, respectively. The first teacher model has been trained with a pretraining dataset. The second teacher model has been trained with a training dataset that is specific to a task to be performed by the student model. In the first stage, the student model may be generated based on a structure of the first teacher model. Internal parameters of the student model are adjusted through a pretraining process based on the first teacher model and the pretraining dataset. Weights of the student model may be pruned during the pretraining process. In the second stage, a sparsity mask is generated for the student model to lock the sparsity pattern generated from the first stage. Further, some of the internal parameters of the student model are modified based on the second teacher model and the training dataset.
    Type: Application
    Filed: September 22, 2022
    Publication date: January 12, 2023
    Inventors: Ofir Zafrir, Guy Boudoukh, Ariel Lahrey, Moshe Wasserblat, Haihao Shen
  • Publication number: 20210350210
    Abstract: A method and apparatus for keeping statistical inference accuracy with 8-bit winograd convolution. A calibration dataset and a pretrained CNN comprising 32-bit floating point weight values may be sampled to generate an input activation tensor and a weight tensor. A transformed input activation tensor may be generated by multiplying the input activation tensor and an input matrix to generate a transformed input activation tensor. A transformed weight tensor may be generated by multiplying the weight tensor and a weight matrix. A scale factor may be computed for each transformed tensor. An 8-bit CNN model including the scale factors may be generated.
    Type: Application
    Filed: July 30, 2018
    Publication date: November 11, 2021
    Applicant: INTEL CORPORATION
    Inventors: Jiong GONG, Haihao SHEN, Xiao Dong LIN, Xiaoli LIU
  • Publication number: 20180173614
    Abstract: Technologies for device-independent application testing include a host computing device and one or more test computing devices. The host computer device records user interface events generated by an application of the test computing device and video data indicative of the display interface of the application. The host computing device detects user interface objects in the video data that correspond to user interface events using a computer vision algorithm, which may include image feature detection or optical character recognition. The host computing device generates an object-based test script that identifies the user interface object and a user interaction. The host computing device may identify the user interface object in the display interface of an application executed by a different test computing device using the computer vision algorithm. The host computing device performs the specified user interaction on the detected user interface object. Other embodiments are described and claimed.
    Type: Application
    Filed: June 26, 2015
    Publication date: June 21, 2018
    Inventors: Jiong GONG, Yun WANG, Haihao SHEN