Patents by Inventor Anbang Yao

Anbang Yao has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

APPARATUS AND METHOD FOR 3D DYNAMIC SPARSE CONVOLUTION

Publication number: 20250148761

Abstract: The disclosure provides an apparatus, method, device and medium for 3D dynamic sparse convolution. The method includes: receiving an input feature map of a 3D data sample; performing input feature map partition to divide the input feature map into a plurality of disjoint input feature map groups; performing a shared 3D dynamic sparse convolution to the plurality of disjoint input feature map groups respectively to obtain a plurality of output feature maps corresponding to the plurality of disjoint input feature map groups, wherein the shared 3D dynamic sparse convolution comprises a shared 3D dynamic sparse convolutional kernel; and performing output feature map grouping to sequentially stack the plurality of output feature maps to obtain an output feature map corresponding to the input feature map. (FIG. 2).

Type: Application

Filed: March 3, 2022

Publication date: May 8, 2025

Inventors: Dongqi CAI, Anbang YAO, Chao LI, Shandong WANG, Yurong CHEN
LOSS-ERROR-AWARE QUANTIZATION OF A LOW-BIT NEURAL NETWORK

Publication number: 20250117639

Abstract: Methods, apparatus, systems and articles of manufacture for loss-error-aware quantization of a low-bit neural network are disclosed. An example apparatus includes a network weight partitioner to partition unquantized network weights of a first network model into a first group to be quantized and a second group to be retrained. The example apparatus includes a loss calculator to process network weights to calculate a first loss. The example apparatus includes a weight quantizer to quantize the first group of network weights to generate low-bit second network weights. In the example apparatus, the loss calculator is to determine a difference between the first loss and a second loss. The example apparatus includes a weight updater to update the second group of network weights based on the difference. The example apparatus includes a network model deployer to deploy a low-bit network model including the low-bit second network weights.

Type: Application

Filed: September 16, 2024

Publication date: April 10, 2025

Applicant: Intel Corporation

Inventors: Anbang Yao, Aojun Zhou, Kuan Wang, Hao Zhao, Yurong Chen
COMPUTE OPTIMIZATIONS FOR LOW PRECISION MACHINE LEARNING OPERATIONS

Publication number: 20250117874

Abstract: One embodiment provides an apparatus comprising a memory stack including multiple memory dies and a parallel processor including a plurality of multiprocessors. Each multiprocessor has a single instruction, multiple thread (SIMT) architecture, the parallel processor coupled to the memory stack via one or more memory interfaces. At least one multiprocessor comprises a multiply-accumulate circuit to perform multiply-accumulate operations on matrix data in a stage of a neural network implementation to produce a result matrix comprising a plurality of matrix data elements at a first precision, precision tracking logic to evaluate metrics associated with the matrix data elements and indicate if an optimization is to be performed for representing data at a second stage of the neural network implementation, and a numerical transform unit to dynamically perform a numerical transform operation on the matrix data elements based on the indication to produce transformed matrix data elements at a second precision.

Type: Application

Filed: October 7, 2024

Publication date: April 10, 2025

Applicant: Intel Corporation

Inventors: Elmoustapha Ould-Ahmed-Vall, Sara S. Baghsorkhi, Anbang Yao, Kevin Nealis, Xiaoming Chen, Altug Koker, Abhishek R. Appu, John C. Weast, Mike B. Macpherson, Dukhwan Kim, Linda L. Hurd, Ben J. Ashbaugh, Barath Lakshmanan, Liwei Ma, Joydeep Ray, Ping T. Tang, Michael S. Strickland
INSTRUCTIONS AND LOGIC TO PERFORM FLOATING POINT AND INTEGER OPERATIONS FOR MACHINE LEARNING

Publication number: 20250094170

Abstract: One embodiment provides for a graphics processing unit to accelerate machine-learning operations, the graphics processing unit comprising a multiprocessor having a single instruction, multiple thread (SIMT) architecture, the multiprocessor to execute at least one single instruction; and a first compute unit included within the multiprocessor, the at least one single instruction to cause the first compute unit to perform a two-dimensional matrix multiply and accumulate operation, wherein to perform the two-dimensional matrix multiply and accumulate operation includes to compute a 32-bit intermediate product of 16-bit operands and to compute a 32-bit sum based on the 32-bit intermediate product.

Type: Application

Filed: September 30, 2024

Publication date: March 20, 2025

Applicant: Intel Corporation

Inventors: Himanshu Kaul, Mark A. Anders, Sanu K. Mathew, Anbang Yao, Joydeep Ray, Ping T. Tang, Michael S. Strickland, Xiaoming Chen, Tatiana Shpeisman, Abhishek R. Appu, Altug Koker, Kamal Sinha, Balaji Vembu, Nicolas C. Galoppo Von Borries, Eriko Nurvitadhi, Rajkishore Barik, Tsung-Han Lin, Vasanth Ranganathan, Sanjeev Jahagirdar
DYNAMIC TRIPLET CONVOLUTION FOR CONVOLUTIONAL NEURAL NETWORKS

Publication number: 20250068891

Abstract: Methods, apparatus, systems and articles of manufacture (e.g., physical storage media) to implement dynamic triplet convolution for convolutional neural networks are disclosed. An example apparatus disclosed herein for a convolutional neural network is to calculate one or more scalar kernels based on an input feature map applied to a layer of the convolutional neural network, ones of the one or more scalar kernels corresponding to respective dimensions of a static multidimensional convolutional filter associated with the layer of the convolutional neural network. The disclosed example apparatus is also to scale elements of the static multidimensional convolutional filter along a first one of the dimensions based on a first one of the one or more scalar kernels corresponding to the first one of the dimensions to determine a dynamic multidimensional convolutional filter associated with the layer of the convolutional neural network.

Type: Application

Filed: February 18, 2022

Publication date: February 27, 2025

Inventors: Dongqi CAI, Anbang YAO, Chao LI, Yurong CHEN, Wenjian SHAO
SYSTEMS, APPARATUS, ARTICLES OF MANUFACTURE, AND METHODS FOR TEACHER-FREE SELF-FEATURE DISTILLATION TRAINING OF MACHINE LEARNING MODELS

Publication number: 20250068916

Abstract: Methods, apparatus, systems, and articles of manufacture are disclosed for teacher-free self-feature distillation training of machine-learning (ML) models. An example apparatus includes at least one memory, instructions, and processor circuitry to at least one of execute or instantiate the instructions to perform a first comparison of (i) a first group of a first set of feature channels (FCs) of an ML model and (ii) a second group of the first set, perform a second comparison of (iii) a first group of a second set of FCs of the ML model and one of (iv) a third group of the first set or a first group of a third set of FCs of the ML model, adjust parameter(s) of the ML model based on the first and/or second comparisons, and, in response to an error value satisfying a threshold, deploy the ML model to execute a workload based on the parameter(s).

Type: Application

Filed: February 21, 2022

Publication date: February 27, 2025

Inventors: Yurong Chen, Anbang Yao, Yi Qian, Yu Zhang, Shandong Wang
METHOD AND APPARATUS OF SPATIALLY SPARSE CONVOLUTION MODULE FOR VISUAL RENDERING AND SYNTHESIS

Publication number: 20250061172

Abstract: Embodiments are generally directed to methods and apparatuses of spatially sparse convolution module for visual rendering and synthesis. An embodiment of a method for image processing, comprising: receiving an input image by a convolution layer of a neural network to generate a plurality of feature maps; performing spatially sparse convolution on the plurality of feature maps to generate spatially sparse feature maps; and upsampling the spatially sparse feature maps to generate an output image.

Type: Application

Filed: September 12, 2024

Publication date: February 20, 2025

Applicant: Intel Corporation

Inventors: Anbang Yao, Ming Lu, Yikai Wang, Scott Janus, Sungye Kim
PROGRAMMABLE COARSE GRAINED AND SPARSE MATRIX COMPUTE HARDWARE WITH ADVANCED SCHEDULING

Publication number: 20250061534

Abstract: One embodiment provides a parallel processor comprising a hardware scheduler to schedule pipeline commands for compute operations to one or more of multiple types of compute units, a plurality of processing resources including a first sparse compute unit configured for input at a first level of sparsity and hybrid memory circuitry including a memory controller, a memory interface, and a second sparse compute unit configured for input at a second level of sparsity that is greater than the first level of sparsity.

Type: Application

Filed: August 29, 2024

Publication date: February 20, 2025

Applicant: Intel Corporation

Inventors: Eriko Nurvitadhi, Balaji Vembu, Nicolas C. Galoppo Von Borries, Rajkishore Barik, Tsung-Han Lin, Kamal Sinha, Nadathur Rajagopalan Satish, Jeremy Bottleson, Farshad Akhbari, Altug Koker, Narayan Srinivasa, Dukhwan Kim, Sara S. Baghsorkhi, Justin E. Gottschlich, Feng Chen, Elmoustapha Ould-Ahmed-Vall, Kevin Nealis, Xiaoming Chen, Anbang Yao
Methods and apparatus for deep learning network execution pipeline on multi-processor platform

Patent number: 12229569

Abstract: Methods and systems are disclosed using an execution pipeline on a multi-processor platform for deep learning network execution. In one example, a network workload analyzer receives a workload, analyzes a computation distribution of the workload, and groups the network nodes into groups. A network executor assigns each group to a processing core of the multi-core platform so that the respective processing core handle computation tasks of the received workload for the respective group.

Type: Grant

Filed: October 27, 2023

Date of Patent: February 18, 2025

Assignee: Intel Corporation

Inventors: Liu Yang, Anbang Yao
DECIMAL-BIT NETWORK QUANTIZATION OF CONVOLUTIONAL NEURAL NETWORK MODELS

Publication number: 20250045573

Abstract: The disclosure relates to decimal-bit network quantization of CNN models.

Type: Application

Filed: March 3, 2022

Publication date: February 6, 2025

Inventors: Anbang YAO, Yikai WANG, Zhaole SUN, Yi YANG, Feng CHEN, Zhuo WANG, Shandong WANG, Yurong CHEN
DYNAMIC NEURAL NETWORK SURGERY

Publication number: 20250045582

Abstract: Techniques related to compressing a pre-trained dense deep neural network to a sparsely connected deep neural network for efficient implementation are discussed. Such techniques may include iteratively pruning and splicing available connections between adjacent layers of the deep neural network and updating weights corresponding to both currently disconnected and currently connected connections between the adjacent layers.

Type: Application

Filed: August 14, 2024

Publication date: February 6, 2025

Applicant: Intel Corporation

Inventors: Anbang Yao, Yiwen Guo, Yan Li, Yurong Chen
Methods and systems for budgeted and simplified training of deep neural networks

Patent number: 12217163

Abstract: Methods and systems for budgeted and simplified training of deep neural networks (DNNs) are disclosed. In one example, a trainer is to train a DNN using a plurality of training sub-images derived from a down-sampled training image. A tester is to test the trained DNN using a plurality of testing sub-images derived from a down-sampled testing image. In another example, in a recurrent deep Q-network (RDQN) having a local attention mechanism located between a convolutional neural network (CNN) and a long-short time memory (LSTM), a plurality of feature maps are generated by the CNN from an input image. Hard-attention is applied by the local attention mechanism to the generated plurality of feature maps by selecting a subset of the generated feature maps. Soft attention is applied by the local attention mechanism to the selected subset of generated feature maps by providing weights to the selected subset of generated feature maps in obtaining weighted feature maps.

Type: Grant

Filed: September 22, 2023

Date of Patent: February 4, 2025

Assignee: Intel Corporation

Inventors: Yiwen Guo, Yuqing Hou, Anbang Yao, Dongqi Cai, Lin Xu, Ping Hu, Shandong Wang, Wenhua Cheng, Yurong Chen, Libin Wang
Instructions and logic to perform floating point and integer operations for machine learning

Patent number: 12217053

Abstract: One embodiment provides for a graphics processing unit to accelerate machine-learning operations, the graphics processing unit comprising a multiprocessor having a single instruction, multiple thread (SIMT) architecture, the multiprocessor to execute at least one single instruction; and a first compute unit included within the multiprocessor, the at least one single instruction to cause the first compute unit to perform a two-dimensional matrix multiply and accumulate operation, wherein to perform the two-dimensional matrix multiply and accumulate operation includes to compute an intermediate product of 16-bit operands and to compute a 32-bit sum based on the intermediate product.

Type: Grant

Filed: December 4, 2023

Date of Patent: February 4, 2025

Assignee: Intel Corporation

Inventors: Himanshu Kaul, Mark A. Anders, Sanu K. Mathew, Anbang Yao, Joydeep Ray, Ping T. Tang, Michael S. Strickland, Xiaoming Chen, Tatiana Shpeisman, Abhishek R. Appu, Altug Koker, Kamal Sinha, Balaji Vembu, Nicolas C. Galoppo Von Borries, Eriko Nurvitadhi, Rajkishore Barik, Tsung-Han Lin, Vasanth Ranganathan, Sanjeev Jahagirdar
COMPUTE OPTIMIZATION MECHANISM

Publication number: 20250005703

Abstract: An apparatus to facilitate compute optimization is disclosed. The apparatus includes a mixed precision core including mixed-precision execution circuitry to execute one or more of the mixed-precision instructions to perform a mixed-precision dot-product operation comprising to perform a set of multiply and accumulate operations.

Type: Application

Filed: July 15, 2024

Publication date: January 2, 2025

Applicant: Intel Corporation

Inventors: Abhishek R. Appu, Altug Koker, Linda L. Hurd, Dukhwan Kim, Mike B. Macpherson, John C. Weast, Feng Chen, Farshad Akhbari, Narayan Srinivasa, Nadathur Rajagopalan Satish, Joydeep Ray, Ping T. Tang, Michael S. Strickland, Xiaoming Chen, Anbang Yao, Tatiana Shpeisman
Concurrent multi-datatype execution within a processing resource

Patent number: 12175252

Abstract: One embodiment provides for a graphics processing unit (GPU) to accelerate machine learning operations, the GPU comprising an instruction cache to store a first instruction and a second instruction, the first instruction to cause the GPU to perform a floating-point operation, including a multi-dimensional floating-point operation, and the second instruction to cause the GPU to perform an integer operation; and a general-purpose graphics compute unit having a single instruction, multiple thread architecture, the general-purpose graphics compute unit to concurrently execute the first instruction and the second instruction.

Type: Grant

Filed: June 14, 2022

Date of Patent: December 24, 2024

Assignee: Intel Corporation

Inventors: Elmoustapha Ould-Ahmed-Vall, Barath Lakshmanan, Tatiana Shpeisman, Joydeep Ray, Ping T. Tang, Michael Strickland, Xiaoming Chen, Anbang Yao, Ben J. Ashbaugh, Linda L. Hurd, Liwei Ma
Joint training of neural networks using multi-scale hard example mining

Patent number: 12154309

Abstract: An example apparatus for mining multi-scale hard examples includes a convolutional neural network to receive a mini-batch of sample candidates and generate basic feature maps. The apparatus also includes a feature extractor and combiner to generate concatenated feature maps based on the basic feature maps and extract the concatenated feature maps for each of a plurality of received candidate boxes. The apparatus further includes a sample scorer and miner to score the candidate samples with multi-task loss scores and select candidate samples with multi-task loss scores exceeding a threshold score.

Type: Grant

Filed: September 6, 2023

Date of Patent: November 26, 2024

Assignee: Intel Corporation

Inventors: Anbang Yao, Yun Ren, Hao Zhao, Tao Kong, Yurong Chen
Compute optimizations for low precision machine learning operations

Patent number: 12148063

Abstract: One embodiment provides a multi-chip module accelerator usable to execute tensor data processing operations a multi-chip module. The multi-chip module may include a memory stack including multiple memory dies and parallel processor circuitry communicatively coupled to the memory stack. The parallel processor circuitry may include multiprocessor cores to execute matrix multiplication and accumulate operations. The matrix multiplication and accumulate operations may include floating-point operations that are configurable to include two-dimensional matrix multiply and accumulate operations involving inputs that have differing floating-point precisions. The floating-point operations may include a first operation at a first precision and a second operation at a second precision. The first operation may include a multiply having at least one 16-bit floating-point input and the second operation may include an accumulate having a 32-bit floating-point input.

Type: Grant

Filed: October 5, 2022

Date of Patent: November 19, 2024

Assignee: Intel Corporation

Inventors: Elmoustapha Ould-Ahmed-Vall, Sara S. Baghsorkhi, Anbang Yao, Kevin Nealis, Xiaoming Chen, Altug Koker, Abhishek R. Appu, John C. Weast, Mike B. Macpherson, Dukhwan Kim, Linda L. Hurd, Ben J. Ashbaugh, Barath Lakshmanan, Liwei Ma, Joydeep Ray, Ping T. Tang, Michael S. Strickland
Instructions and logic to perform floating point and integer operations for machine learning

Patent number: 12141578

Abstract: One embodiment provides for a graphics processing unit to accelerate machine-learning operations, the graphics processing unit comprising a multiprocessor having a single instruction, multiple thread (SIMT) architecture, the multiprocessor to execute at least one single instruction; and a first compute unit included within the multiprocessor, the at least one single instruction to cause the first compute unit to perform a two-dimensional matrix multiply and accumulate operation, wherein to perform the two-dimensional matrix multiply and accumulate operation includes to compute a 32-bit intermediate product of 16-bit operands and to compute a 32-bit sum based on the 32-bit intermediate product.

Type: Grant

Filed: December 9, 2020

Date of Patent: November 12, 2024

Assignee: Intel Corporation

Inventors: Himanshu Kaul, Mark A. Anders, Sanu K. Mathew, Anbang Yao, Joydeep Ray, Ping T. Tang, Michael S. Strickland, Xiaoming Chen, Tatiana Shpeisman, Abhishek R. Appu, Altug Koker, Kamal Sinha, Balaji Vembu, Nicolas C. Galoppo Von Borries, Eriko Nurvitadhi, Rajkishore Barik, Tsung-Han Lin, Vasanth Ranganathan, Sanjeev Jahagirdar
METHODS AND APPARATUS FOR DISCRIMINATIVE SEMANTIC TRANSFER AND PHYSICS-INSPIRED OPTIMIZATION OF FEATURES IN DEEP LEARNING

Publication number: 20240370716

Abstract: Methods and apparatus for discrimitive semantic transfer and physics-inspired optimization in deep learning are disclosed. A computation training method for a convolutional neural network (CNN) includes receiving a sequence of training images in the CNN of a first stage to describe objects of a cluttered scene as a semantic segmentation mask. The semantic segmentation mask is received in a semantic segmentation network of a second stage to produce semantic features. Using weights from the first stage as feature extractors and weights from the second stage as classifiers, edges of the cluttered scene are identified using the semantic features.

Type: Application

Filed: July 11, 2024

Publication date: November 7, 2024

Inventors: Anbang YAO, Hao ZHAO, Ming LU, Yiwen GUO, Yurong CHEN
Method and apparatus of spatially sparse convolution module for visual rendering and synthesis

Patent number: 12124533

Abstract: Embodiments are generally directed to methods and apparatuses of spatially sparse convolution module for visual rendering and synthesis. An embodiment of a method for image processing, comprising: receiving an input image by a convolution layer of a neural network to generate a plurality of feature maps; performing spatially sparse convolution on the plurality of feature maps to generate spatially sparse feature maps; and upsampling the spatially sparse feature maps to generate an output image.

Type: Grant

Filed: September 23, 2021

Date of Patent: October 22, 2024

Assignee: INTEL CORPORATION

Inventors: Anbang Yao, Ming Lu, Yikai Wang, Scott Janus, Sungye Kim

1 2 3 4 5 … next