Patents by Inventor Anbang Yao

Anbang Yao has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20250117639
    Abstract: Methods, apparatus, systems and articles of manufacture for loss-error-aware quantization of a low-bit neural network are disclosed. An example apparatus includes a network weight partitioner to partition unquantized network weights of a first network model into a first group to be quantized and a second group to be retrained. The example apparatus includes a loss calculator to process network weights to calculate a first loss. The example apparatus includes a weight quantizer to quantize the first group of network weights to generate low-bit second network weights. In the example apparatus, the loss calculator is to determine a difference between the first loss and a second loss. The example apparatus includes a weight updater to update the second group of network weights based on the difference. The example apparatus includes a network model deployer to deploy a low-bit network model including the low-bit second network weights.
    Type: Application
    Filed: September 16, 2024
    Publication date: April 10, 2025
    Applicant: Intel Corporation
    Inventors: Anbang Yao, Aojun Zhou, Kuan Wang, Hao Zhao, Yurong Chen
  • Publication number: 20250117874
    Abstract: One embodiment provides an apparatus comprising a memory stack including multiple memory dies and a parallel processor including a plurality of multiprocessors. Each multiprocessor has a single instruction, multiple thread (SIMT) architecture, the parallel processor coupled to the memory stack via one or more memory interfaces. At least one multiprocessor comprises a multiply-accumulate circuit to perform multiply-accumulate operations on matrix data in a stage of a neural network implementation to produce a result matrix comprising a plurality of matrix data elements at a first precision, precision tracking logic to evaluate metrics associated with the matrix data elements and indicate if an optimization is to be performed for representing data at a second stage of the neural network implementation, and a numerical transform unit to dynamically perform a numerical transform operation on the matrix data elements based on the indication to produce transformed matrix data elements at a second precision.
    Type: Application
    Filed: October 7, 2024
    Publication date: April 10, 2025
    Applicant: Intel Corporation
    Inventors: Elmoustapha Ould-Ahmed-Vall, Sara S. Baghsorkhi, Anbang Yao, Kevin Nealis, Xiaoming Chen, Altug Koker, Abhishek R. Appu, John C. Weast, Mike B. Macpherson, Dukhwan Kim, Linda L. Hurd, Ben J. Ashbaugh, Barath Lakshmanan, Liwei Ma, Joydeep Ray, Ping T. Tang, Michael S. Strickland
  • Publication number: 20250094170
    Abstract: One embodiment provides for a graphics processing unit to accelerate machine-learning operations, the graphics processing unit comprising a multiprocessor having a single instruction, multiple thread (SIMT) architecture, the multiprocessor to execute at least one single instruction; and a first compute unit included within the multiprocessor, the at least one single instruction to cause the first compute unit to perform a two-dimensional matrix multiply and accumulate operation, wherein to perform the two-dimensional matrix multiply and accumulate operation includes to compute a 32-bit intermediate product of 16-bit operands and to compute a 32-bit sum based on the 32-bit intermediate product.
    Type: Application
    Filed: September 30, 2024
    Publication date: March 20, 2025
    Applicant: Intel Corporation
    Inventors: Himanshu Kaul, Mark A. Anders, Sanu K. Mathew, Anbang Yao, Joydeep Ray, Ping T. Tang, Michael S. Strickland, Xiaoming Chen, Tatiana Shpeisman, Abhishek R. Appu, Altug Koker, Kamal Sinha, Balaji Vembu, Nicolas C. Galoppo Von Borries, Eriko Nurvitadhi, Rajkishore Barik, Tsung-Han Lin, Vasanth Ranganathan, Sanjeev Jahagirdar
  • Publication number: 20250068916
    Abstract: Methods, apparatus, systems, and articles of manufacture are disclosed for teacher-free self-feature distillation training of machine-learning (ML) models. An example apparatus includes at least one memory, instructions, and processor circuitry to at least one of execute or instantiate the instructions to perform a first comparison of (i) a first group of a first set of feature channels (FCs) of an ML model and (ii) a second group of the first set, perform a second comparison of (iii) a first group of a second set of FCs of the ML model and one of (iv) a third group of the first set or a first group of a third set of FCs of the ML model, adjust parameter(s) of the ML model based on the first and/or second comparisons, and, in response to an error value satisfying a threshold, deploy the ML model to execute a workload based on the parameter(s).
    Type: Application
    Filed: February 21, 2022
    Publication date: February 27, 2025
    Inventors: Yurong Chen, Anbang Yao, Yi Qian, Yu Zhang, Shandong Wang
  • Publication number: 20250068891
    Abstract: Methods, apparatus, systems and articles of manufacture (e.g., physical storage media) to implement dynamic triplet convolution for convolutional neural networks are disclosed. An example apparatus disclosed herein for a convolutional neural network is to calculate one or more scalar kernels based on an input feature map applied to a layer of the convolutional neural network, ones of the one or more scalar kernels corresponding to respective dimensions of a static multidimensional convolutional filter associated with the layer of the convolutional neural network. The disclosed example apparatus is also to scale elements of the static multidimensional convolutional filter along a first one of the dimensions based on a first one of the one or more scalar kernels corresponding to the first one of the dimensions to determine a dynamic multidimensional convolutional filter associated with the layer of the convolutional neural network.
    Type: Application
    Filed: February 18, 2022
    Publication date: February 27, 2025
    Inventors: Dongqi CAI, Anbang YAO, Chao LI, Yurong CHEN, Wenjian SHAO
  • Publication number: 20250061172
    Abstract: Embodiments are generally directed to methods and apparatuses of spatially sparse convolution module for visual rendering and synthesis. An embodiment of a method for image processing, comprising: receiving an input image by a convolution layer of a neural network to generate a plurality of feature maps; performing spatially sparse convolution on the plurality of feature maps to generate spatially sparse feature maps; and upsampling the spatially sparse feature maps to generate an output image.
    Type: Application
    Filed: September 12, 2024
    Publication date: February 20, 2025
    Applicant: Intel Corporation
    Inventors: Anbang Yao, Ming Lu, Yikai Wang, Scott Janus, Sungye Kim
  • Publication number: 20250061534
    Abstract: One embodiment provides a parallel processor comprising a hardware scheduler to schedule pipeline commands for compute operations to one or more of multiple types of compute units, a plurality of processing resources including a first sparse compute unit configured for input at a first level of sparsity and hybrid memory circuitry including a memory controller, a memory interface, and a second sparse compute unit configured for input at a second level of sparsity that is greater than the first level of sparsity.
    Type: Application
    Filed: August 29, 2024
    Publication date: February 20, 2025
    Applicant: Intel Corporation
    Inventors: Eriko Nurvitadhi, Balaji Vembu, Nicolas C. Galoppo Von Borries, Rajkishore Barik, Tsung-Han Lin, Kamal Sinha, Nadathur Rajagopalan Satish, Jeremy Bottleson, Farshad Akhbari, Altug Koker, Narayan Srinivasa, Dukhwan Kim, Sara S. Baghsorkhi, Justin E. Gottschlich, Feng Chen, Elmoustapha Ould-Ahmed-Vall, Kevin Nealis, Xiaoming Chen, Anbang Yao
  • Patent number: 12229569
    Abstract: Methods and systems are disclosed using an execution pipeline on a multi-processor platform for deep learning network execution. In one example, a network workload analyzer receives a workload, analyzes a computation distribution of the workload, and groups the network nodes into groups. A network executor assigns each group to a processing core of the multi-core platform so that the respective processing core handle computation tasks of the received workload for the respective group.
    Type: Grant
    Filed: October 27, 2023
    Date of Patent: February 18, 2025
    Assignee: Intel Corporation
    Inventors: Liu Yang, Anbang Yao
  • Publication number: 20250045573
    Abstract: The disclosure relates to decimal-bit network quantization of CNN models.
    Type: Application
    Filed: March 3, 2022
    Publication date: February 6, 2025
    Inventors: Anbang YAO, Yikai WANG, Zhaole SUN, Yi YANG, Feng CHEN, Zhuo WANG, Shandong WANG, Yurong CHEN
  • Publication number: 20250045582
    Abstract: Techniques related to compressing a pre-trained dense deep neural network to a sparsely connected deep neural network for efficient implementation are discussed. Such techniques may include iteratively pruning and splicing available connections between adjacent layers of the deep neural network and updating weights corresponding to both currently disconnected and currently connected connections between the adjacent layers.
    Type: Application
    Filed: August 14, 2024
    Publication date: February 6, 2025
    Applicant: Intel Corporation
    Inventors: Anbang Yao, Yiwen Guo, Yan Li, Yurong Chen
  • Patent number: 12217053
    Abstract: One embodiment provides for a graphics processing unit to accelerate machine-learning operations, the graphics processing unit comprising a multiprocessor having a single instruction, multiple thread (SIMT) architecture, the multiprocessor to execute at least one single instruction; and a first compute unit included within the multiprocessor, the at least one single instruction to cause the first compute unit to perform a two-dimensional matrix multiply and accumulate operation, wherein to perform the two-dimensional matrix multiply and accumulate operation includes to compute an intermediate product of 16-bit operands and to compute a 32-bit sum based on the intermediate product.
    Type: Grant
    Filed: December 4, 2023
    Date of Patent: February 4, 2025
    Assignee: Intel Corporation
    Inventors: Himanshu Kaul, Mark A. Anders, Sanu K. Mathew, Anbang Yao, Joydeep Ray, Ping T. Tang, Michael S. Strickland, Xiaoming Chen, Tatiana Shpeisman, Abhishek R. Appu, Altug Koker, Kamal Sinha, Balaji Vembu, Nicolas C. Galoppo Von Borries, Eriko Nurvitadhi, Rajkishore Barik, Tsung-Han Lin, Vasanth Ranganathan, Sanjeev Jahagirdar
  • Patent number: 12217163
    Abstract: Methods and systems for budgeted and simplified training of deep neural networks (DNNs) are disclosed. In one example, a trainer is to train a DNN using a plurality of training sub-images derived from a down-sampled training image. A tester is to test the trained DNN using a plurality of testing sub-images derived from a down-sampled testing image. In another example, in a recurrent deep Q-network (RDQN) having a local attention mechanism located between a convolutional neural network (CNN) and a long-short time memory (LSTM), a plurality of feature maps are generated by the CNN from an input image. Hard-attention is applied by the local attention mechanism to the generated plurality of feature maps by selecting a subset of the generated feature maps. Soft attention is applied by the local attention mechanism to the selected subset of generated feature maps by providing weights to the selected subset of generated feature maps in obtaining weighted feature maps.
    Type: Grant
    Filed: September 22, 2023
    Date of Patent: February 4, 2025
    Assignee: Intel Corporation
    Inventors: Yiwen Guo, Yuqing Hou, Anbang Yao, Dongqi Cai, Lin Xu, Ping Hu, Shandong Wang, Wenhua Cheng, Yurong Chen, Libin Wang
  • Publication number: 20250005703
    Abstract: An apparatus to facilitate compute optimization is disclosed. The apparatus includes a mixed precision core including mixed-precision execution circuitry to execute one or more of the mixed-precision instructions to perform a mixed-precision dot-product operation comprising to perform a set of multiply and accumulate operations.
    Type: Application
    Filed: July 15, 2024
    Publication date: January 2, 2025
    Applicant: Intel Corporation
    Inventors: Abhishek R. Appu, Altug Koker, Linda L. Hurd, Dukhwan Kim, Mike B. Macpherson, John C. Weast, Feng Chen, Farshad Akhbari, Narayan Srinivasa, Nadathur Rajagopalan Satish, Joydeep Ray, Ping T. Tang, Michael S. Strickland, Xiaoming Chen, Anbang Yao, Tatiana Shpeisman
  • Patent number: 12175252
    Abstract: One embodiment provides for a graphics processing unit (GPU) to accelerate machine learning operations, the GPU comprising an instruction cache to store a first instruction and a second instruction, the first instruction to cause the GPU to perform a floating-point operation, including a multi-dimensional floating-point operation, and the second instruction to cause the GPU to perform an integer operation; and a general-purpose graphics compute unit having a single instruction, multiple thread architecture, the general-purpose graphics compute unit to concurrently execute the first instruction and the second instruction.
    Type: Grant
    Filed: June 14, 2022
    Date of Patent: December 24, 2024
    Assignee: Intel Corporation
    Inventors: Elmoustapha Ould-Ahmed-Vall, Barath Lakshmanan, Tatiana Shpeisman, Joydeep Ray, Ping T. Tang, Michael Strickland, Xiaoming Chen, Anbang Yao, Ben J. Ashbaugh, Linda L. Hurd, Liwei Ma
  • Patent number: 12154309
    Abstract: An example apparatus for mining multi-scale hard examples includes a convolutional neural network to receive a mini-batch of sample candidates and generate basic feature maps. The apparatus also includes a feature extractor and combiner to generate concatenated feature maps based on the basic feature maps and extract the concatenated feature maps for each of a plurality of received candidate boxes. The apparatus further includes a sample scorer and miner to score the candidate samples with multi-task loss scores and select candidate samples with multi-task loss scores exceeding a threshold score.
    Type: Grant
    Filed: September 6, 2023
    Date of Patent: November 26, 2024
    Assignee: Intel Corporation
    Inventors: Anbang Yao, Yun Ren, Hao Zhao, Tao Kong, Yurong Chen
  • Patent number: 12148063
    Abstract: One embodiment provides a multi-chip module accelerator usable to execute tensor data processing operations a multi-chip module. The multi-chip module may include a memory stack including multiple memory dies and parallel processor circuitry communicatively coupled to the memory stack. The parallel processor circuitry may include multiprocessor cores to execute matrix multiplication and accumulate operations. The matrix multiplication and accumulate operations may include floating-point operations that are configurable to include two-dimensional matrix multiply and accumulate operations involving inputs that have differing floating-point precisions. The floating-point operations may include a first operation at a first precision and a second operation at a second precision. The first operation may include a multiply having at least one 16-bit floating-point input and the second operation may include an accumulate having a 32-bit floating-point input.
    Type: Grant
    Filed: October 5, 2022
    Date of Patent: November 19, 2024
    Assignee: Intel Corporation
    Inventors: Elmoustapha Ould-Ahmed-Vall, Sara S. Baghsorkhi, Anbang Yao, Kevin Nealis, Xiaoming Chen, Altug Koker, Abhishek R. Appu, John C. Weast, Mike B. Macpherson, Dukhwan Kim, Linda L. Hurd, Ben J. Ashbaugh, Barath Lakshmanan, Liwei Ma, Joydeep Ray, Ping T. Tang, Michael S. Strickland
  • Patent number: 12141578
    Abstract: One embodiment provides for a graphics processing unit to accelerate machine-learning operations, the graphics processing unit comprising a multiprocessor having a single instruction, multiple thread (SIMT) architecture, the multiprocessor to execute at least one single instruction; and a first compute unit included within the multiprocessor, the at least one single instruction to cause the first compute unit to perform a two-dimensional matrix multiply and accumulate operation, wherein to perform the two-dimensional matrix multiply and accumulate operation includes to compute a 32-bit intermediate product of 16-bit operands and to compute a 32-bit sum based on the 32-bit intermediate product.
    Type: Grant
    Filed: December 9, 2020
    Date of Patent: November 12, 2024
    Assignee: Intel Corporation
    Inventors: Himanshu Kaul, Mark A. Anders, Sanu K. Mathew, Anbang Yao, Joydeep Ray, Ping T. Tang, Michael S. Strickland, Xiaoming Chen, Tatiana Shpeisman, Abhishek R. Appu, Altug Koker, Kamal Sinha, Balaji Vembu, Nicolas C. Galoppo Von Borries, Eriko Nurvitadhi, Rajkishore Barik, Tsung-Han Lin, Vasanth Ranganathan, Sanjeev Jahagirdar
  • Publication number: 20240370716
    Abstract: Methods and apparatus for discrimitive semantic transfer and physics-inspired optimization in deep learning are disclosed. A computation training method for a convolutional neural network (CNN) includes receiving a sequence of training images in the CNN of a first stage to describe objects of a cluttered scene as a semantic segmentation mask. The semantic segmentation mask is received in a semantic segmentation network of a second stage to produce semantic features. Using weights from the first stage as feature extractors and weights from the second stage as classifiers, edges of the cluttered scene are identified using the semantic features.
    Type: Application
    Filed: July 11, 2024
    Publication date: November 7, 2024
    Inventors: Anbang YAO, Hao ZHAO, Ming LU, Yiwen GUO, Yurong CHEN
  • Patent number: 12124533
    Abstract: Embodiments are generally directed to methods and apparatuses of spatially sparse convolution module for visual rendering and synthesis. An embodiment of a method for image processing, comprising: receiving an input image by a convolution layer of a neural network to generate a plurality of feature maps; performing spatially sparse convolution on the plurality of feature maps to generate spatially sparse feature maps; and upsampling the spatially sparse feature maps to generate an output image.
    Type: Grant
    Filed: September 23, 2021
    Date of Patent: October 22, 2024
    Assignee: INTEL CORPORATION
    Inventors: Anbang Yao, Ming Lu, Yikai Wang, Scott Janus, Sungye Kim
  • Patent number: 12112397
    Abstract: One embodiment provides a parallel processor comprising a hardware scheduler to schedule pipeline commands for compute operations to one or more of multiple types of compute units, a plurality of processing resources including a first sparse compute unit configured for input at a first level of sparsity and hybrid memory circuitry including a memory controller, a memory interface, and a second sparse compute unit configured for input at a second level of sparsity that is greater than the first level of sparsity.
    Type: Grant
    Filed: June 14, 2023
    Date of Patent: October 8, 2024
    Assignee: Intel Corporation
    Inventors: Eriko Nurvitadhi, Balaji Vembu, Nicolas C. Galoppo Von Borries, Rajkishore Barik, Tsung-Han Lin, Kamal Sinha, Nadathur Rajagopalan Satish, Jeremy Bottleson, Farshad Akhbari, Altug Koker, Narayan Srinivasa, Dukhwan Kim, Sara S. Baghsorkhi, Justin E. Gottschlich, Feng Chen, Elmoustapha Ould-Ahmed-Vall, Kevin Nealis, Xiaoming Chen, Anbang Yao