Patents by Inventor Anbang Yao
Anbang Yao has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20250238895Abstract: The application relates to a multi-exit visual synthesis network (VSN) based on dynamic patch computing. A method for visual synthesis is provided and includes: splitting an input image into multiple input patches; performing a synthesis process on each input patch with a first layer to an ith exit layer of a multi-exit VSN to obtain an ith intermediate synthesis patch, where i is an index of an intermediate exit of the VSN and predetermined as an integer greater than or equal to 1; predicting an incremental improvement of a (i+1)th intermediate synthesis patch relative to the ith intermediate synthesis patch based on features in the ith intermediate synthesis patch; determining a final exit of the VSN and a final synthesis patch for the input patch based on the predicted incremental improvement; and merging respective final synthesis patches for the multiple input patches to generate an output image.Type: ApplicationFiled: May 6, 2022Publication date: July 24, 2025Inventors: Ming Lu, Anbang Yao, Yanjie Pan, Li Xu, Yurong Chen
-
Publication number: 20250231770Abstract: Methods and systems are disclosed using an execution pipeline on a multi-processor platform for deep learning network execution. In one example, a network workload analyzer receives a workload, analyzes a computation distribution of the workload, and groups the network nodes into groups. A network executor assigns each group to a processing core of the multi-core platform so that the respective processing core handle computation tasks of the received workload for the respective group.Type: ApplicationFiled: December 24, 2024Publication date: July 17, 2025Inventors: Liu Yang, Anbang YAO
-
Patent number: 12346798Abstract: In an example, an apparatus comprises a compute engine comprising a high precision component and a low precision component; and logic, at least partially including hardware logic, to receive instructions in the compute engine; select at least one of the high precision component or the low precision component to execute the instructions; and apply a gate to at least one of the high precision component or the low precision component to execute the instructions. Other embodiments are also disclosed and claimed.Type: GrantFiled: July 12, 2023Date of Patent: July 1, 2025Assignee: INTEL CORPORATIONInventors: Kamal Sinha, Balaji Vembu, Eriko Nurvitadhi, Nicolas C. Galoppo Von Borries, Rajkishore Barik, Tsung-Han Lin, Joydeep Ray, Ping T. Tang, Michael S. Strickland, Xiaoming Chen, Anbang Yao, Tatiana Shpeisman, Abhishek R. Appu, Altug Koker, Farshad Akhbari, Narayan Srinivasa, Feng Chen, Dukhwan Kim, Nadathur Rajagopalan Satish, John C. Weast, Mike B. MacPherson, Linda L. Hurd, Vasanth Ranganathan, Sanjeev Jahagirdar
-
Patent number: 12322156Abstract: Techniques related to implementing and training image classification networks are discussed. Such techniques include applying shared convolutional layers to input images regardless of resolution and applying normalization selectively based on the input image resolution. Such techniques further include training using mixed image size parallel training and mixed image size ensemble distillation.Type: GrantFiled: June 15, 2020Date of Patent: June 3, 2025Assignee: Intel CorporationInventors: Anbang Yao, Yikai Wang, Ming Lu, Shandong Wang, Feng Chen
-
Patent number: 12299927Abstract: Apparatus and methods for three-dimensional pose estimation are disclosed herein. An example apparatus includes an image synchronizer to synchronize a first image generated by a first image capture device and a second image generated by a second image capture device, the first image and the second image including a subject; a two-dimensional pose detector to predict first positions of keypoints of the subject based on the first image and by executing a first neural network model to generate first two-dimensional data and predict second positions of the keypoints based on the second image and by executing the first neural network model to generate second two-dimensional data; and a three-dimensional pose calculator to generate a three-dimensional graphical model representing a pose of the subject in the first image and the second image based on the first two-dimensional data, the second two-dimensional data, and by executing a second neural network model.Type: GrantFiled: June 26, 2020Date of Patent: May 13, 2025Assignee: Intel CorporationInventors: Shandong Wang, Yangyuxuan Kang, Anbang Yao, Ming Lu, Yurong Chen
-
Publication number: 20250148761Abstract: The disclosure provides an apparatus, method, device and medium for 3D dynamic sparse convolution. The method includes: receiving an input feature map of a 3D data sample; performing input feature map partition to divide the input feature map into a plurality of disjoint input feature map groups; performing a shared 3D dynamic sparse convolution to the plurality of disjoint input feature map groups respectively to obtain a plurality of output feature maps corresponding to the plurality of disjoint input feature map groups, wherein the shared 3D dynamic sparse convolution comprises a shared 3D dynamic sparse convolutional kernel; and performing output feature map grouping to sequentially stack the plurality of output feature maps to obtain an output feature map corresponding to the input feature map. (FIG. 2).Type: ApplicationFiled: March 3, 2022Publication date: May 8, 2025Inventors: Dongqi CAI, Anbang YAO, Chao LI, Shandong WANG, Yurong CHEN
-
Publication number: 20250117874Abstract: One embodiment provides an apparatus comprising a memory stack including multiple memory dies and a parallel processor including a plurality of multiprocessors. Each multiprocessor has a single instruction, multiple thread (SIMT) architecture, the parallel processor coupled to the memory stack via one or more memory interfaces. At least one multiprocessor comprises a multiply-accumulate circuit to perform multiply-accumulate operations on matrix data in a stage of a neural network implementation to produce a result matrix comprising a plurality of matrix data elements at a first precision, precision tracking logic to evaluate metrics associated with the matrix data elements and indicate if an optimization is to be performed for representing data at a second stage of the neural network implementation, and a numerical transform unit to dynamically perform a numerical transform operation on the matrix data elements based on the indication to produce transformed matrix data elements at a second precision.Type: ApplicationFiled: October 7, 2024Publication date: April 10, 2025Applicant: Intel CorporationInventors: Elmoustapha Ould-Ahmed-Vall, Sara S. Baghsorkhi, Anbang Yao, Kevin Nealis, Xiaoming Chen, Altug Koker, Abhishek R. Appu, John C. Weast, Mike B. Macpherson, Dukhwan Kim, Linda L. Hurd, Ben J. Ashbaugh, Barath Lakshmanan, Liwei Ma, Joydeep Ray, Ping T. Tang, Michael S. Strickland
-
Publication number: 20250117639Abstract: Methods, apparatus, systems and articles of manufacture for loss-error-aware quantization of a low-bit neural network are disclosed. An example apparatus includes a network weight partitioner to partition unquantized network weights of a first network model into a first group to be quantized and a second group to be retrained. The example apparatus includes a loss calculator to process network weights to calculate a first loss. The example apparatus includes a weight quantizer to quantize the first group of network weights to generate low-bit second network weights. In the example apparatus, the loss calculator is to determine a difference between the first loss and a second loss. The example apparatus includes a weight updater to update the second group of network weights based on the difference. The example apparatus includes a network model deployer to deploy a low-bit network model including the low-bit second network weights.Type: ApplicationFiled: September 16, 2024Publication date: April 10, 2025Applicant: Intel CorporationInventors: Anbang Yao, Aojun Zhou, Kuan Wang, Hao Zhao, Yurong Chen
-
Publication number: 20250094170Abstract: One embodiment provides for a graphics processing unit to accelerate machine-learning operations, the graphics processing unit comprising a multiprocessor having a single instruction, multiple thread (SIMT) architecture, the multiprocessor to execute at least one single instruction; and a first compute unit included within the multiprocessor, the at least one single instruction to cause the first compute unit to perform a two-dimensional matrix multiply and accumulate operation, wherein to perform the two-dimensional matrix multiply and accumulate operation includes to compute a 32-bit intermediate product of 16-bit operands and to compute a 32-bit sum based on the 32-bit intermediate product.Type: ApplicationFiled: September 30, 2024Publication date: March 20, 2025Applicant: Intel CorporationInventors: Himanshu Kaul, Mark A. Anders, Sanu K. Mathew, Anbang Yao, Joydeep Ray, Ping T. Tang, Michael S. Strickland, Xiaoming Chen, Tatiana Shpeisman, Abhishek R. Appu, Altug Koker, Kamal Sinha, Balaji Vembu, Nicolas C. Galoppo Von Borries, Eriko Nurvitadhi, Rajkishore Barik, Tsung-Han Lin, Vasanth Ranganathan, Sanjeev Jahagirdar
-
Publication number: 20250068891Abstract: Methods, apparatus, systems and articles of manufacture (e.g., physical storage media) to implement dynamic triplet convolution for convolutional neural networks are disclosed. An example apparatus disclosed herein for a convolutional neural network is to calculate one or more scalar kernels based on an input feature map applied to a layer of the convolutional neural network, ones of the one or more scalar kernels corresponding to respective dimensions of a static multidimensional convolutional filter associated with the layer of the convolutional neural network. The disclosed example apparatus is also to scale elements of the static multidimensional convolutional filter along a first one of the dimensions based on a first one of the one or more scalar kernels corresponding to the first one of the dimensions to determine a dynamic multidimensional convolutional filter associated with the layer of the convolutional neural network.Type: ApplicationFiled: February 18, 2022Publication date: February 27, 2025Inventors: Dongqi CAI, Anbang YAO, Chao LI, Yurong CHEN, Wenjian SHAO
-
Publication number: 20250068916Abstract: Methods, apparatus, systems, and articles of manufacture are disclosed for teacher-free self-feature distillation training of machine-learning (ML) models. An example apparatus includes at least one memory, instructions, and processor circuitry to at least one of execute or instantiate the instructions to perform a first comparison of (i) a first group of a first set of feature channels (FCs) of an ML model and (ii) a second group of the first set, perform a second comparison of (iii) a first group of a second set of FCs of the ML model and one of (iv) a third group of the first set or a first group of a third set of FCs of the ML model, adjust parameter(s) of the ML model based on the first and/or second comparisons, and, in response to an error value satisfying a threshold, deploy the ML model to execute a workload based on the parameter(s).Type: ApplicationFiled: February 21, 2022Publication date: February 27, 2025Inventors: Yurong Chen, Anbang Yao, Yi Qian, Yu Zhang, Shandong Wang
-
Publication number: 20250061534Abstract: One embodiment provides a parallel processor comprising a hardware scheduler to schedule pipeline commands for compute operations to one or more of multiple types of compute units, a plurality of processing resources including a first sparse compute unit configured for input at a first level of sparsity and hybrid memory circuitry including a memory controller, a memory interface, and a second sparse compute unit configured for input at a second level of sparsity that is greater than the first level of sparsity.Type: ApplicationFiled: August 29, 2024Publication date: February 20, 2025Applicant: Intel CorporationInventors: Eriko Nurvitadhi, Balaji Vembu, Nicolas C. Galoppo Von Borries, Rajkishore Barik, Tsung-Han Lin, Kamal Sinha, Nadathur Rajagopalan Satish, Jeremy Bottleson, Farshad Akhbari, Altug Koker, Narayan Srinivasa, Dukhwan Kim, Sara S. Baghsorkhi, Justin E. Gottschlich, Feng Chen, Elmoustapha Ould-Ahmed-Vall, Kevin Nealis, Xiaoming Chen, Anbang Yao
-
Publication number: 20250061172Abstract: Embodiments are generally directed to methods and apparatuses of spatially sparse convolution module for visual rendering and synthesis. An embodiment of a method for image processing, comprising: receiving an input image by a convolution layer of a neural network to generate a plurality of feature maps; performing spatially sparse convolution on the plurality of feature maps to generate spatially sparse feature maps; and upsampling the spatially sparse feature maps to generate an output image.Type: ApplicationFiled: September 12, 2024Publication date: February 20, 2025Applicant: Intel CorporationInventors: Anbang Yao, Ming Lu, Yikai Wang, Scott Janus, Sungye Kim
-
Patent number: 12229569Abstract: Methods and systems are disclosed using an execution pipeline on a multi-processor platform for deep learning network execution. In one example, a network workload analyzer receives a workload, analyzes a computation distribution of the workload, and groups the network nodes into groups. A network executor assigns each group to a processing core of the multi-core platform so that the respective processing core handle computation tasks of the received workload for the respective group.Type: GrantFiled: October 27, 2023Date of Patent: February 18, 2025Assignee: Intel CorporationInventors: Liu Yang, Anbang Yao
-
Publication number: 20250045573Abstract: The disclosure relates to decimal-bit network quantization of CNN models.Type: ApplicationFiled: March 3, 2022Publication date: February 6, 2025Inventors: Anbang YAO, Yikai WANG, Zhaole SUN, Yi YANG, Feng CHEN, Zhuo WANG, Shandong WANG, Yurong CHEN
-
Publication number: 20250045582Abstract: Techniques related to compressing a pre-trained dense deep neural network to a sparsely connected deep neural network for efficient implementation are discussed. Such techniques may include iteratively pruning and splicing available connections between adjacent layers of the deep neural network and updating weights corresponding to both currently disconnected and currently connected connections between the adjacent layers.Type: ApplicationFiled: August 14, 2024Publication date: February 6, 2025Applicant: Intel CorporationInventors: Anbang Yao, Yiwen Guo, Yan Li, Yurong Chen
-
Patent number: 12217053Abstract: One embodiment provides for a graphics processing unit to accelerate machine-learning operations, the graphics processing unit comprising a multiprocessor having a single instruction, multiple thread (SIMT) architecture, the multiprocessor to execute at least one single instruction; and a first compute unit included within the multiprocessor, the at least one single instruction to cause the first compute unit to perform a two-dimensional matrix multiply and accumulate operation, wherein to perform the two-dimensional matrix multiply and accumulate operation includes to compute an intermediate product of 16-bit operands and to compute a 32-bit sum based on the intermediate product.Type: GrantFiled: December 4, 2023Date of Patent: February 4, 2025Assignee: Intel CorporationInventors: Himanshu Kaul, Mark A. Anders, Sanu K. Mathew, Anbang Yao, Joydeep Ray, Ping T. Tang, Michael S. Strickland, Xiaoming Chen, Tatiana Shpeisman, Abhishek R. Appu, Altug Koker, Kamal Sinha, Balaji Vembu, Nicolas C. Galoppo Von Borries, Eriko Nurvitadhi, Rajkishore Barik, Tsung-Han Lin, Vasanth Ranganathan, Sanjeev Jahagirdar
-
Patent number: 12217163Abstract: Methods and systems for budgeted and simplified training of deep neural networks (DNNs) are disclosed. In one example, a trainer is to train a DNN using a plurality of training sub-images derived from a down-sampled training image. A tester is to test the trained DNN using a plurality of testing sub-images derived from a down-sampled testing image. In another example, in a recurrent deep Q-network (RDQN) having a local attention mechanism located between a convolutional neural network (CNN) and a long-short time memory (LSTM), a plurality of feature maps are generated by the CNN from an input image. Hard-attention is applied by the local attention mechanism to the generated plurality of feature maps by selecting a subset of the generated feature maps. Soft attention is applied by the local attention mechanism to the selected subset of generated feature maps by providing weights to the selected subset of generated feature maps in obtaining weighted feature maps.Type: GrantFiled: September 22, 2023Date of Patent: February 4, 2025Assignee: Intel CorporationInventors: Yiwen Guo, Yuqing Hou, Anbang Yao, Dongqi Cai, Lin Xu, Ping Hu, Shandong Wang, Wenhua Cheng, Yurong Chen, Libin Wang
-
Publication number: 20250005703Abstract: An apparatus to facilitate compute optimization is disclosed. The apparatus includes a mixed precision core including mixed-precision execution circuitry to execute one or more of the mixed-precision instructions to perform a mixed-precision dot-product operation comprising to perform a set of multiply and accumulate operations.Type: ApplicationFiled: July 15, 2024Publication date: January 2, 2025Applicant: Intel CorporationInventors: Abhishek R. Appu, Altug Koker, Linda L. Hurd, Dukhwan Kim, Mike B. Macpherson, John C. Weast, Feng Chen, Farshad Akhbari, Narayan Srinivasa, Nadathur Rajagopalan Satish, Joydeep Ray, Ping T. Tang, Michael S. Strickland, Xiaoming Chen, Anbang Yao, Tatiana Shpeisman
-
Patent number: 12175252Abstract: One embodiment provides for a graphics processing unit (GPU) to accelerate machine learning operations, the GPU comprising an instruction cache to store a first instruction and a second instruction, the first instruction to cause the GPU to perform a floating-point operation, including a multi-dimensional floating-point operation, and the second instruction to cause the GPU to perform an integer operation; and a general-purpose graphics compute unit having a single instruction, multiple thread architecture, the general-purpose graphics compute unit to concurrently execute the first instruction and the second instruction.Type: GrantFiled: June 14, 2022Date of Patent: December 24, 2024Assignee: Intel CorporationInventors: Elmoustapha Ould-Ahmed-Vall, Barath Lakshmanan, Tatiana Shpeisman, Joydeep Ray, Ping T. Tang, Michael Strickland, Xiaoming Chen, Anbang Yao, Ben J. Ashbaugh, Linda L. Hurd, Liwei Ma