Patents by Inventor Dhawal Srivastava

Dhawal Srivastava has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Convolutional neural network optimization mechanism

Patent number: 11934934

Abstract: An apparatus to facilitate optimization of a convolutional neural network (CNN) is disclosed. The apparatus includes optimization logic to receive a CNN model having a list of instructions and including pruning logic to optimize the list of instructions by eliminating branches in the list of instructions that comprise a weight value of 0.

Type: Grant

Filed: April 17, 2017

Date of Patent: March 19, 2024

Assignee: Intel Corporation

Inventors: Liwei Ma, Elmoustapha Ould- Ahmed-Vall, Barath Lakshmanan, Ben J. Ashbaugh, Jingyi Jin, Jeremy Bottleson, Mike B. Macpherson, Kevin Nealis, Dhawal Srivastava, Joydeep Ray, Ping T. Tang, Michael S. Strickland, Xiaoming Chen, Anbang Yao, Tatiana Shpeisman, Altug Koker, Abhishek R. Appu
EFFICIENT CONVOLUTION IN MACHINE LEARNING ENVIRONMENTS

Publication number: 20230419090

Abstract: A mechanism is described for facilitating smart convolution in machine learning environments. An apparatus of embodiments, as described herein, includes one or more processors including one or more graphics processors, and detection and selection logic to detect and select input images having a plurality of geometric shapes associated with an object for which a neural network is to be trained. The apparatus further includes filter generation and storage logic (“filter logic”) to generate weights providing filters based on the plurality of geometric shapes, where the filter logic is further to sort the filters in filter groups based on common geometric shapes of the plurality of geographic shapes, and where the filter logic is further to store the filter groups in bins based on the common geometric shapes, wherein each bin corresponds to a geometric shape.

Type: Application

Filed: May 24, 2023

Publication date: December 28, 2023

Applicant: Intel Corporation

Inventor: Dhawal Srivastava
Convolutional neural network optimization mechanism

Patent number: 11727246

Abstract: Embodiments provide systems and methods which facilitate optimization of a convolutional neural network (CNN). One embodiment provides for a non-transitory machine-readable medium storing instructions that cause one or more processors to perform operations comprising processing a trained convolutional neural network (CNN) to generate a processed CNN, the trained CNN having weights in a floating-point format. Processing the trained CNN includes quantizing the weights in the floating-point format to generate weights in an integer format. Quantizing the weights includes generating a quantization table to enable non-uniform quantization of the weights and quantizing the weights from the floating-point format to the integer format using the quantization table. The operations additionally comprise performing an inference operation utilizing the processed CNN with the integer format weights.

Type: Grant

Filed: February 22, 2019

Date of Patent: August 15, 2023

Assignee: Intel Corporation

Inventors: Liwei Ma, Elmoustapha Ould-Ahmed-Vall, Barath Lakshmanan, Ben J. Ashbaugh, Jingyi Jin, Jeremy Bottleson, Mike B. Macpherson, Kevin Nealis, Dhawal Srivastava, Joydeep Ray, Ping T. Tang, Michael S. Strickland, Xiaoming Chen, Anbang Yao, Tatiana Shpeisman, Altug Koker, Abhishek R. Appu
Efficient convolution in machine learning environments

Patent number: 11710028

Abstract: A mechanism is described for facilitating smart convolution in machine learning environments. An apparatus of embodiments, as described herein, includes one or more processors including one or more graphics processors, and detection and selection logic to detect and select input images having a plurality of geometric shapes associated with an object for which a neural network is to be trained. The apparatus further includes filter generation and storage logic (“filter logic”) to generate weights providing filters based on the plurality of geometric shapes, where the filter logic is further to sort the filters in filter groups based on common geometric shapes of the plurality of geographic shapes, and where the filter logic is further to store the filter groups in bins based on the common geometric shapes, wherein each bin corresponds to a geometric shape.

Type: Grant

Filed: December 30, 2017

Date of Patent: July 25, 2023

Assignee: INTEL CORPORATION

Inventor: Dhawal Srivastava
MACHINE LEARNING ACCELERATOR MECHANISM

Publication number: 20230053289

Abstract: An apparatus to facilitate acceleration of machine learning operations is disclosed. The apparatus comprises at least one processor to perform operations to implement a neural network and accelerator logic to perform communicatively coupled to the processor to perform compute operations for the neural network.

Type: Application

Filed: June 21, 2022

Publication date: February 16, 2023

Applicant: Intel Corporation

Inventors: Amit Bleiweiss, Anavai Ramesh, Asit Mishra, Deborah Marr, Jeffrey Cook, Srinivas Sridharan, Eriko Nurvitadhi, Elmoustapha Ould-Ahmed-Vall, Dheevatsa Mudigere, Mohammad Ashraf Bhuiyan, Md Faijul Amin, Wei Wang, Dhawal Srivastava, Niharika Maheshwari
SPECIALIZED FIXED FUNCTION HARDWARE FOR EFFICIENT CONVOLUTION

Publication number: 20220391679

Abstract: One embodiment provides a graphics processor comprising an instruction cache to store an instruction and a compute block configured to perform multiply-accumulate operations in response to execution of the instruction. The compute block includes a scheduler to schedule a plurality of threads for execution of the instruction and multiply-accumulate circuitry configured to execute the instruction via the plurality of threads, wherein the multiply-accumulate circuitry includes a plurality of functional units configured to process, in parallel via the plurality of threads, a corresponding plurality of matrix elements to multiply a first matrix and a second matrix, and to multiply the first matrix and the second matrix includes to multiply data elements in a row of the first matrix by corresponding data elements in a column of the second matrix to generate a plurality of products.

Type: Application

Filed: August 11, 2022

Publication date: December 8, 2022

Applicant: Intel Corporation

Inventors: Rajkishore Barik, Elmoustapha Ould-Ahmed-Vall, Xiaoming Chen, Dhawal Srivastava, Anbang Yao, Kevin Nealis, Eriko Nurvitadhi, Sara S. Baghsorkhi, Balaji Vembu, Tatiana Shpeisman, Ping T. Tang
Parallelism in disparity map generation

Patent number: 11508079

Abstract: Input images are partitioned into non-overlapping segments perpendicular to a disparity dimension of the input images. Each segment includes a contiguous region of pixels spanning from a first edge to a second edge of the image, with the two edges parallel to the disparity dimension. In some aspects, contiguous input image segments are assigned in a “round robin” manner to a set of sub-images. Each pair of input images generates a corresponding pair of sub-image sets. Semi-global matching processes are then performed on pairs of corresponding sub-images generated from each input image. The SGM processes may be run in parallel, reducing an elapsed time to generate respective disparity sub-maps. The disparity sub-maps are then combined to provide a single disparity map of equivalent size to the original two input images.

Type: Grant

Filed: June 28, 2019

Date of Patent: November 22, 2022

Assignee: Intel Corporation

Inventors: Wei-Yu Tsai, Amit Aneja, Maciej Adam Kaminski, Dhawal Srivastava, Jayaram Puttaswamy, Mithali Shivkumar
Specialized fixed function hardware for efficient convolution

Patent number: 11475286

Abstract: One embodiment provides an apparatus comprising an instruction cache to store a plurality of instructions, a scheduler unit coupled to the instruction cache, the scheduler unit to schedule the plurality of instructions for execution, an instruction fetch and decode unit to decode the plurality of instructions to determine a set of operations to perform in response, one or more compute blocks to perform parallel multiply-accumulate operations based on the instruction fetch and decode unit decoding a first instruction of the plurality of instructions, and matrix multiplication logic to perform matrix multiplication operations based on the instruction fetch and decode unit decoding a second instruction of the plurality of instructions.

Type: Grant

Filed: December 21, 2021

Date of Patent: October 18, 2022

Assignee: Intel Corporation

Inventors: Rajkishore Barik, Elmoustapha Ould-Ahmed-Vall, Xiaoming Chen, Dhawal Srivastava, Anbang Yao, Kevin Nealis, Eriko Nurvitadhi, Sara S. Baghsorkhi, Balaji Vembu, Tatiana Shpeisman, Ping T. Tang
Machine learning accelerator mechanism

Patent number: 11373088

Abstract: An apparatus to facilitate acceleration of machine learning operations is disclosed. The apparatus comprises at least one processor to perform operations to implement a neural network and accelerator logic to perform communicatively coupled to the processor to perform compute operations for the neural network.

Type: Grant

Filed: December 30, 2017

Date of Patent: June 28, 2022

Assignee: INTEL CORPORATION

Inventors: Amit Bleiweiss, Anavai Ramesh, Asit Mishra, Deborah Marr, Jeffrey Cook, Srinivas Sridharan, Eriko Nurvitadhi, Elmoustapha Ould-Ahmed-Vall, Dheevatsa Mudigere, Mohammad Ashraf Bhuiyan, Md Faijul Amin, Wei Wang, Dhawal Srivastava, Niharika Maheshwari
SPECIALIZED FIXED FUNCTION HARDWARE FOR EFFICIENT CONVOLUTION

Publication number: 20220114430

Abstract: One embodiment provides an apparatus comprising an instruction cache to store a plurality of instructions, a scheduler unit coupled to the instruction cache, the scheduler unit to schedule the plurality of instructions for execution, an instruction fetch and decode unit to decode the plurality of instructions to determine a set of operations to perform in response, one or more compute blocks to perform parallel multiply-accumulate operations based on the instruction fetch and decode unit decoding a first instruction of the plurality of instructions, and matrix multiplication logic to perform matrix multiplication operations based on the instruction fetch and decode unit decoding a second instruction of the plurality of instructions.

Type: Application

Filed: December 21, 2021

Publication date: April 14, 2022

Applicant: Intel Corporation

Inventors: Rajkishore Barik, Elmoustapha Ould-Ahmed-Vall, Xiaoming Chen, Dhawal Srivastava, Anbang Yao, Kevin Nealis, Eriko Nurvitadhi, Sara S. Baghsorkhi, Balaji Vembu, Tatiana Shpeisman, Ping T. Tang
CONVOLUTIONAL NEURAL NETWORK OPTIMIZATION MECHANISM

Publication number: 20210397925

Abstract: A library of machine learning primitives is provided to optimize a machine learning model to improve the efficiency of inference operations. In one embodiment a trained convolutional neural network (CNN) model is processed into a trained CNN model via pruning, convolution window optimization, and quantization.

Type: Application

Filed: August 26, 2021

Publication date: December 23, 2021

Applicant: Intel Corporation

Inventors: Liwei Ma, Elmoustapha Ould-Ahmed-Vall, Barath Lakshmanan, Ben J. Ashbaugh, Jingyi Jin, Jeremy Bottleson, Mike B. Macpherson, Kevin Nealis, Dhawal Srivastava, Joydeep Ray, Ping T. Tang, Michael S. Strickland, Xiaoming Chen, Anbang Yao, Tatiana Shpeisman, Altug Koker, Abhishek R. Appu
SPECIALIZED FIXED FUNCTION HARDWARE FOR EFFICIENT CONVOLUTION

Publication number: 20210081774

Abstract: One embodiment provides for a general-purpose graphics processing unit including a scheduler to schedule multiple matrix operations for execution by a general-purpose graphics processing unit. The multiple matrix operations are determined based on a single machine learning compute instruction. The single machine learning compute instruction is a convolution instruction and the multiple matrix operations are associated with a convolution operation.

Type: Application

Filed: October 28, 2020

Publication date: March 18, 2021

Applicant: Intel Corporation

Inventors: Rajkishore Barik, Elmoustapha Ould-Ahmed-Vall, Xiaoming Chen, Dhawal Srivastava, Anbang Yao, Kevin Nealis, Eriko Nurvitadhi, Sara S. Baghsorkhi, Balaji Vembu, Tatiana Shpeisman, Ping T. Tang
Specialized fixed function hardware for efficient convolution

Patent number: 10824938

Abstract: One embodiment provides for a compute apparatus to perform machine learning operations, the apparatus comprising a decode unit to decode a single instruction into a decoded instruction, the decoded instruction to perform one or more machine learning operations, wherein the decode unit, based on parameters of the one or more machine learning operations, is to request a scheduler to schedule the one or more machine learning operations to one of an array of programmable compute units and a fixed function compute unit.

Type: Grant

Filed: April 24, 2017

Date of Patent: November 3, 2020

Assignee: Intel Corporation

Inventors: Rajkishore Barik, Elmoustapha Ould-Ahmed-Vall, Xiaoming Chen, Dhawal Srivastava, Anbang Yao, Kevin Nealis, Eriko Nurvitadhi, Sara S. Baghsorkhi, Balaji Vembu, Tatiana Shpeisman, Ping T. Tang
PARALLELISM IN DISPARITY MAP GENERATION

Publication number: 20190318494

Abstract: Input images are partitioned into non-overlapping segments perpendicular to a disparity dimension of the input images. Each segment includes a contiguous region of pixels spanning from a first edge to a second edge of the image, with the two edges parallel to the disparity dimension. In some aspects, contiguous input image segments are assigned in a “round robin” manner to a set of sub-images. Each pair of input images generates a corresponding pair of sub-image sets. Semi-global matching processes are then performed on pairs of corresponding sub-images generated from each input image. The SGM processes may be run in parallel, reducing an elapsed time to generate respective disparity sub-maps. The disparity sub-maps are then combined to provide a single disparity map of equivalent size to the original two input images.

Type: Application

Filed: June 28, 2019

Publication date: October 17, 2019

Inventors: Wei-Yu Tsai, Amit Aneja, Maciej Adam Kaminski, Dhawal Srivastava, Jayaram Puttaswamy, Mithali Shivkumar
MACHINE LEARNING ACCELERATOR MECHANISM

Publication number: 20190205737

Abstract: An apparatus to facilitate acceleration of machine learning operations is disclosed. The apparatus comprises at least one processor to perform operations to implement a neural network and accelerator logic to perform communicatively coupled to the processor to perform compute operations for the neural network.

Type: Application

Filed: December 30, 2017

Publication date: July 4, 2019

Applicant: Intel Corporation

Inventors: Amit Bleiweiss, Anavai Ramesh, Asit Mishra, Deborah Marr, Jeffrey Cook, Srinivas Sridharan, Eriko Nurvitadhi, Elmoustapha Ould-Ahmed-Vall, Dheevatsa Mudigere, Mohammad Ashraf Bhuiyan, Md Faijul Amin, Wei Wang, Dhawal Srivastava, Niharika Maheshwari
EFFICIENT CONVOLUTION IN MACHINE LEARNING ENVIRONMENTS

Publication number: 20190205747

Abstract: A mechanism is described for facilitating smart convolution in machine learning environments. An apparatus of embodiments, as described herein, includes one or more processors including one or more graphics processors, and detection and selection logic to detect and select input images having a plurality of geometric shapes associated with an object for which a neural network is to be trained. The apparatus further includes filter generation and storage logic (“filter logic”) to generate weights providing filters based on the plurality of geometric shapes, where the filter logic is further to sort the filters in filter groups based on common geometric shapes of the plurality of geographic shapes, and where the filter logic is further to store the filter groups in bins based on the common geometric shapes, wherein each bin corresponds to a geometric shape.

Type: Application

Filed: December 30, 2017

Publication date: July 4, 2019

Applicant: Intel Corporation

Inventor: DHAWAL SRIVASTAVA
COMPUTE OPTIMIZATION MECHANISM FOR DEEP NEURAL NETWORKS

Publication number: 20190205736

Abstract: An apparatus to facilitate compute optimization is disclosed. The apparatus includes a at least one processor to perform operations to implement a neural network and compute logic to accelerate neural network computations.

Type: Application

Filed: December 29, 2017

Publication date: July 4, 2019

Applicant: Intel Corporation

Inventors: Amit Bleiweiss, Abhishek Venkatesh, Gokce Keskin, John Gierach, Oguz Elibol, Tomer Bar-On, Huma Abidi, Devan Burke, Jaikrishnan Menon, Eriko Nurvitadhi, Pruthvi Gowda Thorehosur Appajigowda, Travis T. Schluessler, Dhawal Srivastava, Nishant Patel, Anil Thomas
CONVOLUTIONAL NEURAL NETWORK OPTIMIZATION MECHANISM

Publication number: 20190188554

Abstract: Embodiments provide systems and methods which facilitate optimization of a convolutional neural network (CNN). One embodiment provides for a non-transitory machine-readable medium storing instructions that cause one or more processors to perform operations comprising processing a trained convolutional neural network (CNN) to generate a processed CNN, the trained CNN having weights in a floating-point format. Processing the trained CNN includes quantizing the weights in the floating-point format to generate weights in an integer format. Quantizing the weights includes generating a quantization table to enable non-uniform quantization of the weights and quantizing the weights from the floating-point format to the integer format using the quantization table. The operations additionally comprise performing an inference operation utilizing the processed CNN with the integer format weights.

Type: Application

Filed: February 22, 2019

Publication date: June 20, 2019

Applicant: Intel Corporation

Inventors: Liwei Ma, Elmoustapha Ould-Ahmed-Vall, Barath Lakshmanan, Ben J. Ashbaugh, Jingyi Jin, Jeremy Bottleson, Mike B. Macpherson, Kevin Nealis, Dhawal Srivastava, Joydeep Ray, Ping T. Tang, Michael S. Strickland, Xiaoming Chen, Anbang Yao, Tatiana Shpeisman, Altug Koker, Abhishek R. Appu
SPECIALIZED FIXED FUNCTION HARDWARE FOR EFFICIENT CONVOLUTION

Publication number: 20180307980

Abstract: One embodiment provides for a compute apparatus to perform machine learning operations, the apparatus comprising a decode unit to decode a single instruction into a decoded instruction, the decoded instruction to perform one or more machine learning operations, wherein the decode unit, based on parameters of the one or more machine learning operations, is to request a scheduler to schedule the one or more machine learning operations to one of an array of programmable compute units and a fixed function compute unit.

Type: Application

Filed: April 24, 2017

Publication date: October 25, 2018

Applicant: Intel Corporation

Inventors: Rajkishore Barik, Elmoustapha Ould-Ahmed-Vall, Xiaoming Chen, Dhawal Srivastava, Anbang Yao, Kevin Nealis, Eriko Nurvitadhi, Sara S. Baghsorkhi, Balaji Vembu, Tatiana Shpeisman, Ping T. Tang
CONVOLUTIONAL NEURAL NETWORK OPTIMIZATION MECHANISM

Publication number: 20180300600

Abstract: An apparatus to facilitate optimization of a convolutional neural network (CNN) is disclosed. The apparatus includes optimization logic to receive a CNN model having a list of instructions and including pruning logic to optimize the list of instructions by eliminating branches in the list of instructions that comprise a weight value of 0.

Type: Application

Filed: April 17, 2017

Publication date: October 18, 2018

Applicant: Intel Corporation

Inventors: Liwei Ma, Elmoustapha Ould- Ahmed-Vall, Barath Lakshmanan, Ben J. Ashbaugh, Jingyi Jin, Jeremy Bottleson, Mike B. Macpherson, Kevin Nealis, Dhawal Srivastava, Joydeep Ray, Ping T. Tang, Michael S. Strickland, Xiaoming Chen, Anbang Yao, Tatiana Shpeisman, Altug Koker, Abhishek R. Appu