Patents by Inventor Sasikanth Avancha

Sasikanth Avancha has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

APPARATUSES, METHODS, AND SYSTEMS FOR NEURAL NETWORKS

Publication number: 20240118892

Abstract: Methods and apparatuses relating to processing neural networks are described. In one embodiment, an apparatus to process a neural network includes a plurality of fully connected layer chips coupled by an interconnect; a plurality of convolutional layer chips each coupled by an interconnect to a respective fully connected layer chip of the plurality of fully connected layer chips and each of the plurality of fully connected layer chips and the plurality of convolutional layer chips including an interconnect to couple each of a forward propagation compute intensive tile, a back propagation compute intensive tile, and a weight gradient compute intensive tile of a column of compute intensive tiles between a first memory intensive tile and a second memory intensive tile.

Type: Application

Filed: December 18, 2023

Publication date: April 11, 2024

Inventors: Swagath VENKATARAMANI, Dipankar DAS, Ashish RANJAN, Subarno BANERJEE, Sasikanth AVANCHA, Ashok JAGANNATHAN, Ajaya V. DURG, Dheemanth NAGARAJ, Bharat KAUL, Anand RAGHUNATHAN
MATRIX OPERATION OPTIMIZATION MECHANISM

Publication number: 20230289399

Abstract: An apparatus to facilitate machine learning matrix processing is disclosed. The apparatus comprises a memory to store matrix data one or more processors to execute an instruction to examine a message descriptor included in the instruction to determine a type of matrix layout manipulation operation that is to be executed, examine a message header included in the instruction having a plurality of parameters that define a two-dimensional (2D) memory surface that is to be retrieved, retrieve one or more blocks of the matrix data from the memory based on the plurality of parameters and a register file including a plurality of registers, wherein the one or more blocks of the matrix data is stored within a first set of the plurality of registers.

Type: Application

Filed: February 2, 2023

Publication date: September 14, 2023

Applicant: Intel Corporation

Inventors: Joydeep Ray, Fangwen Fu, Dhiraj D. Kalamkar, Sasikanth Avancha
Apparatuses, methods, and systems for access synchronization in a shared memory

Patent number: 11681529

Abstract: Systems, methods, and apparatuses relating to access synchronization in a shared memory are described. In one embodiment, a processor includes a decoder to decode an instruction into a decoded instruction, and an execution unit to execute the decoded instruction to: receive a first input operand of a memory address to be tracked and a second input operand of an allowed sequence of memory accesses to the memory address, and cause a block of a memory access that violates the allowed sequence of memory accesses to the memory address. In one embodiment, a circuit separate from the execution unit compares a memory address for a memory access request to one or more memory addresses in a tracking table, and blocks a memory access for the memory access request when a type of access violates a corresponding allowed sequence of memory accesses to the memory address for the memory access request.

Type: Grant

Filed: August 24, 2021

Date of Patent: June 20, 2023

Assignee: Intel Corporation

Inventors: Swagath Venkataramani, Dipankar Das, Sasikanth Avancha, Ashish Ranjan, Subarno Banerjee, Bharat Kaul, Anand Raghunathan
Instructions and logic for vector multiply add with zero skipping

Patent number: 11669329

Abstract: Embodiments described herein provide for an instruction and associated logic to enable a vector multiply add instructions with automatic zero skipping for sparse input. One embodiment provides for a general-purpose graphics processor comprising logic to perform operations comprising fetching a hardware macro instruction having a predicate mask, a repeat count, and a set of initial operands, where the initial operands include a destination operand and multiple source operands. The hardware macro instruction is configured to perform one or more multiply/add operations on input data associated with a set of matrices.

Type: Grant

Filed: April 18, 2022

Date of Patent: June 6, 2023

Assignee: Intel Corporation

Inventors: Supratim Pal, Sasikanth Avancha, Ishwar Bhati, Wei-Yu Chen, Dipankar Das, Ashutosh Garg, Chandra S. Gurram, Junjie Gu, Guei-Yuan Lueh, Subramaniam Maiyuran, Jorge E. Parra, Sudarshan Srinivasan, Varghese George
Matrix operation optimization mechanism

Patent number: 11593454

Abstract: An apparatus to facilitate machine learning matrix processing is disclosed. The apparatus comprises a memory to store matrix data one or more processors to execute an instruction to examine a message descriptor included in the instruction to determine a type of matrix layout manipulation operation that is to be executed, examine a message header included in the instruction having a plurality of parameters that define a two-dimensional (2D) memory surface that is to be retrieved, retrieve one or more blocks of the matrix data from the memory based on the plurality of parameters and a register file including a plurality of registers, wherein the one or more blocks of the matrix data is stored within a first set of the plurality of registers.

Type: Grant

Filed: June 2, 2020

Date of Patent: February 28, 2023

Assignee: Intel Corporation

Inventors: Joydeep Ray, Fangwen Fu, Dhiraj D. Kalamkar, Sasikanth Avancha
SYSTOLIC ARRAY HAVING SUPPORT FOR OUTPUT SPARSITY

Publication number: 20220413803

Abstract: A processing apparatus is described herein that includes a general-purpose parallel processing engine comprising a matrix accelerator including one or more systolic arrays, at least one of the one or more systolic arrays comprising multiple pipeline stages, each pipeline stage of the multiple pipeline stages including multiple processing elements, the multiple processing elements configured to perform processing operations on input matrix elements based on output sparsity metadata. The output sparsity metadata indicates to the multiple processing elements to bypass multiplication for a first row of elements of a second matrix and multiply a second row of elements of the second matrix with a column of matrix elements of a first matrix.

Type: Application

Filed: June 25, 2021

Publication date: December 29, 2022

Applicant: Intel Corporation

Inventors: Jorge Parra, Fangwen Fu, Subramaniam Maiyuran, Varghese George, Mike Macpherson, Supratim Pal, Chandra Gurram, Sabareesh Ganapathy, Sasikanth Avancha, Dharma Teja Vooturi, Naveen Mellempudi, Dipankar Das
INSTRUCTIONS AND LOGIC FOR VECTOR MULTIPLY ADD WITH ZERO SKIPPING

Publication number: 20220326953

Abstract: Embodiments described herein provide for an instruction and associated logic to enable a vector multiply add instructions with automatic zero skipping for sparse input. One embodiment provides for a general-purpose graphics processor comprising logic to perform operations comprising fetching a hardware macro instruction having a predicate mask, a repeat count, and a set of initial operands, where the initial operands include a destination operand and multiple source operands. The hardware macro instruction is configured to perform one or more multiply/add operations on input data associated with a set of matrices.

Type: Application

Filed: April 18, 2022

Publication date: October 13, 2022

Applicant: Intel Corporation

Inventors: Supratim Pal, Sasikanth Avancha, Ishwar Bhati, Wei-Yu Chen, Dipankar Das, Ashutosh Garg, Chandra S. Gurram, Junjie Gu, Guei-Yuan Lueh, Subramaniam Maiyuran, Jorge E. Parra, Sudarshan Srinivasan, Varghese George
Instructions and logic for vector multiply add with zero skipping

Patent number: 11314515

Abstract: Embodiments described herein provide for an instruction and associated logic to enable a vector multiply add instructions with automatic zero skipping for sparse input. One embodiment provides for a general-purpose graphics processor comprising logic to perform operations comprising fetching a hardware macro instruction having a predicate mask, a repeat count, and a set of initial operands, where the initial operands include a destination operand and multiple source operands. The hardware macro instruction is configured to perform one or more multiply/add operations on input data associated with a set of matrices.

Type: Grant

Filed: December 23, 2019

Date of Patent: April 26, 2022

Assignee: Intel Corporation

Inventors: Supratim Pal, Sasikanth Avancha, Ishwar Bhati, Wei-Yu Chen, Dipankar Das, Ashutosh Garg, Chandra S. Gurram, Junjie Gu, Guei-Yuan Lueh, Subramaniam Maiyuran, Jorge E. Parra, Sudarshan Srinivasan, Varghese George
Circuitry for low-precision deep learning

Patent number: 11275998

Abstract: The present disclosure relates generally to techniques for improving the implementation of certain operations on an integrated circuit. In particular, deep learning techniques, which may use a deep neural network (DNN) topology, may be implemented more efficiently using low-precision weights and activation values by efficiently performing down conversion of data to a lower precision and by preventing data overflow during suitable computations. Further, by more efficiently mapping multipliers to programmable logic on the integrated circuit device, the resources used by the DNN topology to perform, for example, inference tasks may be reduced, resulting in improved integrated circuit operating speeds.

Type: Grant

Filed: May 31, 2018

Date of Patent: March 15, 2022

Assignee: Intel Corporation

Inventors: Martin Langhammer, Sudarshan Srinivasan, Gregg William Baeckler, Duncan Moss, Sasikanth Avancha, Dipankar Das
APPARATUSES, METHODS, AND SYSTEMS FOR NEURAL NETWORKS

Publication number: 20220050683

Abstract: Methods and apparatuses relating to processing neural networks are described. In one embodiment, an apparatus to process a neural network includes a plurality of fully connected layer chips coupled by an interconnect; a plurality of convolutional layer chips each coupled by an interconnect to a respective fully connected layer chip of the plurality of fully connected layer chips and each of the plurality of fully connected layer chips and the plurality of convolutional layer chips including an interconnect to couple each of a forward propagation compute intensive tile, a back propagation compute intensive tile, and a weight gradient compute intensive tile of a column of compute intensive tiles between a first memory intensive tile and a second memory intensive tile.

Type: Application

Filed: October 26, 2021

Publication date: February 17, 2022

Inventors: Swagath VENKATARAMANI, Dipankar DAS, Ashish RANJAN, Subarno BANERJEE, Sasikanth AVANCHA, Ashok JAGANNATHAN, Ajaya V. DURG, Dheemanth NAGARAJ, Bharat KAUL, Anand RAGHUNATHAN
APPARATUSES, METHODS, AND SYSTEMS FOR ACCESS SYNCHRONIZATION IN A SHARED MEMORY

Publication number: 20210382719

Abstract: Systems, methods, and apparatuses relating to access synchronization in a shared memory are described. In one embodiment, a processor includes a decoder to decode an instruction into a decoded instruction, and an execution unit to execute the decoded instruction to: receive a first input operand of a memory address to be tracked and a second input operand of an allowed sequence of memory accesses to the memory address, and cause a block of a memory access that violates the allowed sequence of memory accesses to the memory address. In one embodiment, a circuit separate from the execution unit compares a memory address for a memory access request to one or more memory addresses in a tracking table, and blocks a memory access for the memory access request when a type of access violates a corresponding allowed sequence of memory accesses to the memory address for the memory access request.

Type: Application

Filed: August 24, 2021

Publication date: December 9, 2021

Inventors: Swagath VENKATARAMANI, Dipankar DAS, Sasikanth AVANCHA, Ashish RANJAN, Subarno BANERJEE, Bharat KAUL, Anand RAGHUNATHAN
MATRIX OPERATION OPTIMIZATION MECHANISM

Publication number: 20210374209

Abstract: An apparatus to facilitate machine learning matrix processing is disclosed. The apparatus comprises a memory to store matrix data one or more processors to execute an instruction to examine a message descriptor included in the instruction to determine a type of matrix layout manipulation operation that is to be executed, examine a message header included in the instruction having a plurality of parameters that define a two-dimensional (2D) memory surface that is to be retrieved, retrieve one or more blocks of the matrix data from the memory based on the plurality of parameters and a register file including a plurality of registers, wherein the one or more blocks of the matrix data is stored within a first set of the plurality of registers.

Type: Application

Filed: June 2, 2020

Publication date: December 2, 2021

Applicant: Intel Corporation

Inventors: Joydeep Ray, Fangwen Fu, Dhiraj D. Kalamkar, Sasikanth Avancha
Apparatuses, methods, and systems for access synchronization in a shared memory

Patent number: 11106464

Abstract: Systems, methods, and apparatuses relating to access synchronization in a shared memory are described. In one embodiment, a processor includes a decoder to decode an instruction into a decoded instruction, and an execution unit to execute the decoded instruction to: receive a first input operand of a memory address to be tracked and a second input operand of an allowed sequence of memory accesses to the memory address, and cause a block of a memory access that violates the allowed sequence of memory accesses to the memory address. In one embodiment, a circuit separate from the execution unit compares a memory address for a memory access request to one or more memory addresses in a tracking table, and blocks a memory access for the memory access request when a type of access violates a corresponding allowed sequence of memory accesses to the memory address for the memory access request.

Type: Grant

Filed: September 27, 2016

Date of Patent: August 31, 2021

Assignee: Intel Corporation

Inventors: Swagath Venkataramani, Dipankar Das, Sasikanth Avancha, Ashish Ranjan, Subarno Banerjee, Bharat Kaul, Anand Raghunathan
INSTRUCTIONS AND LOGIC FOR VECTOR MULTIPLY ADD WITH ZERO SKIPPING

Publication number: 20210191724

Abstract: Embodiments described herein provide for an instruction and associated logic to enable a vector multiply add instructions with automatic zero skipping for sparse input. One embodiment provides for a general-purpose graphics processor comprising logic to perform operations comprising fetching a hardware macro instruction having a predicate mask, a repeat count, and a set of initial operands, where the initial operands include a destination operand and multiple source operands. The hardware macro instruction is configured to perform one or more multiply/add operations on input data associated with a set of matrices.

Type: Application

Filed: December 23, 2019

Publication date: June 24, 2021

Applicant: Intel Corporation

Inventors: Supratim Pal, Sasikanth Avancha, Ishwar Bhati, Wei-Yu Chen, Dipankar Das, Ashutosh Garg, Chandra S. Gurram, Junjie Gu, Guei-Yuan Lueh, Subramaniam Maiyuran, Jorge E. Parra, Sudarshan Srinivasan, Varghese George
UTILIZING STRUCTURED SPARSITY IN SYSTOLIC ARRAYS

Publication number: 20210081201

Abstract: An apparatus to facilitate utilizing structured sparsity in systolic arrays is disclosed. The apparatus includes a processor comprising a systolic array to receive data from a plurality of source registers, the data comprising unpacked source data, structured source data that is packed based on sparsity, and metadata corresponding to the structured source data; identify portions of the unpacked source data to multiply with the structured source data, the portions of the unpacked source data identified based on the metadata; and output, to a destination register, a result of multiplication of the portions of the unpacked source data and the structured source data.

Type: Application

Filed: November 30, 2020

Publication date: March 18, 2021

Applicant: Intel Corporation

Inventors: Subramaniam Maiyuran, Jorge Parra, Ashutosh Garg, Chandra Gurram, Chunhui Mei, Durgesh Borkar, Shubra Marwaha, Supratim Pal, Varghese George, Wei Xiong, Yan Li, Yongsheng Liu, Dipankar Das, Sasikanth Avancha, Dharma Teja Vooturi, Naveen K. Mellempudi
APPARATUSES, METHODS, AND SYSTEMS FOR NEURAL NETWORKS

Publication number: 20190303743

Abstract: Methods and apparatuses relating to processing neural networks are described. In one embodiment, an apparatus to process a neural network includes a plurality of fully connected layer chips coupled by an interconnect; a plurality of convolutional layer chips each coupled by an interconnect to a respective fully connected layer chip of the plurality of fully connected layer chips and each of the plurality of fully connected layer chips and the plurality of convolutional layer chips including an interconnect to couple each of a forward propagation compute intensive tile, a back propagation compute intensive tile, and a weight gradient compute intensive tile of a column of compute intensive tiles between a first memory intensive tile and a second memory intensive tile.

Type: Application

Filed: September 27, 2016

Publication date: October 3, 2019

Inventors: Swagath VENKATARAMANI, Dipankar DAS, Ashish RANJAN, Subarno BANERJEE, Sasikanth AVANCHA, Ashok JAGANNATHAN, Ajaya V. DURG, Dheemanth NAGARAJ, Bharat KAUL, Anand RAGHUNATHAN
APPARATUSES, METHODS, AND SYSTEMS FOR ACCESS SYNCHRONIZATION IN A SHARED MEMORY

Publication number: 20190243651

Abstract: Systems, methods, and apparatuses relating to access synchronization in a shared memory are described. In one embodiment, a processor includes a decoder to decode an instruction into a decoded instruction, and an execution unit to execute the decoded instruction to: receive a first input operand of a memory address to be tracked and a second input operand of an allowed sequence of memory accesses to the memory address, and cause a block of a memory access that violates the allowed sequence of memory accesses to the memory address. In one embodiment, a circuit separate from the execution unit compares a memory address for a memory access request to one or more memory addresses in a tracking table, and blocks a memory access for the memory access request when a type of access violates a corresponding allowed sequence of memory accesses to the memory address for the memory access request.

Type: Application

Filed: September 27, 2016

Publication date: August 8, 2019

Applicant: Intel Corporation

Inventors: Swagath VENKATARAMANI, Dipankar DAS, Sasikanth AVANCHA, Ashish RANJAN, Subarno BANERJEE, Bharat KAUL, Anand RAGHUNATHAN
CIRCUITRY FOR LOW-PRECISION DEEP LEARNING

Publication number: 20190042939

Abstract: The present disclosure relates generally to techniques for improving the implementation of certain operations on an integrated circuit. In particular, deep learning techniques, which may use a deep neural network (DNN) topology, may be implemented more efficiently using low-precision weights and activation values by efficiently performing down conversion of data to a lower precision and by preventing data overflow during suitable computations. Further, by more efficiently mapping multipliers to programmable logic on the integrated circuit device, the resources used by the DNN topology to perform, for example, inference tasks may be reduced, resulting in improved integrated circuit operating speeds.

Type: Application

Filed: May 31, 2018

Publication date: February 7, 2019

Inventors: Martin Langhammer, Sudarshan Srinivasan, Gregg William Baeckler, Duncan Moss, Sasikanth Avancha, Dipankar Das
Delivering data from a range of input devices over a secure path to trusted services in a secure element

Patent number: 9390251

Abstract: Systems and methods of delivering data from a range of input devices may involve detecting an availability of data from an input device, wherein the input device is associated with a default input path of a mobile platform. An input device driver can be invoked in a security engine in response to the availability of the data if a hardware component in the default input path is in a secure input mode, wherein the security engine it associated with a secure input path of the mobile platform. Additionally, the input device driver may be used to retrieve the data from the input device into the security engine.

Type: Grant

Filed: July 31, 2012

Date of Patent: July 12, 2016

Assignee: Intel Corporation

Inventors: Sasikanth Avancha, Ninad Kothari, Rajesh Banginwar, Taeho Kgil
DELIVERING DATA FROM A RANGE OF INPUT DEVICES OVER A SECURE PATH TO TRUSTED SERVICES IN A SECURE ELEMENT

Publication number: 20140310799

Abstract: Systems and methods of delivering data from a range of input devices may involve detecting an availability of data from an input device, wherein the input device is associated with a default input path of a mobile platform. An input device driver can be invoked in a security engine in response to the availability of the data if a hardware component in the default input path is in a secure input mode, wherein the security engine it associated with a secure input path of the mobile platform. Additionally, the input device driver may be used to retrieve the data from the input device into the security engine.

Type: Application

Filed: July 31, 2012

Publication date: October 16, 2014

Inventors: Sasikanth Avancha, Ninad Kothari, Rajesh Banginwar, Taeho Kgil

1 2 next