Patents by Inventor Krishna N. Vinod

Krishna N. Vinod has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

METHOD AND APPARATUS FOR PERFORMING REDUCTION OPERATIONS ON A PLURALITY OF ASSOCIATED DATA ELEMENT VALUES

Publication number: 20230060900

Abstract: Embodiments detailed herein relate to reduction operations on a plurality of data element values. In one embodiment, a process comprises decoding circuitry to decode an instruction and execution circuitry to execute the decoded instruction. The instruction specifies a first input register containing a plurality of data element values, a first index register containing a plurality of indices, and an output register, where each index of the plurality of indices maps to one unique data element position of the first input register. The execution includes to identify data element values that are associated with one another based on the indices, perform one or more reduction operations on the associated data element values based on the identification, and store results of the one or more reduction operations in the output register.

Type: Application

Filed: October 4, 2022

Publication date: March 2, 2023

Inventors: Christopher J. HUGHES, Jonathan D. PEARCE, Guei-Yuan LUEH, ElMoustapha OULD-AHMED-VALL, Jorge E. PARRA, Prasoonkumar SURTI, Krishna N. VINOD, Ronen ZOHAR
METHOD AND APPARATUS FOR PERFORMING REDUCTION OPERATIONS ON A PLURALITY OF ASSOCIATED DATA ELEMENT VALUES

Publication number: 20220229661

Abstract: Embodiments detailed herein relate to reduction operations on a plurality of data element values. In one embodiment, a process comprises decoding circuitry to decode an instruction and execution circuitry to execute the decoded instruction. The instruction specifies a first input register containing a plurality of data element values, a first index register containing a plurality of indices, and an output register, where each index of the plurality of indices maps to one unique data element position of the first input register. The execution includes to identify data element values that are associated with one another based on the indices, perform one or more reduction operations on the associated data element values based on the identification, and store results of the one or more reduction operations in the output register.

Type: Application

Filed: April 4, 2022

Publication date: July 21, 2022

Inventors: Christopher J. HUGHES, Jonathan D. PEARCE, Guei-Yuan LUEH, ElMoustapha OULD-AHMED-VALL, Jorge E. PARRA, Prasoonkumar SURTI, Krishna N. VINOD, Ronen ZOHAR
Method and apparatus for performing reduction operations on a plurality of associated data element values

Patent number: 11294670

Abstract: Embodiments detailed herein relate to reduction operations on a plurality of data element values. In one embodiment, a process comprises decoding circuitry to decode an instruction and execution circuitry to execute the decoded instruction. The instruction specifies a first input register containing a plurality of data element values, a first index register containing a plurality of indices, and an output register, where each index of the plurality of indices maps to one unique data element position of the first input register. The execution includes to identify data element values that are associated with one another based on the indices, perform one or more reduction operations on the associated data element values based on the identification, and store results of the one or more reduction operations in the output register.

Type: Grant

Filed: March 27, 2019

Date of Patent: April 5, 2022

Assignee: INTEL CORPORATION

Inventors: Christopher J. Hughes, Jonathan D. Pearce, Guei-Yuan Lueh, ElMoustapha Ould-Ahmed-Vall, Jorge E. Parra, Prasoonkumar Surti, Krishna N. Vinod, Ronen Zohar
Apparatuses, methods, and systems for memory interface circuit arbitration in a configurable spatial accelerator

Patent number: 11037050

Abstract: Systems, methods, and apparatuses relating to arbitration among a plurality of memory interface circuits in a configurable spatial accelerator are described. In one embodiment, a configurable spatial accelerator (CSA) includes a plurality of processing elements; a plurality of request address file (RAF) circuits, and a circuit switched interconnect network between the plurality of processing elements and the RAF circuits. As a dataflow architecture, embodiments of CSA have a unique memory architecture where memory accesses are decoupled into an explicit request and response phase allowing pipelining through memory. Certain embodiments herein provide for improved memory sub-system design via arbitration and the improvements to arbitration discussed herein.

Type: Grant

Filed: June 29, 2019

Date of Patent: June 15, 2021

Assignee: Intel Corporation

Inventors: Krishna N. Vinod, Sujoyita Kaushikkar, Aniket S. Kakade, Kermin ChoFleming, Ping Zou, Alexey Suprun, Bhavya K. Daya
APPARATUSES, METHODS, AND SYSTEMS FOR MEMORY INTERFACE CIRCUIT ARBITRATION IN A CONFIGURABLE SPATIAL ACCELERATOR

Publication number: 20200410323

Abstract: Systems, methods, and apparatuses relating to arbitration among a plurality of memory interface circuits in a configurable spatial accelerator are described. In one embodiment, a configurable spatial accelerator (CSA) includes a plurality of processing elements; a plurality of request address file (RAF) circuits, and a circuit switched interconnect network between the plurality of processing elements and the RAF circuits. As a dataflow architecture, embodiments of CSA have a unique memory architecture where memory accesses are decoupled into an explicit request and response phase allowing pipelining through memory. Certain embodiments herein provide for improved memory sub-system design via arbitration and the improvements to arbitration discussed herein.

Type: Application

Filed: June 29, 2019

Publication date: December 31, 2020

Inventors: Krishna N. Vinod, Sujoyita Kaushikkar, Aniket S. Kakade, Kermin ChoFleming, Ping Zou, Alexey Suprun, Bhavya K. Daya
Method and apparatus for vector-matrix comparison

Patent number: 10817297

Abstract: Methods and apparatus for vector-matrix comparison are disclosed. In one embodiment, a processor comprises decoding and execution circuitry. The decoding circuitry decodes an instruction, where operands of the instruction specifies an output location to store output results, a vector of data element values, and a matrix of data element values. The execution circuitry executes the decoded instruction. The execution includes to map each of the data element values of the vector to one of consecutive rows of the matrix; for each data element value of the vector, to compare that data element value of the vector with data element values in a respective row of the matrix and obtain data element match results. The execution further includes to store the output results based on the data element match results, where each output result maps to a respective data element column position and indicates a vector match result.

Type: Grant

Filed: March 30, 2019

Date of Patent: October 27, 2020

Assignee: Intel Corporation

Inventors: Christopher J. Hughes, ElMoustapha Ould-Ahmed-Vall, Jorge E. Parra, Prasoonkumar Surti, Krishna N. Vinod, Ronen Zohar
METHOD AND APPARATUS FOR VECTOR-MATRIX COMPARISON

Publication number: 20200310804

Abstract: Methods and apparatus for vector-matrix comparison are disclosed. In one embodiment, a processor comprises decoding and execution circuitry. The decoding circuitry decodes an instruction, where operands of the instruction specifies an output location to store output results, a vector of data element values, and a matrix of data element values. The execution circuitry executes the decoded instruction. The execution includes to map each of the data element values of the vector to one of consecutive rows of the matrix; for each data element value of the vector, to compare that data element value of the vector with data element values in a respective row of the matrix and obtain data element match results. The execution further includes to store the output results based on the data element match results, where each output result maps to a respective data element column position and indicates a vector match result.

Type: Application

Filed: March 30, 2019

Publication date: October 1, 2020

Inventors: Christopher J. HUGHES, ElMoustapha OULD-AHMED-VALL, Jorge E. PARRA, Prasoonkumar SURTI, Krishna N. VINOD, Ronen ZOHAR
METHOD AND APPARATUS FOR PERFORMING REDUCTION OPERATIONS ON A PLURALITY OF DATA ELEMENT VALUES

Publication number: 20200310809

Abstract: Embodiments detailed herein relate to reduction operations on a plurality of data element values. In one embodiment, a process comprises decoding circuitry to decode an instruction and execution circuitry to execute the decoded instruction. The instruction specifies a first input register containing a plurality of data element values, a first index register containing a plurality of indices, and an output register, where each index of the plurality of indices maps to one unique data element position of the first input register. The execution includes to identify data element values that are associated with one another based on the indices, perform one or more reduction operations on the associated data element values based on the identification, and store results of the one or more reduction operations in the output register.

Type: Application

Filed: March 27, 2019

Publication date: October 1, 2020

Inventors: Christopher J. HUGHES, Jonathan D. PEARCE, Guei-Yuan LUEH, ElMoustapha OULD-AHMED-VALL, Jorge E. PARRA, Prasoonkumar SURTI, Krishna N. VINOD, Ronen ZOHAR
Method and apparatus for vector-matrix comparison

Patent number: 10782971

Abstract: Methods and apparatus for vector-matrix comparison are disclosed. In one embodiment, a processor comprises decoding and execution circuitry. The decoding circuitry decodes an instruction, where operands of the instruction specifies an output location to store output results, a vector of data element values, and a matrix of data element values. The execution circuitry executes the decoded instruction. The execution includes to map each of the data element values of the vector to one of consecutive rows of the matrix; for each data element value of the vector, to compare that data element value of the vector with data element values in a respective row of the matrix and obtain data element match results. The execution further includes to store the output results based on the data element match results, where each output result maps to a respective data element column position and indicates a vector match result.

Type: Grant

Filed: March 30, 2019

Date of Patent: September 22, 2020

Assignee: Intel Corporation

Inventors: Christopher J. Hughes, ElMoustapha Ould-Ahmed-Vall, Jorge E. Parra, Prasoonkumar Surti, Krishna N. Vinod, Ronen Zohar
Minimizing snoop traffic locally and across cores on a chip multi-core fabric

Patent number: 10102129

Abstract: A processor includes a processing core, a L1 cache comprising a first processing core and a first L1 cache comprising a first L1 cache data entry of a plurality of L1 cache data entries to store data. The processor also includes an L2 cache comprising a first L2 cache data entry of a plurality of L2 cache data entries. The first L2 cache data entry corresponds to the first L1 cache data entry and each of the plurality of L2 cache data entries are associated with a corresponding presence bit (pbit) of a plurality of pbits. Each of the plurality of pbits indicates a status of a corresponding one of the plurality of L2 cache data entries. The processor also includes a cache controller, which in response to a first request among a plurality of requests to access the data at the first L1 cache data entry, determines that a copy of the data is stored in the first L2 cache data entry; and retrieves the copy of the data from the L2 cache data entry in view of the status of the pbit.

Type: Grant

Filed: December 21, 2015

Date of Patent: October 16, 2018

Assignee: Intel Corporation

Inventors: Krishna N. Vinod, Avinash Sodani, Zainulabedin J. Aurangabadwala
Mechanism to avoid hot-L1/cold-L2 events in an inclusive L2 cache using L1 presence bits for victim selection bias

Patent number: 9836399

Abstract: A processor includes a processing core, an L1 cache, operatively coupled to the processing core, the L1 cache comprising an L1 cache entry to store a data item, an L2 cache, inclusive with respect to the L1 cache, the L2 cache comprising an L2 cache entry corresponding to the L1 cache entry, an activity flag associated with the L2 cache entry, the activity flag indicating an activity status of the L1 cache entry, and a cache controller to, in response to detecting an access operation with respect to the L1 cache entry, set the flag to an active status.

Type: Grant

Filed: March 27, 2015

Date of Patent: December 5, 2017

Assignee: Intel Corporation

Inventors: Krishna N. Vinod, Avinash Sodani, Zainulabedin Aurangabadwala
MINIMIZING SNOOP TRAFFIC LOCALLY AND ACROSS CORES ON A CHIP MULTI-CORE FABRIC

Publication number: 20170177483

Abstract: A processor includes a processing core, a L1 cache comprising a first processing core and a first L1 cache comprising a first L1 cache data entry of a plurality of L1 cache data entries to store data. The processor also includes an L2 cache comprising a first L2 cache data entry of a plurality of L2 cache data entries. The first L2 cache data entry corresponds to the first L1 cache data entry and each of the plurality of L2 cache data entries are associated with a corresponding presence bit (pbit) of a plurality of pbits. Each of the plurality of pbits indicates a status of a corresponding one of the plurality of L2 cache data entries. The processor also includes a cache controller, which in response to a first request among a plurality of requests to access the data at the first L1 cache data entry, determines that a copy of the data is stored in the first L2 cache data entry; and retrieves the copy of the data from the L2 cache data entry in view of the status of the pbit.

Type: Application

Filed: December 21, 2015

Publication date: June 22, 2017

Inventors: Krishna N. Vinod, Avinash Sodani, Zainulabedin J. Aurangabadwala
Instruction and logic for prefetcher throttling based on counts of memory accesses to data sources

Patent number: 9507596

Abstract: A processor includes a core, a prefetcher, and a prefetcher control module. The prefetcher includes logic to make speculative prefetch requests through a memory subsystem for an element for execution by the core, and logic to store prefetched elements in a cache. The prefetcher control module includes logic to determine counts of memory accesses to two types of memory and, based upon the counts and the type of memory, reduce the speculative prefetch requests of the prefetcher.

Type: Grant

Filed: August 28, 2014

Date of Patent: November 29, 2016

Assignee: Intel Corporation

Inventors: Ashok Jagannathan, Prabhat Jain, Krishna N. Vinod, Avinash Sodani
Mechanism To Avoid Hot-L1/Cold-L2 Events In An Inclusive L2 Cache Using L1 Presence Bits For Victim Selection Bias

Publication number: 20160283380

Abstract: A processor includes a processing core, an L1 cache, operatively coupled to the processing core, the L1 cache comprising an L1 cache entry to store a data item, an L2 cache, inclusive with respect to the L1 cache, the L2 cache comprising an L2 cache entry corresponding to the L1 cache entry, an activity flag associated with the L2 cache entry, the activity flag indicating an activity status of the L1 cache entry, and a cache controller to, in response to detecting an access operation with respect to the L1 cache entry, set the flag to an active status.

Type: Application

Filed: March 27, 2015

Publication date: September 29, 2016

Inventors: Krishna N. Vinod, Avinash Sodani, Zainulabedin Aurangabadwala
INSTRUCTION AND LOGIC FOR PREFETCHER THROTTLING BASED ON DATA SOURCE

Publication number: 20160062768

Abstract: A processor includes a core, a prefetcher, and a prefetcher control module. The prefetcher includes logic to make speculative prefetch requests through a memory subsystem for an element for execution by the core, and logic to store prefetched elements in a cache. The prefetcher control module includes logic to determine counts of memory accesses to two types of memory and, based upon the counts and the type of memory, reduce the speculative prefetch requests of the prefetcher.

Type: Application

Filed: August 28, 2014

Publication date: March 3, 2016

Inventors: Ashok Jagannathan, Prabhat Jain, Krishna N. Vinod, Avinash Sodani