Patents by Inventor Rakesh Krishnaiyer

Rakesh Krishnaiyer has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Methods and apparatus to perform automatic compiler optimization to enable streaming-store generation for unaligned contiguous write access

Patent number: 12032934

Abstract: Methods, apparatus, systems and articles of manufacture (e.g., computer readable storage media) to perform automatic compiler optimization to enable streaming-store generation for unaligned contiguous write access are disclosed. Example apparatus disclosed herein are to mark a store instruction in source program code as a transformation candidate when the store instruction is associated with a group of memory accesses that are unaligned with respect to a size of a cache line in a cache. Disclosed apparatus are also to transform the store instruction that is marked as the transformation candidate to form transformed program code when a non-temporal property is satisfied, the transformed program code to replace the store instruction with (i) a write to a buffer in the cache and (ii) a streaming-store instruction that is to write contents of the buffer to memory.

Type: Grant

Filed: September 23, 2021

Date of Patent: July 9, 2024

Assignee: Intel Corporation

Inventors: Charles Yount, Rakesh Krishnaiyer, Timothy Creech, Daniel Woodworth, Joshua Cranmer
Automatic compiler dataflow optimization to enable pipelining of loops with local storage requirements

Patent number: 11366647

Abstract: Systems, apparatuses and methods may provide for technology that detects one or more local variables in source code, wherein the local variable(s) lack dependencies across iterations of a loop in the source code, automatically generate pipeline execution code for the local variable(s), and incorporate the pipeline execution code into an output of a compiler. In one example, the pipeline execution code includes an initialization of a pool of buffer storage for the local variable(s).

Type: Grant

Filed: April 30, 2020

Date of Patent: June 21, 2022

Assignee: Intel Corporation

Inventors: Rajiv Deodhar, Sergey Dmitriev, Daniel Woodworth, Rakesh Krishnaiyer, Kent Glossop, Arvind Sudarsanam
Nested loops reversal enhancements

Patent number: 11256489

Abstract: Systems, apparatuses and methods may provide for technology to identify in user code, a nested loop which would result in cache memory misses when executed. The technology further reverses an order of iterations of a first inner loop in the nested loop to obtain a modified nested loop. Reversing the order of iterations increases a number of times that cache memory hits occur when the modified nested loop is executed.

Type: Grant

Filed: September 22, 2017

Date of Patent: February 22, 2022

Assignee: Intel Corporation

Inventors: Gautam Doshi, Rakesh Krishnaiyer, Rama Kishan Malladi
METHODS AND APPARATUS TO PERFORM AUTOMATIC COMPILER OPTIMIZATION TO ENABLE STREAMING-STORE GENERATION FOR UNALIGNED CONTIGUOUS WRITE ACCESS

Publication number: 20220012028

Abstract: Methods, apparatus, systems and articles of manufacture (e.g., computer readable storage media) to perform automatic compiler optimization to enable streaming-store generation for unaligned contiguous write access are disclosed. Example apparatus disclosed herein are to mark a store instruction in source program code as a transformation candidate when the store instruction is associated with a group of memory accesses that are unaligned with respect to a size of a cache line in a cache. Disclosed apparatus are also to transform the store instruction that is marked as the transformation candidate to form transformed program code when a non-temporal property is satisfied, the transformed program code to replace the store instruction with (i) a write to a buffer in the cache and (ii) a streaming-store instruction that is to write contents of the buffer to memory.

Type: Application

Filed: September 23, 2021

Publication date: January 13, 2022

Inventors: Charles Yount, Rakesh Krishnaiyer, Timothy Creech, Daniel Woodworth, Joshua Cranmer
Integration of automated complier dataflow optimizations

Patent number: 11106438

Abstract: Various embodiments are generally directed to optimizing dataflow in automated transformation frameworks (e.g., compiler, runtime, etc.) for spatial architectures (e.g., Configurable Spatial Accelerator) that translate high-level user code into forms that use “streams” (e.g., Latency Insensitive Channels, line buffers) to reduce overhead, eliminate or improve the efficiency of redundant memory accesses, and improve overall throughput.

Type: Grant

Filed: March 27, 2020

Date of Patent: August 31, 2021

Assignee: INTEL CORPORATION

Inventors: Dounia Khaldi, Rakesh Krishnaiyer, Rajiv Deodhar, Daniel Woodworth, Joshua Cranmer, Kent Glossop
LOOP NEST REVERSAL

Publication number: 20200371763

Abstract: Systems, apparatuses and methods may provide for technology to identify in user code, a nested loop which would result in cache memory misses when executed. The technology further reverses an order of iterations of a first inner loop in the nested loop to obtain a modified nested loop. Reversing the order of iterations increases a number of times that cache memory hits occur when the modified nested loop is executed.

Type: Application

Filed: September 22, 2017

Publication date: November 26, 2020

Inventors: Gautam Doshi, Rakesh Krishnaiyer, Rama Kishan Malladi
AUTOMATIC COMPILER DATAFLOW OPTIMIZATION TO ENABLE PIPELINING OF LOOPS WITH LOCAL STORAGE REQUIREMENTS

Publication number: 20200257510

Abstract: Systems, apparatuses and methods may provide for technology that detects one or more local variables in source code, wherein the local variable(s) lack dependencies across iterations of a loop in the source code, automatically generate pipeline execution code for the local variable(s), and incorporate the pipeline execution code into an output of a compiler. In one example, the pipeline execution code includes an initialization of a pool of buffer storage for the local variable(s).

Type: Application

Filed: April 30, 2020

Publication date: August 13, 2020

Inventors: Rajiv Deodhar, Sergey Dmitriev, Daniel Woodworth, Rakesh Krishnaiyer, Kent Glossop, Arvind Sudarsanam
Compiler transformation with loop and data partitioning

Patent number: 10628141

Abstract: Logic may transform a target code to partition data automatically and/or autonomously based on a memory constraint associated with a resource such as a target device. Logic may identify a tag in the code to identify a task, wherein the task comprises at least one loop, the loop to process data elements in one or more arrays. Logic may automatically generate instructions to determine one or more partitions for the at least one loop to partition data elements, accessed by one or more memory access instructions for the one or more arrays within the at least one loop, based on a memory constraint, the memory constraint to identify an amount of memory available for allocation to process the task. Logic may determine one or more iteration space blocks for the parallel loops, determine memory windows for each block, copy data into and out of constrained memory, and transform array accesses.

Type: Grant

Filed: May 7, 2018

Date of Patent: April 21, 2020

Assignee: INTEL CORPORATION

Inventors: Rakesh Krishnaiyer, Konstantin Bobrovskii, Dmitry Budanov
Methods and apparatus to eliminate partial-redundant vector loads

Patent number: 10268454

Abstract: Methods, apparatus, systems and articles of manufacture are disclosed to eliminate partial-redundant vector load operations. An example apparatus includes a node grouper to associate a vector operation with a node group, a candidate verifier to perform a dependencies test on a subset of the node group, and identify a subset of the node group as a candidate when the subset satisfies the dependencies test, and a code optimizer to determine replacement code based on a characteristic of the candidate in the node group and compare an estimated cost associated with executing the replacement code to a threshold. The example apparatus also includes a code generator to generate machine code using the replacement code when the estimated cost of executing the replacement code satisfies the threshold.

Type: Grant

Filed: September 25, 2017

Date of Patent: April 23, 2019

Assignee: Intel Corporation

Inventors: Farhana Aleen Schuchman, David L. Kreitzer, Rakesh Krishnaiyer, Vyacheslav Pavlovich Zakharin, Sergey Preis, Leonardo Jose Borges, Philippe Thierry
COMPILER TRANSFORMATION WITH LOOP AND DATA PARTITIONING

Publication number: 20190042221

Abstract: Logic may transform a target code to partition data automatically and/or autonomously based on a memory constraint associated with a resource such as a target device. Logic may identify a tag in the code to identify a task, wherein the task comprises at least one loop, the loop to process data elements in one or more arrays. Logic may automatically generate instructions to determine one or more partitions for the at least one loop to partition data elements, accessed by one or more memory access instructions for the one or more arrays within the at least one loop, based on a memory constraint, the memory constraint to identify an amount of memory available for allocation to process the task. Logic may determine one or more iteration space blocks for the parallel loops, determine memory windows for each block, copy data into and out of constrained memory, and transform array accesses.

Type: Application

Filed: May 7, 2018

Publication date: February 7, 2019

Applicant: INTEL CORPORATION

Inventors: Rakesh Krishnaiyer, Konstantin Bobrovskii, Dmitry Budanov
Employing prefetch to reduce write overhead

Patent number: 9921966

Abstract: The present application is directed to employing prefetch to reduce write overhead. A device may comprise a processor and a cache memory. The processor may determine if data to be written to the cache memory comprises multiple cache lines wherein at least one of the cache lines will be fully written. If the data comprises at least one cache line to be fully written, then the processor may perform a “prefetch” wherein the processor may write dummy data to sections of the cache memory corresponding to the data to be written in full cache lines. The processor may then write actual data to the sections containing the dummy data without the processor first having to verify ownership of the sections. Any remaining data that will not be written in full cache lines may then be written to the cache memory utilizing a standard write transaction.

Type: Grant

Filed: May 9, 2014

Date of Patent: March 20, 2018

Assignee: INTEL CORPORATION

Inventors: Rakesh Krishnaiyer, Serge Preis, Hideki Ido, Anatoly Zvezdin
METHODS AND APPARATUS TO ELIMINATE PARTIAL-REDUNDANT VECTOR LOADS

Publication number: 20180011693

Abstract: Methods, apparatus, systems and articles of manufacture are disclosed to eliminate partial-redundant vector load operations. An example apparatus includes a node grouper to associate a vector operation with a node group, a candidate verifier to perform a dependencies test on a subset of the node group, and identify a subset of the node group as a candidate when the subset satisfies the dependencies test, and a code optimizer to determine replacement code based on a characteristic of the candidate in the node group and compare an estimated cost associated with executing the replacement code to a threshold. The example apparatus also includes a code generator to generate machine code using the replacement code when the estimated cost of executing the replacement code satisfies the threshold.

Type: Application

Filed: September 25, 2017

Publication date: January 11, 2018

Inventors: Farhana Aleen Schuchman, David L. Kreitzer, Rakesh Krishnaiyer, Vyacheslav Pavlovich Zakharin, Sergey Preis, Leonardo Jose Borges, Philippe Thierry
Methods and apparatus to eliminate partial-redundant vector loads

Patent number: 9785413

Abstract: Methods, apparatus, systems and articles of manufacture are disclosed to eliminate partial-redundant vector loads. An example apparatus includes a node group to associate a vector operation with a node group based on a load type of the vector operation. The example apparatus also includes a candidate identifier to identify a candidate in the node group, the candidate to include a subset of vector operations of the node group. The example apparatus also includes a code optimizer to determine replacement code based on a characteristic of the candidate, and to compare an estimated cost associated with executing the replacement code to a threshold cost relative to a cost of executing the candidate. The example apparatus also includes a code generator to generate machine code using the replacement code when the estimated cost of executing the replacement code satisfies the threshold cost.

Type: Grant

Filed: June 16, 2015

Date of Patent: October 10, 2017

Assignee: INTEL CORPORATION

Inventors: Farhana Aleen Schuchman, David L. Kreitzer, Rakesh Krishnaiyer, Vyacheslav Pavlovich Zakharin, Sergey Preis, Leonardo Jose Borges, Philippe Thierry
METHODS AND APPARATUS TO ELIMINATE PARTIAL-REDUNDANT VECTOR LOADS

Publication number: 20160259628

Abstract: Methods, apparatus, systems and articles of manufacture are disclosed to eliminate partial-redundant vector loads. An example apparatus includes a node group to associate a vector operation with a node group based on a load type of the vector operation. The example apparatus also includes a candidate identifier to identify a candidate in the node group, the candidate to include a subset of vector operations of the node group. The example apparatus also includes a code optimizer to determine replacement code based on a characteristic of the candidate, and to compare an estimated cost associated with executing the replacement code to a threshold cost relative to a cost of executing the candidate. The example apparatus also includes a code generator to generate machine code using the replacement code when the estimated cost of executing the replacement code satisfies the threshold cost.

Type: Application

Filed: June 16, 2015

Publication date: September 8, 2016

Inventors: Farhana Aleen Schuchman, David L. Kreitzer, Rakesh Krishnaiyer, Vyacheslav Pavlovich Zakharin, Sergey Preis, Leonardo Jose Borges, Philippe Thierry
Speculative compilation to generate advice messages

Patent number: 9009689

Abstract: Methods to improve optimization of compilation are presented. In one embodiment, a method includes identifying one or more optimization speculations with respect to a code region and speculatively performing transformation on an intermediate representation of the code region in accordance with an optimization speculation. The method includes generating an advice message corresponding to the optimization speculation and displaying the advice message if the optimization speculation results in an improved compilation result.

Type: Grant

Filed: November 9, 2010

Date of Patent: April 14, 2015

Assignee: Intel Corporation

Inventors: Rakesh Krishnaiyer, Hideki Saito Ido, Ernesto Su, John L. Ng, Jin Lin, Xinmin Tian, Robert Y. Geva
Data dependence testing for loop fusion with code replication, array contraction, and loop interchange

Patent number: 8677338

Abstract: Methods and apparatus to data dependence testing for loop fusion, e.g., with code replication, array contraction, and/or loop interchange, are described. In one embodiment, a compiler may optimize code for efficient execution during run-time by testing for dependencies associated with improving memory locality through code replication in loops that enable various loop transformations. Other embodiments are also described.

Type: Grant

Filed: June 4, 2008

Date of Patent: March 18, 2014

Assignee: Intel Corporation

Inventors: John L. Ng, Rakesh Krishnaiyer, Alexander Y. Ostanevich
SPECULATIVE COMPILATION TO GENERATE ADVICE MESSAGES

Publication number: 20120117552

Abstract: Methods to improve optimization of compilation are presented. In one embodiment, a method includes identifying one or more optimization speculations with respect to a code region and speculatively performing transformation on an intermediate representation of the code region in accordance with an optimization speculation. The method includes generating an advice message corresponding to the optimization speculation and displaying the advice message if the optimization speculation results in an improved compilation result.

Type: Application

Filed: November 9, 2010

Publication date: May 10, 2012

Inventors: Rakesh Krishnaiyer, Hideki Saito Ido, Ernesto Su, John L. Ng, Jin Lin, Xinmin Tian, Robert Y. Geva
Dynamic prefetch distance calculation

Patent number: 7702856

Abstract: The prefetch distance to be used by a prefetch instruction may not always be correctly calculated using compile-time information. In one embodiment, the present invention generates prefetch distance calculation code to dynamically calculate a prefetch distance used by a prefetch instruction at run-time.

Type: Grant

Filed: November 9, 2005

Date of Patent: April 20, 2010

Assignee: Intel Corporation

Inventors: Rakesh Krishnaiyer, Somnath Ghosh, Abhay Kanhere
DATA DEPENDENCE TESTING FOR LOOP FUSION WITH CODE REPLICATION, ARRAY CONTRACTION, AND LOOP INTERCHANGE

Publication number: 20090307675

Abstract: Methods and apparatus to data dependence testing for loop fusion, e.g., with code replication, array contraction, and/or loop interchange, are described. In one embodiment, a compiler may optimize code for efficient execution during run-time by testing for dependencies associated with improving memory locality through code replication in loops that enable various loop transformations. Other embodiments are also described.

Type: Application

Filed: June 4, 2008

Publication date: December 10, 2009

Inventors: John L. Ng, Rakesh Krishnaiyer, Alexander Y. Ostanevich
Dynamic prefetch distance calculation

Publication number: 20070106848

Abstract: The prefetch distance to be used by a prefetch instruction may not always be correctly calculated using compile-time information. In one embodiment, the present invention generates prefetch distance calculation code to dynamically calculate a prefetch distance used by a prefetch instruction at run-time.

Type: Application

Filed: November 9, 2005

Publication date: May 10, 2007

Inventors: Rakesh Krishnaiyer, Somnath Ghosh, Abhay Kanhere

1 2 next