Patents by Inventor Rakesh Krishnaiyer

Rakesh Krishnaiyer has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11366647
    Abstract: Systems, apparatuses and methods may provide for technology that detects one or more local variables in source code, wherein the local variable(s) lack dependencies across iterations of a loop in the source code, automatically generate pipeline execution code for the local variable(s), and incorporate the pipeline execution code into an output of a compiler. In one example, the pipeline execution code includes an initialization of a pool of buffer storage for the local variable(s).
    Type: Grant
    Filed: April 30, 2020
    Date of Patent: June 21, 2022
    Assignee: Intel Corporation
    Inventors: Rajiv Deodhar, Sergey Dmitriev, Daniel Woodworth, Rakesh Krishnaiyer, Kent Glossop, Arvind Sudarsanam
  • Patent number: 11256489
    Abstract: Systems, apparatuses and methods may provide for technology to identify in user code, a nested loop which would result in cache memory misses when executed. The technology further reverses an order of iterations of a first inner loop in the nested loop to obtain a modified nested loop. Reversing the order of iterations increases a number of times that cache memory hits occur when the modified nested loop is executed.
    Type: Grant
    Filed: September 22, 2017
    Date of Patent: February 22, 2022
    Assignee: Intel Corporation
    Inventors: Gautam Doshi, Rakesh Krishnaiyer, Rama Kishan Malladi
  • Publication number: 20220012028
    Abstract: Methods, apparatus, systems and articles of manufacture (e.g., computer readable storage media) to perform automatic compiler optimization to enable streaming-store generation for unaligned contiguous write access are disclosed. Example apparatus disclosed herein are to mark a store instruction in source program code as a transformation candidate when the store instruction is associated with a group of memory accesses that are unaligned with respect to a size of a cache line in a cache. Disclosed apparatus are also to transform the store instruction that is marked as the transformation candidate to form transformed program code when a non-temporal property is satisfied, the transformed program code to replace the store instruction with (i) a write to a buffer in the cache and (ii) a streaming-store instruction that is to write contents of the buffer to memory.
    Type: Application
    Filed: September 23, 2021
    Publication date: January 13, 2022
    Inventors: Charles Yount, Rakesh Krishnaiyer, Timothy Creech, Daniel Woodworth, Joshua Cranmer
  • Patent number: 11106438
    Abstract: Various embodiments are generally directed to optimizing dataflow in automated transformation frameworks (e.g., compiler, runtime, etc.) for spatial architectures (e.g., Configurable Spatial Accelerator) that translate high-level user code into forms that use “streams” (e.g., Latency Insensitive Channels, line buffers) to reduce overhead, eliminate or improve the efficiency of redundant memory accesses, and improve overall throughput.
    Type: Grant
    Filed: March 27, 2020
    Date of Patent: August 31, 2021
    Assignee: INTEL CORPORATION
    Inventors: Dounia Khaldi, Rakesh Krishnaiyer, Rajiv Deodhar, Daniel Woodworth, Joshua Cranmer, Kent Glossop
  • Publication number: 20200371763
    Abstract: Systems, apparatuses and methods may provide for technology to identify in user code, a nested loop which would result in cache memory misses when executed. The technology further reverses an order of iterations of a first inner loop in the nested loop to obtain a modified nested loop. Reversing the order of iterations increases a number of times that cache memory hits occur when the modified nested loop is executed.
    Type: Application
    Filed: September 22, 2017
    Publication date: November 26, 2020
    Inventors: Gautam Doshi, Rakesh Krishnaiyer, Rama Kishan Malladi
  • Publication number: 20200257510
    Abstract: Systems, apparatuses and methods may provide for technology that detects one or more local variables in source code, wherein the local variable(s) lack dependencies across iterations of a loop in the source code, automatically generate pipeline execution code for the local variable(s), and incorporate the pipeline execution code into an output of a compiler. In one example, the pipeline execution code includes an initialization of a pool of buffer storage for the local variable(s).
    Type: Application
    Filed: April 30, 2020
    Publication date: August 13, 2020
    Inventors: Rajiv Deodhar, Sergey Dmitriev, Daniel Woodworth, Rakesh Krishnaiyer, Kent Glossop, Arvind Sudarsanam
  • Publication number: 20200233649
    Abstract: Various embodiments are generally directed to optimizing dataflow in automated transformation frameworks (e.g., compiler, runtime, etc.) for spatial architectures (e.g., Configurable Spatial Accelerator) that translate high-level user code into forms that use “streams” (e.g., Latency Insensitive Channels, line buffers) to reduce overhead, eliminate or improve the efficiency of redundant memory accesses, and improve overall throughput.
    Type: Application
    Filed: March 27, 2020
    Publication date: July 23, 2020
    Applicant: Intel Corporation
    Inventors: DOUNIA KHALDI, RAKESH KRISHNAIYER, RAJIV DEODHAR, DANIEL WOODWORTH, JOSHUA CRANMER, KENT GLOSSOP
  • Patent number: 10628141
    Abstract: Logic may transform a target code to partition data automatically and/or autonomously based on a memory constraint associated with a resource such as a target device. Logic may identify a tag in the code to identify a task, wherein the task comprises at least one loop, the loop to process data elements in one or more arrays. Logic may automatically generate instructions to determine one or more partitions for the at least one loop to partition data elements, accessed by one or more memory access instructions for the one or more arrays within the at least one loop, based on a memory constraint, the memory constraint to identify an amount of memory available for allocation to process the task. Logic may determine one or more iteration space blocks for the parallel loops, determine memory windows for each block, copy data into and out of constrained memory, and transform array accesses.
    Type: Grant
    Filed: May 7, 2018
    Date of Patent: April 21, 2020
    Assignee: INTEL CORPORATION
    Inventors: Rakesh Krishnaiyer, Konstantin Bobrovskii, Dmitry Budanov
  • Patent number: 10268454
    Abstract: Methods, apparatus, systems and articles of manufacture are disclosed to eliminate partial-redundant vector load operations. An example apparatus includes a node grouper to associate a vector operation with a node group, a candidate verifier to perform a dependencies test on a subset of the node group, and identify a subset of the node group as a candidate when the subset satisfies the dependencies test, and a code optimizer to determine replacement code based on a characteristic of the candidate in the node group and compare an estimated cost associated with executing the replacement code to a threshold. The example apparatus also includes a code generator to generate machine code using the replacement code when the estimated cost of executing the replacement code satisfies the threshold.
    Type: Grant
    Filed: September 25, 2017
    Date of Patent: April 23, 2019
    Assignee: Intel Corporation
    Inventors: Farhana Aleen Schuchman, David L. Kreitzer, Rakesh Krishnaiyer, Vyacheslav Pavlovich Zakharin, Sergey Preis, Leonardo Jose Borges, Philippe Thierry
  • Publication number: 20190042221
    Abstract: Logic may transform a target code to partition data automatically and/or autonomously based on a memory constraint associated with a resource such as a target device. Logic may identify a tag in the code to identify a task, wherein the task comprises at least one loop, the loop to process data elements in one or more arrays. Logic may automatically generate instructions to determine one or more partitions for the at least one loop to partition data elements, accessed by one or more memory access instructions for the one or more arrays within the at least one loop, based on a memory constraint, the memory constraint to identify an amount of memory available for allocation to process the task. Logic may determine one or more iteration space blocks for the parallel loops, determine memory windows for each block, copy data into and out of constrained memory, and transform array accesses.
    Type: Application
    Filed: May 7, 2018
    Publication date: February 7, 2019
    Applicant: INTEL CORPORATION
    Inventors: Rakesh Krishnaiyer, Konstantin Bobrovskii, Dmitry Budanov
  • Patent number: 9921966
    Abstract: The present application is directed to employing prefetch to reduce write overhead. A device may comprise a processor and a cache memory. The processor may determine if data to be written to the cache memory comprises multiple cache lines wherein at least one of the cache lines will be fully written. If the data comprises at least one cache line to be fully written, then the processor may perform a “prefetch” wherein the processor may write dummy data to sections of the cache memory corresponding to the data to be written in full cache lines. The processor may then write actual data to the sections containing the dummy data without the processor first having to verify ownership of the sections. Any remaining data that will not be written in full cache lines may then be written to the cache memory utilizing a standard write transaction.
    Type: Grant
    Filed: May 9, 2014
    Date of Patent: March 20, 2018
    Assignee: INTEL CORPORATION
    Inventors: Rakesh Krishnaiyer, Serge Preis, Hideki Ido, Anatoly Zvezdin
  • Publication number: 20180011693
    Abstract: Methods, apparatus, systems and articles of manufacture are disclosed to eliminate partial-redundant vector load operations. An example apparatus includes a node grouper to associate a vector operation with a node group, a candidate verifier to perform a dependencies test on a subset of the node group, and identify a subset of the node group as a candidate when the subset satisfies the dependencies test, and a code optimizer to determine replacement code based on a characteristic of the candidate in the node group and compare an estimated cost associated with executing the replacement code to a threshold. The example apparatus also includes a code generator to generate machine code using the replacement code when the estimated cost of executing the replacement code satisfies the threshold.
    Type: Application
    Filed: September 25, 2017
    Publication date: January 11, 2018
    Inventors: Farhana Aleen Schuchman, David L. Kreitzer, Rakesh Krishnaiyer, Vyacheslav Pavlovich Zakharin, Sergey Preis, Leonardo Jose Borges, Philippe Thierry
  • Patent number: 9785413
    Abstract: Methods, apparatus, systems and articles of manufacture are disclosed to eliminate partial-redundant vector loads. An example apparatus includes a node group to associate a vector operation with a node group based on a load type of the vector operation. The example apparatus also includes a candidate identifier to identify a candidate in the node group, the candidate to include a subset of vector operations of the node group. The example apparatus also includes a code optimizer to determine replacement code based on a characteristic of the candidate, and to compare an estimated cost associated with executing the replacement code to a threshold cost relative to a cost of executing the candidate. The example apparatus also includes a code generator to generate machine code using the replacement code when the estimated cost of executing the replacement code satisfies the threshold cost.
    Type: Grant
    Filed: June 16, 2015
    Date of Patent: October 10, 2017
    Assignee: INTEL CORPORATION
    Inventors: Farhana Aleen Schuchman, David L. Kreitzer, Rakesh Krishnaiyer, Vyacheslav Pavlovich Zakharin, Sergey Preis, Leonardo Jose Borges, Philippe Thierry
  • Publication number: 20160259628
    Abstract: Methods, apparatus, systems and articles of manufacture are disclosed to eliminate partial-redundant vector loads. An example apparatus includes a node group to associate a vector operation with a node group based on a load type of the vector operation. The example apparatus also includes a candidate identifier to identify a candidate in the node group, the candidate to include a subset of vector operations of the node group. The example apparatus also includes a code optimizer to determine replacement code based on a characteristic of the candidate, and to compare an estimated cost associated with executing the replacement code to a threshold cost relative to a cost of executing the candidate. The example apparatus also includes a code generator to generate machine code using the replacement code when the estimated cost of executing the replacement code satisfies the threshold cost.
    Type: Application
    Filed: June 16, 2015
    Publication date: September 8, 2016
    Inventors: Farhana Aleen Schuchman, David L. Kreitzer, Rakesh Krishnaiyer, Vyacheslav Pavlovich Zakharin, Sergey Preis, Leonardo Jose Borges, Philippe Thierry
  • Publication number: 20150324292
    Abstract: The present application is directed to employing prefetch to reduce write overhead. A device may comprise a processor and a cache memory. The processor may determine if data to be written to the cache memory comprises multiple cache lines wherein at least one of the cache lines will be fully written. If the data comprises at least one cache line to be fully written, then the processor may perform a “prefetch” wherein the processor may write dummy data to sections of the cache memory corresponding to the data to be written in full cache lines. The processor may then write actual data to the sections containing the dummy data without the processor first having to verify ownership of the sections. Any remaining data that will not be written in full cache lines may then be written to the cache memory utilizing a standard write transaction.
    Type: Application
    Filed: May 9, 2014
    Publication date: November 12, 2015
    Inventors: RAKESH KRISHNAIYER, SERGE PREIS, HIDEKI IDO, ANATOLY ZVEZDIN
  • Patent number: 9009689
    Abstract: Methods to improve optimization of compilation are presented. In one embodiment, a method includes identifying one or more optimization speculations with respect to a code region and speculatively performing transformation on an intermediate representation of the code region in accordance with an optimization speculation. The method includes generating an advice message corresponding to the optimization speculation and displaying the advice message if the optimization speculation results in an improved compilation result.
    Type: Grant
    Filed: November 9, 2010
    Date of Patent: April 14, 2015
    Assignee: Intel Corporation
    Inventors: Rakesh Krishnaiyer, Hideki Saito Ido, Ernesto Su, John L. Ng, Jin Lin, Xinmin Tian, Robert Y. Geva
  • Patent number: 8677338
    Abstract: Methods and apparatus to data dependence testing for loop fusion, e.g., with code replication, array contraction, and/or loop interchange, are described. In one embodiment, a compiler may optimize code for efficient execution during run-time by testing for dependencies associated with improving memory locality through code replication in loops that enable various loop transformations. Other embodiments are also described.
    Type: Grant
    Filed: June 4, 2008
    Date of Patent: March 18, 2014
    Assignee: Intel Corporation
    Inventors: John L. Ng, Rakesh Krishnaiyer, Alexander Y. Ostanevich
  • Publication number: 20120117552
    Abstract: Methods to improve optimization of compilation are presented. In one embodiment, a method includes identifying one or more optimization speculations with respect to a code region and speculatively performing transformation on an intermediate representation of the code region in accordance with an optimization speculation. The method includes generating an advice message corresponding to the optimization speculation and displaying the advice message if the optimization speculation results in an improved compilation result.
    Type: Application
    Filed: November 9, 2010
    Publication date: May 10, 2012
    Inventors: Rakesh Krishnaiyer, Hideki Saito Ido, Ernesto Su, John L. Ng, Jin Lin, Xinmin Tian, Robert Y. Geva
  • Patent number: 7702856
    Abstract: The prefetch distance to be used by a prefetch instruction may not always be correctly calculated using compile-time information. In one embodiment, the present invention generates prefetch distance calculation code to dynamically calculate a prefetch distance used by a prefetch instruction at run-time.
    Type: Grant
    Filed: November 9, 2005
    Date of Patent: April 20, 2010
    Assignee: Intel Corporation
    Inventors: Rakesh Krishnaiyer, Somnath Ghosh, Abhay Kanhere
  • Publication number: 20090307675
    Abstract: Methods and apparatus to data dependence testing for loop fusion, e.g., with code replication, array contraction, and/or loop interchange, are described. In one embodiment, a compiler may optimize code for efficient execution during run-time by testing for dependencies associated with improving memory locality through code replication in loops that enable various loop transformations. Other embodiments are also described.
    Type: Application
    Filed: June 4, 2008
    Publication date: December 10, 2009
    Inventors: John L. Ng, Rakesh Krishnaiyer, Alexander Y. Ostanevich