Patents by Inventor Rakesh Krishnaiyer
Rakesh Krishnaiyer has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 11366647Abstract: Systems, apparatuses and methods may provide for technology that detects one or more local variables in source code, wherein the local variable(s) lack dependencies across iterations of a loop in the source code, automatically generate pipeline execution code for the local variable(s), and incorporate the pipeline execution code into an output of a compiler. In one example, the pipeline execution code includes an initialization of a pool of buffer storage for the local variable(s).Type: GrantFiled: April 30, 2020Date of Patent: June 21, 2022Assignee: Intel CorporationInventors: Rajiv Deodhar, Sergey Dmitriev, Daniel Woodworth, Rakesh Krishnaiyer, Kent Glossop, Arvind Sudarsanam
-
Patent number: 11256489Abstract: Systems, apparatuses and methods may provide for technology to identify in user code, a nested loop which would result in cache memory misses when executed. The technology further reverses an order of iterations of a first inner loop in the nested loop to obtain a modified nested loop. Reversing the order of iterations increases a number of times that cache memory hits occur when the modified nested loop is executed.Type: GrantFiled: September 22, 2017Date of Patent: February 22, 2022Assignee: Intel CorporationInventors: Gautam Doshi, Rakesh Krishnaiyer, Rama Kishan Malladi
-
Publication number: 20220012028Abstract: Methods, apparatus, systems and articles of manufacture (e.g., computer readable storage media) to perform automatic compiler optimization to enable streaming-store generation for unaligned contiguous write access are disclosed. Example apparatus disclosed herein are to mark a store instruction in source program code as a transformation candidate when the store instruction is associated with a group of memory accesses that are unaligned with respect to a size of a cache line in a cache. Disclosed apparatus are also to transform the store instruction that is marked as the transformation candidate to form transformed program code when a non-temporal property is satisfied, the transformed program code to replace the store instruction with (i) a write to a buffer in the cache and (ii) a streaming-store instruction that is to write contents of the buffer to memory.Type: ApplicationFiled: September 23, 2021Publication date: January 13, 2022Inventors: Charles Yount, Rakesh Krishnaiyer, Timothy Creech, Daniel Woodworth, Joshua Cranmer
-
Patent number: 11106438Abstract: Various embodiments are generally directed to optimizing dataflow in automated transformation frameworks (e.g., compiler, runtime, etc.) for spatial architectures (e.g., Configurable Spatial Accelerator) that translate high-level user code into forms that use “streams” (e.g., Latency Insensitive Channels, line buffers) to reduce overhead, eliminate or improve the efficiency of redundant memory accesses, and improve overall throughput.Type: GrantFiled: March 27, 2020Date of Patent: August 31, 2021Assignee: INTEL CORPORATIONInventors: Dounia Khaldi, Rakesh Krishnaiyer, Rajiv Deodhar, Daniel Woodworth, Joshua Cranmer, Kent Glossop
-
Publication number: 20200371763Abstract: Systems, apparatuses and methods may provide for technology to identify in user code, a nested loop which would result in cache memory misses when executed. The technology further reverses an order of iterations of a first inner loop in the nested loop to obtain a modified nested loop. Reversing the order of iterations increases a number of times that cache memory hits occur when the modified nested loop is executed.Type: ApplicationFiled: September 22, 2017Publication date: November 26, 2020Inventors: Gautam Doshi, Rakesh Krishnaiyer, Rama Kishan Malladi
-
Publication number: 20200257510Abstract: Systems, apparatuses and methods may provide for technology that detects one or more local variables in source code, wherein the local variable(s) lack dependencies across iterations of a loop in the source code, automatically generate pipeline execution code for the local variable(s), and incorporate the pipeline execution code into an output of a compiler. In one example, the pipeline execution code includes an initialization of a pool of buffer storage for the local variable(s).Type: ApplicationFiled: April 30, 2020Publication date: August 13, 2020Inventors: Rajiv Deodhar, Sergey Dmitriev, Daniel Woodworth, Rakesh Krishnaiyer, Kent Glossop, Arvind Sudarsanam
-
Publication number: 20200233649Abstract: Various embodiments are generally directed to optimizing dataflow in automated transformation frameworks (e.g., compiler, runtime, etc.) for spatial architectures (e.g., Configurable Spatial Accelerator) that translate high-level user code into forms that use “streams” (e.g., Latency Insensitive Channels, line buffers) to reduce overhead, eliminate or improve the efficiency of redundant memory accesses, and improve overall throughput.Type: ApplicationFiled: March 27, 2020Publication date: July 23, 2020Applicant: Intel CorporationInventors: DOUNIA KHALDI, RAKESH KRISHNAIYER, RAJIV DEODHAR, DANIEL WOODWORTH, JOSHUA CRANMER, KENT GLOSSOP
-
Patent number: 10628141Abstract: Logic may transform a target code to partition data automatically and/or autonomously based on a memory constraint associated with a resource such as a target device. Logic may identify a tag in the code to identify a task, wherein the task comprises at least one loop, the loop to process data elements in one or more arrays. Logic may automatically generate instructions to determine one or more partitions for the at least one loop to partition data elements, accessed by one or more memory access instructions for the one or more arrays within the at least one loop, based on a memory constraint, the memory constraint to identify an amount of memory available for allocation to process the task. Logic may determine one or more iteration space blocks for the parallel loops, determine memory windows for each block, copy data into and out of constrained memory, and transform array accesses.Type: GrantFiled: May 7, 2018Date of Patent: April 21, 2020Assignee: INTEL CORPORATIONInventors: Rakesh Krishnaiyer, Konstantin Bobrovskii, Dmitry Budanov
-
Patent number: 10268454Abstract: Methods, apparatus, systems and articles of manufacture are disclosed to eliminate partial-redundant vector load operations. An example apparatus includes a node grouper to associate a vector operation with a node group, a candidate verifier to perform a dependencies test on a subset of the node group, and identify a subset of the node group as a candidate when the subset satisfies the dependencies test, and a code optimizer to determine replacement code based on a characteristic of the candidate in the node group and compare an estimated cost associated with executing the replacement code to a threshold. The example apparatus also includes a code generator to generate machine code using the replacement code when the estimated cost of executing the replacement code satisfies the threshold.Type: GrantFiled: September 25, 2017Date of Patent: April 23, 2019Assignee: Intel CorporationInventors: Farhana Aleen Schuchman, David L. Kreitzer, Rakesh Krishnaiyer, Vyacheslav Pavlovich Zakharin, Sergey Preis, Leonardo Jose Borges, Philippe Thierry
-
Publication number: 20190042221Abstract: Logic may transform a target code to partition data automatically and/or autonomously based on a memory constraint associated with a resource such as a target device. Logic may identify a tag in the code to identify a task, wherein the task comprises at least one loop, the loop to process data elements in one or more arrays. Logic may automatically generate instructions to determine one or more partitions for the at least one loop to partition data elements, accessed by one or more memory access instructions for the one or more arrays within the at least one loop, based on a memory constraint, the memory constraint to identify an amount of memory available for allocation to process the task. Logic may determine one or more iteration space blocks for the parallel loops, determine memory windows for each block, copy data into and out of constrained memory, and transform array accesses.Type: ApplicationFiled: May 7, 2018Publication date: February 7, 2019Applicant: INTEL CORPORATIONInventors: Rakesh Krishnaiyer, Konstantin Bobrovskii, Dmitry Budanov
-
Patent number: 9921966Abstract: The present application is directed to employing prefetch to reduce write overhead. A device may comprise a processor and a cache memory. The processor may determine if data to be written to the cache memory comprises multiple cache lines wherein at least one of the cache lines will be fully written. If the data comprises at least one cache line to be fully written, then the processor may perform a “prefetch” wherein the processor may write dummy data to sections of the cache memory corresponding to the data to be written in full cache lines. The processor may then write actual data to the sections containing the dummy data without the processor first having to verify ownership of the sections. Any remaining data that will not be written in full cache lines may then be written to the cache memory utilizing a standard write transaction.Type: GrantFiled: May 9, 2014Date of Patent: March 20, 2018Assignee: INTEL CORPORATIONInventors: Rakesh Krishnaiyer, Serge Preis, Hideki Ido, Anatoly Zvezdin
-
Publication number: 20180011693Abstract: Methods, apparatus, systems and articles of manufacture are disclosed to eliminate partial-redundant vector load operations. An example apparatus includes a node grouper to associate a vector operation with a node group, a candidate verifier to perform a dependencies test on a subset of the node group, and identify a subset of the node group as a candidate when the subset satisfies the dependencies test, and a code optimizer to determine replacement code based on a characteristic of the candidate in the node group and compare an estimated cost associated with executing the replacement code to a threshold. The example apparatus also includes a code generator to generate machine code using the replacement code when the estimated cost of executing the replacement code satisfies the threshold.Type: ApplicationFiled: September 25, 2017Publication date: January 11, 2018Inventors: Farhana Aleen Schuchman, David L. Kreitzer, Rakesh Krishnaiyer, Vyacheslav Pavlovich Zakharin, Sergey Preis, Leonardo Jose Borges, Philippe Thierry
-
Patent number: 9785413Abstract: Methods, apparatus, systems and articles of manufacture are disclosed to eliminate partial-redundant vector loads. An example apparatus includes a node group to associate a vector operation with a node group based on a load type of the vector operation. The example apparatus also includes a candidate identifier to identify a candidate in the node group, the candidate to include a subset of vector operations of the node group. The example apparatus also includes a code optimizer to determine replacement code based on a characteristic of the candidate, and to compare an estimated cost associated with executing the replacement code to a threshold cost relative to a cost of executing the candidate. The example apparatus also includes a code generator to generate machine code using the replacement code when the estimated cost of executing the replacement code satisfies the threshold cost.Type: GrantFiled: June 16, 2015Date of Patent: October 10, 2017Assignee: INTEL CORPORATIONInventors: Farhana Aleen Schuchman, David L. Kreitzer, Rakesh Krishnaiyer, Vyacheslav Pavlovich Zakharin, Sergey Preis, Leonardo Jose Borges, Philippe Thierry
-
Publication number: 20160259628Abstract: Methods, apparatus, systems and articles of manufacture are disclosed to eliminate partial-redundant vector loads. An example apparatus includes a node group to associate a vector operation with a node group based on a load type of the vector operation. The example apparatus also includes a candidate identifier to identify a candidate in the node group, the candidate to include a subset of vector operations of the node group. The example apparatus also includes a code optimizer to determine replacement code based on a characteristic of the candidate, and to compare an estimated cost associated with executing the replacement code to a threshold cost relative to a cost of executing the candidate. The example apparatus also includes a code generator to generate machine code using the replacement code when the estimated cost of executing the replacement code satisfies the threshold cost.Type: ApplicationFiled: June 16, 2015Publication date: September 8, 2016Inventors: Farhana Aleen Schuchman, David L. Kreitzer, Rakesh Krishnaiyer, Vyacheslav Pavlovich Zakharin, Sergey Preis, Leonardo Jose Borges, Philippe Thierry
-
Publication number: 20150324292Abstract: The present application is directed to employing prefetch to reduce write overhead. A device may comprise a processor and a cache memory. The processor may determine if data to be written to the cache memory comprises multiple cache lines wherein at least one of the cache lines will be fully written. If the data comprises at least one cache line to be fully written, then the processor may perform a “prefetch” wherein the processor may write dummy data to sections of the cache memory corresponding to the data to be written in full cache lines. The processor may then write actual data to the sections containing the dummy data without the processor first having to verify ownership of the sections. Any remaining data that will not be written in full cache lines may then be written to the cache memory utilizing a standard write transaction.Type: ApplicationFiled: May 9, 2014Publication date: November 12, 2015Inventors: RAKESH KRISHNAIYER, SERGE PREIS, HIDEKI IDO, ANATOLY ZVEZDIN
-
Patent number: 9009689Abstract: Methods to improve optimization of compilation are presented. In one embodiment, a method includes identifying one or more optimization speculations with respect to a code region and speculatively performing transformation on an intermediate representation of the code region in accordance with an optimization speculation. The method includes generating an advice message corresponding to the optimization speculation and displaying the advice message if the optimization speculation results in an improved compilation result.Type: GrantFiled: November 9, 2010Date of Patent: April 14, 2015Assignee: Intel CorporationInventors: Rakesh Krishnaiyer, Hideki Saito Ido, Ernesto Su, John L. Ng, Jin Lin, Xinmin Tian, Robert Y. Geva
-
Patent number: 8677338Abstract: Methods and apparatus to data dependence testing for loop fusion, e.g., with code replication, array contraction, and/or loop interchange, are described. In one embodiment, a compiler may optimize code for efficient execution during run-time by testing for dependencies associated with improving memory locality through code replication in loops that enable various loop transformations. Other embodiments are also described.Type: GrantFiled: June 4, 2008Date of Patent: March 18, 2014Assignee: Intel CorporationInventors: John L. Ng, Rakesh Krishnaiyer, Alexander Y. Ostanevich
-
Publication number: 20120117552Abstract: Methods to improve optimization of compilation are presented. In one embodiment, a method includes identifying one or more optimization speculations with respect to a code region and speculatively performing transformation on an intermediate representation of the code region in accordance with an optimization speculation. The method includes generating an advice message corresponding to the optimization speculation and displaying the advice message if the optimization speculation results in an improved compilation result.Type: ApplicationFiled: November 9, 2010Publication date: May 10, 2012Inventors: Rakesh Krishnaiyer, Hideki Saito Ido, Ernesto Su, John L. Ng, Jin Lin, Xinmin Tian, Robert Y. Geva
-
Patent number: 7702856Abstract: The prefetch distance to be used by a prefetch instruction may not always be correctly calculated using compile-time information. In one embodiment, the present invention generates prefetch distance calculation code to dynamically calculate a prefetch distance used by a prefetch instruction at run-time.Type: GrantFiled: November 9, 2005Date of Patent: April 20, 2010Assignee: Intel CorporationInventors: Rakesh Krishnaiyer, Somnath Ghosh, Abhay Kanhere
-
Publication number: 20090307675Abstract: Methods and apparatus to data dependence testing for loop fusion, e.g., with code replication, array contraction, and/or loop interchange, are described. In one embodiment, a compiler may optimize code for efficient execution during run-time by testing for dependencies associated with improving memory locality through code replication in loops that enable various loop transformations. Other embodiments are also described.Type: ApplicationFiled: June 4, 2008Publication date: December 10, 2009Inventors: John L. Ng, Rakesh Krishnaiyer, Alexander Y. Ostanevich