Patents by Inventor Roger Gramunt
Roger Gramunt has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20230195456Abstract: In one embodiment, an apparatus includes: a plurality of execution circuits to execute and instruct micro-operations (?ops), where a subset of the plurality of execution circuits are capable of execution of a fused ?op; a fusion circuit coupled to at least the subset of the plurality of execution circuits, wherein the fusion circuit is to fuse at least some pairs of producer-consumer ?ops into fused ?ops; and a fusion throttle circuit coupled to the fusion circuit, wherein the fusion throttle circuit is to prevent a first ?op from being fused with another ?op based at least in part on historical information associated with the first ?op. Other embodiments are described and claimed.Type: ApplicationFiled: December 22, 2021Publication date: June 22, 2023Inventors: Sufiyan Syed, Roger Gramunt, Jayesh Gaur, Priyank Deshpande
-
Publication number: 20220343174Abstract: Described herein is a graphics processor including a processing resource including a multiplier configured to multiply input associated with the instruction at one of a first plurality of bit widths, an adder configured to add a product output from the multiplier with an accumulator value at one of a second plurality of bit widths, and circuitry to select a first bit width of the first plurality of bit widths for the multiplier and a second bit width of the second plurality of bit widths for the adder.Type: ApplicationFiled: May 12, 2022Publication date: October 27, 2022Applicant: Intel CorporationInventors: Dipankar Das, Roger Gramunt, Mikhail Smelyanskiy, Jesus Corbal, Dheevatsa Mudigere, Naveen K. Mellempudi, Alexander F. Heinecke
-
Patent number: 11334796Abstract: A processing cluster of a processing cluster array comprises a plurality of registers to store input values of vector input operands, the input values of at least some of the vector input operands having different bit lengths than those of other input values of other vector input operands, and a compute unit to execute a dot-product instruction with the vector input operands to perform a number of parallel multiply operations and an accumulate operation per 32-bit lane based on a bit length of the smallest-sized input value of a first vector input operand relative to the 32-bit lane.Type: GrantFiled: August 3, 2020Date of Patent: May 17, 2022Assignee: Intel CorporationInventors: Dipankar Das, Roger Gramunt, Mikhail Smelyanskiy, Jesus Corbal, Dheevatsa Mudigere, Naveen K. Mellempudi, Alexander F. Heinecke
-
Publication number: 20210374069Abstract: A method, system, and apparatus may initialize a fixed plurality of page table entries for a fixed plurality of pages in memory, each page having a first size, wherein a linear address for each page table entry corresponds to a physical address and the fixed plurality of pages are aligned. A bit in each of the page table entries for the aligned pages may be set to indicate whether or not the fixed plurality of pages is to be treated as one combined page having a second page size larger than the first page size. Other embodiments are described and claimed.Type: ApplicationFiled: August 17, 2021Publication date: December 2, 2021Applicant: Intel CorporationInventors: Edward Grochowski, Julio Gago, Roger Gramunt, Roger Espasa, Rolf Kassa
-
Publication number: 20210019631Abstract: A processing cluster of a processing cluster array comprises a plurality of registers to store input values of vector input operands, the input values of at least some of the vector input operands having different bit lengths than those of other input values of other vector input operands, and a compute unit to execute a dot-product instruction with the vector input operands to perform a number of parallel multiply operations and an accumulate operation per 32-bit lane based on a bit length of the smallest-sized input value of a first vector input operand relative to the 32-bit lane.Type: ApplicationFiled: August 3, 2020Publication date: January 21, 2021Applicant: Intel CorporationInventors: Dipankar Das, Roger Gramunt, Mikhail Smelyanskiy, Jesus Corbal, Dheevatsa Mudigere, Naveen K. Mellempudi, Alexander F. Heinecke
-
Patent number: 10776699Abstract: One embodiment provides for a compute apparatus to perform machine learning operations, the compute apparatus comprising a fetch unit to fetch a single instruction having multiple input operands, wherein the multiple input operands have an unequal bit-length, a first input operand having a first bit-length and a second input operand having a second bit-length; a decode unit to decode the single instruction into a decoded instruction; an operand length unit to determine a smaller bit-length of the first bit-length and the second bit-length; and a compute unit to perform a matrix operation on the multiple input operands to generate an output value having a bit length of the smaller bit length.Type: GrantFiled: January 12, 2018Date of Patent: September 15, 2020Assignee: Intel CorporationInventors: Dipankar Das, Roger Gramunt, Mikhail Smelyanskiy, Jesus Corbal, Dheevatsa Mudigere, Naveen K. Mellempudi, Alexander F. Heinecke
-
Publication number: 20200242046Abstract: A method, system, and apparatus may initialize a fixed plurality of page table entries for a fixed plurality of pages in memory, each page having a first size, wherein a linear address for each page table entry corresponds to a physical address and the fixed plurality of pages are aligned. A bit in each of the page table entries for the aligned pages may be set to indicate whether or not the fixed plurality of pages is to be treated as one combined page having a second page size larger than the first page size. Other embodiments are described and claimed.Type: ApplicationFiled: September 4, 2019Publication date: July 30, 2020Inventors: Edward Grochowski, Julio Gago, Roger Gramunt, Roger Espasa, Rolf Kassa
-
Patent number: 10719320Abstract: An apparatus is provided which comprises: a component; a voltage generator to supply load current to the component; first one or more circuitries to predict that the load current is to increase from a first time; and second one or more circuitries to, in anticipation of the increase in the load current from the first time, cause the component to execute first instructions during a time period that occurs prior to the first time.Type: GrantFiled: July 31, 2017Date of Patent: July 21, 2020Assignee: Intel CorporationInventors: Federico Ardanaz, Roger Gramunt, Jesus Corbal, Dennis R. Bradford, Jonathan M. Eastep
-
Patent number: 10445245Abstract: A method, system, and apparatus may initialize a fixed plurality of page table entries for a fixed plurality of pages in memory, each page having a first size, wherein a linear address for each page table entry corresponds to a physical address and the fixed plurality of pages are aligned. A bit in each of the page table entries for the aligned pages may be set to indicate whether or not the fixed plurality of pages is to be treated as one combined page having a second page size larger than the first page size. Other embodiments are described and claimed.Type: GrantFiled: December 19, 2016Date of Patent: October 15, 2019Assignee: Intel CorporationInventors: Edward Grochowski, Julio Gago, Roger Gramunt, Roger Espasa, Rolf Kassa
-
Patent number: 10445244Abstract: A method, system, and apparatus may initialize a fixed plurality of page table entries for a fixed plurality of pages in memory, each page having a first size, wherein a linear address for each page table entry corresponds to a physical address and the fixed plurality of pages are aligned. A bit in each of the page table entries for the aligned pages may be set to indicate whether or not the fixed plurality of pages is to be treated as one combined page having a second page size larger than the first page size. Other embodiments are described and claimed.Type: GrantFiled: December 19, 2016Date of Patent: October 15, 2019Assignee: Intel CorporationInventors: Edward Grochowski, Julio Gago, Roger Gramunt, Roger Espasa, Rolf Kassa
-
Publication number: 20190034203Abstract: An apparatus is provided which comprises: a component; a voltage generator to supply load current to the component; first one or more circuitries to predict that the load current is to increase from a first time; and second one or more circuitries to, in anticipation of the increase in the load current from the first time, cause the component to execute first instructions during a time period that occurs prior to the first time.Type: ApplicationFiled: July 31, 2017Publication date: January 31, 2019Inventors: Federico Ardanaz, Roger Gramunt, Jesus Corbal, Dennis R. Bradford, Jonathan M. Eastep
-
Patent number: 10175986Abstract: A processor includes a logic for stateless capture of data linear addresses (DLA) during precise event based sampling (PEBS) for an out-of-order execution engine. The engine may include a PEBS unit with logic to increment a counter each time an instance of a designated micro-op is retired a reorder buffer, capture output DLA referenced by an instance of the micro-op that executes after the counter overflows, set a captured bit associated with a reorder buffer identifier for the instance of the micro-op, and store a PEBS record in a debug storage when the instance of the micro-op is retired from the reorder buffer. The designated micro-op references a DLA of a memory accessible to the processor.Type: GrantFiled: May 8, 2017Date of Patent: January 8, 2019Assignee: Intel CorporationInventors: Roger Gramunt, Ramon Matas, Benjamin C. Chaffin, Neal S. Moyer, Rammohan Padmanabhan, Alexey P. Suprun, Matthew G. Smith
-
Publication number: 20180322390Abstract: One embodiment provides for a compute apparatus to perform machine learning operations, the compute apparatus comprising a fetch unit to fetch a single instruction having multiple input operands, wherein the multiple input operands have an unequal bit-length, a first input operand having a first bit-length and a second input operand having a second bit-length; a decode unit to decode the single instruction into a decoded instruction; an operand length unit to determine a smaller bit-length of the first bit-length and the second bit-length; and a compute unit to perform a matrix operation on the multiple input operands to generate an output value having a bit length of the smaller bit length.Type: ApplicationFiled: January 12, 2018Publication date: November 8, 2018Applicant: Intel CorporationInventors: Dipankar Das, Roger Gramunt, Mikhail Smelyanskiy, Jesus Corbal, Dheevatsa Mudigere, Naveen K. Mellempudi, Alexander F. Heinecke
-
Patent number: 10007620Abstract: A processor includes a set associative cache and a cache controller. The cache controller makes an initial association between first and second groups of sampled sets in the cache and first and second cache replacement policies. Follower sets in the cache are initially associated with the more conservative of the two policies. Following cache line insertions in a first epoch, the associations between the groups of sampled sets and cache replacement policies are swapped for the next epoch. If the less conservative policy outperforms the more conservative policy during two consecutive epochs, the follower sets are associated with the less conservative policy for the next epoch. Subsequently, if the more conservative policy outperforms the less conservative policy during any epoch, the follower sets are again associated with the more conservative policy. Performance may be measured based the number of cache misses associated with each policy.Type: GrantFiled: September 30, 2016Date of Patent: June 26, 2018Assignee: Intel CorporationInventors: Seth H. Pugsley, Christopher B. Wilkerson, Roger Gramunt, Jonathan C. Hall, Prabhat Jain
-
Publication number: 20180095895Abstract: A processor includes a set associative cache and a cache controller. The cache controller makes an initial association between first and second groups of sampled sets in the cache and first and second cache replacement policies. Follower sets in the cache are initially associated with the more conservative of the two policies. Following cache line insertions in a first epoch, the associations between the groups of sampled sets and cache replacement policies are swapped for the next epoch. If the less conservative policy outperforms the more conservative policy during two consecutive epochs, the follower sets are associated with the less conservative policy for the next epoch. Subsequently, if the more conservative policy outperforms the less conservative policy during any epoch, the follower sets are again associated with the more conservative policy. Performance may be measured based the number of cache misses associated with each policy.Type: ApplicationFiled: September 30, 2016Publication date: April 5, 2018Inventors: Seth H. Pugsley, Christopher B. Wilkerson, Roger Gramunt, Jonathan C. Hall, Prabhat Jain
-
Patent number: 9934155Abstract: A method, system, and apparatus may initialize a fixed plurality of page table entries for a fixed plurality of pages in memory, each page having a first size, wherein a linear address for each page table entry corresponds to a physical address and the fixed plurality of pages are aligned. A bit in each of the page table entries for the aligned pages may be set to indicate whether or not the fixed plurality of pages is to be treated as one combined page having a second page size larger than the first page size. Other embodiments are described and claimed.Type: GrantFiled: December 20, 2012Date of Patent: April 3, 2018Assignee: Intel CorporationInventors: Edward Grochowski, Julio Gago, Roger Gramunt, Roger Espasa, Rolf Kassa
-
Patent number: 9891914Abstract: An apparatus and method for performing an efficient scatter operation. For example, one embodiment of a processor comprises: an allocator unit to receive a scatter operation comprising a number of data elements and responsively allocate resources to execute the scatter operation; a memory execution cluster comprising at least a portion of the resources to execute the scatter operation, the resources including one or more store data buffers and one or more store address buffers; and a senior store pipeline to transfer store data elements from the store data buffers to system memory using addresses from the store address buffers prior to retirement of the scatter operation.Type: GrantFiled: April 10, 2015Date of Patent: February 13, 2018Assignee: Intel CorporationInventors: Ramon Matas, Alexey P. Suprun, Roger Gramunt, Chung-Lun Chan, Rammohan Padmanabhan
-
Patent number: 9886396Abstract: In one embodiment, a processor includes a frontend unit having an instruction decoder to receive and to decode instructions of a plurality of threads, an execution unit coupled to the instruction decoder to receive and execute the decoded instructions, and an instruction retirement unit having a retirement logic to receive the instructions from the execution unit and to retire the instructions associated with one or more of the threads that have an instruction or an event pending to be retired. The instruction retirement unit includes a thread arbitration logic to select one of the threads at a time and to dispatch the selected thread to the retirement logic for retirement processing.Type: GrantFiled: December 23, 2014Date of Patent: February 6, 2018Assignee: Intel CorporationInventors: Roger Gramunt, Rammohan Padmanabhan, Ramon Matas, Neal S. Moyer, Benjamin C. Chaffin, Avinash Sodani, Alexey P. Suprun, Vikram S. Sundaram, Chung-Lun Chan, Gerardo A. Fernandez, Julio Gago, Michael S. Yang, Aditya Kesiraju
-
Patent number: 9804842Abstract: An apparatus and method for efficiently managing the architectural state of a processor.Type: GrantFiled: December 23, 2014Date of Patent: October 31, 2017Assignee: INTEL CORPORATIONInventors: Jesus Corbal San Adrian, Dennis R. Bradford, Benjamin C. Chaffin, Taraneh Bahrami, Jonathan C. Hall, Thomas B. Maciukenas, Roger Gramunt, Rohan Sharma
-
Publication number: 20170242698Abstract: A processor includes a logic for stateless capture of data linear addresses (DLA) during precise event based sampling (PEBS) for an out-of-order execution engine. The engine may include a PEBS unit with logic to increment a counter each time an instance of a designated micro-op is retired a reorder buffer, capture output DLA referenced by an instance of the micro-op that executes after the counter overflows, set a captured bit associated with a reorder buffer identifier for the instance of the micro-op, and store a PEBS record in a debug storage when the instance of the micro-op is retired from the reorder buffer. The designated micro-op references a DLA of a memory accessible to the processor.Type: ApplicationFiled: May 8, 2017Publication date: August 24, 2017Inventors: Roger Gramunt, Ramon Matas, Benjamin C. Chaffin, Neal S. Moyer, Rammohan Padmanabhan, Alexey P. Suprun, Matthew G. Smith