Patents by Inventor Mani Azimi
Mani Azimi has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 10048966Abstract: A processor includes an instruction decoder to receive an instruction having a first operand, a second operand, and a third operand, and an execution unit coupled to the instruction decoder to execute the instruction, the execution unit to individually perform a shift operation by at least one bit for each of a plurality of data elements stored in a storage location indicated by the second operand, for each of the data elements that has an overflow in response to the shift-left operation, to carry over the overflow into an adjacent data element based on a first bitmask obtained from the third operand, generating a final result, and to store the final result in a storage location indicated by the first operand.Type: GrantFiled: January 10, 2017Date of Patent: August 14, 2018Assignee: Intel CorporationInventors: Hariharan L. Thantry, Mani Azimi
-
Publication number: 20170242703Abstract: A processor includes an instruction decoder to receive an instruction having a first operand, a second operand, and a third operand, and an execution unit coupled to the instruction decoder to execute the instruction, the execution unit to individually perform a shift operation by at least one bit for each of a plurality of data elements stored in a storage location indicated by the second operand, for each of the data elements that has an overflow in response to the shift-left operation, to carry over the overflow into an adjacent data element based on a first bitmask obtained from the third operand, generating a final result, and to store the final result in a storage location indicated by the first operand.Type: ApplicationFiled: January 10, 2017Publication date: August 24, 2017Inventors: Hariharan L. Thantry, Mani Azimi
-
Patent number: 9674114Abstract: Layout-aware modular decoupled crossbar and router for on-chip interconnects and associated micro-architectures and methods of operation. A crossbar and router architecture called MoDe-X (Modular Decoupled Crossbar) is disclosed that supports 5-port routing for use in 2D mesh interconnects and is implemented through use of decoupled row and column sub-crossbar modules in combination with feeder wiring and control logic that enables routing between ports on the row and column sub-crossbar modules. The corresponding MoDe-X router supports 5-port routing between various router input and output port combinations while reducing both router area and power consumption when compared with a conventional 5×5 crossbar design and implementation. The MoDe-X micro-architecture can be configured to support both single and dual local port injection configurations.Type: GrantFiled: February 9, 2012Date of Patent: June 6, 2017Assignee: Intel CorporationInventors: Dongkook Park, Aniruddha Vaidya, Akhilesh Kumar, Mani Azimi
-
Patent number: 9606797Abstract: In one embodiment, the present invention includes a processor with a vector execution unit to execute a vector instruction on a vector having a plurality of individual data elements, where the vector instruction is of a first width and the vector execution unit is of a smaller width. The processor further includes a control logic coupled to the vector execution unit to compress a number of execution cycles consumed in execution of the vector instruction when at least some of the individual data elements are not to be operated on by the vector instruction. Other embodiments are described and claimed.Type: GrantFiled: December 21, 2012Date of Patent: March 28, 2017Assignee: Intel CorporationInventors: Aniruddha S. Vaidya, Anahita Shayesteh, Dong Hyuk Woo, Saikat Saharoy, Mani Azimi
-
Patent number: 9542186Abstract: A processor includes an instruction decoder to receive an instruction having a first operand, a second operand, and a third operand, and an execution unit coupled to the instruction decoder to execute the instruction, the execution unit to individually perform a shift operation by at least one bit for each of a plurality of data elements stored in a storage location indicated by the second operand, for each of the data elements that has an overflow in response to the shift-left operation, to carry over the overflow into an adjacent data element based on a first bitmask obtained from the third operand, generating a final result, and to store the final result in a storage location indicated by the first operand.Type: GrantFiled: October 6, 2015Date of Patent: January 10, 2017Assignee: Intel CorporationInventors: Hariharan L. Thantry, Mani Azimi
-
Patent number: 9424031Abstract: Various embodiments are generally directed to overcoming limitations of vector registers in their use with bit-parallel string matching algorithms. An apparatus includes a processor element; and logic to receive a pattern comprising a first string of elements to employ in a string matching operation, instantiate a test bitmask in a first vector register of the processor element, the first vector register comprising multiple lanes, copy bit values at MSB bit positions of the multiple lanes of the first vector register to a first vector mask as a vector value, bit-shift the vector value as a scalar value, bit-shift the first vector register, employ the vector value of the first vector mask to selectively fill LSB bit positions of lanes of a second vector register of the processor element; and OR the second vector register into the first vector register. Other embodiments are described and claimed.Type: GrantFiled: March 13, 2013Date of Patent: August 23, 2016Assignee: INTEL CORPORATIONInventors: Hariharan Thantry, Mani Azimi
-
Patent number: 9424191Abstract: An apparatus of an aspect includes a plurality of cores. The plurality of cores are logically grouped into a plurality of clusters. A cluster sharing map-based coherence directory is coupled with the plurality of cores and is to track sharing of data among the plurality of cores. The cluster sharing map-based coherence directory includes a tag array to store corresponding pairs of addresses and cluster identifiers. Each of the addresses is to identify data. Each of the cluster identifiers is to identify one of the clusters. The cluster sharing map-based coherence directory also includes a cluster sharing map array to store cluster sharing maps. Each of the cluster sharing maps corresponds to one of the pairs of addresses and cluster identifiers. Each of the cluster sharing maps is to indicate intra-cluster sharing of data identified by the corresponding address within a cluster identified by the corresponding cluster identifier.Type: GrantFiled: June 29, 2012Date of Patent: August 23, 2016Assignee: Intel CorporationInventors: Naveen Cherukuri, Mani Azimi
-
Patent number: 9418011Abstract: In one embodiment, the present invention includes a processor comprising a page tracker buffer (PTB), the PTB including a plurality of entries to store an address to a cache page and to store a signature to track an access to each cache line of the cache page, and a PTB handler, the PTB handler to load entries into the PTB and to update the signature. Other embodiments are also described and claimed.Type: GrantFiled: June 23, 2010Date of Patent: August 16, 2016Assignee: Intel CorporationInventors: Livio B. Soares, Naveen Cherukuri, Akhilesh Kumar, Mani Azimi
-
Publication number: 20160026466Abstract: A processor includes an instruction decoder to receive an instruction having a first operand, a second operand, and a third operand, and an execution unit coupled to the instruction decoder to execute the instruction, the execution unit to individually perform a shift operation by at least one bit for each of a plurality of data elements stored in a storage location indicated by the second operand, for each of the data elements that has an overflow in response to the shift-left operation, to carry over the overflow into an adjacent data element based on a first bitmask obtained from the third operand, generating a final result, and to store the final result in a storage location indicated by the first operand.Type: ApplicationFiled: October 6, 2015Publication date: January 28, 2016Inventors: Hariharan L. Thantry, Mani Azimi
-
Patent number: 9152419Abstract: A processor includes an instruction decoder to receive an instruction having a first operand, a second operand, and a third operand, and an execution unit coupled to the instruction decoder to execute the instruction, the execution unit to individually perform a shift operation by at least one bit for each of a plurality of data elements stored in a storage location indicated by the second operand, for each of the data elements that has an overflow in response to the shift-left operation, to carry over the overflow into an adjacent data element based on a first bitmask obtained from the third operand, generating a final result, and to store the final result in a storage location indicated by the first operand.Type: GrantFiled: December 18, 2012Date of Patent: October 6, 2015Assignee: Intel CorporationInventors: Hariharan L. Thantry, Mani Azimi
-
Patent number: 8990506Abstract: In one embodiment, the present invention includes a cache memory including cache lines that each have a tag field including a state portion to store a cache coherency state of data stored in the line and a weight portion to store a weight corresponding to a relative importance of the data. In various implementations, the weight can be based on the cache coherency state and a recency of usage of the data. Other embodiments are described and claimed.Type: GrantFiled: December 16, 2009Date of Patent: March 24, 2015Assignee: Intel CorporationInventors: Naveen Cherukuri, Dennis W. Brzezinski, Ioannis T. Schoinas, Anahita Shayesteh, Akhilesh Kumar, Mani Azimi
-
Publication number: 20140376557Abstract: Layout-aware modular decoupled crossbar and router for on-chip interconnects and associated micro-architectures and methods of operation. A crossbar and router architecture called MoDe-X (Modular Decoupled Crossbar) is disclosed that supports 5-port routing for use in 2D mesh interconnects and is implemented through use of decoupled row and column sub-crossbar modules in combination with feeder wiring and control logic that enables routing between ports on the row and column sub-crossbar modules. The corresponding MoDe-X router supports 5-port routing between various router input and output port combinations while reducing both router area and power consumption when compared with a conventional 5×5 crossbar design and implementation. The MoDe-X micro-architecture can be configured to support both single and dual local port injection configurations.Type: ApplicationFiled: February 9, 2012Publication date: December 25, 2014Inventors: Dongkonk Park, Aniruddha Vaidya, Akhilesh Kumar, Mani Azimi
-
Publication number: 20140281371Abstract: Various embodiments are generally directed to overcoming limitations of vector registers in their use with bit-parallel string matching algorithms. An apparatus includes a processor element; and logic to receive a pattern comprising a first string of elements to employ in a string matching operation, instantiate a test bitmask in a first vector register of the processor element, the first vector register comprising multiple lanes, copy bit values at MSB bit positions of the multiple lanes of the first vector register to a first vector mask as a vector value, bit-shift the vector value as a scalar value, bit-shift the first vector register, employ the vector value of the first vector mask to selectively fill LSB bit positions of lanes of a second vector register of the processor element; and OR the second vector register into the first vector register. Other embodiments are described and claimed.Type: ApplicationFiled: March 13, 2013Publication date: September 18, 2014Inventors: HARIHARAN THANTRY, MANI AZIMI
-
Publication number: 20140181477Abstract: In one embodiment, the present invention includes a processor with a vector execution unit to execute a vector instruction on a vector having a plurality of individual data elements, where the vector instruction is of a first width and the vector execution unit is of a smaller width. The processor further includes a control logic coupled to the vector execution unit to compress a number of execution cycles consumed in execution of the vector instruction when at least some of the individual data elements are not to be operated on by the vector instruction. Other embodiments are described and claimed.Type: ApplicationFiled: December 21, 2012Publication date: June 26, 2014Inventors: Aniruddha S. Vaidya, Anahita Shayesteh, Dong Hyuk Woo, Saikat Saharoy, Mani Azimi
-
Publication number: 20140173255Abstract: A processor includes an instruction decoder to receive an instruction having a first operand, a second operand, and a third operand, and an execution unit coupled to the instruction decoder to execute the instruction, the execution unit to individually perform a shift operation by at least one bit for each of a plurality of data elements stored in a storage location indicated by the second operand, for each of the data elements that has an overflow in response to the shift-left operation, to carry over the overflow into an adjacent data element based on a first bitmask obtained from the third operand, generating a final result, and to store the final result in a storage location indicated by the first operand.Type: ApplicationFiled: December 18, 2012Publication date: June 19, 2014Inventors: Hariharan L. Thantry, Mani Azimi
-
Patent number: 8683136Abstract: An apparatus and method are described for performing history-based prefetching. For example a method according to one embodiment comprises: determining if a previous access signature exists in memory for a memory page associated with a current stream; if the previous access signature exists, reading the previous access signature from memory; and issuing prefetch operations using the previous access signature.Type: GrantFiled: December 22, 2010Date of Patent: March 25, 2014Assignee: Intel CorporationInventors: Naveen Cherukuri, Mani Azimi
-
Publication number: 20140006714Abstract: An apparatus of an aspect includes a plurality of cores. The plurality of cores are logically grouped into a plurality of clusters. A cluster sharing map-based coherence directory is coupled with the plurality of cores and is to track sharing of data among the plurality of cores. The cluster sharing map-based coherence directory includes a tag array to store corresponding pairs of addresses and cluster identifiers. Each of the addresses is to identify data. Each of the cluster identifiers is to identify one of the clusters. The cluster sharing map-based coherence directory also includes a cluster sharing map array to store cluster sharing maps. Each of the cluster sharing maps corresponds to one of the pairs of addresses and cluster identifiers. Each of the cluster sharing maps is to indicate intra-cluster sharing of data identified by the corresponding address within a cluster identified by the corresponding cluster identifier.Type: ApplicationFiled: June 29, 2012Publication date: January 2, 2014Inventors: Naveen Cherukuri, Mani Azimi
-
Publication number: 20120166733Abstract: An apparatus and method are described for performing history-based prefetching. For example a method according to one embodiment comprises: determining if a previous access signature exists in memory for a memory page associated with a current stream; if the previous access signature exists, reading the previous access signature from memory; and issuing prefetch operations using the previous access signature.Type: ApplicationFiled: December 22, 2010Publication date: June 28, 2012Inventors: Naveen Cherukuri, Mani Azimi
-
Publication number: 20110320762Abstract: In one embodiment, the present invention includes a processor comprising a page tracker buffer (PTB), the PTB including a plurality of entries to store an address to a cache page and to store a signature to track an access to each cache line of the cache page, and a PTB handler, the PTB handler to load entries into the PTB and to update the signature. Other embodiments are also described and claimed.Type: ApplicationFiled: June 23, 2010Publication date: December 29, 2011Inventors: Livio B. Soares, Naveen Cherukuri, Akhilesh Kumar, Mani Azimi
-
Publication number: 20110145506Abstract: In one embodiment, the present invention includes a cache memory including cache lines that each have a tag field including a state portion to store a cache coherency state of data stored in the line and a weight portion to store a weight corresponding to a relative importance of the data. In various implementations, the weight can be based on the cache coherency state and a recency of usage of the data. Other embodiments are described and claimed.Type: ApplicationFiled: December 16, 2009Publication date: June 16, 2011Inventors: Naveen Cherukuri, Dennis W. Brzezinski, Ioannis T. Schoinas, Anahita Shayesteh, Akhilesh Kumar, Mani Azimi