Patents by Inventor Mani Azimi

Mani Azimi has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 10048966
    Abstract: A processor includes an instruction decoder to receive an instruction having a first operand, a second operand, and a third operand, and an execution unit coupled to the instruction decoder to execute the instruction, the execution unit to individually perform a shift operation by at least one bit for each of a plurality of data elements stored in a storage location indicated by the second operand, for each of the data elements that has an overflow in response to the shift-left operation, to carry over the overflow into an adjacent data element based on a first bitmask obtained from the third operand, generating a final result, and to store the final result in a storage location indicated by the first operand.
    Type: Grant
    Filed: January 10, 2017
    Date of Patent: August 14, 2018
    Assignee: Intel Corporation
    Inventors: Hariharan L. Thantry, Mani Azimi
  • Publication number: 20170242703
    Abstract: A processor includes an instruction decoder to receive an instruction having a first operand, a second operand, and a third operand, and an execution unit coupled to the instruction decoder to execute the instruction, the execution unit to individually perform a shift operation by at least one bit for each of a plurality of data elements stored in a storage location indicated by the second operand, for each of the data elements that has an overflow in response to the shift-left operation, to carry over the overflow into an adjacent data element based on a first bitmask obtained from the third operand, generating a final result, and to store the final result in a storage location indicated by the first operand.
    Type: Application
    Filed: January 10, 2017
    Publication date: August 24, 2017
    Inventors: Hariharan L. Thantry, Mani Azimi
  • Patent number: 9674114
    Abstract: Layout-aware modular decoupled crossbar and router for on-chip interconnects and associated micro-architectures and methods of operation. A crossbar and router architecture called MoDe-X (Modular Decoupled Crossbar) is disclosed that supports 5-port routing for use in 2D mesh interconnects and is implemented through use of decoupled row and column sub-crossbar modules in combination with feeder wiring and control logic that enables routing between ports on the row and column sub-crossbar modules. The corresponding MoDe-X router supports 5-port routing between various router input and output port combinations while reducing both router area and power consumption when compared with a conventional 5×5 crossbar design and implementation. The MoDe-X micro-architecture can be configured to support both single and dual local port injection configurations.
    Type: Grant
    Filed: February 9, 2012
    Date of Patent: June 6, 2017
    Assignee: Intel Corporation
    Inventors: Dongkook Park, Aniruddha Vaidya, Akhilesh Kumar, Mani Azimi
  • Patent number: 9606797
    Abstract: In one embodiment, the present invention includes a processor with a vector execution unit to execute a vector instruction on a vector having a plurality of individual data elements, where the vector instruction is of a first width and the vector execution unit is of a smaller width. The processor further includes a control logic coupled to the vector execution unit to compress a number of execution cycles consumed in execution of the vector instruction when at least some of the individual data elements are not to be operated on by the vector instruction. Other embodiments are described and claimed.
    Type: Grant
    Filed: December 21, 2012
    Date of Patent: March 28, 2017
    Assignee: Intel Corporation
    Inventors: Aniruddha S. Vaidya, Anahita Shayesteh, Dong Hyuk Woo, Saikat Saharoy, Mani Azimi
  • Patent number: 9542186
    Abstract: A processor includes an instruction decoder to receive an instruction having a first operand, a second operand, and a third operand, and an execution unit coupled to the instruction decoder to execute the instruction, the execution unit to individually perform a shift operation by at least one bit for each of a plurality of data elements stored in a storage location indicated by the second operand, for each of the data elements that has an overflow in response to the shift-left operation, to carry over the overflow into an adjacent data element based on a first bitmask obtained from the third operand, generating a final result, and to store the final result in a storage location indicated by the first operand.
    Type: Grant
    Filed: October 6, 2015
    Date of Patent: January 10, 2017
    Assignee: Intel Corporation
    Inventors: Hariharan L. Thantry, Mani Azimi
  • Patent number: 9424031
    Abstract: Various embodiments are generally directed to overcoming limitations of vector registers in their use with bit-parallel string matching algorithms. An apparatus includes a processor element; and logic to receive a pattern comprising a first string of elements to employ in a string matching operation, instantiate a test bitmask in a first vector register of the processor element, the first vector register comprising multiple lanes, copy bit values at MSB bit positions of the multiple lanes of the first vector register to a first vector mask as a vector value, bit-shift the vector value as a scalar value, bit-shift the first vector register, employ the vector value of the first vector mask to selectively fill LSB bit positions of lanes of a second vector register of the processor element; and OR the second vector register into the first vector register. Other embodiments are described and claimed.
    Type: Grant
    Filed: March 13, 2013
    Date of Patent: August 23, 2016
    Assignee: INTEL CORPORATION
    Inventors: Hariharan Thantry, Mani Azimi
  • Patent number: 9424191
    Abstract: An apparatus of an aspect includes a plurality of cores. The plurality of cores are logically grouped into a plurality of clusters. A cluster sharing map-based coherence directory is coupled with the plurality of cores and is to track sharing of data among the plurality of cores. The cluster sharing map-based coherence directory includes a tag array to store corresponding pairs of addresses and cluster identifiers. Each of the addresses is to identify data. Each of the cluster identifiers is to identify one of the clusters. The cluster sharing map-based coherence directory also includes a cluster sharing map array to store cluster sharing maps. Each of the cluster sharing maps corresponds to one of the pairs of addresses and cluster identifiers. Each of the cluster sharing maps is to indicate intra-cluster sharing of data identified by the corresponding address within a cluster identified by the corresponding cluster identifier.
    Type: Grant
    Filed: June 29, 2012
    Date of Patent: August 23, 2016
    Assignee: Intel Corporation
    Inventors: Naveen Cherukuri, Mani Azimi
  • Patent number: 9418011
    Abstract: In one embodiment, the present invention includes a processor comprising a page tracker buffer (PTB), the PTB including a plurality of entries to store an address to a cache page and to store a signature to track an access to each cache line of the cache page, and a PTB handler, the PTB handler to load entries into the PTB and to update the signature. Other embodiments are also described and claimed.
    Type: Grant
    Filed: June 23, 2010
    Date of Patent: August 16, 2016
    Assignee: Intel Corporation
    Inventors: Livio B. Soares, Naveen Cherukuri, Akhilesh Kumar, Mani Azimi
  • Publication number: 20160026466
    Abstract: A processor includes an instruction decoder to receive an instruction having a first operand, a second operand, and a third operand, and an execution unit coupled to the instruction decoder to execute the instruction, the execution unit to individually perform a shift operation by at least one bit for each of a plurality of data elements stored in a storage location indicated by the second operand, for each of the data elements that has an overflow in response to the shift-left operation, to carry over the overflow into an adjacent data element based on a first bitmask obtained from the third operand, generating a final result, and to store the final result in a storage location indicated by the first operand.
    Type: Application
    Filed: October 6, 2015
    Publication date: January 28, 2016
    Inventors: Hariharan L. Thantry, Mani Azimi
  • Patent number: 9152419
    Abstract: A processor includes an instruction decoder to receive an instruction having a first operand, a second operand, and a third operand, and an execution unit coupled to the instruction decoder to execute the instruction, the execution unit to individually perform a shift operation by at least one bit for each of a plurality of data elements stored in a storage location indicated by the second operand, for each of the data elements that has an overflow in response to the shift-left operation, to carry over the overflow into an adjacent data element based on a first bitmask obtained from the third operand, generating a final result, and to store the final result in a storage location indicated by the first operand.
    Type: Grant
    Filed: December 18, 2012
    Date of Patent: October 6, 2015
    Assignee: Intel Corporation
    Inventors: Hariharan L. Thantry, Mani Azimi
  • Patent number: 8990506
    Abstract: In one embodiment, the present invention includes a cache memory including cache lines that each have a tag field including a state portion to store a cache coherency state of data stored in the line and a weight portion to store a weight corresponding to a relative importance of the data. In various implementations, the weight can be based on the cache coherency state and a recency of usage of the data. Other embodiments are described and claimed.
    Type: Grant
    Filed: December 16, 2009
    Date of Patent: March 24, 2015
    Assignee: Intel Corporation
    Inventors: Naveen Cherukuri, Dennis W. Brzezinski, Ioannis T. Schoinas, Anahita Shayesteh, Akhilesh Kumar, Mani Azimi
  • Publication number: 20140376557
    Abstract: Layout-aware modular decoupled crossbar and router for on-chip interconnects and associated micro-architectures and methods of operation. A crossbar and router architecture called MoDe-X (Modular Decoupled Crossbar) is disclosed that supports 5-port routing for use in 2D mesh interconnects and is implemented through use of decoupled row and column sub-crossbar modules in combination with feeder wiring and control logic that enables routing between ports on the row and column sub-crossbar modules. The corresponding MoDe-X router supports 5-port routing between various router input and output port combinations while reducing both router area and power consumption when compared with a conventional 5×5 crossbar design and implementation. The MoDe-X micro-architecture can be configured to support both single and dual local port injection configurations.
    Type: Application
    Filed: February 9, 2012
    Publication date: December 25, 2014
    Inventors: Dongkonk Park, Aniruddha Vaidya, Akhilesh Kumar, Mani Azimi
  • Publication number: 20140281371
    Abstract: Various embodiments are generally directed to overcoming limitations of vector registers in their use with bit-parallel string matching algorithms. An apparatus includes a processor element; and logic to receive a pattern comprising a first string of elements to employ in a string matching operation, instantiate a test bitmask in a first vector register of the processor element, the first vector register comprising multiple lanes, copy bit values at MSB bit positions of the multiple lanes of the first vector register to a first vector mask as a vector value, bit-shift the vector value as a scalar value, bit-shift the first vector register, employ the vector value of the first vector mask to selectively fill LSB bit positions of lanes of a second vector register of the processor element; and OR the second vector register into the first vector register. Other embodiments are described and claimed.
    Type: Application
    Filed: March 13, 2013
    Publication date: September 18, 2014
    Inventors: HARIHARAN THANTRY, MANI AZIMI
  • Publication number: 20140181477
    Abstract: In one embodiment, the present invention includes a processor with a vector execution unit to execute a vector instruction on a vector having a plurality of individual data elements, where the vector instruction is of a first width and the vector execution unit is of a smaller width. The processor further includes a control logic coupled to the vector execution unit to compress a number of execution cycles consumed in execution of the vector instruction when at least some of the individual data elements are not to be operated on by the vector instruction. Other embodiments are described and claimed.
    Type: Application
    Filed: December 21, 2012
    Publication date: June 26, 2014
    Inventors: Aniruddha S. Vaidya, Anahita Shayesteh, Dong Hyuk Woo, Saikat Saharoy, Mani Azimi
  • Publication number: 20140173255
    Abstract: A processor includes an instruction decoder to receive an instruction having a first operand, a second operand, and a third operand, and an execution unit coupled to the instruction decoder to execute the instruction, the execution unit to individually perform a shift operation by at least one bit for each of a plurality of data elements stored in a storage location indicated by the second operand, for each of the data elements that has an overflow in response to the shift-left operation, to carry over the overflow into an adjacent data element based on a first bitmask obtained from the third operand, generating a final result, and to store the final result in a storage location indicated by the first operand.
    Type: Application
    Filed: December 18, 2012
    Publication date: June 19, 2014
    Inventors: Hariharan L. Thantry, Mani Azimi
  • Patent number: 8683136
    Abstract: An apparatus and method are described for performing history-based prefetching. For example a method according to one embodiment comprises: determining if a previous access signature exists in memory for a memory page associated with a current stream; if the previous access signature exists, reading the previous access signature from memory; and issuing prefetch operations using the previous access signature.
    Type: Grant
    Filed: December 22, 2010
    Date of Patent: March 25, 2014
    Assignee: Intel Corporation
    Inventors: Naveen Cherukuri, Mani Azimi
  • Publication number: 20140006714
    Abstract: An apparatus of an aspect includes a plurality of cores. The plurality of cores are logically grouped into a plurality of clusters. A cluster sharing map-based coherence directory is coupled with the plurality of cores and is to track sharing of data among the plurality of cores. The cluster sharing map-based coherence directory includes a tag array to store corresponding pairs of addresses and cluster identifiers. Each of the addresses is to identify data. Each of the cluster identifiers is to identify one of the clusters. The cluster sharing map-based coherence directory also includes a cluster sharing map array to store cluster sharing maps. Each of the cluster sharing maps corresponds to one of the pairs of addresses and cluster identifiers. Each of the cluster sharing maps is to indicate intra-cluster sharing of data identified by the corresponding address within a cluster identified by the corresponding cluster identifier.
    Type: Application
    Filed: June 29, 2012
    Publication date: January 2, 2014
    Inventors: Naveen Cherukuri, Mani Azimi
  • Publication number: 20120166733
    Abstract: An apparatus and method are described for performing history-based prefetching. For example a method according to one embodiment comprises: determining if a previous access signature exists in memory for a memory page associated with a current stream; if the previous access signature exists, reading the previous access signature from memory; and issuing prefetch operations using the previous access signature.
    Type: Application
    Filed: December 22, 2010
    Publication date: June 28, 2012
    Inventors: Naveen Cherukuri, Mani Azimi
  • Publication number: 20110320762
    Abstract: In one embodiment, the present invention includes a processor comprising a page tracker buffer (PTB), the PTB including a plurality of entries to store an address to a cache page and to store a signature to track an access to each cache line of the cache page, and a PTB handler, the PTB handler to load entries into the PTB and to update the signature. Other embodiments are also described and claimed.
    Type: Application
    Filed: June 23, 2010
    Publication date: December 29, 2011
    Inventors: Livio B. Soares, Naveen Cherukuri, Akhilesh Kumar, Mani Azimi
  • Publication number: 20110145506
    Abstract: In one embodiment, the present invention includes a cache memory including cache lines that each have a tag field including a state portion to store a cache coherency state of data stored in the line and a weight portion to store a weight corresponding to a relative importance of the data. In various implementations, the weight can be based on the cache coherency state and a recency of usage of the data. Other embodiments are described and claimed.
    Type: Application
    Filed: December 16, 2009
    Publication date: June 16, 2011
    Inventors: Naveen Cherukuri, Dennis W. Brzezinski, Ioannis T. Schoinas, Anahita Shayesteh, Akhilesh Kumar, Mani Azimi