Patents by Inventor Omar M. Shaikh

Omar M. Shaikh has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 10725755
    Abstract: Systems, apparatuses, and methods for a hardware and software system to automatically decompose a program into multiple parallel threads are described. In some embodiments, the systems and apparatuses execute a method of original code decomposition and/or generated thread execution.
    Type: Grant
    Filed: June 6, 2017
    Date of Patent: July 28, 2020
    Assignee: Intel Corporation
    Inventors: David J. Sager, Ruchira Sasanka, Ron Gabor, Shlomo Raikin, Joseph Nuzman, Leeor Peled, Jason A. Domer, Ho-Seop Kim, Youfeng Wu, Koichi Yamada, Tin-Fook Ngai, Howard H. Chen, Jayaram Bobba, Jeffrey J. Cook, Omar M. Shaikh, Suresh Srinivas
  • Publication number: 20180060049
    Abstract: Systems, apparatuses, and methods for a hardware and software system to automatically decompose a program into multiple parallel threads are described. In some embodiments, the systems and apparatuses execute a method of original code decomposition and/or generated thread execution.
    Type: Application
    Filed: June 6, 2017
    Publication date: March 1, 2018
    Inventors: DAVID J. SAGER, RUCHIRA SASANKA, RON GABOR, SHLOMO RAIKIN, JOSEPH NUZMAN, LEEOR PELED, JASON A. DOMER, HO-SEOP KIM, YOUFENG WU, KOICHI YAMADA, TIN-FOOK NGAI, HOWARD H. CHEN, JAYARAM BOBBA, JEFFREY J. COOK, OMAR M. SHAIKH, SURESH SRINIVAS
  • Publication number: 20170212825
    Abstract: A hardware profiling mechanism implemented by performance monitoring hardware enables page level automatic binary translation. The hardware during runtime identifies a code page in memory containing potentially optimizable instructions. The hardware requests allocation of a new page in memory associated with the code page, where the new page contains a collection of counters and each of the counters corresponds to one of the instructions in the code page. When the hardware detects a branch instruction having a branch target within the code page, it increments one of the counters that has the same position in the new page as the branch target in the code page. The execution of the code page is repeated and the counters are incremented when branch targets fall within the code page. The hardware then provides the counter values in the new page to a binary translator for binary translation.
    Type: Application
    Filed: January 10, 2017
    Publication date: July 27, 2017
    Inventors: Paul Caprioli, Matthew C. Merten, Muawya M. Al-Otoom, Omar M. Shaikh, Abhay S. Kanhere, Suresh Srinivas, Koichi Yamada, Vivek Thakkar, Pawel Osciak
  • Patent number: 9672019
    Abstract: Systems, apparatuses, and methods for a hardware and software system to automatically decompose a program into multiple parallel threads are described. In some embodiments, the systems and apparatuses execute a method of original code decomposition and/or generated thread execution.
    Type: Grant
    Filed: December 25, 2010
    Date of Patent: June 6, 2017
    Assignee: Intel Corporation
    Inventors: David J. Sager, Ruchira Sasanka, Ron Gabor, Shlomo Raikin, Joseph Nuzman, Leeor Peled, Jason A. Domer, Ho-Seop Kim, Youfeng Wu, Koichi Yamada, Tin-Fook Ngai, Howard H. Chen, Jayaram Bobba, Jeffery J. Cook, Omar M. Shaikh, Suresh Srinivas
  • Patent number: 9652234
    Abstract: A dynamic optimization of code for a processor-specific dynamic binary translation of hot code pages (e.g., frequently executed code pages) may be provided by a run-time translation layer. A method may be provided to use an instruction look-aside buffer (iTLB) to map original code pages and translated code pages. The method may comprise fetching an instruction from an original code page, determining whether the fetched instruction is a first instruction of a new code page and whether the original code page is deprecated. If both determinations return yes, the method may further comprise fetching a next instruction from a translated code page. If either determinations returns no, the method may further comprise decoding the instruction and fetching the next instruction from the original code page.
    Type: Grant
    Filed: September 30, 2011
    Date of Patent: May 16, 2017
    Assignee: Intel Corporation
    Inventors: Paul Caprioli, Martin G. Dixon, Brett L. Toll, Muawya M. Al-Otoom, Omar M. Shaikh
  • Publication number: 20170097826
    Abstract: Systems, apparatuses, and methods for improving TM throughput using a TM region indicator (or color) are described. Through the use of TM region indicators younger TM regions can have their instructions retired while waiting for older TM regions to commit.
    Type: Application
    Filed: December 16, 2016
    Publication date: April 6, 2017
    Inventors: Omar M. Shaikh, Ravi Rajwar, Paul Caprioli, Muawya M. Al-Otoom
  • Publication number: 20170097891
    Abstract: Systems, apparatuses, and methods for improving TM throughput using a TM region indicator (or color) are described. Through the use of TM region indicators younger TM regions can have their instructions retired while waiting for older TM regions to commit.
    Type: Application
    Filed: December 16, 2016
    Publication date: April 6, 2017
    Inventors: Omar M. Shaikh, Ravi Rajwar, Paul Caprioli, Muawya M. Al-Otoom
  • Patent number: 9558127
    Abstract: A processor includes a cache hierarchy and an execution unit. The cache hierarchy includes a lower level cache and a higher level cache. The execution unit includes logic to issue a memory operation to access the cache hierarchy. The lower level cache includes logic to determine that a requested cache line of the memory operation is unavailable in the lower level cache, determine that a line fill buffer of the lower level cache is full, and initiate prefetching of the requested cache line from the higher level cache based upon the determination that the line fill buffer of the lower level cache is full. The line fill buffer is to forward miss requests to the higher level cache.
    Type: Grant
    Filed: September 9, 2014
    Date of Patent: January 31, 2017
    Assignee: Intel Corporation
    Inventors: Stanislav Shwartsman, Robert S. Chappell, Ronak Singhal, Ryan L. Carlson, Raanan Sade, Omar M. Shaikh, Liron Zur, Yiftach Gilad
  • Patent number: 9542191
    Abstract: A hardware profiling mechanism implemented by performance monitoring hardware enables page level automatic binary translation. The hardware during runtime identifies a code page in memory containing potentially optimizable instructions. The hardware requests allocation of a new page in memory associated with the code page, where the new page contains a collection of counters and each of the counters corresponds to one of the instructions in the code page. When the hardware detects a branch instruction having a branch target within the code page, it increments one of the counters that has the same position in the new page as the branch target in the code page. The execution of the code page is repeated and the counters are incremented when branch targets fall within the code page. The hardware then provides the counter values in the new page to a binary translator for binary translation.
    Type: Grant
    Filed: March 30, 2012
    Date of Patent: January 10, 2017
    Assignee: Intel Corporation
    Inventors: Paul Caprioli, Matthew C. Merten, Muawya M. Al-Otoom, Omar M. Shaikh, Abhay S. Kanhere, Suresh Srinivas, Koichi Yamada, Vivek Thakkar, Pawel Osciak
  • Publication number: 20160350221
    Abstract: Systems, apparatuses, and methods for improving TM throughput using a TM region indicator (or color) are described. Through the use of TM region indicators younger TM regions can have their instructions retired while waiting for older TM regions to commit.
    Type: Application
    Filed: August 9, 2016
    Publication date: December 1, 2016
    Inventors: Omar M. Shaikh, Ravi Rajwar, Paul Caprioli, Muawya M. Al-Otoom
  • Patent number: 9411739
    Abstract: Systems, apparatuses, and methods for improving transactional memory (TM) throughput using a TM region indicator (or color) are described. Through the use of TM region indicators younger TM regions can have their instructions retired while waiting for older TM regions to commit. A copy-on-write (COW) buffer may be used to maintain a mapping from checkpointed architectural registers to physical registers, wherein the COW buffer maintains a plurality of register checkpoints for a plurality of TM regions by marking separations between TM regions using pointers, a first pointer to identify a position in the COW buffer of the last committed instruction, a retirement pointer to identify a boundary between a youngest TM region and a currently retiring position.
    Type: Grant
    Filed: November 30, 2012
    Date of Patent: August 9, 2016
    Assignee: Intel Corporation
    Inventors: Omar M. Shaikh, Ravi Rajwar, Paul Caprioli, Muawya M. Al-Otoom
  • Publication number: 20160070651
    Abstract: A processor includes a cache hierarchy and an execution unit. The cache hierarchy includes a lower level cache and a higher level cache. The execution unit includes logic to issue a memory operation to access the cache hierarchy. The lower level cache includes logic to determine that a requested cache line of the memory operation is unavailable in the lower level cache, determine that a line fill buffer of the lower level cache is full, and initiate prefetching of the requested cache line from the higher level cache based upon the determination that the line fill buffer of the lower level cache is full. The line fill buffer is to forward miss requests to the higher level cache.
    Type: Application
    Filed: September 9, 2014
    Publication date: March 10, 2016
    Inventors: Stanislav Shwartsman, Robert S. Chappell, Ronak Singhal, Ryan L. Carlson, Raanan Sade, Omar M. Shaikh, Liron Zur, Yiftach Gilad
  • Patent number: 8826257
    Abstract: A method of memory disambiguation hardware to support software binary translation is provided. This method includes unrolling a set of instructions to be executed within a processor, the set of instructions having a number of memory operations. An original relative order of memory operations is determined. Then, possible reordering problems are detected and identified in software. The reordering problem being when a first memory operation has been reordered prior to and aliases to a second memory operation with respect to the original order of memory operations. The reordering problem is addressed and a relative order of memory operations to the processor is communicated.
    Type: Grant
    Filed: March 30, 2012
    Date of Patent: September 2, 2014
    Assignee: Intel Corporation
    Inventors: Muawya M. Al-Otoom, Paul Caprioli, Abhay S. Kanhere, Arvind Krishnaswamy, Omar M. Shaikh
  • Publication number: 20140156933
    Abstract: Systems, apparatuses, and methods for improving TM throughput using a TM region indicator (or color) are described. Through the use of TM region indicators younger TM regions can have their instructions retired while waiting for older TM regions to commit.
    Type: Application
    Filed: November 30, 2012
    Publication date: June 5, 2014
    Inventors: Omar M. Shaikh, Ravi Rajwar, Paul Caprioli, Muawya M. Al-Otoom
  • Publication number: 20130311758
    Abstract: A hardware profiling mechanism implemented by performance monitoring hardware enables page level automatic binary translation. The hardware during runtime identifies a code page in memory containing potentially optimizable instructions. The hardware requests allocation of a new page in memory associated with the code page, where the new page contains a collection of counters and each of the counters corresponds to one of the instructions in the code page. When the hardware detects a branch instruction having a branch target within the code page, it increments one of the counters that has the same position in the new page as the branch target in the code page. The execution of the code page is repeated and the counters are incremented when branch targets fall within the code page. The hardware then provides the counter values in the new page to a binary translator for binary translation.
    Type: Application
    Filed: March 30, 2012
    Publication date: November 21, 2013
    Inventors: Paul Caprioli, Matthew C. Merten, Muawya M. Al-Otoom, Omar M. Shaikh, Abhay S. Kanhere, Suresh Srinivas, Koichi Yamada, Vivek Thakkar, Pawel Osciak
  • Publication number: 20130305019
    Abstract: A dynamic optimization of code for a processor-specific dynamic binary translation of hot code pages (e.g., frequently executed code pages) may be provided by a run-time translation layer. A method may be provided to use an instruction look-aside buffer (iTLB) to map original code pages and translated code pages. The method may comprise fetching an instruction from an original code page, determining whether the fetched instruction is a first instruction of a new code page and whether the original code page is deprecated. If both determinations return yes, the method may further comprise fetching a next instruction from a translated code page. If either determinations returns no, the method may further comprise decoding the instruction and fetching the next instruction from the original code page.
    Type: Application
    Filed: September 30, 2011
    Publication date: November 14, 2013
    Inventors: Paul Caprioli, Martin G. Dixon, Brett L. Toll, Muawya M. Al-Otoom, Omar M. Shaikh
  • Publication number: 20130262838
    Abstract: A method of memory disambiguation hardware to support software binary translation is provided. This method includes unrolling a set of instructions to be executed within a processor, the set of instructions having a number of memory operations. An original relative order of memory operations is determined. Then, possible reordering problems are detected and identified in software. The reordering problem being when a first memory operation has been reordered prior to and aliases to a second memory operation with respect to the original order of memory operations. The reordering problem is addressed and a relative order of memory operations to the processor is communicated.
    Type: Application
    Filed: March 30, 2012
    Publication date: October 3, 2013
    Inventors: Muawya M. Al-Otoom, Paul Caprioli, Abhay S. Kanhere, Arvind Krishnaswamy, Omar M. Shaikh
  • Publication number: 20110167416
    Abstract: Systems, apparatuses, and methods for a hardware and software system to automatically decompose a program into multiple parallel threads are described. In some embodiments, the systems and apparatuses execute a method of original code decomposition and/or generated thread execution.
    Type: Application
    Filed: December 25, 2010
    Publication date: July 7, 2011
    Inventors: David J. Sager, Ruchira Sasanka, Ron Gabor, Shlomo Raikin, Joseph Nuzman, Leeor Peled, Jason A. Domer, Ho-Seop Kim, Youfeng Wu, Koichi Yamada, Tin-Fook Ngai, Howard H. Chen, Jayaram Bobba, Jeffery J. Cook, Omar M. Shaikh, Suresh Srinivas