Patents by Inventor Ayal Zaks

Ayal Zaks has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11579881
    Abstract: Disclosed embodiments relate to instructions for vector operations with immediate values. In one example, a system includes a memory and a processor that includes fetch circuitry to fetch the instruction from a code storage, the instruction including an opcode, a destination identifier to specify a destination vector register, a first immediate, and a write mask identifier to specify a write mask register, the write mask register including at least one bit corresponding to each destination vector register element, the at least one bit to specify whether the destination vector register element is masked or unmasked, decode circuitry to decode the fetched instruction, and execution circuitry to execute the decoded instruction, to, use the write mask register to determine unmasked elements of the destination vector register, and, when the opcode specifies to broadcast, broadcast the first immediate to one or more unmasked vector elements of the destination vector register.
    Type: Grant
    Filed: June 29, 2017
    Date of Patent: February 14, 2023
    Assignee: Intel Corporation
    Inventors: Gadi Haber, Robert Valentine, Ayal Zaks, Jesus Corbal San Adrian
  • Publication number: 20190004801
    Abstract: Disclosed embodiments relate to instructions for vector operations with immediate values. In one example, a system includes a memory and a processor that includes fetch circuitry to fetch the instruction from a code storage, the instruction including an opcode, a destination identifier to specify a destination vector register, a first immediate, and a write mask identifier to specify a write mask register, the write mask register including at least one bit corresponding to each destination vector register element, the at least one bit to specify whether the destination vector register element is masked or unmasked, decode circuitry to decode the fetched instruction, and execution circuitry to execute the decoded instruction, to, use the write mask register to determine unmasked elements of the destination vector register, and, when the opcode specifies to broadcast, broadcast the first immediate to one or more unmasked vector elements of the destination vector register.
    Type: Application
    Filed: June 29, 2017
    Publication date: January 3, 2019
    Inventors: Gadi Haber, Robert Valentine, Ayal Zaks, Jesus Corbal San Adrian
  • Patent number: 10140210
    Abstract: An apparatus and method for determining whether data needed for one or more operations is stored in a cache and scheduling the operations for execution based on the determination. For example, one embodiment of a processor comprises: a hierarchy of cache levels for caching data including at least a level 1 (L1) cache; cache occupancy determination logic to determine whether data associated with one or more subsequent operations is stored in one of the cache levels; and scheduling logic to schedule execution of the subsequent operations based on the determination of whether data associated with the subsequent operations is stored in the cache levels.
    Type: Grant
    Filed: September 24, 2013
    Date of Patent: November 27, 2018
    Assignee: Intel Corporation
    Inventors: Ayal Zaks, Robert Valentine, Arie Narkis
  • Patent number: 9298460
    Abstract: Systems and methods are disclosed for enhancing the throughput of a processor by minimizing the number of transfers of data associated with data transfer between a register file and a memory stack. The register file used by a processor running an application is partitioned into a number of blocks. A subset of the blocks of the register file is defined in an application binary interface enabling the subset to be pre-allocated and exposed to the application binary interface. Optionally, blocks other than the subset are not exposed to the application binary interface so that the data relating to application function switch or a context switch is not transferred between the unexposed blocks and a memory stack.
    Type: Grant
    Filed: November 29, 2011
    Date of Patent: March 29, 2016
    Assignee: International Business Machines Corporation
    Inventors: Revital Eres, Amit Golander, Nadav Levison, Sagi Manole, Ayal Zaks
  • Publication number: 20150089139
    Abstract: An apparatus and method for determining whether data needed for one or more operations is stored in a cache and scheduling the operations for execution based on the determination. For example, one embodiment of a processor comprises: a hierarchy of cache levels for caching data including at least a level 1 (L1) cache; cache occupancy determination logic to determine whether data associated with one or more subsequent operations is stored in one of the cache levels; and scheduling logic to schedule execution of the subsequent operations based on the determination of whether data associated with the subsequent operations is stored in the cache levels.
    Type: Application
    Filed: September 24, 2013
    Publication date: March 26, 2015
    Inventors: Ayal Zaks, Robert Valentine, Arie Narkis
  • Publication number: 20140189330
    Abstract: Branch instructions are provided for improved execution performance. The branch instruction includes one or more paths that are marked as a safe path for execution. If a marked path is executed based on a branch prediction, the execution continues until completion after it is determined that the other path is the correct path.
    Type: Application
    Filed: December 27, 2012
    Publication date: July 3, 2014
    Inventors: Ayal Zaks, Robert Valentine, Lihu Rappoport
  • Patent number: 8756580
    Abstract: Dynamic determination of affinity between fields of structure may be determined based on accesses to the same instance. The affinity may be utilized in determining a data layout of a structure so as to optimize performance of a target program. The affinity determination may be an estimation based upon a trace of an execution of the target program. Access relation between proximate accesses to fields of the same instance may be utilized to estimate an optimized data layout of the structure.
    Type: Grant
    Filed: March 23, 2010
    Date of Patent: June 17, 2014
    Assignee: International Business Machines Corporation
    Inventors: Alon Dayan, David Joel Edelsohm, Olga Golovanevsky, Ayal Zaks
  • Patent number: 8726252
    Abstract: A compiler of a single instruction multiple data (SIMD) information handling system (IHS) identifies “if-then-else” statements that offer opportunity for conditional branch conversion. The SIMD IHS employs a processor or processors to execute the executable program. During execution, the processor generates and updates SIMD lane mask information to track and manage the conditional branch loops of the executing program. The processor saves branch addresses and employs SIMD lane masks to identify conditional branch loops with different branch conditions than previous conditional branch loops. The processor may reduce SIMD IHS processing time during processing of compiled code of the original “if-then-else” statements. The processor continues processing next statements inline after all SIMD lanes are complete, while providing speculative and parallel processing capability for multiple data operations of the executable program.
    Type: Grant
    Filed: January 28, 2011
    Date of Patent: May 13, 2014
    Assignee: International Business Machines Corporation
    Inventors: Alexandre E. Eichenberger, Brian Flachs, Dorit Nuzman, Ira Rosen, Ulrich Weigand, Ayal Zaks
  • Patent number: 8713549
    Abstract: A method for vectorization of a block of code is provided. The method comprises receiving a first block of code as input; and converting the first block of code into at least a second block of code and a third block of code. The first block of code accesses a first set of memory addresses that are potentially misaligned. The second block of code performs conditional leaping address incrementation to selectively access a first subset of the first set of memory addresses. The third block of code accesses a second subset of the first set of memory addresses starting from an aligned memory address, simultaneously accessing multiple memory addresses at a time. No memory address belongs to both the first subset and the second subset of memory addresses.
    Type: Grant
    Filed: September 7, 2012
    Date of Patent: April 29, 2014
    Assignee: International Business Machines Corporation
    Inventors: Dorit Nuzman, Ira Rosen, Ayal Zaks
  • Patent number: 8627304
    Abstract: A method for vectorization of a block of code is provided. The method comprises receiving a first block of code as input; and converting the first block of code into at least a second block of code and a third block of code. The first block of code accesses a first set of memory addresses that are potentially misaligned. The second block of code performs conditional leaping address incrementation to selectively access a first subset of the first set of memory addresses. The third block of code accesses a second subset of the first set of memory addresses starting from an aligned memory address, simultaneously accessing multiple memory addresses at a time. No memory address belongs to both the first subset and the second subset of memory addresses.
    Type: Grant
    Filed: July 28, 2009
    Date of Patent: January 7, 2014
    Assignee: International Business Machines Corporation
    Inventors: Dorit Nuzman, Ira Rosen, Ayal Zaks
  • Publication number: 20130138922
    Abstract: Systems and methods are disclosed for enhancing the throughput of a processor by minimizing the number of transfers of data associated with data transfer between a register file and a memory stack. The register file used by a processor running an application is partitioned into a number of blocks. A subset of the blocks of the register file is defined in an application binary interface enabling the subset to be pre-allocated and exposed to the application binary interface. Optionally, blocks other than the subset are not exposed to the application binary interface so that the data relating to application function switch or a context switch is not transferred between the unexposed blocks and a memory stack.
    Type: Application
    Filed: November 29, 2011
    Publication date: May 30, 2013
    Applicant: International Business Machines Corporation
    Inventors: Revital Eres, Amit Golander, Nadav Levison, Sagi Manole, Ayal Zaks
  • Patent number: 8359291
    Abstract: A data layout optimization may utilize affinity estimation between pairs of fields of a record in a computer program. The affinity estimation may be determined based on a trace of an execution and in view of actual processing entities performing each access to the fields. The disclosed subject matter may be configured to be aware of a specific architecture of a target computer having a plurality of processing entities, executing the program so as to provide an improved affinity estimation which may take into account both false sharing issues, spatial locality improvement and the like.
    Type: Grant
    Filed: June 8, 2010
    Date of Patent: January 22, 2013
    Assignee: International Business Machines Corporation
    Inventors: Alon Dayan, David Joel Edelsohn, Olga Golovanevsky, Ayal Zaks
  • Patent number: 8359435
    Abstract: A method for computing includes executing a program, including multiple cacheable lines of executable code, on a processor having a software-managed cache. A run-time cache management routine running on the processor is used to assemble a profile of inter-line jumps occurring in the software-managed cache while executing the program. Based on the profile, an optimized layout of the lines in the code is computed, and the lines of the program are re-ordered in accordance with the optimized layout while continuing to execute the program.
    Type: Grant
    Filed: December 16, 2009
    Date of Patent: January 22, 2013
    Assignee: International Business Machines Corporation
    Inventors: Revital Erez, Brian Flachs, Mark Richard Nutter, John Kevin Patrick O'Brien, Ulrich Weigand, Ayal Zaks
  • Publication number: 20130013666
    Abstract: A data transmission optimization method and system. The method comprises analyzing program code to identify data access calls in the program code, using one or more processor; determining whether a first data access call is for retrieving target data stored in a data structure with a plurality of fields, wherein the target data is stored in one or more target fields of the data structure; determining whether servicing the first data access call will result in transfer of non-target data stored in one or more non-target fields in the data structure; and replacing the first data access call with a second data access call, wherein servicing the second data access call will result in transfer of the target data and minimizes the transfer of non-target data.
    Type: Application
    Filed: July 7, 2011
    Publication date: January 10, 2013
    Applicant: International Business Machines Corporation
    Inventors: Muli Ben-Yehuda, Daniel Citron, Itzhack Goldberg, Nadav Har'El, Dorit Nuzman, Ayal Zaks
  • Publication number: 20120331453
    Abstract: A method for vectorization of a block of code is provided. The method comprises receiving a first block of code as input; and converting the first block of code into at least a second block of code and a third block of code. The first block of code accesses a first set of memory addresses that are potentially misaligned. The second block of code performs conditional leaping address incrementation to selectively access a first subset of the first set of memory addresses. The third block of code accesses a second subset of the first set of memory addresses starting from an aligned memory address, simultaneously accessing multiple memory addresses at a time. No memory address belongs to both the first subset and the second subset of memory addresses.
    Type: Application
    Filed: September 7, 2012
    Publication date: December 27, 2012
    Applicant: INTERNATIONAL BUSINESS MACHINES
    Inventors: Dorit Nuzman, Ira Rosen, Ayal Zaks
  • Publication number: 20120198425
    Abstract: A compiler of a single instruction multiple data (SIMD) information handling system (IHS) identifies “if-then-else” statements that offer opportunity for conditional branch conversion. The compiler converts those “if-then-else” statements into “conditional branch and prepare” statements as well as “branch return” statements. The compiler compiles source code file information containing “if-then-else” statement opportunities into compiled code, namely an executable program. The SIMD IHS employs a processor or processors to execute the executable program. During execution, the processor generates and updates SIMD lane mask information to track and manage the conditional branch loops of the executing program. The processor saves branch addresses and employs SIMD lane masks to identify conditional branch loops with different branch conditions than previous conditional branch loops. The processor may reduce SIMD IHS processing time during processing of compiled code of the original “if-then-else” statements.
    Type: Application
    Filed: January 28, 2011
    Publication date: August 2, 2012
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Alexandre E. Eichenberger, Brian Flachs, Dorit Nuzman, Ira Rosen, Ulrich Weigand, Ayal Zaks
  • Patent number: 8136107
    Abstract: A method for managing multiple values assigned to a variable during various stages of a software pipelined process executed in a computing environment. The method comprises allocating two or more slots in a vector register to two or more values associated with said variable during two or more stages of a pipeline process; and rotating values in each slot responsive to an instruction.
    Type: Grant
    Filed: October 24, 2007
    Date of Patent: March 13, 2012
    Assignee: International Business Machines Corporation
    Inventor: Ayal Zaks
  • Publication number: 20110302561
    Abstract: A data layout optimization may utilize affinity estimation between paris of fields of a record in a computer program. The affinity estimation may be determined based on a trace of an execution and in view of actual processing entities performing each access to the fields. The disclosed subject matter may be configured to be aware of a specific architecture of a target computer having a plurality of processing entities, executing the program so as to provide an improved affinity estimation which may take into account both false sharing issues, spatial locality improvement and the like.
    Type: Application
    Filed: June 8, 2010
    Publication date: December 8, 2011
    Applicant: International Business Machines Corporation
    Inventors: Alon Dayan, David Joel Edelsohn, Olga Golovanevsky, Ayal Zaks
  • Publication number: 20110239197
    Abstract: Dynamic determination of affinity between fields of structure may be determined based on accesses to the same instance. The affinity may be utilized in determining a data layout of a structure so as to optimize performance of a target program. The affinity determination may be an estimation based upon a trace of an execution of the target program. Access relation between proximate accesses to fields of the same instance may be utilized to estimate an optimized data layout of the structure.
    Type: Application
    Filed: March 23, 2010
    Publication date: September 29, 2011
    Applicant: International Business Machines Corporation
    Inventors: Alon Dayan, David Joel Edelsohn, Olga Golovanevsky, Ayal Zaks
  • Publication number: 20110145503
    Abstract: A method for computing includes executing a program, including multiple cacheable lines of executable code, on a processor having a software-managed cache. A run-time cache management routine running on the processor is used to assemble a profile of inter-line jumps occurring in the software-managed cache while executing the program. Based on the profile, an optimized layout of the lines in the code is computed, and the lines of the program are re-ordered in accordance with the optimized layout while continuing to execute the program.
    Type: Application
    Filed: December 16, 2009
    Publication date: June 16, 2011
    Applicant: International Business Machines Corporation
    Inventors: Revital Erez, Brian Flachs, Mark Richard Nutter, John Kevin Patrick O'Brien, Ulrich Weigand, Ayal Zaks