Patents by Inventor Ayal Zaks
Ayal Zaks has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 11579881Abstract: Disclosed embodiments relate to instructions for vector operations with immediate values. In one example, a system includes a memory and a processor that includes fetch circuitry to fetch the instruction from a code storage, the instruction including an opcode, a destination identifier to specify a destination vector register, a first immediate, and a write mask identifier to specify a write mask register, the write mask register including at least one bit corresponding to each destination vector register element, the at least one bit to specify whether the destination vector register element is masked or unmasked, decode circuitry to decode the fetched instruction, and execution circuitry to execute the decoded instruction, to, use the write mask register to determine unmasked elements of the destination vector register, and, when the opcode specifies to broadcast, broadcast the first immediate to one or more unmasked vector elements of the destination vector register.Type: GrantFiled: June 29, 2017Date of Patent: February 14, 2023Assignee: Intel CorporationInventors: Gadi Haber, Robert Valentine, Ayal Zaks, Jesus Corbal San Adrian
-
Publication number: 20190004801Abstract: Disclosed embodiments relate to instructions for vector operations with immediate values. In one example, a system includes a memory and a processor that includes fetch circuitry to fetch the instruction from a code storage, the instruction including an opcode, a destination identifier to specify a destination vector register, a first immediate, and a write mask identifier to specify a write mask register, the write mask register including at least one bit corresponding to each destination vector register element, the at least one bit to specify whether the destination vector register element is masked or unmasked, decode circuitry to decode the fetched instruction, and execution circuitry to execute the decoded instruction, to, use the write mask register to determine unmasked elements of the destination vector register, and, when the opcode specifies to broadcast, broadcast the first immediate to one or more unmasked vector elements of the destination vector register.Type: ApplicationFiled: June 29, 2017Publication date: January 3, 2019Inventors: Gadi Haber, Robert Valentine, Ayal Zaks, Jesus Corbal San Adrian
-
Patent number: 10140210Abstract: An apparatus and method for determining whether data needed for one or more operations is stored in a cache and scheduling the operations for execution based on the determination. For example, one embodiment of a processor comprises: a hierarchy of cache levels for caching data including at least a level 1 (L1) cache; cache occupancy determination logic to determine whether data associated with one or more subsequent operations is stored in one of the cache levels; and scheduling logic to schedule execution of the subsequent operations based on the determination of whether data associated with the subsequent operations is stored in the cache levels.Type: GrantFiled: September 24, 2013Date of Patent: November 27, 2018Assignee: Intel CorporationInventors: Ayal Zaks, Robert Valentine, Arie Narkis
-
Patent number: 9298460Abstract: Systems and methods are disclosed for enhancing the throughput of a processor by minimizing the number of transfers of data associated with data transfer between a register file and a memory stack. The register file used by a processor running an application is partitioned into a number of blocks. A subset of the blocks of the register file is defined in an application binary interface enabling the subset to be pre-allocated and exposed to the application binary interface. Optionally, blocks other than the subset are not exposed to the application binary interface so that the data relating to application function switch or a context switch is not transferred between the unexposed blocks and a memory stack.Type: GrantFiled: November 29, 2011Date of Patent: March 29, 2016Assignee: International Business Machines CorporationInventors: Revital Eres, Amit Golander, Nadav Levison, Sagi Manole, Ayal Zaks
-
Publication number: 20150089139Abstract: An apparatus and method for determining whether data needed for one or more operations is stored in a cache and scheduling the operations for execution based on the determination. For example, one embodiment of a processor comprises: a hierarchy of cache levels for caching data including at least a level 1 (L1) cache; cache occupancy determination logic to determine whether data associated with one or more subsequent operations is stored in one of the cache levels; and scheduling logic to schedule execution of the subsequent operations based on the determination of whether data associated with the subsequent operations is stored in the cache levels.Type: ApplicationFiled: September 24, 2013Publication date: March 26, 2015Inventors: Ayal Zaks, Robert Valentine, Arie Narkis
-
Publication number: 20140189330Abstract: Branch instructions are provided for improved execution performance. The branch instruction includes one or more paths that are marked as a safe path for execution. If a marked path is executed based on a branch prediction, the execution continues until completion after it is determined that the other path is the correct path.Type: ApplicationFiled: December 27, 2012Publication date: July 3, 2014Inventors: Ayal Zaks, Robert Valentine, Lihu Rappoport
-
Patent number: 8756580Abstract: Dynamic determination of affinity between fields of structure may be determined based on accesses to the same instance. The affinity may be utilized in determining a data layout of a structure so as to optimize performance of a target program. The affinity determination may be an estimation based upon a trace of an execution of the target program. Access relation between proximate accesses to fields of the same instance may be utilized to estimate an optimized data layout of the structure.Type: GrantFiled: March 23, 2010Date of Patent: June 17, 2014Assignee: International Business Machines CorporationInventors: Alon Dayan, David Joel Edelsohm, Olga Golovanevsky, Ayal Zaks
-
Patent number: 8726252Abstract: A compiler of a single instruction multiple data (SIMD) information handling system (IHS) identifies “if-then-else” statements that offer opportunity for conditional branch conversion. The SIMD IHS employs a processor or processors to execute the executable program. During execution, the processor generates and updates SIMD lane mask information to track and manage the conditional branch loops of the executing program. The processor saves branch addresses and employs SIMD lane masks to identify conditional branch loops with different branch conditions than previous conditional branch loops. The processor may reduce SIMD IHS processing time during processing of compiled code of the original “if-then-else” statements. The processor continues processing next statements inline after all SIMD lanes are complete, while providing speculative and parallel processing capability for multiple data operations of the executable program.Type: GrantFiled: January 28, 2011Date of Patent: May 13, 2014Assignee: International Business Machines CorporationInventors: Alexandre E. Eichenberger, Brian Flachs, Dorit Nuzman, Ira Rosen, Ulrich Weigand, Ayal Zaks
-
Patent number: 8713549Abstract: A method for vectorization of a block of code is provided. The method comprises receiving a first block of code as input; and converting the first block of code into at least a second block of code and a third block of code. The first block of code accesses a first set of memory addresses that are potentially misaligned. The second block of code performs conditional leaping address incrementation to selectively access a first subset of the first set of memory addresses. The third block of code accesses a second subset of the first set of memory addresses starting from an aligned memory address, simultaneously accessing multiple memory addresses at a time. No memory address belongs to both the first subset and the second subset of memory addresses.Type: GrantFiled: September 7, 2012Date of Patent: April 29, 2014Assignee: International Business Machines CorporationInventors: Dorit Nuzman, Ira Rosen, Ayal Zaks
-
Patent number: 8627304Abstract: A method for vectorization of a block of code is provided. The method comprises receiving a first block of code as input; and converting the first block of code into at least a second block of code and a third block of code. The first block of code accesses a first set of memory addresses that are potentially misaligned. The second block of code performs conditional leaping address incrementation to selectively access a first subset of the first set of memory addresses. The third block of code accesses a second subset of the first set of memory addresses starting from an aligned memory address, simultaneously accessing multiple memory addresses at a time. No memory address belongs to both the first subset and the second subset of memory addresses.Type: GrantFiled: July 28, 2009Date of Patent: January 7, 2014Assignee: International Business Machines CorporationInventors: Dorit Nuzman, Ira Rosen, Ayal Zaks
-
Publication number: 20130138922Abstract: Systems and methods are disclosed for enhancing the throughput of a processor by minimizing the number of transfers of data associated with data transfer between a register file and a memory stack. The register file used by a processor running an application is partitioned into a number of blocks. A subset of the blocks of the register file is defined in an application binary interface enabling the subset to be pre-allocated and exposed to the application binary interface. Optionally, blocks other than the subset are not exposed to the application binary interface so that the data relating to application function switch or a context switch is not transferred between the unexposed blocks and a memory stack.Type: ApplicationFiled: November 29, 2011Publication date: May 30, 2013Applicant: International Business Machines CorporationInventors: Revital Eres, Amit Golander, Nadav Levison, Sagi Manole, Ayal Zaks
-
Patent number: 8359291Abstract: A data layout optimization may utilize affinity estimation between pairs of fields of a record in a computer program. The affinity estimation may be determined based on a trace of an execution and in view of actual processing entities performing each access to the fields. The disclosed subject matter may be configured to be aware of a specific architecture of a target computer having a plurality of processing entities, executing the program so as to provide an improved affinity estimation which may take into account both false sharing issues, spatial locality improvement and the like.Type: GrantFiled: June 8, 2010Date of Patent: January 22, 2013Assignee: International Business Machines CorporationInventors: Alon Dayan, David Joel Edelsohn, Olga Golovanevsky, Ayal Zaks
-
Patent number: 8359435Abstract: A method for computing includes executing a program, including multiple cacheable lines of executable code, on a processor having a software-managed cache. A run-time cache management routine running on the processor is used to assemble a profile of inter-line jumps occurring in the software-managed cache while executing the program. Based on the profile, an optimized layout of the lines in the code is computed, and the lines of the program are re-ordered in accordance with the optimized layout while continuing to execute the program.Type: GrantFiled: December 16, 2009Date of Patent: January 22, 2013Assignee: International Business Machines CorporationInventors: Revital Erez, Brian Flachs, Mark Richard Nutter, John Kevin Patrick O'Brien, Ulrich Weigand, Ayal Zaks
-
Publication number: 20130013666Abstract: A data transmission optimization method and system. The method comprises analyzing program code to identify data access calls in the program code, using one or more processor; determining whether a first data access call is for retrieving target data stored in a data structure with a plurality of fields, wherein the target data is stored in one or more target fields of the data structure; determining whether servicing the first data access call will result in transfer of non-target data stored in one or more non-target fields in the data structure; and replacing the first data access call with a second data access call, wherein servicing the second data access call will result in transfer of the target data and minimizes the transfer of non-target data.Type: ApplicationFiled: July 7, 2011Publication date: January 10, 2013Applicant: International Business Machines CorporationInventors: Muli Ben-Yehuda, Daniel Citron, Itzhack Goldberg, Nadav Har'El, Dorit Nuzman, Ayal Zaks
-
Publication number: 20120331453Abstract: A method for vectorization of a block of code is provided. The method comprises receiving a first block of code as input; and converting the first block of code into at least a second block of code and a third block of code. The first block of code accesses a first set of memory addresses that are potentially misaligned. The second block of code performs conditional leaping address incrementation to selectively access a first subset of the first set of memory addresses. The third block of code accesses a second subset of the first set of memory addresses starting from an aligned memory address, simultaneously accessing multiple memory addresses at a time. No memory address belongs to both the first subset and the second subset of memory addresses.Type: ApplicationFiled: September 7, 2012Publication date: December 27, 2012Applicant: INTERNATIONAL BUSINESS MACHINESInventors: Dorit Nuzman, Ira Rosen, Ayal Zaks
-
Publication number: 20120198425Abstract: A compiler of a single instruction multiple data (SIMD) information handling system (IHS) identifies “if-then-else” statements that offer opportunity for conditional branch conversion. The compiler converts those “if-then-else” statements into “conditional branch and prepare” statements as well as “branch return” statements. The compiler compiles source code file information containing “if-then-else” statement opportunities into compiled code, namely an executable program. The SIMD IHS employs a processor or processors to execute the executable program. During execution, the processor generates and updates SIMD lane mask information to track and manage the conditional branch loops of the executing program. The processor saves branch addresses and employs SIMD lane masks to identify conditional branch loops with different branch conditions than previous conditional branch loops. The processor may reduce SIMD IHS processing time during processing of compiled code of the original “if-then-else” statements.Type: ApplicationFiled: January 28, 2011Publication date: August 2, 2012Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Alexandre E. Eichenberger, Brian Flachs, Dorit Nuzman, Ira Rosen, Ulrich Weigand, Ayal Zaks
-
Patent number: 8136107Abstract: A method for managing multiple values assigned to a variable during various stages of a software pipelined process executed in a computing environment. The method comprises allocating two or more slots in a vector register to two or more values associated with said variable during two or more stages of a pipeline process; and rotating values in each slot responsive to an instruction.Type: GrantFiled: October 24, 2007Date of Patent: March 13, 2012Assignee: International Business Machines CorporationInventor: Ayal Zaks
-
Publication number: 20110302561Abstract: A data layout optimization may utilize affinity estimation between paris of fields of a record in a computer program. The affinity estimation may be determined based on a trace of an execution and in view of actual processing entities performing each access to the fields. The disclosed subject matter may be configured to be aware of a specific architecture of a target computer having a plurality of processing entities, executing the program so as to provide an improved affinity estimation which may take into account both false sharing issues, spatial locality improvement and the like.Type: ApplicationFiled: June 8, 2010Publication date: December 8, 2011Applicant: International Business Machines CorporationInventors: Alon Dayan, David Joel Edelsohn, Olga Golovanevsky, Ayal Zaks
-
Publication number: 20110239197Abstract: Dynamic determination of affinity between fields of structure may be determined based on accesses to the same instance. The affinity may be utilized in determining a data layout of a structure so as to optimize performance of a target program. The affinity determination may be an estimation based upon a trace of an execution of the target program. Access relation between proximate accesses to fields of the same instance may be utilized to estimate an optimized data layout of the structure.Type: ApplicationFiled: March 23, 2010Publication date: September 29, 2011Applicant: International Business Machines CorporationInventors: Alon Dayan, David Joel Edelsohn, Olga Golovanevsky, Ayal Zaks
-
Publication number: 20110145503Abstract: A method for computing includes executing a program, including multiple cacheable lines of executable code, on a processor having a software-managed cache. A run-time cache management routine running on the processor is used to assemble a profile of inter-line jumps occurring in the software-managed cache while executing the program. Based on the profile, an optimized layout of the lines in the code is computed, and the lines of the program are re-ordered in accordance with the optimized layout while continuing to execute the program.Type: ApplicationFiled: December 16, 2009Publication date: June 16, 2011Applicant: International Business Machines CorporationInventors: Revital Erez, Brian Flachs, Mark Richard Nutter, John Kevin Patrick O'Brien, Ulrich Weigand, Ayal Zaks