Patents by Inventor Zeev Sperber

Zeev Sperber has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 9716646
    Abstract: In accordance with embodiments disclosed herein, there is provided systems and methods for using thresholds to gate timing packet generation in a tracing system (TS). For example, the method may include generating and outputting a trace data (TD) packet into a packet log. The method also includes generating and outputting a timing packet (TM) corresponding to the TD packet into the packet log when a number of clock cycles elapsed since an output of a previous TM packet exceeds a clock threshold value.
    Type: Grant
    Filed: July 17, 2014
    Date of Patent: July 25, 2017
    Assignee: Intel Corporation
    Inventors: Tsvika Kurts, Beeman C. Strong, Ofer Levy, Gabi Malka, Zeev Sperber
  • Publication number: 20170199726
    Abstract: A method is described that involves executing a first instruction with a functional unit. The first instruction is a multiply-add instruction. The method further includes executing a second instruction with the functional unit. The second instruction is a round instruction.
    Type: Application
    Filed: March 27, 2017
    Publication date: July 13, 2017
    Applicant: lntel Corporation
    Inventors: Cristina S. Anderson, Zeev Sperber, Simon Rubanovich, Benny Eitan, Amit Gradstein
  • Publication number: 20170192934
    Abstract: Methods and apparatus are disclosed for using an index array and finite state machine for scatter/gather operations. Embodiment of apparatus may comprise: decode logic to decode a scatter/gather instruction and generate a set of micro-operations, and an index array to hold a set of indices and a corresponding set of mask elements. A finite state machine facilitates the gather operation. Address generation logic generates an address from an index of the set of indices for at least each of the corresponding mask elements having a first value. An address is accessed to load a corresponding data element if the mask element had the first value. The data element is written at an in-register position in a destination vector register according to a respective in-register position the index. Values of corresponding mask elements are changed from the first value to a second value responsive to completion of their respective loads.
    Type: Application
    Filed: February 6, 2015
    Publication date: July 6, 2017
    Inventors: Zeev Sperber, Robert Valentine, Guy Patkin, Stanislav Shwartsman, Shlomo Raikin, Igor Yanover, Gal Ofir
  • Patent number: 9696997
    Abstract: A method of an aspect includes generating real time instruction trace (RTIT) packets for a first logical processor of a processor. The RTIT packets indicate a flow of software executed by the first logical processor. The RTIT packets are stored in an RTIT queue corresponding to the first logical processor. The RTIT packets are transferred from the RTIT queue to memory predominantly with firmware of the processor. Other methods, apparatus, and systems are also disclosed.
    Type: Grant
    Filed: January 11, 2016
    Date of Patent: July 4, 2017
    Assignee: Intel Corporation
    Inventors: Tsvika Kurts, Ofer Levy, Itamar Kazachinsky, Gabi Malka, Zeev Sperber, Jason W. Brandt
  • Publication number: 20170185377
    Abstract: An example processor includes a register and an ADD low functional unit. The register stores first, second, and third floating point (FP) values. The ADD low functional unit receives a request to perform an ADD low operation and, responsive to the request: adds the first FP value with the second FP value to obtain a first sum value; rounds the first sum value to generate an ADD value; adds the first FP value with the second FP value to obtain a second sum value; subtracts the ADD value from the second sum value to generate a difference value; normalizes the difference value to obtain a normalized difference value; rounds the normalized difference value to generate an ADD low value; and sends the ADD low value to an application.
    Type: Application
    Filed: December 23, 2015
    Publication date: June 29, 2017
    Inventors: Cristina S. Anderson, Marius A. Cornea-Hasegan, Elmoustapha Ould-Ahmed-Vall, Robert Valentine, Jesus Corbal, Nikita Astafev, Mark J. Charney, Milind B. Girkar, Amit Gradstein, Simon Rubanovich, Zeev Sperber
  • Publication number: 20170185379
    Abstract: An example processor includes a register and a fused multiply-add (FMA) low functional unit. The register stores first, second, and third floating point (FP) values. The FMA low functional unit receives a request to perform an FMA low operation: multiplies the first FP value with the second FP value to obtain a first product value; adds the first product with the third FP value to generate a first result value; rounds the first result to generate a first FMA value; multiplies the first FP value with the second FP value to obtain a second product value; adds the second product value with the third FP value to generate a second result value; and subtracts the FMA value from the second result value to obtain a third result value, which can then be normalized and rounded (FMA low result) and sent the FMA low result to an application.
    Type: Application
    Filed: December 23, 2015
    Publication date: June 29, 2017
    Inventors: Cristina S. Anderson, Marius A. Cornea-Hasegan, Elmoustapha Ould-Ahmed-Vall, Robert Valentine, Jesus Corbal, Nikita Astafev, Mark J. Charney, Milind B. Girkar, Amit Gradstein, Simon Rubanovich, Zeev Sperber
  • Patent number: 9678751
    Abstract: Embodiments of systems, apparatuses, and methods for performing in a computer processor vector packed horizontal partial sum of packed data elements in response to a single vector packed horizontal sum instruction that includes a destination vector register operand, a source vector register operand, and an opcode are described.
    Type: Grant
    Filed: December 23, 2011
    Date of Patent: June 13, 2017
    Assignee: Intel Corporation
    Inventors: Elmoustapha Ould-Ahmed-Vall, Moustapha Hagog, Robert Valentine, Amit Gradstein, Simon Rubanovich, Zeev Sperber, Boris Ginzburg, Ziv Aviv
  • Publication number: 20170161068
    Abstract: A method and apparatus for including in a processor instructions for performing logical-comparison and branch support operations on packed or unpacked data. In one embodiment, instruction decode logic decodes instructions for an execution unit to operate on packed data elements including logical comparisons. A register file including 128-bit packed data registers stores packed single-precision floating point (SPFP) and packed integer data elements. The logical comparisons may include comparison of SPFP data elements and comparison of integer data elements and setting at least one bit to indicate the results. Based on these comparisons, branch support actions are taken. Such branch support actions may include setting the at least one bit, which in turn may be utilized by a branching unit in response to a branch instruction. Alternatively, the branch support actions may include branching to an indicated target code location.
    Type: Application
    Filed: November 7, 2016
    Publication date: June 8, 2017
    Inventors: Rajiv Kapoor, Ronen Zohar, Mark Buxton, Zeev Sperber, Koby Gottlieb
  • Patent number: 9672034
    Abstract: In-lane vector shuffle operations are described. In one embodiment a shuffle instruction specifies a field of per-lane control bits, a source operand and a destination operand, these operands having corresponding lanes, each lane divided into corresponding portions of multiple data elements. Sets of data elements are selected from corresponding portions of every lane of the source operand according to per-lane control bits. Elements of these sets are copied to specified fields in corresponding portions of every lane of the destination operand. Another embodiment of the shuffle instruction also specifies a second source operand, all operands having corresponding lanes divided into multiple data elements. A set selected according to per-lane control bits contains data elements from every lane portion of a first source operand and data elements from every corresponding lane portion of the second source operand. Set elements are copied to specified fields in every lane of the destination operand.
    Type: Grant
    Filed: March 15, 2013
    Date of Patent: June 6, 2017
    Assignee: Intel Corporation
    Inventors: Zeev Sperber, Robert Valentine, Benny Eitan, Doron Orenstein
  • Patent number: 9658850
    Abstract: An apparatus is described having instruction execution logic circuitry. The instruction execution logic circuitry has input vector element routing circuitry to perform the following for each of three different instructions: for each of a plurality of output vector element locations, route into an output vector element location an input vector element from one of a plurality of input vector element locations that are available to source the output vector element. The output vector element and each of the input vector element locations are one of three available bit widths for the three different instructions. The apparatus further includes masking layer circuitry coupled to the input vector element routing circuitry to mask a data structure created by the input vector routing element circuitry. The masking layer circuitry is designed to mask at three different levels of granularity that correspond to the three available bit widths.
    Type: Grant
    Filed: December 23, 2011
    Date of Patent: May 23, 2017
    Assignee: Intel Corporation
    Inventors: Elmoustapha Ould-Ahmed-Vall, Robert Valentine, Jesus Corbal, Bret L. Toll, Mark J. Charney, Zeev Sperber, Amit Gradstein
  • Publication number: 20170123799
    Abstract: In one embodiment, a processor includes a fetch logic to fetch instructions, a decode logic to decode the instructions, and an execution logic to execute at least some of the instructions. The decode logic may identify a first instruction having a first immediate value, accumulate the first immediate value with a folded immediate value associated with a first operand of the first instruction, and prevent the first instruction from provision to the execution logic, such that the first instruction is not to be executed within the execution logic. Other embodiments are described and claimed.
    Type: Application
    Filed: November 3, 2015
    Publication date: May 4, 2017
    Inventors: Zeev Sperber, Tomer Weiner, Amit Gradstein, Simon Rubanovich, Alex Gerber
  • Publication number: 20170123793
    Abstract: In one embodiment, a processor includes a fetch logic to fetch instructions, a decode logic to decode the fetched instructions, and an execution logic to execute at least some of the instructions. The decode logic may determine whether a flag portion of a first instruction to be folded is to be performed, and if not, accumulate a first immediate value of the first instruction with a folded immediate value obtained from an entry of an immediate buffer. Other embodiments are described and claimed.
    Type: Application
    Filed: November 3, 2015
    Publication date: May 4, 2017
    Inventors: Zeev Sperber, Tomer Weiner, Amit Gradstein, Simon Rubanovich, Alex Gerber, Itai Ravid
  • Patent number: 9639354
    Abstract: A method of an aspect includes receiving an instruction indicating a destination storage location. A result is stored in the destination storage location in response to the instruction. The result includes the result including a sequence of at least four non-negative integers. In an aspect, values of the at least four non-negative integers are not calculated using a result of a preceding instruction. Other methods, apparatus, systems, and instructions are disclosed.
    Type: Grant
    Filed: December 22, 2011
    Date of Patent: May 2, 2017
    Assignee: Intel Corporation
    Inventors: Seth Abraham, Robert Valentine, Elmoustapha Ould-Ahmed-Vall, Zeev Sperber, Amit Gradstein
  • Patent number: 9626333
    Abstract: Methods and apparatus are disclosed using an index array and finite state machine for scatter/gather operations. Embodiment of apparatus may comprise: decode logic to decode scatter/gather instructions and generate micro-operations. An index array holds a set of indices and a corresponding set of mask elements. A finite state machine facilitates the scatter operation. Address generation logic generates an address from an index of the set of indices for at least each of the corresponding mask elements having a first value. Storage is allocated in a buffer for each of the set of addresses being generated. Data elements corresponding to the set of addresses being generated are copied to the buffer. Addresses from the set are accessed to store data elements if a corresponding mask element has said first value and the mask element is changed to a second value responsive to completion of their respective stores.
    Type: Grant
    Filed: June 2, 2012
    Date of Patent: April 18, 2017
    Assignee: Intel Corporation
    Inventors: Zeev Sperber, Robert Valentine, Shlomo Raikin, Stanislav Shwartsman, Gal Ofir, Igor Yanover, Guy Patkin, Levy Ofer
  • Patent number: 9619226
    Abstract: Embodiments of systems, apparatuses, and methods for performing in a computer processor vector packed horizontal add or subtract of packed data elements in response to a single vector packed horizontal add or subtract instruction that includes a destination vector register operand, a source vector register operand, and an opcode are describes.
    Type: Grant
    Filed: December 23, 2011
    Date of Patent: April 11, 2017
    Assignee: Intel Corporation
    Inventors: Mostafa Hagog, Elmoustapha Ould-Ahmed-Vall, Robert Valentine, Amit Gradstein, Simon Rubanovich, Zeev Sperber
  • Patent number: 9619236
    Abstract: An apparatus is described having instruction execution logic circuitry to execute first, second, third and fourth instruction. Both the first instruction and the second instruction insert a first group of input vector elements to one of multiple first non overlapping sections of respective first and second resultant vectors. The first group has a first bit width. Each of the multiple first non overlapping sections have a same bit width as the first group. Both the third instruction and the fourth instruction insert a second group of input vector elements to one of multiple second non overlapping sections of respective third and fourth resultant vectors. The second group has a second bit width that is larger than said first bit width. Each of the multiple second non overlapping sections have a same bit width as the second group.
    Type: Grant
    Filed: December 23, 2011
    Date of Patent: April 11, 2017
    Assignee: Intel Corporation
    Inventors: Elmoustapha Ould-Ahmed-Vall, Robert Valentine, Jesus Corbal, Bret L. Toll, Mark J. Charney, Zeev Sperber, Amit Gradstein
  • Patent number: 9606770
    Abstract: A method is described that involves executing a first instruction with a functional unit. The first instruction is a multiply-add instruction. The method further includes executing a second instruction with the functional unit. The second instruction is a round instruction.
    Type: Grant
    Filed: December 3, 2014
    Date of Patent: March 28, 2017
    Assignee: Intel Corporation
    Inventors: Cristina S. Anderson, Zeev Sperber, Simon Rubanovich, Benny Eitan, Amit Gradstein
  • Publication number: 20170068298
    Abstract: Technologies for local power gate (LPG) interfaces for power-aware operations are described. A system on chip (SoC) includes a first functional unit, a second functional unit, and local power gate (LPG) hardware coupled to the first functional unit and the second functional unit. The LPG hardware is to power gate the first functional unit according to local power states of the LPG hardware. The second functional unit decodes a first instruction to perform a first power-aware operation of a specified length, including computing an execution code path for execution. The second functional unit monitors a current local power state of the LPG hardware, selects a code path based on the current local power state, the specified length, and a specified threshold, and issues a hint to the LPG hardware to power up the first functional unit and continues execution of the first power-aware operation without waiting for the first functional unit to be powered up.
    Type: Application
    Filed: November 17, 2016
    Publication date: March 9, 2017
    Inventors: Michael Mishaeli, Ron Gabor, Robert C. Valentine, Alex Gerber, Zeev Sperber
  • Patent number: 9588764
    Abstract: An apparatus is described that includes instruction execution circuitry to execute first, second, third, and fourth instructions, the first and second instructions select a first group of input vector elements from one of multiple first non-overlapping sections of respective first and second input vectors. Each of the multiple first non-overlapping sections have a same bit width as the first group. Both the third and fourth instructions select a second group of input vector elements from one of multiple second non overlapping sections of respective third and fourth input vectors. The second group has a second bit width that is larger than the first bit width. Each of multiple second non overlapping sections have a same bit width as the second group. The apparatus includes masking layer circuitry to mask the first and second groups at a first granularity and second granularity.
    Type: Grant
    Filed: December 23, 2011
    Date of Patent: March 7, 2017
    Assignee: Intel Corporation
    Inventors: Elmoustapha Ould-Ahmed-Vall, Robert Valentine, Jesus Corbal, Bret L. Toll, Mark J. Charney, Zeev Sperber, Amit Gradstein
  • Patent number: 9582464
    Abstract: Embodiments of systems, apparatuses, and methods for performing in a computer processor vector double block packed sum of absolute differences (SAD) in response to a single vector double block packed sum of absolute differences instruction that includes a destination vector register operand, first and second source operands, an immediate, and an opcode are described.
    Type: Grant
    Filed: December 23, 2011
    Date of Patent: February 28, 2017
    Assignee: Intel Corporation
    Inventors: Elmoustapha Ould-Ahmed-Vall, Mostafa Hagog, Robert Valentine, Amit Gradstein, Simon Rubanovich, Zeev Sperber