Patents by Inventor Zeev Sperber

Zeev Sperber has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Using thresholds to gate timing packet generation in a tracing system

Patent number: 9716646

Abstract: In accordance with embodiments disclosed herein, there is provided systems and methods for using thresholds to gate timing packet generation in a tracing system (TS). For example, the method may include generating and outputting a trace data (TD) packet into a packet log. The method also includes generating and outputting a timing packet (TM) corresponding to the TD packet into the packet log when a number of clock cycles elapsed since an output of a previous TM packet exceeds a clock threshold value.

Type: Grant

Filed: July 17, 2014

Date of Patent: July 25, 2017

Assignee: Intel Corporation

Inventors: Tsvika Kurts, Beeman C. Strong, Ofer Levy, Gabi Malka, Zeev Sperber
MULTIPLY ADD FUNCTIONAL UNIT CAPABLE OF EXECUTING SCALE, ROUND, GETEXP, ROUND, GETMANT, REDUCE, RANGE AND CLASS INSTRUCTIONS

Publication number: 20170199726

Abstract: A method is described that involves executing a first instruction with a functional unit. The first instruction is a multiply-add instruction. The method further includes executing a second instruction with the functional unit. The second instruction is a round instruction.

Type: Application

Filed: March 27, 2017

Publication date: July 13, 2017

Applicant: lntel Corporation

Inventors: Cristina S. Anderson, Zeev Sperber, Simon Rubanovich, Benny Eitan, Amit Gradstein
GATHER USING INDEX ARRAY AND FINITE STATE MACHINE

Publication number: 20170192934

Abstract: Methods and apparatus are disclosed for using an index array and finite state machine for scatter/gather operations. Embodiment of apparatus may comprise: decode logic to decode a scatter/gather instruction and generate a set of micro-operations, and an index array to hold a set of indices and a corresponding set of mask elements. A finite state machine facilitates the gather operation. Address generation logic generates an address from an index of the set of indices for at least each of the corresponding mask elements having a first value. An address is accessed to load a corresponding data element if the mask element had the first value. The data element is written at an in-register position in a destination vector register according to a respective in-register position the index. Values of corresponding mask elements are changed from the first value to a second value responsive to completion of their respective loads.

Type: Application

Filed: February 6, 2015

Publication date: July 6, 2017

Inventors: Zeev Sperber, Robert Valentine, Guy Patkin, Stanislav Shwartsman, Shlomo Raikin, Igor Yanover, Gal Ofir
Real time instruction trace processors, methods, and systems

Patent number: 9696997

Abstract: A method of an aspect includes generating real time instruction trace (RTIT) packets for a first logical processor of a processor. The RTIT packets indicate a flow of software executed by the first logical processor. The RTIT packets are stored in an RTIT queue corresponding to the first logical processor. The RTIT packets are transferred from the RTIT queue to memory predominantly with firmware of the processor. Other methods, apparatus, and systems are also disclosed.

Type: Grant

Filed: January 11, 2016

Date of Patent: July 4, 2017

Assignee: Intel Corporation

Inventors: Tsvika Kurts, Ofer Levy, Itamar Kazachinsky, Gabi Malka, Zeev Sperber, Jason W. Brandt
FLOATING POINT (FP) ADD LOW INSTRUCTIONS FUNCTIONAL UNIT

Publication number: 20170185377

Abstract: An example processor includes a register and an ADD low functional unit. The register stores first, second, and third floating point (FP) values. The ADD low functional unit receives a request to perform an ADD low operation and, responsive to the request: adds the first FP value with the second FP value to obtain a first sum value; rounds the first sum value to generate an ADD value; adds the first FP value with the second FP value to obtain a second sum value; subtracts the ADD value from the second sum value to generate a difference value; normalizes the difference value to obtain a normalized difference value; rounds the normalized difference value to generate an ADD low value; and sends the ADD low value to an application.

Type: Application

Filed: December 23, 2015

Publication date: June 29, 2017

Inventors: Cristina S. Anderson, Marius A. Cornea-Hasegan, Elmoustapha Ould-Ahmed-Vall, Robert Valentine, Jesus Corbal, Nikita Astafev, Mark J. Charney, Milind B. Girkar, Amit Gradstein, Simon Rubanovich, Zeev Sperber
Fused Multiply-Add (FMA) low functional unit

Publication number: 20170185379

Abstract: An example processor includes a register and a fused multiply-add (FMA) low functional unit. The register stores first, second, and third floating point (FP) values. The FMA low functional unit receives a request to perform an FMA low operation: multiplies the first FP value with the second FP value to obtain a first product value; adds the first product with the third FP value to generate a first result value; rounds the first result to generate a first FMA value; multiplies the first FP value with the second FP value to obtain a second product value; adds the second product value with the third FP value to generate a second result value; and subtracts the FMA value from the second result value to obtain a third result value, which can then be normalized and rounded (FMA low result) and sent the FMA low result to an application.

Type: Application

Filed: December 23, 2015

Publication date: June 29, 2017

Inventors: Cristina S. Anderson, Marius A. Cornea-Hasegan, Elmoustapha Ould-Ahmed-Vall, Robert Valentine, Jesus Corbal, Nikita Astafev, Mark J. Charney, Milind B. Girkar, Amit Gradstein, Simon Rubanovich, Zeev Sperber
Systems, apparatuses, and methods for performing a horizontal partial sum in response to a single instruction

Patent number: 9678751

Abstract: Embodiments of systems, apparatuses, and methods for performing in a computer processor vector packed horizontal partial sum of packed data elements in response to a single vector packed horizontal sum instruction that includes a destination vector register operand, a source vector register operand, and an opcode are described.

Type: Grant

Filed: December 23, 2011

Date of Patent: June 13, 2017

Assignee: Intel Corporation

Inventors: Elmoustapha Ould-Ahmed-Vall, Moustapha Hagog, Robert Valentine, Amit Gradstein, Simon Rubanovich, Zeev Sperber, Boris Ginzburg, Ziv Aviv
METHOD AND APPARATUS FOR PERFORMING LOGICAL COMPARE OPERATIONS

Publication number: 20170161068

Abstract: A method and apparatus for including in a processor instructions for performing logical-comparison and branch support operations on packed or unpacked data. In one embodiment, instruction decode logic decodes instructions for an execution unit to operate on packed data elements including logical comparisons. A register file including 128-bit packed data registers stores packed single-precision floating point (SPFP) and packed integer data elements. The logical comparisons may include comparison of SPFP data elements and comparison of integer data elements and setting at least one bit to indicate the results. Based on these comparisons, branch support actions are taken. Such branch support actions may include setting the at least one bit, which in turn may be utilized by a branching unit in response to a branch instruction. Alternatively, the branch support actions may include branching to an indicated target code location.

Type: Application

Filed: November 7, 2016

Publication date: June 8, 2017

Inventors: Rajiv Kapoor, Ronen Zohar, Mark Buxton, Zeev Sperber, Koby Gottlieb
Vector shuffle instructions operating on multiple lanes each having a plurality of data elements using a same set of per-lane control bits

Patent number: 9672034

Abstract: In-lane vector shuffle operations are described. In one embodiment a shuffle instruction specifies a field of per-lane control bits, a source operand and a destination operand, these operands having corresponding lanes, each lane divided into corresponding portions of multiple data elements. Sets of data elements are selected from corresponding portions of every lane of the source operand according to per-lane control bits. Elements of these sets are copied to specified fields in corresponding portions of every lane of the destination operand. Another embodiment of the shuffle instruction also specifies a second source operand, all operands having corresponding lanes divided into multiple data elements. A set selected according to per-lane control bits contains data elements from every lane portion of a first source operand and data elements from every corresponding lane portion of the second source operand. Set elements are copied to specified fields in every lane of the destination operand.

Type: Grant

Filed: March 15, 2013

Date of Patent: June 6, 2017

Assignee: Intel Corporation

Inventors: Zeev Sperber, Robert Valentine, Benny Eitan, Doron Orenstein
Apparatus and method of improved permute instructions

Patent number: 9658850

Abstract: An apparatus is described having instruction execution logic circuitry. The instruction execution logic circuitry has input vector element routing circuitry to perform the following for each of three different instructions: for each of a plurality of output vector element locations, route into an output vector element location an input vector element from one of a plurality of input vector element locations that are available to source the output vector element. The output vector element and each of the input vector element locations are one of three available bit widths for the three different instructions. The apparatus further includes masking layer circuitry coupled to the input vector element routing circuitry to mask a data structure created by the input vector routing element circuitry. The masking layer circuitry is designed to mask at three different levels of granularity that correspond to the three available bit widths.

Type: Grant

Filed: December 23, 2011

Date of Patent: May 23, 2017

Assignee: Intel Corporation

Inventors: Elmoustapha Ould-Ahmed-Vall, Robert Valentine, Jesus Corbal, Bret L. Toll, Mark J. Charney, Zeev Sperber, Amit Gradstein
PERFORMING FOLDING OF IMMEDIATE DATA IN A PROCESSOR

Publication number: 20170123799

Abstract: In one embodiment, a processor includes a fetch logic to fetch instructions, a decode logic to decode the instructions, and an execution logic to execute at least some of the instructions. The decode logic may identify a first instruction having a first immediate value, accumulate the first immediate value with a folded immediate value associated with a first operand of the first instruction, and prevent the first instruction from provision to the execution logic, such that the first instruction is not to be executed within the execution logic. Other embodiments are described and claimed.

Type: Application

Filed: November 3, 2015

Publication date: May 4, 2017

Inventors: Zeev Sperber, Tomer Weiner, Amit Gradstein, Simon Rubanovich, Alex Gerber
ENABLING REMOVAL AND RECONSTRUCTION OF FLAG OPERATIONS IN A PROCESSOR

Publication number: 20170123793

Abstract: In one embodiment, a processor includes a fetch logic to fetch instructions, a decode logic to decode the fetched instructions, and an execution logic to execute at least some of the instructions. The decode logic may determine whether a flag portion of a first instruction to be folded is to be performed, and if not, accumulate a first immediate value of the first instruction with a folded immediate value obtained from an entry of an immediate buffer. Other embodiments are described and claimed.

Type: Application

Filed: November 3, 2015

Publication date: May 4, 2017

Inventors: Zeev Sperber, Tomer Weiner, Amit Gradstein, Simon Rubanovich, Alex Gerber, Itai Ravid
Packed data rearrangement control indexes precursors generation processors, methods, systems, and instructions

Patent number: 9639354

Abstract: A method of an aspect includes receiving an instruction indicating a destination storage location. A result is stored in the destination storage location in response to the instruction. The result includes the result including a sequence of at least four non-negative integers. In an aspect, values of the at least four non-negative integers are not calculated using a result of a preceding instruction. Other methods, apparatus, systems, and instructions are disclosed.

Type: Grant

Filed: December 22, 2011

Date of Patent: May 2, 2017

Assignee: Intel Corporation

Inventors: Seth Abraham, Robert Valentine, Elmoustapha Ould-Ahmed-Vall, Zeev Sperber, Amit Gradstein
Scatter using index array and finite state machine

Patent number: 9626333

Abstract: Methods and apparatus are disclosed using an index array and finite state machine for scatter/gather operations. Embodiment of apparatus may comprise: decode logic to decode scatter/gather instructions and generate micro-operations. An index array holds a set of indices and a corresponding set of mask elements. A finite state machine facilitates the scatter operation. Address generation logic generates an address from an index of the set of indices for at least each of the corresponding mask elements having a first value. Storage is allocated in a buffer for each of the set of addresses being generated. Data elements corresponding to the set of addresses being generated are copied to the buffer. Addresses from the set are accessed to store data elements if a corresponding mask element has said first value and the mask element is changed to a second value responsive to completion of their respective stores.

Type: Grant

Filed: June 2, 2012

Date of Patent: April 18, 2017

Assignee: Intel Corporation

Inventors: Zeev Sperber, Robert Valentine, Shlomo Raikin, Stanislav Shwartsman, Gal Ofir, Igor Yanover, Guy Patkin, Levy Ofer
Systems, apparatuses, and methods for performing a horizontal add or subtract in response to a single instruction

Patent number: 9619226

Abstract: Embodiments of systems, apparatuses, and methods for performing in a computer processor vector packed horizontal add or subtract of packed data elements in response to a single vector packed horizontal add or subtract instruction that includes a destination vector register operand, a source vector register operand, and an opcode are describes.

Type: Grant

Filed: December 23, 2011

Date of Patent: April 11, 2017

Assignee: Intel Corporation

Inventors: Mostafa Hagog, Elmoustapha Ould-Ahmed-Vall, Robert Valentine, Amit Gradstein, Simon Rubanovich, Zeev Sperber
Apparatus and method of improved insert instructions

Patent number: 9619236

Abstract: An apparatus is described having instruction execution logic circuitry to execute first, second, third and fourth instruction. Both the first instruction and the second instruction insert a first group of input vector elements to one of multiple first non overlapping sections of respective first and second resultant vectors. The first group has a first bit width. Each of the multiple first non overlapping sections have a same bit width as the first group. Both the third instruction and the fourth instruction insert a second group of input vector elements to one of multiple second non overlapping sections of respective third and fourth resultant vectors. The second group has a second bit width that is larger than said first bit width. Each of the multiple second non overlapping sections have a same bit width as the second group.

Type: Grant

Filed: December 23, 2011

Date of Patent: April 11, 2017

Assignee: Intel Corporation

Inventors: Elmoustapha Ould-Ahmed-Vall, Robert Valentine, Jesus Corbal, Bret L. Toll, Mark J. Charney, Zeev Sperber, Amit Gradstein
Multiply add functional unit capable of executing SCALE, ROUND, GETEXP, ROUND, GETMANT, REDUCE, RANGE and CLASS instructions

Patent number: 9606770

Abstract: A method is described that involves executing a first instruction with a functional unit. The first instruction is a multiply-add instruction. The method further includes executing a second instruction with the functional unit. The second instruction is a round instruction.

Type: Grant

Filed: December 3, 2014

Date of Patent: March 28, 2017

Assignee: Intel Corporation

Inventors: Cristina S. Anderson, Zeev Sperber, Simon Rubanovich, Benny Eitan, Amit Gradstein
LOCAL POWER GATE (LPG) INTERFACES FOR POWER-AWARE OPERATIONS

Publication number: 20170068298

Abstract: Technologies for local power gate (LPG) interfaces for power-aware operations are described. A system on chip (SoC) includes a first functional unit, a second functional unit, and local power gate (LPG) hardware coupled to the first functional unit and the second functional unit. The LPG hardware is to power gate the first functional unit according to local power states of the LPG hardware. The second functional unit decodes a first instruction to perform a first power-aware operation of a specified length, including computing an execution code path for execution. The second functional unit monitors a current local power state of the LPG hardware, selects a code path based on the current local power state, the specified length, and a specified threshold, and issues a hint to the LPG hardware to power up the first functional unit and continues execution of the first power-aware operation without waiting for the first functional unit to be powered up.

Type: Application

Filed: November 17, 2016

Publication date: March 9, 2017

Inventors: Michael Mishaeli, Ron Gabor, Robert C. Valentine, Alex Gerber, Zeev Sperber
Apparatus and method of improved extract instructions

Patent number: 9588764

Abstract: An apparatus is described that includes instruction execution circuitry to execute first, second, third, and fourth instructions, the first and second instructions select a first group of input vector elements from one of multiple first non-overlapping sections of respective first and second input vectors. Each of the multiple first non-overlapping sections have a same bit width as the first group. Both the third and fourth instructions select a second group of input vector elements from one of multiple second non overlapping sections of respective third and fourth input vectors. The second group has a second bit width that is larger than the first bit width. Each of multiple second non overlapping sections have a same bit width as the second group. The apparatus includes masking layer circuitry to mask the first and second groups at a first granularity and second granularity.

Type: Grant

Filed: December 23, 2011

Date of Patent: March 7, 2017

Assignee: Intel Corporation

Inventors: Elmoustapha Ould-Ahmed-Vall, Robert Valentine, Jesus Corbal, Bret L. Toll, Mark J. Charney, Zeev Sperber, Amit Gradstein
Systems, apparatuses, and methods for performing a double blocked sum of absolute differences

Patent number: 9582464

Abstract: Embodiments of systems, apparatuses, and methods for performing in a computer processor vector double block packed sum of absolute differences (SAD) in response to a single vector double block packed sum of absolute differences instruction that includes a destination vector register operand, first and second source operands, an immediate, and an opcode are described.

Type: Grant

Filed: December 23, 2011

Date of Patent: February 28, 2017

Assignee: Intel Corporation

Inventors: Elmoustapha Ould-Ahmed-Vall, Mostafa Hagog, Robert Valentine, Amit Gradstein, Simon Rubanovich, Zeev Sperber

prev … 10 11 12 13 14 15 16 17 18 … next