Patents by Inventor Raanan Sade

Raanan Sade has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Apparatus and method for processing structure of arrays (SoA) and array of structures (AoS) data

Patent number: 10838734

Abstract: An apparatus and method for processing array of structures (AoS) and structure of arrays (SoA) data. For example, one embodiment of a processor comprises: a destination tile register to store data elements in a structure of arrays (SoA) format; a first source tile register to store indices associated with the data elements; instruction fetch circuitry to fetch an array of structures (AoS) gather instruction comprising operands identifying the first source tile register and the destination tile register; a decoder to decode the AoS gather instruction; and execution circuitry to determine a plurality of system memory addresses based on the indices from the first source tile register, to read data elements from the system memory addresses in an AoS format, and to load the data elements to the destination tile register in an SoA format.

Type: Grant

Filed: September 24, 2018

Date of Patent: November 17, 2020

Assignee: Intel Corporation

Inventors: Christopher J. Hughes, Bret Toll, Alexander Heinecke, Dan Baum, Elmoustapha Ould-Ahmed-Vall, Raanan Sade, Robert Valentine, Mark Charney
SYSTEMS AND METHODS FOR PERFORMING MATRIX COMPRESS AND DECOMPRESS INSTRUCTIONS

Publication number: 20200348937

Abstract: Disclosed embodiments relate to matrix compress/decompress instructions. In one example, a processor includes fetch circuitry to fetch a compress instruction having a format with fields to specify an opcode and locations of decompressed source and compressed destination matrices, decode circuitry to decode the fetched compress instructions, and execution circuitry, responsive to the decoded compress instruction, to: generate a compressed result according to a compress algorithm by compressing the specified decompressed source matrix by either packing non-zero-valued elements together and storing the matrix position of each non-zero-valued element in a header, or using fewer bits to represent one or more elements and using the header to identify matrix elements being represented by fewer bits; and store the compressed result to the specified compressed destination matrix.

Type: Application

Filed: July 20, 2020

Publication date: November 5, 2020

Inventors: Dan BAUM, Michael ESPIG, James GUILFORD, Wajdi K. FEGHALI, Raanan SADE, Christopher J. HUGHES, Robert VALENTINE, Bret TOLL, Elmoustapha OULD-AHMED-VALL, Mark J. CHARNEY, Vinodh GOPAL, Ronen ZOHAR, Alexander F. HEINECKE
APPARATUS AND METHOD FOR CONVERTING A FLOATING-POINT VALUE FROM HALF PRECISION TO SINGLE PRECISION

Publication number: 20200341759

Abstract: An embodiment of the invention is a processor including execution circuitry to, in response to a decoded instruction, convert a half-precision floating-point value to a single-precision floating-point value and store the single-precision floating-point value in each of the plurality of element locations of a destination register. The processor also includes a decoder and the destination register. The decoder is to decode an instruction to generate the decoded instruction.

Type: Application

Filed: May 12, 2020

Publication date: October 29, 2020

Applicant: Intel Corporation

Inventors: Robert Valentine, Mark Charney, Raanan Sade, Elmoustapha Ould-Ahmed-Vall, Jesus Corbal
Hardware apparatuses and methods for memory corruption detection

Patent number: 10776190

Abstract: Methods and apparatuses relating to memory corruption detection are described. In one embodiment, a hardware processor includes an execution unit to execute an instruction to request access to a block of a memory through a pointer to the block of the memory, and a memory management unit to allow access to the block of the memory when a memory corruption detection value in the pointer is validated with a memory corruption detection value in the memory for the block, wherein a position of the memory corruption detection value in the pointer is selectable between a first location and a second, different location.

Type: Grant

Filed: December 18, 2018

Date of Patent: September 15, 2020

Assignee: Intel Corporation

Inventors: Tomer Stark, Ron Gabor, Joseph Nuzman, Raanan Sade, Bryant E. Bigbee
SYSTEMS, METHODS, AND APPARATUS FOR TILE CONFIGURATION

Publication number: 20200241877

Abstract: Embodiments detailed herein relate to matrix (tile) operations. For example, decode circuitry to decode an instruction having fields for an opcode and a memory address; and execution circuitry to execute the decoded instruction to set a tile configuration for the processor to utilize tiles in matrix operations based on a description retrieved from the memory address, wherein a tile a set of 2-dimensional registers are discussed.

Type: Application

Filed: July 1, 2017

Publication date: July 30, 2020

Applicant: Intel Corporation

Inventors: Menachem ADELMAN, Robert VALENTINE, Zeev SPERBER, Mark J. CHARNEY, Bret L. TOLL, Rinat RAPPOPORT, Jesus CORBAL, Dan BAUM, Alexander F. HEINECKE, Elmoustapha OULD-AHMED-VALL, Yuri GEBIL, Raanan SADE
Apparatus and method for prioritized quality of service processing for transactional memory

Patent number: 10719442

Abstract: An apparatus and method for prioritizing transactional memory regions. For example, one embodiment of a processor comprises: a plurality of cores to execute threads comprising sequences of instructions, at least some of the instructions specifying a transactional memory region; a cache of each core to store a plurality of cache lines; transactional memory circuitry of each core to manage execution of the transactional memory (TM) regions based on priorities associated with each of the TM regions; and wherein the transactional memory circuitry, upon detecting a conflict between a first TM region having a first priority value and a second TM region having a second priority value, is to determine which of the first TM region or the second TM region is permitted to continue executing and which is to be aborted based, at least in part, on the first and second priority values.

Type: Grant

Filed: September 10, 2018

Date of Patent: July 21, 2020

Assignee: Intel Corporation

Inventors: Ren Wang, Raanan Sade, Yipeng Wang, Tsung-Yuan Tai, Sameh Gobriel
Systems and methods for performing matrix compress and decompress instructions

Patent number: 10719323

Abstract: Disclosed embodiments relate to matrix compress/decompress instructions. In one example, a processor includes fetch circuitry to fetch a compress instruction having a format with fields to specify an opcode and locations of decompressed source and compressed destination matrices, decode circuitry to decode the fetched compress instructions, and execution circuitry, responsive to the decoded compress instruction, to: generate a compressed result according to a compress algorithm by compressing the specified decompressed source matrix by either packing non-zero-valued elements together and storing the matrix position of each non-zero-valued element in a header, or using fewer bits to represent one or more elements and using the header to identify matrix elements being represented by fewer bits; and store the compressed result to the specified compressed destination matrix.

Type: Grant

Filed: September 27, 2018

Date of Patent: July 21, 2020

Assignee: Intel Corporation

Inventors: Dan Baum, Michael Espig, James Guilford, Wajdi K. Feghali, Raanan Sade, Christopher J. Hughes, Robert Valentine, Bret Toll, Elmoustapha Ould-Ahmed-Vall, Mark J. Charney, Vinodh Gopal, Ronen Zohar, Alexander F. Heinecke
Defining virtualized page attributes based on guest page attributes

Patent number: 10713177

Abstract: A processing system includes a processing core to execute a virtual machine (VM) comprising a guest operating system (OS) and a memory management unit, communicatively coupled to the processing core, comprising a storage device to store an extended page table entry (EPTE) comprising a mapping from a guest physical address (GPA) associated with the guest OS to an identifier of a memory frame, a first plurality of access right flags associated with accessing the memory frame in a first page mode referenced by an attribute of a memory page identified by the GPA, and a second plurality of access right flags associated with accessing the memory frame in a second page mode referenced by the attribute of the memory page identified by the GPA.

Type: Grant

Filed: September 9, 2016

Date of Patent: July 14, 2020

Assignee: Intel Corporation

Inventors: Gilbert Neiger, Baiju V. Patel, Gur Hildesheim, Ron Rais, Andrew V. Anderson, Jason W. Brandt, David M. Durham, Barry E. Huntley, Raanan Sade, Ravi L. Sahita, Vedvyas Shanbhogue, Arumugam Thiyagarajah
SYSTEMS AND METHODS TO ACCELERATE MULTIPLICATION OF SPARSE MATRICES

Publication number: 20200210517

Abstract: Disclosed embodiments relate to accelerating multiplication of sparse matrices. In one example, a processor is to fetch and decode an instruction having fields to specify locations of first, second, and third matrices, and an opcode indicating the processor is to multiply and accumulate matching non-zero (NZ) elements of the first and second matrices with corresponding elements of the third matrix, and executing the decoded instruction as per the opcode to generate NZ bitmasks for the first and second matrices, broadcast up to two NZ elements at a time from each row of the first matrix and each column of the second matrix to a processing engine (PE) grid, each PE to multiply and accumulate matching NZ elements of the first and second matrices with corresponding elements of the third matrix. Each PE further to store an NZ element for use in a subsequent multiplications.

Type: Application

Filed: December 27, 2018

Publication date: July 2, 2020

Inventors: Dan BAUM, Chen KOREN, Elmoustapha OULD-AHMED-VALL, Michael ESPIG, Christopher J. HUGHES, Raanan SADE, Robert VALENTINE, Mark J. CHARNEY, Alexander F. HEINECKE
SYSTEMS AND METHODS FOR PERFORMING NIBBLE-SIZED OPERATIONS ON MATRIX ELEMENTS

Publication number: 20200210173

Abstract: Disclosed embodiments relate to systems and methods for performing nibble-sized operations on matrix elements. In one example, a processor includes fetch circuitry to fetch an instruction, decode circuitry to decode the fetched instruction the fetched instruction having fields to specify an opcode and locations of first source, second source, and destination matrices, the opcode to indicate the processor is to, for each pair of corresponding elements of the first and second source matrices, logically partition each element into nibble-sized partitions, perform an operation indicated by the instruction on each partition, and store execution results to a corresponding nibble-sized partition of a corresponding element of the destination matrix. The exemplary processor includes execution circuitry to execute the decoded instruction as per the opcode.

Type: Application

Filed: December 26, 2018

Publication date: July 2, 2020

Inventors: Elmoustapha OULD-AHMED-VALL, Jonathan D. PEARCE, Dan BAUM, Guei-Yuan LUEH, Michael ESPIG, Christopher J. HUGHES, Raanan SADE, Robert VALENTINE, Mark J. CHARNEY, Alexander F. HEINECKE
SYSTEMS AND METHODS FOR PERFORMING MATRIX ROW- AND COLUMN-WISE PERMUTE INSTRUCTIONS

Publication number: 20200210188

Abstract: Disclosed embodiments relate to systems and methods for performing matrix row-wise and column-wise permute instructions. In one example, a processor includes fetch circuitry to fetch an instruction, decoding, using decode circuitry, the fetched instruction having fields to specify an opcode and locations of a source matrix and a destination matrix, the opcode indicating the processor is to perform a permutation by copying, into each of a plurality of equal-sized logical partitions of the destination matrix, a selected logical partition of a same size from the source matrix, the selection being indicated by a permute control, and execution circuitry to execute the decoded instruction as per the opcode.

Type: Application

Filed: December 27, 2018

Publication date: July 2, 2020

Inventors: Elmoustapha OULD-AHMED-VALL, Jonathan D. PEARCE, Dan BAUM, Guei-Yuan LUEH, Michael ESPIG, Christopher J. HUGHES, Raanan SADE, Robert VALENTINE, Mark J. CHARNEY, Alexander F. HEINECKE
EFFICIENT IMPLEMENTATION OF COMPLEX VECTOR FUSED MULTIPLY ADD AND COMPLEX VECTOR MULTIPLY

Publication number: 20200201628

Abstract: Disclosed embodiments relate to efficient complex vector multiplication. In one example, an apparatus includes execution circuitry, responsive to an instruction having fields to specify multiplier, multiplicand, and summand complex vectors, to perform two operations: first, to generate a double-even multiplicand by duplicating even elements of the specified multiplicand, and to generate a temporary vector using a fused multiply-add (FMA) circuit having A, B, and C inputs set to the specified multiplier, the double-even multiplicand, and the specified summand, respectively, and second, to generate a double-odd multiplicand by duplicating odd elements of the specified multiplicand, to generate a swapped multiplier by swapping even and odd elements of the specified multiplier, and to generate a result using a second FMA circuit having its even product negated, and having A, B, and C inputs set to the swapped multiplier, the double-odd multiplicand, and the temporary vector, respectively.

Type: Application

Filed: December 2, 2019

Publication date: June 25, 2020

Inventors: Raanan SADE, Thierry PONS, Amit GRADSTEIN, Zeev SPERBER, Mark J. CHARNEY, Robert VALENTINE, Eyal Oz-Sinay
SYSTEMS AND METHODS TO TRANSPOSE VECTORS ON-THE-FLY WHILE LOADING FROM MEMORY

Publication number: 20200201640

Abstract: Disclosed embodiments relate to transposing vectors while loading from memory. In one example, a processor includes a register file, a memory interface, fetch circuitry to fetch an instruction, decode circuitry to decode the fetched instruction having fields to specify an opcode, a destination vector register, and a source vector having N groups of elements, N being a positive integer, the opcode to indicate the processor is to fetch the source vector, generate write data comprising one or more N-tuples, each N-tuple comprising corresponding elements from each of the N groups of elements, and write the write data to the destination vector register, and execution circuitry to execute the decoded instruction as per the opcode, the execution circuitry has a shuffle pipeline disposed between the memory and the register file, the shuffle pipeline to fetch, decode, and execute further instances of the instruction at one instruction per clock cycle.

Type: Application

Filed: December 21, 2018

Publication date: June 25, 2020

Inventors: Alexander F. HEINECKE, Evangelos GEORGANAS, Christopher J. HUGHES, Raanan SADE, Robert VALENTINE
APPARATUS AND METHOD FOR A RANGE COMPARISON, EXCHANGE, AND ADD

Publication number: 20200201630

Abstract: An apparatus and method for executing an atomic test and update instruction. For example, one embodiment of a processor comprises: a decoder to decode an atomic test and update (ATU) instruction having a first operand specifying a first value in a first storage location, a second operand specifying a second value in a second storage location, a third operand specifying a third value in a third storage location, and an opcode specifying a condition to be tested relative to the first and second values; and execution circuitry to perform a load lock operation to load the first value from the first storage location, the load lock operation to prevent access by another instruction before a result of the ATU instruction is stored, the execution circuitry to test a condition related the first value and the second value, wherein if the condition is met then the execution circuitry is to add the first value and the third value to generate a sum and to store the sum to the first storage location.

Type: Application

Filed: December 23, 2018

Publication date: June 25, 2020

Inventors: RAANAN SADE, JOSEPH NUZMAN, HUBERT NUECKEL
APPARATUS AND METHOD FOR COMPLEX MULTIPLICATION

Publication number: 20200192663

Abstract: An embodiment of the invention is a processor including execution circuitry to calculate, in response to a decoded instruction, a result of a complex multiplication of a first complex number and a second complex number. The calculation includes a first operation to calculate a first term of a real component of the result and a first term of the imaginary component of the result. The calculation also includes a second operation to calculate a second term of the real component of the result and a second term of the imaginary component of the result. The processor also includes a decoder, a first source register, and a second source register. The decoder is to decode an instruction to generate the decoded instruction. The first source register is to provide the first complex number and the second source register is to provide the second complex number.

Type: Application

Filed: October 18, 2019

Publication date: June 18, 2020

Applicant: Intel Corporation

Inventors: Robert Valentine, Mark Charney, Raanan Sade, Elmoustapha Ould-Ahmed-Vall, Jesus Corbal, Roman S. Dubtsov
Apparatus and method for converting a floating-point value from half precision to single precision

Patent number: 10684854

Abstract: An embodiment of the invention is a processor including execution circuitry to, in response to a decoded instruction, convert a half-precision floating-point value to a single-precision floating-point value and store the single-precision floating-point value in each of the plurality of element locations of a destination register. The processor also includes a decoder and the destination register. The decoder is to decode an instruction to generate the decoded instruction.

Type: Grant

Filed: November 28, 2017

Date of Patent: June 16, 2020

Assignee: Intel Corporation

Inventors: Robert Valentine, Mark Charney, Raanan Sade, Elmoustapha Ould-Ahmed-Vall, Jesus Corbal
SYSTEMS FOR PERFORMING INSTRUCTIONS TO QUICKLY CONVERT AND USE TILES AS 1D VECTORS

Publication number: 20200104135

Abstract: Disclosed embodiments relate to systems for performing instructions to quickly convert and use matrices (tiles) as one-dimensional vectors. In one example, a processor includes fetch circuitry to fetch an instruction having fields to specify an opcode, locations of a two-dimensional (2D) matrix and a one-dimensional (1D) vector, and a group of elements comprising one of a row, part of a row, multiple rows, a column, part of a column, multiple columns, and a rectangular sub-tile of the specified 2D matrix, and wherein the opcode is to indicate a move of the specified group between the 2D matrix and the 1D vector, decode circuitry to decode the fetched instruction; and execution circuitry, responsive to the decoded instruction, when the opcode specifies a move from 1D, to move contents of the specified 1D vector to the specified group of elements.

Type: Application

Filed: September 27, 2018

Publication date: April 2, 2020

Inventors: Bret TOLL, Christopher J. HUGHES, Dan BAUM, Elmoustapha OULD-AHMED-VALL, Raanan SADE, Robert VALENTINE, Mark J. CHARNEY, Alexander F. HEINECKE
Method and system for performing data movement operations with read snapshot and in place write update

Patent number: 10606755

Abstract: Method and system for performing data movement operations is described herein. One embodiment of a method includes: storing data for a first memory address in a cache line of a memory of a first processing unit, the cache line associated with a coherency state indicating that the memory has sole ownership of the cache line; decoding an instruction for execution by a second processing unit, the instruction comprising a source data operand specifying the first memory address and a destination operand specifying a memory location in the second processing unit; and responsive to executing the decoded instruction, copying data from the cache line of the memory of the first processing unit as identified by the first memory address, to the memory location of the second processing unit, wherein responsive to the copy, the cache line is to remain in the memory and the coherency state is to remain unchanged.

Type: Grant

Filed: June 30, 2017

Date of Patent: March 31, 2020

Assignee: Intel Corporation

Inventors: Anil Vasudevan, Venkata Krishnan, Andrew J. Herdrich, Ren Wang, Robert G. Blankenship, Vedaraman Geetha, Shrikant M. Shah, Marshall A. Millier, Raanan Sade, Binh Q. Pham, Olivier Serres, Chyi-Chang Miao, Christopher B. Wilkerson
APPARATUS AND METHOD FOR PROCESSING STRUCTURE OF ARRAYS (SOA) AND ARRAY OF STRUCTURES (AOS) DATA

Publication number: 20200097298

Abstract: An apparatus and method for processing array of structures (AoS) and structure of arrays (SoA) data. For example, one embodiment of a processor comprises: a destination tile register to store data elements in a structure of arrays (SoA) format; a first source tile register to store indices associated with the data elements; instruction fetch circuitry to fetch an array of structures (AoS) gather instruction comprising operands identifying the first source tile register and the destination tile register; a decoder to decode the AoS gather instruction; and execution circuitry to determine a plurality of system memory addresses based on the indices from the first source tile register, to read data elements from the system memory addresses in an AoS format, and to load the data elements to the destination tile register in an SoA format.

Type: Application

Filed: September 24, 2018

Publication date: March 26, 2020

Inventors: CHRISTOPHER J. HUGHES, BRET TOLL, ALEXANDER HEINECKE, DAN BAUM, ELMOUSTAPHA OULD-AHMED-VALL, RAANAN SADE, ROBERT VALENTINE, MARK CHARNEY
APPARATUS AND METHOD FOR TILE GATHER AND TILE SCATTER

Publication number: 20200097291

Abstract: An apparatus and method for tile-based gather and scatter operations. For example, one embodiment of a processor comprises: a destination tile register to store a 2-D arrangement of data elements; a first source tile register to store indices associated with the data elements; instruction fetch circuitry to fetch a tile gather instruction comprising operands identifying the first source tile register and the destination tile register; a decoder to decode the tile gather instruction; and execution circuitry to determine a plurality of system memory addresses based on the indices from the first source tile register and to load the data elements from the system memory addresses to the destination tile register.

Type: Application

Filed: September 24, 2018

Publication date: March 26, 2020

Inventors: CHRISTOPHER J. HUGHES, BRET TOLL, ALEXANDER HEINECKE, DAN BAUM, ELMOUSTAPHA OULD-AHMED-VALL, RAANAN SADE, ROBERT VALENTINE, MARK CHARNEY

prev … 2 3 4 5 6 7 8 9 10 next