Patents by Inventor Raanan Sade

Raanan Sade has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20190042255
    Abstract: Embodiments detailed herein relate to systems and methods to store a tile register pair to memory. In one example, a processor includes: decode circuitry to decode a store matrix pair instruction having fields for an opcode and source and destination identifiers to identify source and destination matrices, respectively, each matrix having a PAIR parameter equal to TRUE; and execution circuitry to execute the decoded store matrix pair instruction to store every element of left and right tiles of the identified source matrix to corresponding element positions of left and right tiles of the identified destination matrix, respectively, wherein the executing stores a chunk of C elements of one row of the identified source matrix at a time.
    Type: Application
    Filed: December 29, 2017
    Publication date: February 7, 2019
    Inventors: Raanan Sade, Simon Rubanovich, Amit Gradstein, Zeev Sperber, Alexander Heinecke, Robert Valentine, Mark J. Charney, Bret Toll, Jesus Corbal, Elmoustapha Ould-Ahmed-Vall, Menachem Adelman
  • Publication number: 20190042245
    Abstract: Disclosed embodiments relate to instructions for fast element unpacking. In one example, a processor includes fetch circuitry to fetch an instruction whose format includes fields to specify an opcode and locations of an Array-of-Structures (AOS) source matrix and one or more Structure of Arrays (SOA) destination matrices, wherein: the specified opcode calls for unpacking elements of the specified AOS source matrix into the specified Structure of Arrays (SOA) destination matrices, the AOS source matrix is to contain N structures each containing K elements of different types, with same-typed elements in consecutive structures separated by a stride, the SOA destination matrices together contain K segregated groups, each containing N same-typed elements, decode circuitry to decode the fetched instruction, and execution circuitry, responsive to the decoded instruction, to unpack each element of the specified AOS matrix into one of the K element types of the one or more SOA matrices.
    Type: Application
    Filed: September 28, 2018
    Publication date: February 7, 2019
    Inventors: Bret TOLL, Alexander F. HEINECKE, Christopher J. HUGHES, Ronen ZOHAR, Michael ESPIG, Dan BAUM, Raanan SADE, Robert VALENTINE
  • Publication number: 20190042262
    Abstract: An apparatus and method for efficient matrix alignment in a systolic array.
    Type: Application
    Filed: September 28, 2018
    Publication date: February 7, 2019
    Inventors: Michael Espig, Bret Toll, Raanan Sade, Robert Valentine, Alexander Heinecke
  • Publication number: 20190042260
    Abstract: Disclosed embodiments relate to systems and methods for performing instructions specifying ternary tile operations. In one example, a processor includes fetch and decode circuitry to fetch and decode an instruction specifying a ternary tile operation, and locations of destination and first, second, and third source matrices, each of the matrices having M rows by N columns; and execution circuitry to respond to the decoded instruction by, for each equal-sized group of K elements of the specified first, second, and third source matrices, generate K results by performing the ternary tile operation in parallel on K corresponding elements of the specified first, second, and third source matrices, and store each of the K results to a corresponding element of the specified destination matrix, wherein corresponding elements of the specified source and destination matrices occupy a same relative position within their associated matrix.
    Type: Application
    Filed: September 14, 2018
    Publication date: February 7, 2019
    Inventors: Elmoustapha OULD-AHMED-VALL, Christopher J. HUGHES, Bret TOLL, Dan BAUM, Raanan SADE, Robert VALENTINE, Mark J. CHARNEY, Alexander F. HEINECKE
  • Publication number: 20190042261
    Abstract: Disclosed embodiments relate to systems and methods for performing instructions specifying horizontal tile operations. In one example, a processor includes fetch circuitry to fetch an instruction specifying a horizontal tile operation, a location of a M by N source matrix comprising K groups of elements, and locations of K destinations, wherein each of the K groups of elements comprises the same number of elements, decode circuitry to decode the fetched instruction, and execution circuitry to respond to the decoded instruction by generating K results, each result being generated by performing the specified horizontal tile operation across every element of a corresponding group of the K groups, and writing each generated result to a corresponding location of the K specified destination locations.
    Type: Application
    Filed: September 14, 2018
    Publication date: February 7, 2019
    Inventors: Christopher J. HUGHES, Bret TOLL, Dan BAUM, Elmoustapha OULD-AHMED-VALL, Raanan SADE, Robert VALENTINE, Mark J. CHARNEY, Alexander F. HEINECKE
  • Publication number: 20190042235
    Abstract: Disclosed embodiments relate to computing dot products of nibbles in tile operands. In one example, a processor includes decode circuitry to decode a tile dot product instruction having fields for an opcode, a destination identifier to identify a M by N destination matrix, a first source identifier to identify a M by K first source matrix, and a second source identifier to identify a K by N second source matrix, each of the matrices containing doubleword elements, and execution circuitry to execute the decoded instruction to perform a flow K times for each element (M,N) of the identified destination matrix to generate eight products by multiplying each nibble of a doubleword element (M,K) of the identified first source matrix by a corresponding nibble of a doubleword element (K,N) of the identified second source matrix, and to accumulate and saturate the eight products with previous contents of the doubleword element (M,N).
    Type: Application
    Filed: December 29, 2017
    Publication date: February 7, 2019
    Inventors: Raanan Sade, Simon Rubanovich, Amit Gradstein, Zeev Sperber, Alexander Heinecke, Robert Valentine, Mark J. Charney, Bret Toll, Jesus Corbal, Elmoustapha Ould-Ahmed-Vall, Menachem Adelman
  • Publication number: 20190042256
    Abstract: Embodiments detailed herein relate to systems and methods to zero a tile register pair. In one example, a processor includes decode circuitry to decode a matrix pair zeroing instruction having fields for an opcode and an identifier to identify a destination matrix having a PAIR parameter equal to TRUE; and execution circuitry to execute the decoded matrix pair zeroing instruction to zero every element of a left matrix and a right matrix of the identified destination matrix.
    Type: Application
    Filed: December 29, 2017
    Publication date: February 7, 2019
    Inventors: Raanan Sade, Simon Rubanovich, Amit Gradstein, Zeev Sperber, Alexander Heinecke, Robert Valentine, Mark J. Charney, Bret Toll, Jesus Corbal, Elmoustapha Ould-Ahmed-Vall, Menachem Adelman, Eyal Hadas
  • Publication number: 20190042448
    Abstract: Implementations of using tiles for caching are detailed In some implementations, an instruction execution circuitry executes one or more instructions, a register state cache coupled to the instruction execution circuitry holds thread register state in a plurality of registers, and backing storage pointer storage stores a backing storage pointer, wherein the backing storage pointer is to reference a state backing storage area in external memory to store the thread register state stored in the register state cache.
    Type: Application
    Filed: December 22, 2017
    Publication date: February 7, 2019
    Inventors: Raanan SADE, Jason BRANDT, Mark J. CHARNEY, Joseph NUZMAN, Leena PUTHIYEDATH, Rinat RAPPOPORT, Vivekananthan SANJEEPAN, Robert VALENTINE
  • Publication number: 20190042540
    Abstract: Embodiments detailed herein relate to matrix (tile) operations. For example, decode circuitry to decode an instruction having fields for an opcode and a memory address, and execution circuitry to execute the decoded instruction to store configuration information about usage of storage for two-dimensional data structures at the memory address.
    Type: Application
    Filed: December 29, 2017
    Publication date: February 7, 2019
    Inventors: Raanan Sade, Simon Rubanovich, Amit Gradstein, Zeev Sperber, Alexander Heinecke, Robert Valentine, Mark J. Charney, Bret Toll, Jesus Corbal, Elmoustapha Ould-Ahmed-Vall, Menachem Adelman
  • Publication number: 20190042202
    Abstract: Disclosed embodiments relate to systems and methods for performing instructions to transpose rectangular tiles. In one example, a processor includes fetch circuitry to fetch an instruction having fields to specify an opcode and locations of first destination, second destination, first source, and second source matrices, the specified opcode to cause the processor to process each of the specified source and destination matrices as a rectangular matrix, decode circuitry to decode the fetched rectangular matrix transpose instruction, and execution circuitry to respond to the decoded rectangular matrix transpose instruction by transposing each row of elements of the specified first source matrix into a corresponding column of the specified first destination matrix and transposing each row of elements of the specified second source matrix into a corresponding column of the specified second destination matrix.
    Type: Application
    Filed: September 27, 2018
    Publication date: February 7, 2019
    Inventors: Raanan SADE, Robert VALENTINE, Mark J. CHARNEY, Simon RUBANOVICH, Amit GRADSTEIN, Zeev SPERBER, Bret TOLL, Jesus CORBAL, Christopher J. HUGHES, Alexander F. HEINECKE, Elmoustapha OULD-AHMED-VALL
  • Publication number: 20190042541
    Abstract: Embodiments detailed herein relate to matrix operations. For example, embodiments of instruction support for matrix (tile) dot product operations are detailed. Exemplary instructions including computing a dot product of signed words and accumulating in a quadword data elements of a matrix pair. Additionally, in some instances, non-accumulating quadword data elements of the matrix pair are set to zero.
    Type: Application
    Filed: December 29, 2017
    Publication date: February 7, 2019
    Inventors: Raanan Sade, Simon Rubanovich, Amit Gradstein, Zeev Sperber, Alexander Heinecke, Robert Valentine, Mark J. Charney, Bret Toll, Jesus Corbal, Elmoustapha Ould-Ahmed-Vall, Menachem Adelman
  • Publication number: 20190004958
    Abstract: Method and system for performing data movement operations is described herein. One embodiment of a method includes: storing data for a first memory address in a cache line of a memory of a first processing unit, the cache line associated with a coherency state indicating that the memory has sole ownership of the cache line; decoding an instruction for execution by a second processing unit, the instruction comprising a source data operand specifying the first memory address and a destination operand specifying a memory location in the second processing unit; and responsive to executing the decoded instruction, copying data from the cache line of the memory of the first processing unit as identified by the first memory address, to the memory location of the second processing unit, wherein responsive to the copy, the cache line is to remain in the memory and the coherency state is to remain unchanged.
    Type: Application
    Filed: June 30, 2017
    Publication date: January 3, 2019
    Inventors: Anil Vasudevan, Venkata Krishnan, Andrew J. Herdrich, Ren Wang, Robert G. Blankenship, Vedaraman Geetha, Shrikant M. Shah, Marshall A. Millier, Raanan Sade, Binh Q. Pham, Olivier Serres, Chyi-Chang Miao, Christopher B. Wilkerson
  • Patent number: 10162694
    Abstract: Methods and apparatuses relating to memory corruption detection are described. In one embodiment, a hardware processor includes an execution unit to execute an instruction to request access to a block of a memory through a pointer to the block of the memory, and a memory management unit to allow access to the block of the memory when a memory corruption detection value in the pointer is validated with a memory corruption detection value in the memory for the block, wherein a position of the memory corruption detection value in the pointer is selectable between a first location and a second, different location.
    Type: Grant
    Filed: December 21, 2015
    Date of Patent: December 25, 2018
    Assignee: Intel Corporation
    Inventors: Tomer Stark, Ron Gabor, Joseph Nuzman, Raanan Sade, Bryant E. Bigbee
  • Patent number: 10146538
    Abstract: Suspendable load address tracking inside transactions is disclosed. An example processing device of implementations of the disclosure includes a transactional memory (TM) read set tracking component circuitry to identify a suspend read tracking instruction within a transaction executed by the processing device, mark load instructions occurring in the transaction subsequent to the identified suspend read tracking instruction with a suspend attribute, wherein the addresses corresponding to the marked load instructions are excluded from a read set maintained for the transaction, identify a resume read tracking instruction within the transaction, and stop marking the load instructions occurring subsequent to the identified resume read tracking instruction with the suspend attribute.
    Type: Grant
    Filed: September 30, 2016
    Date of Patent: December 4, 2018
    Assignee: Intel Corporation
    Inventors: Raanan Sade, Roman Dementiev, Ravi Rajwar, Ady Tal, Alex Gerber
  • Publication number: 20180210842
    Abstract: A processing device including a linear address transformation circuit to determine that a metadata value stored in a portion of a linear address falls within a pre-defined metadata range. The metadata value corresponds to a plurality of metadata bits. The linear address transformation circuit to replace each of the plurality of the metadata bits with a constant value.
    Type: Application
    Filed: January 26, 2017
    Publication date: July 26, 2018
    Inventors: Joseph Nuzman, Raanan Sade, Igor Yanover, Ron Gabor, Amit Gradstein
  • Publication number: 20180181393
    Abstract: A processor includes a decoder, a data return buffer, and an execution unit. The decoder is to decode an instruction for a non-posted load into a decoded instruction for loading data from memory mapped input/output. The execution unit is for executing the decoded instruction. The execution is to start a timer, determine whether the timer exceeds a timeout threshold, allocate an entry in the data return buffer for the load, and determine whether an event arrived. The timer is to measure an amount of time taken to return the non-posted load instruction. The determination whether an event arrived is made in response to at least one of the allocation of the entry for the load, or a determination that the timer exceeds the timeout threshold.
    Type: Application
    Filed: December 22, 2016
    Publication date: June 28, 2018
    Inventors: Ido Ouziel, Raanan Sade, Jacob Doweck
  • Patent number: 9990287
    Abstract: An apparatus and method are described for efficiently transferring data from a core of a central processing unit (CPU) to a graphics processing unit (GPU). For example, one embodiment of a method comprises: writing data to a buffer within the core of the CPU until a designated amount of data has been written; upon detecting that the designated amount of data has been written, responsively generating an eviction cycle, the eviction cycle causing the data to be transferred from the buffer to a cache accessible by both the core and the GPU; setting an indication to indicate to the GPU that data is available in the cache; and upon the GPU detecting the indication, providing the data to the GPU from the cache upon receipt of a read signal from the GPU.
    Type: Grant
    Filed: December 21, 2011
    Date of Patent: June 5, 2018
    Assignee: Intel Corporation
    Inventors: Shlomo Raikin, Raanan Sade, Robert Valentine, Julius Yuli Mandelblat, Ron Shalev, Larisa Novakovsky
  • Publication number: 20180095759
    Abstract: Suspendable load address tracking inside transactions is disclosed. An example processing device of implementations of the disclosure includes a transactional memory (TM) read set tracking component circuitry to identify a suspend read tracking instruction within a transaction executed by the processing device, mark load instructions occurring in the transaction subsequent to the identified suspend read tracking instruction with a suspend attribute, wherein the addresses corresponding to the marked load instructions are excluded from a read set maintained for the transaction, identify a resume read tracking instruction within the transaction, and stop marking the load instructions occurring subsequent to the identified resume read tracking instruction with the suspend attribute.
    Type: Application
    Filed: September 30, 2016
    Publication date: April 5, 2018
    Inventors: Raanan Sade, Roman Dementiev, Ravi Rajwar, Ady Tal, Alex Gerber
  • Publication number: 20180074969
    Abstract: A processing system includes a processing core to execute a virtual machine (VM) comprising a guest operating system (OS) and a memory management unit, communicatively coupled to the processing core, comprising a storage device to store an extended page table entry (EPTE) comprising a mapping from a guest physical address (GPA) associated with the guest OS to an identifier of a memory frame, a first plurality of access right flags associated with accessing the memory frame in a first page mode referenced by an attribute of a memory page identified by the GPA, and a second plurality of access right flags associated with accessing the memory frame in a second page mode referenced by the attribute of the memory page identified by the GPA.
    Type: Application
    Filed: September 9, 2016
    Publication date: March 15, 2018
    Inventors: Gilbert Neiger, Baiju V. Patel, Gur Hildesheim, Ron Rais, Andrew V. Anderson, Jason W. Brandt, David M. Durham, Barry E. Huntley, Raanan Sade, Ravi L. Sahita, Vedvyas Shanbhogue, Arumugam Thiyagarajah
  • Publication number: 20180024925
    Abstract: An example system on a chip (SoC) includes a processor, a cache, and a main memory. The SoC can include a first memory to store data in a memory line, wherein the memory line is set to an invalid state. The processor can include a processor coupled to the first memory. The processor can determine that a data size of a first data set received from an application is within a data size range. The processor can determine that an aggregate data size of the first data set and a second data set received from the application is at least a same data size as data size of the memory line. The processor can perform an invalid-to-modify (I2M) operation to change the memory line from the invalid state to a modified state. The processor can write the first data set and the second data set to the memory line.
    Type: Application
    Filed: July 20, 2016
    Publication date: January 25, 2018
    Inventors: Raanan Sade, Joseph Nuzman, Stanislav Shwartsman, Igor Yanover, Liron Zur