Patents by Inventor Simon Rubanovich

Simon Rubanovich has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20190347100
    Abstract: Embodiments detailed herein relate to matrix operations. In particular, support for a matrix transpose instruction is detailed. In some embodiments, decode circuitry to decode an instruction having fields for an opcode, a source matrix operand identifier, and a destination matrix operand identifier; and execution circuitry to execute the decoded instruction to transpose each row of elements of the identified source matrix operand into a corresponding column of the identified destination matrix operand are detailed.
    Type: Application
    Filed: July 1, 2017
    Publication date: November 14, 2019
    Applicant: Intel Corporation
    Inventors: Robert VALENTINE, Dan BAUM, Zeev SPERBER, Jesus CORBAL, Elmoustapha OULD-AHMED-VALL, Bret L. TOLL, Mark J. CHARNEY, Barukh ZIV, Alexander HEINECKE, Milind GIRKAR, Menachem ADELMAN, Simon RUBANOVICH
  • Patent number: 10318244
    Abstract: A method is described that involves executing a first instruction with a functional unit. The first instruction is a multiply-add instruction. The method further includes executing a second instruction with the functional unit. The second instruction is a round instruction.
    Type: Grant
    Filed: March 27, 2017
    Date of Patent: June 11, 2019
    Assignee: Intel Corporation
    Inventors: Cristina S. Anderson, Zeev Sperber, Simon Rubanovich, Benny Eitan, Amit Gradstein
  • Patent number: 10303471
    Abstract: Embodiments of systems, apparatuses, and methods for performing in a computer processor vector double block packed sum of absolute differences (SAD) in response to a single vector double block packed sum of absolute differences instruction that includes a destination vector register operand, first and second source operands, an immediate, and an opcode are described.
    Type: Grant
    Filed: February 28, 2017
    Date of Patent: May 28, 2019
    Assignee: Intel Corporation
    Inventors: Elmoustapha Ould-Ahmed-Vall, Mostafa Hagog, Robert Valentine, Amit Gradstein, Simon Rubanovich, Zeev Sperber
  • Patent number: 10275216
    Abstract: A method of an aspect includes receiving a floating point scaling instruction. The floating point scaling instruction indicates a first source including one or more floating point data elements, a second source including one or more corresponding floating point data elements, and a destination. A result is stored in the destination in response to the floating point scaling instruction. The result includes one or more corresponding result floating point data elements each including a corresponding floating point data element of the second source multiplied by a base of the one or more floating point data elements of the first source raised to a power of an integer representative of the corresponding floating point data element of the first source. Other methods, apparatus, systems, and instructions are disclosed.
    Type: Grant
    Filed: March 30, 2018
    Date of Patent: April 30, 2019
    Assignee: Intel Corporation
    Inventors: Cristina S. Anderson, Amit Gradstein, Robert Valentine, Simon Rubanovich, Benny Eitan
  • Publication number: 20190114169
    Abstract: An apparatus is described having an instruction execution pipeline that has a vector functional unit to support a vector multiply add instruction. The vector multiply add instruction to multiply respective K bit elements of two vectors and accumulate a portion of each of their respective products with another respective input operand in an X bit accumulator, where X is greater than K.
    Type: Application
    Filed: September 17, 2018
    Publication date: April 18, 2019
    Inventors: Shay Gueron, Vlad Krasnov, Robert Valentine, Zeev Sperber, Amit Gradstein, Simon Rubanovich
  • Publication number: 20190079768
    Abstract: Disclosed embodiments relate to computing dot products of nibbles in tile operands. In one example, a processor includes decode circuitry to decode a tile dot product instruction having fields for an opcode, a destination identifier to identify a M by N destination matrix, a first source identifier to identify a M by K first source matrix, and a second source identifier to identify a K by N second source matrix, each of the matrices containing doubleword elements, and execution circuitry to execute the decoded instruction to perform a flow K times for each element (m, n) of the specified destination matrix to generate eight products by multiplying each nibble of a doubleword element (M,K) of the specified first source matrix by a corresponding nibble of a doubleword element (K,N) of the specified second source matrix, and to accumulate and saturate the eight products with previous contents of the doubleword element.
    Type: Application
    Filed: November 9, 2018
    Publication date: March 14, 2019
    Inventors: Alexander F. HEINECKE, Robert VALENTINE, Mark J. CHARNEY, Raanan SADE, Menachem ADELMAN, Zeev SPERBER, Amit GRADSTEIN, Simon RUBANOVICH
  • Publication number: 20190079762
    Abstract: Disclosed embodiments relate to systems and methods for performing instructions to convert to 16-bit floating-point format. In one example, a processor includes fetch circuitry to fetch an instruction having fields to specify an opcode and locations of a first source vector comprising N single-precision elements, and a destination vector comprising at least N 16-bit floating-point elements, the opcode to indicate execution circuitry is to convert each of the elements of the specified source vector to 16-bit floating-point, the conversion to include truncation and rounding, as necessary, and to store each converted element into a corresponding location of the specified destination vector, decode circuitry to decode the fetched instruction, and execution circuitry to respond to the decoded instruction as specified by the opcode.
    Type: Application
    Filed: November 9, 2018
    Publication date: March 14, 2019
    Inventors: Alexander F. HEINECKE, Robert VALENTINE, Mark J. CHARNEY, Raanan SADE, Menachem ADELMAN, Zeev SPERBER, Amit GRADSTEIN, Simon RUBANOVICH
  • Publication number: 20190079767
    Abstract: Disclosed embodiments relate to systems and methods for performing 16-bit floating-point vector dot product instructions. In one example, a processor includes fetch circuitry to fetch an instruction having fields to specify an opcode and locations of first source, second source, and destination vectors, the opcode to indicate execution circuitry is to multiply N pairs of 16-bit floating-point formatted elements of the specified first and second sources, and accumulate the resulting products with previous contents of a corresponding single-precision element of the specified destination, decode circuitry to decode the fetched instruction, and execution circuitry to respond to the decoded instruction as specified by the opcode.
    Type: Application
    Filed: November 9, 2018
    Publication date: March 14, 2019
    Inventors: Alexander F. HEINECKE, Robert VALENTINE, Mark J. CHARNEY, Raanan SADE, Menachem ADELMAN, Zeev SPERBER, Amit GRADSTEIN, Simon RUBANOVICH
  • Patent number: 10228909
    Abstract: A method of an aspect includes receiving a floating point scaling instruction. The floating point scaling instruction indicates a first source including one or more floating point data elements, a second source including one or more corresponding floating point data elements, and a destination. A result is stored in the destination in response to the floating point scaling instruction. The result includes one or more corresponding result floating point data elements each including a corresponding floating point data element of the second source multiplied by a base of the one or more floating point data elements of the first source raised to a power of an integer representative of the corresponding floating point data element of the first source. Other methods, apparatus, systems, and instructions are disclosed.
    Type: Grant
    Filed: March 30, 2018
    Date of Patent: March 12, 2019
    Assignee: Intel Corporation
    Inventors: Cristina S. Anderson, Amit Gradstein, Robert Valentine, Simon Rubanovich, Benny Eitan
  • Patent number: 10209986
    Abstract: A method of an aspect includes receiving a floating point rounding instruction. The floating point rounding instruction indicates a source of one or more floating point data elements, indicates a number of fraction bits after a radix point that each of the one or more floating point data elements are to be rounded to, and indicates a destination storage location. A result is stored in the destination storage location in response to the floating point rounding instruction. The result includes one or more rounded result floating point data elements. Each of the one or more rounded result floating point data elements includes one of the floating point data elements of the source, in a corresponding position, which has been rounded to the indicated number of fraction bits. Other methods, apparatus, systems, and instructions are disclosed.
    Type: Grant
    Filed: December 22, 2011
    Date of Patent: February 19, 2019
    Assignee: Intel Corporation
    Inventors: Jesus Corbal San Adrian, Cristina S. Anderson, Robert Valentine, Bret Toll, Amit Gradstein, Simon Rubanovich, Benny Eitan
  • Patent number: 10203955
    Abstract: Instructions and logic provide SIMD vector packed tuple cross-comparison functionality. Some processor embodiments include first and second registers with a variable plurality of data fields, each of the data fields to store an element of a first data type. The processor executes a SIMD instruction for vector packed tuple cross-comparison in some embodiments, which for each data field of a portion of data fields in a tuple of the first register, compares its corresponding element with every element of a corresponding portion of data fields in a tuple of the second register and sets a mask bit corresponding to each element of the second register portion, in a bit-mask corresponding to each unmasked element of the corresponding first register portion, according to the corresponding comparison. In some embodiments bit-masks are shifted by corresponding elements in data fields of a third register. The comparison type is indicated by an immediate operand.
    Type: Grant
    Filed: December 31, 2014
    Date of Patent: February 12, 2019
    Assignee: Intel Corporation
    Inventors: Robert Valentine, Christopher J. Hughes, Mark J. Charney, Zeev Sperber, Amit Gradstein, Simon Rubanovich, Elmoustapha Ould-Ahmed-Vall, Yuri Gebil
  • Publication number: 20190042235
    Abstract: Disclosed embodiments relate to computing dot products of nibbles in tile operands. In one example, a processor includes decode circuitry to decode a tile dot product instruction having fields for an opcode, a destination identifier to identify a M by N destination matrix, a first source identifier to identify a M by K first source matrix, and a second source identifier to identify a K by N second source matrix, each of the matrices containing doubleword elements, and execution circuitry to execute the decoded instruction to perform a flow K times for each element (M,N) of the identified destination matrix to generate eight products by multiplying each nibble of a doubleword element (M,K) of the identified first source matrix by a corresponding nibble of a doubleword element (K,N) of the identified second source matrix, and to accumulate and saturate the eight products with previous contents of the doubleword element (M,N).
    Type: Application
    Filed: December 29, 2017
    Publication date: February 7, 2019
    Inventors: Raanan Sade, Simon Rubanovich, Amit Gradstein, Zeev Sperber, Alexander Heinecke, Robert Valentine, Mark J. Charney, Bret Toll, Jesus Corbal, Elmoustapha Ould-Ahmed-Vall, Menachem Adelman
  • Publication number: 20190042254
    Abstract: Embodiments detailed herein relate to systems and methods to load a tile register pair. In one example, a processor includes: decode circuitry to decode a load matrix pair instruction having fields for an opcode and source and destination identifiers to identify source and destination matrices, respectively, each matrix having a PAIR parameter equal to TRUE; and execution circuitry to execute the decoded load matrix pair instruction to load every element of left and right tiles of the identified destination matrix from corresponding element positions of left and right tiles of the identified source matrix, respectively, wherein the executing operates on one row of the identified destination matrix at a time, starting with the first row.
    Type: Application
    Filed: December 29, 2017
    Publication date: February 7, 2019
    Inventors: Raanan Sade, Simon Rubanovich, Amit Gradstein, Zeev Sperber, Alexander Heinecke, Robert Valentine, Mark J. Charney, Bret Toll, Jesus Corbal, Elmoustapha Ould-Ahmed-Vall, Menachem Adelman
  • Publication number: 20190042540
    Abstract: Embodiments detailed herein relate to matrix (tile) operations. For example, decode circuitry to decode an instruction having fields for an opcode and a memory address, and execution circuitry to execute the decoded instruction to store configuration information about usage of storage for two-dimensional data structures at the memory address.
    Type: Application
    Filed: December 29, 2017
    Publication date: February 7, 2019
    Inventors: Raanan Sade, Simon Rubanovich, Amit Gradstein, Zeev Sperber, Alexander Heinecke, Robert Valentine, Mark J. Charney, Bret Toll, Jesus Corbal, Elmoustapha Ould-Ahmed-Vall, Menachem Adelman
  • Publication number: 20190042255
    Abstract: Embodiments detailed herein relate to systems and methods to store a tile register pair to memory. In one example, a processor includes: decode circuitry to decode a store matrix pair instruction having fields for an opcode and source and destination identifiers to identify source and destination matrices, respectively, each matrix having a PAIR parameter equal to TRUE; and execution circuitry to execute the decoded store matrix pair instruction to store every element of left and right tiles of the identified source matrix to corresponding element positions of left and right tiles of the identified destination matrix, respectively, wherein the executing stores a chunk of C elements of one row of the identified source matrix at a time.
    Type: Application
    Filed: December 29, 2017
    Publication date: February 7, 2019
    Inventors: Raanan Sade, Simon Rubanovich, Amit Gradstein, Zeev Sperber, Alexander Heinecke, Robert Valentine, Mark J. Charney, Bret Toll, Jesus Corbal, Elmoustapha Ould-Ahmed-Vall, Menachem Adelman
  • Publication number: 20190042256
    Abstract: Embodiments detailed herein relate to systems and methods to zero a tile register pair. In one example, a processor includes decode circuitry to decode a matrix pair zeroing instruction having fields for an opcode and an identifier to identify a destination matrix having a PAIR parameter equal to TRUE; and execution circuitry to execute the decoded matrix pair zeroing instruction to zero every element of a left matrix and a right matrix of the identified destination matrix.
    Type: Application
    Filed: December 29, 2017
    Publication date: February 7, 2019
    Inventors: Raanan Sade, Simon Rubanovich, Amit Gradstein, Zeev Sperber, Alexander Heinecke, Robert Valentine, Mark J. Charney, Bret Toll, Jesus Corbal, Elmoustapha Ould-Ahmed-Vall, Menachem Adelman, Eyal Hadas
  • Publication number: 20190042202
    Abstract: Disclosed embodiments relate to systems and methods for performing instructions to transpose rectangular tiles. In one example, a processor includes fetch circuitry to fetch an instruction having fields to specify an opcode and locations of first destination, second destination, first source, and second source matrices, the specified opcode to cause the processor to process each of the specified source and destination matrices as a rectangular matrix, decode circuitry to decode the fetched rectangular matrix transpose instruction, and execution circuitry to respond to the decoded rectangular matrix transpose instruction by transposing each row of elements of the specified first source matrix into a corresponding column of the specified first destination matrix and transposing each row of elements of the specified second source matrix into a corresponding column of the specified second destination matrix.
    Type: Application
    Filed: September 27, 2018
    Publication date: February 7, 2019
    Inventors: Raanan SADE, Robert VALENTINE, Mark J. CHARNEY, Simon RUBANOVICH, Amit GRADSTEIN, Zeev SPERBER, Bret TOLL, Jesus CORBAL, Christopher J. HUGHES, Alexander F. HEINECKE, Elmoustapha OULD-AHMED-VALL
  • Publication number: 20190042541
    Abstract: Embodiments detailed herein relate to matrix operations. For example, embodiments of instruction support for matrix (tile) dot product operations are detailed. Exemplary instructions including computing a dot product of signed words and accumulating in a quadword data elements of a matrix pair. Additionally, in some instances, non-accumulating quadword data elements of the matrix pair are set to zero.
    Type: Application
    Filed: December 29, 2017
    Publication date: February 7, 2019
    Inventors: Raanan Sade, Simon Rubanovich, Amit Gradstein, Zeev Sperber, Alexander Heinecke, Robert Valentine, Mark J. Charney, Bret Toll, Jesus Corbal, Elmoustapha Ould-Ahmed-Vall, Menachem Adelman
  • Patent number: 10157059
    Abstract: A processor for floating point underflow detection includes circuitry to decode a first instruction and a floating point unit. The decoded instruction, when executed by the processor, may be for performing a fused multiply-add (FMA) operation. The floating point unit includes circuitry to determine a non-normalized result of the first instruction based on a first input, a second input, and a third input. The floating point unit further includes circuitry to determine whether underflow exists in the non-normalized result based on a first exponent of the first input, a second exponent of the second input, and a third exponent of the third input.
    Type: Grant
    Filed: September 29, 2016
    Date of Patent: December 18, 2018
    Assignee: Intel Corporation
    Inventors: Simon Rubanovich, Thierry Pons, Zeev Sperber, Amit Gradstein
  • Publication number: 20180309461
    Abstract: An apparatus and method are described for performing vector compression. For example, one embodiment of a processor comprises: vector compression logic to compress a source vector comprising a plurality of valid data elements and invalid data elements to generate a destination vector in which valid data elements are stored contiguously on one side of the destination vector, the vector compression logic to utilize a bit mask associated with the source vector and comprising a plurality of bits, each bit corresponding to one of the plurality of data elements of the source vector and indicating whether the data element comprises a valid data element or an invalid data element, the vector compression logic to utilize indices of the bit mask and associated bit values of the bit mask to generate a control vector; and shuffle logic to shuffle/permute the data elements of the source vector to the destination vector in accordance with the control vector.
    Type: Application
    Filed: March 15, 2018
    Publication date: October 25, 2018
    Inventors: Simon Rubanovich, David M. Russinoff, Amit Gradstein, John W. O'Leary, Zeev Sperber