Patents by Inventor Ronen Zohar

Ronen Zohar has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20240045690
    Abstract: Disclosed embodiments relate to matrix compress/decompress instructions. In one example, a processor includes fetch circuitry to fetch a compress instruction having a format with fields to specify an opcode and locations of decompressed source and compressed destination matrices, decode circuitry to decode the fetched compress instructions, and execution circuitry, responsive to the decoded compress instruction, to: generate a compressed result according to a compress algorithm by compressing the specified decompressed source matrix by either packing non-zero-valued elements together and storing the matrix position of each non-zero-valued element in a header, or using fewer bits to represent one or more elements and using the header to identify matrix elements being represented by fewer bits; and store the compressed result to the specified compressed destination matrix.
    Type: Application
    Filed: September 1, 2023
    Publication date: February 8, 2024
    Inventors: Dan BAUM, Michael ESPIG, James GUILFORD, Wajdi K. FEGHALI, Raanan SADE, Christopher J. HUGHES, Robert VALENTINE, Bret TOLL, Elmoustapha OULD-AHMED-VALL, Mark J. CHARNEY, Vinodh GOPAL, Ronen ZOHAR, Alexander F. HEINECKE
  • Patent number: 11748103
    Abstract: Disclosed embodiments relate to matrix compress/decompress instructions. In one example, a processor includes fetch circuitry to fetch a compress instruction having a format with fields to specify an opcode and locations of decompressed source and compressed destination matrices, decode circuitry to decode the fetched compress instructions, and execution circuitry, responsive to the decoded compress instruction, to: generate a compressed result according to a compress algorithm by compressing the specified decompressed source matrix by either packing non-zero-valued elements together and storing the matrix position of each non-zero-valued element in a header, or using fewer bits to represent one or more elements and using the header to identify matrix elements being represented by fewer bits; and store the compressed result to the specified compressed destination matrix.
    Type: Grant
    Filed: February 15, 2022
    Date of Patent: September 5, 2023
    Assignee: Intel Corporation
    Inventors: Dan Baum, Michael Espig, James Guilford, Wajdi K. Feghali, Raanan Sade, Christopher J. Hughes, Robert Valentine, Bret Toll, Elmoustapha Ould-Ahmed-Vall, Mark J. Charney, Vinodh Gopal, Ronen Zohar, Alexander F. Heinecke
  • Publication number: 20230060900
    Abstract: Embodiments detailed herein relate to reduction operations on a plurality of data element values. In one embodiment, a process comprises decoding circuitry to decode an instruction and execution circuitry to execute the decoded instruction. The instruction specifies a first input register containing a plurality of data element values, a first index register containing a plurality of indices, and an output register, where each index of the plurality of indices maps to one unique data element position of the first input register. The execution includes to identify data element values that are associated with one another based on the indices, perform one or more reduction operations on the associated data element values based on the identification, and store results of the one or more reduction operations in the output register.
    Type: Application
    Filed: October 4, 2022
    Publication date: March 2, 2023
    Inventors: Christopher J. HUGHES, Jonathan D. PEARCE, Guei-Yuan LUEH, ElMoustapha OULD-AHMED-VALL, Jorge E. PARRA, Prasoonkumar SURTI, Krishna N. VINOD, Ronen ZOHAR
  • Patent number: 11507376
    Abstract: Disclosed embodiments relate to instructions for fast element unpacking. In one example, a processor includes fetch circuitry to fetch an instruction whose format includes fields to specify an opcode and locations of an Array-of-Structures (AOS) source matrix and one or more Structure of Arrays (SOA) destination matrices, wherein: the specified opcode calls for unpacking elements of the specified AOS source matrix into the specified Structure of Arrays (SOA) destination matrices, the AOS source matrix is to contain N structures each containing K elements of different types, with same-typed elements in consecutive structures separated by a stride, the SOA destination matrices together contain K segregated groups, each containing N same-typed elements, decode circuitry to decode the fetched instruction, and execution circuitry, responsive to the decoded instruction, to unpack each element of the specified AOS matrix into one of the K element types of the one or more SOA matrices.
    Type: Grant
    Filed: January 19, 2021
    Date of Patent: November 22, 2022
    Assignee: Intel Corporation
    Inventors: Bret Toll, Alexander F. Heinecke, Christopher J. Hughes, Ronen Zohar, Michael Espig, Dan Baum, Raanan Sade, Robert Valentine, Mark J. Charney, Elmoustapha Ould-Ahmed-Vall
  • Publication number: 20220229661
    Abstract: Embodiments detailed herein relate to reduction operations on a plurality of data element values. In one embodiment, a process comprises decoding circuitry to decode an instruction and execution circuitry to execute the decoded instruction. The instruction specifies a first input register containing a plurality of data element values, a first index register containing a plurality of indices, and an output register, where each index of the plurality of indices maps to one unique data element position of the first input register. The execution includes to identify data element values that are associated with one another based on the indices, perform one or more reduction operations on the associated data element values based on the identification, and store results of the one or more reduction operations in the output register.
    Type: Application
    Filed: April 4, 2022
    Publication date: July 21, 2022
    Inventors: Christopher J. HUGHES, Jonathan D. PEARCE, Guei-Yuan LUEH, ElMoustapha OULD-AHMED-VALL, Jorge E. PARRA, Prasoonkumar SURTI, Krishna N. VINOD, Ronen ZOHAR
  • Publication number: 20220171627
    Abstract: Disclosed embodiments relate to matrix compress/decompress instructions. In one example, a processor includes fetch circuitry to fetch a compress instruction having a format with fields to specify an opcode and locations of decompressed source and compressed destination matrices, decode circuitry to decode the fetched compress instructions, and execution circuitry, responsive to the decoded compress instruction, to: generate a compressed result according to a compress algorithm by compressing the specified decompressed source matrix by either packing non-zero-valued elements together and storing the matrix position of each non-zero-valued element in a header, or using fewer bits to represent one or more elements and using the header to identify matrix elements being represented by fewer bits; and store the compressed result to the specified compressed destination matrix.
    Type: Application
    Filed: February 15, 2022
    Publication date: June 2, 2022
    Inventors: Dan BAUM, Michael ESPIG, James GUILFORD, Wajdi K. FEGHALI, Raanan SADE, Christopher J. HUGHES, Robert VALENTINE, Bret TOLL, Elmoustapha OULD-AHMED-VALL, Mark J. CHARNEY, Vinodh GOPAL, Ronen ZOHAR, Alexander F. HEINECKE
  • Patent number: 11327754
    Abstract: Methods and apparatus for approximation using polynomial functions are disclosed. In one embodiment, a processor comprises decoding and execution circuitry. The decoding circuitry is to decode an instruction, where the instruction comprises a first operand specifying an output location and a second operand specifying a plurality of data element values to be computed. The execution circuitry is to execute the decoded instruction. The execution includes to compute a result for each of the plurality of data element values using a polynomial function to approximate a complex function, where the computation uses coefficients stored in a lookup location for the complex function, and where data element values within different data element value ranges use different sets of coefficients. The execution further includes to store results of the computation in the output location.
    Type: Grant
    Filed: March 27, 2019
    Date of Patent: May 10, 2022
    Assignee: INTEL CORPORATION
    Inventors: Jorge Parra, Dan Baum, Robert S. Chappell, Michael Espig, Varghese George, Alexander Heinecke, Christopher Hughes, Subramaniam Maiyuran, Prasoonkumar Surti, Ronen Zohar, Elmoustapha Ould-Ahmed-Vall
  • Patent number: 11294670
    Abstract: Embodiments detailed herein relate to reduction operations on a plurality of data element values. In one embodiment, a process comprises decoding circuitry to decode an instruction and execution circuitry to execute the decoded instruction. The instruction specifies a first input register containing a plurality of data element values, a first index register containing a plurality of indices, and an output register, where each index of the plurality of indices maps to one unique data element position of the first input register. The execution includes to identify data element values that are associated with one another based on the indices, perform one or more reduction operations on the associated data element values based on the identification, and store results of the one or more reduction operations in the output register.
    Type: Grant
    Filed: March 27, 2019
    Date of Patent: April 5, 2022
    Assignee: INTEL CORPORATION
    Inventors: Christopher J. Hughes, Jonathan D. Pearce, Guei-Yuan Lueh, ElMoustapha Ould-Ahmed-Vall, Jorge E. Parra, Prasoonkumar Surti, Krishna N. Vinod, Ronen Zohar
  • Patent number: 11249761
    Abstract: Disclosed embodiments relate to matrix compress/decompress instructions. In one example, a processor includes fetch circuitry to fetch a compress instruction having a format with fields to specify an opcode and locations of decompressed source and compressed destination matrices, decode circuitry to decode the fetched compress instructions, and execution circuitry, responsive to the decoded compress instruction, to: generate a compressed result according to a compress algorithm by compressing the specified decompressed source matrix by either packing non-zero-valued elements together and storing the matrix position of each non-zero-valued element in a header, or using fewer bits to represent one or more elements and using the header to identify matrix elements being represented by fewer bits; and store the compressed result to the specified compressed destination matrix.
    Type: Grant
    Filed: July 20, 2020
    Date of Patent: February 15, 2022
    Assignee: Intel Corporation
    Inventors: Dan Baum, Michael Espig, James Guilford, Wajdi K. Feghali, Raanan Sade, Christopher J. Hughes, Robert Valentine, Bret Toll, Elmoustapha Ould-Ahmed-Vall, Mark J. Charney, Vinodh Gopal, Ronen Zohar, Alexander F. Heinecke
  • Publication number: 20210216314
    Abstract: In one embodiment, the present invention includes a method for receiving a rounding instruction and an immediate value in a processor, determining if a rounding mode override indicator of the immediate value is active, and if so executing a rounding operation on a source operand in a floating point unit of the processor responsive to the rounding instruction and according to a rounding mode set forth in the immediate operand. Other embodiments are described and claimed.
    Type: Application
    Filed: December 1, 2020
    Publication date: July 15, 2021
    Inventors: Ronen Zohar, Shane Story
  • Publication number: 20210216315
    Abstract: Disclosed embodiments relate to instructions for fast element unpacking. In one example, a processor includes fetch circuitry to fetch an instruction whose format includes fields to specify an opcode and locations of an Array-of-Structures (AOS) source matrix and one or more Structure of Arrays (SOA) destination matrices, wherein: the specified opcode calls for unpacking elements of the specified AOS source matrix into the specified Structure of Arrays (SOA) destination matrices, the AOS source matrix is to contain N structures each containing K elements of different types, with same-typed elements in consecutive structures separated by a stride, the SOA destination matrices together contain K segregated groups, each containing N same-typed elements, decode circuitry to decode the fetched instruction, and execution circuitry, responsive to the decoded instruction, to unpack each element of the specified AOS matrix into one of the K element types of the one or more SOA matrices.
    Type: Application
    Filed: January 19, 2021
    Publication date: July 15, 2021
    Inventors: Bret TOLL, Alexander F. HEINECKE, Christopher J. HUGHES, Ronen ZOHAR, Michael ESPIG, Dan BAUM, Raanan SADE, Robert VALENTINE, Mark J. CHARNEY, Elmoustapha OULD-AHMED-VALL
  • Patent number: 10896043
    Abstract: Disclosed embodiments relate to instructions for fast element unpacking. In one example, a processor includes fetch circuitry to fetch an instruction whose format includes fields to specify an opcode and locations of an Array-of-Structures (AOS) source matrix and one or more Structure of Arrays (SOA) destination matrices, wherein: the specified opcode calls for unpacking elements of the specified AOS source matrix into the specified Structure of Arrays (SOA) destination matrices, the AOS source matrix is to contain N structures each containing K elements of different types, with same-typed elements in consecutive structures separated by a stride, the SOA destination matrices together contain K segregated groups, each containing N same-typed elements, decode circuitry to decode the fetched instruction, and execution circuitry, responsive to the decoded instruction, to unpack each element of the specified AOS matrix into one of the K element types of the one or more SOA matrices.
    Type: Grant
    Filed: September 28, 2018
    Date of Patent: January 19, 2021
    Assignee: Intel Corporation
    Inventors: Bret Toll, Alexander F. Heinecke, Christopher J. Hughes, Ronen Zohar, Michael Espig, Dan Baum, Raanan Sade, Robert Valentine, Mark J. Charney, Elmoustapha Ould-Ahmed-Vall
  • Publication number: 20200348937
    Abstract: Disclosed embodiments relate to matrix compress/decompress instructions. In one example, a processor includes fetch circuitry to fetch a compress instruction having a format with fields to specify an opcode and locations of decompressed source and compressed destination matrices, decode circuitry to decode the fetched compress instructions, and execution circuitry, responsive to the decoded compress instruction, to: generate a compressed result according to a compress algorithm by compressing the specified decompressed source matrix by either packing non-zero-valued elements together and storing the matrix position of each non-zero-valued element in a header, or using fewer bits to represent one or more elements and using the header to identify matrix elements being represented by fewer bits; and store the compressed result to the specified compressed destination matrix.
    Type: Application
    Filed: July 20, 2020
    Publication date: November 5, 2020
    Inventors: Dan BAUM, Michael ESPIG, James GUILFORD, Wajdi K. FEGHALI, Raanan SADE, Christopher J. HUGHES, Robert VALENTINE, Bret TOLL, Elmoustapha OULD-AHMED-VALL, Mark J. CHARNEY, Vinodh GOPAL, Ronen ZOHAR, Alexander F. HEINECKE
  • Patent number: 10817297
    Abstract: Methods and apparatus for vector-matrix comparison are disclosed. In one embodiment, a processor comprises decoding and execution circuitry. The decoding circuitry decodes an instruction, where operands of the instruction specifies an output location to store output results, a vector of data element values, and a matrix of data element values. The execution circuitry executes the decoded instruction. The execution includes to map each of the data element values of the vector to one of consecutive rows of the matrix; for each data element value of the vector, to compare that data element value of the vector with data element values in a respective row of the matrix and obtain data element match results. The execution further includes to store the output results based on the data element match results, where each output result maps to a respective data element column position and indicates a vector match result.
    Type: Grant
    Filed: March 30, 2019
    Date of Patent: October 27, 2020
    Assignee: Intel Corporation
    Inventors: Christopher J. Hughes, ElMoustapha Ould-Ahmed-Vall, Jorge E. Parra, Prasoonkumar Surti, Krishna N. Vinod, Ronen Zohar
  • Publication number: 20200310804
    Abstract: Methods and apparatus for vector-matrix comparison are disclosed. In one embodiment, a processor comprises decoding and execution circuitry. The decoding circuitry decodes an instruction, where operands of the instruction specifies an output location to store output results, a vector of data element values, and a matrix of data element values. The execution circuitry executes the decoded instruction. The execution includes to map each of the data element values of the vector to one of consecutive rows of the matrix; for each data element value of the vector, to compare that data element value of the vector with data element values in a respective row of the matrix and obtain data element match results. The execution further includes to store the output results based on the data element match results, where each output result maps to a respective data element column position and indicates a vector match result.
    Type: Application
    Filed: March 30, 2019
    Publication date: October 1, 2020
    Inventors: Christopher J. HUGHES, ElMoustapha OULD-AHMED-VALL, Jorge E. PARRA, Prasoonkumar SURTI, Krishna N. VINOD, Ronen ZOHAR
  • Publication number: 20200310800
    Abstract: Methods and apparatus for approximation using polynomial functions are disclosed. In one embodiment, a processor comprises decoding and execution circuitry. The decoding circuitry is to decode an instruction, where the instruction comprises a first operand specifying an output location and a second operand specifying a plurality of data element values to be computed. The execution circuitry is to execute the decoded instruction. The execution includes to compute a result for each of the plurality of data element values using a polynomial function to approximate a complex function, where the computation uses coefficients stored in a lookup location for the complex function, and where data element values within different data element value ranges use different sets of coefficients. The execution further includes to store results of the computation in the output location.
    Type: Application
    Filed: March 27, 2019
    Publication date: October 1, 2020
    Inventors: Jorge PARRA, Dan BAUM, Robert CHAPPELL, Michael ESPIG, Varghese GEORGE, Alexander HEINECKE, Christopher HUGHES, Subramaniam MAIYURAN, Elmoustapha OULD-AHMED-VALL, Prasoonkumar SURTI, Ronen ZOHAR
  • Publication number: 20200310809
    Abstract: Embodiments detailed herein relate to reduction operations on a plurality of data element values. In one embodiment, a process comprises decoding circuitry to decode an instruction and execution circuitry to execute the decoded instruction. The instruction specifies a first input register containing a plurality of data element values, a first index register containing a plurality of indices, and an output register, where each index of the plurality of indices maps to one unique data element position of the first input register. The execution includes to identify data element values that are associated with one another based on the indices, perform one or more reduction operations on the associated data element values based on the identification, and store results of the one or more reduction operations in the output register.
    Type: Application
    Filed: March 27, 2019
    Publication date: October 1, 2020
    Inventors: Christopher J. HUGHES, Jonathan D. PEARCE, Guei-Yuan LUEH, ElMoustapha OULD-AHMED-VALL, Jorge E. PARRA, Prasoonkumar SURTI, Krishna N. VINOD, Ronen ZOHAR
  • Patent number: 10782971
    Abstract: Methods and apparatus for vector-matrix comparison are disclosed. In one embodiment, a processor comprises decoding and execution circuitry. The decoding circuitry decodes an instruction, where operands of the instruction specifies an output location to store output results, a vector of data element values, and a matrix of data element values. The execution circuitry executes the decoded instruction. The execution includes to map each of the data element values of the vector to one of consecutive rows of the matrix; for each data element value of the vector, to compare that data element value of the vector with data element values in a respective row of the matrix and obtain data element match results. The execution further includes to store the output results based on the data element match results, where each output result maps to a respective data element column position and indicates a vector match result.
    Type: Grant
    Filed: March 30, 2019
    Date of Patent: September 22, 2020
    Assignee: Intel Corporation
    Inventors: Christopher J. Hughes, ElMoustapha Ould-Ahmed-Vall, Jorge E. Parra, Prasoonkumar Surti, Krishna N. Vinod, Ronen Zohar
  • Patent number: 10719323
    Abstract: Disclosed embodiments relate to matrix compress/decompress instructions. In one example, a processor includes fetch circuitry to fetch a compress instruction having a format with fields to specify an opcode and locations of decompressed source and compressed destination matrices, decode circuitry to decode the fetched compress instructions, and execution circuitry, responsive to the decoded compress instruction, to: generate a compressed result according to a compress algorithm by compressing the specified decompressed source matrix by either packing non-zero-valued elements together and storing the matrix position of each non-zero-valued element in a header, or using fewer bits to represent one or more elements and using the header to identify matrix elements being represented by fewer bits; and store the compressed result to the specified compressed destination matrix.
    Type: Grant
    Filed: September 27, 2018
    Date of Patent: July 21, 2020
    Assignee: Intel Corporation
    Inventors: Dan Baum, Michael Espig, James Guilford, Wajdi K. Feghali, Raanan Sade, Christopher J. Hughes, Robert Valentine, Bret Toll, Elmoustapha Ould-Ahmed-Vall, Mark J. Charney, Vinodh Gopal, Ronen Zohar, Alexander F. Heinecke
  • Patent number: 10572251
    Abstract: A method and apparatus for including in processor instructions for performing logical-comparison and branch support operations on packed or unpacked data. In one embodiment, instruction decode logic decodes instructions for an execution unit to operate on packed data elements including logical comparisons. A register file including 128-bit packed data registers stores packed single-precision floating point (SPFP) and packed integer data elements. The logical comparisons may include comparison of SPFP data elements and comparison of integer data elements and setting at least one bit to indicate the results. Based on these comparisons, branch support actions are taken. Such branch support actions may include setting the at least one bit, which in turn may be utilized by a branching unit in response to a branch instruction. Alternatively, the branch support actions may include branching to an indicated target code location.
    Type: Grant
    Filed: November 8, 2018
    Date of Patent: February 25, 2020
    Assignee: Intel Corporation
    Inventors: Rajiv Kapoor, Ronen Zohar, Mark J. Buxton, Zeev Sperber, Koby Gottlieb