Patents by Inventor NIKITA A. SHUSTROV

NIKITA A. SHUSTROV has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20230297643
    Abstract: Matrix multiplication operations can be implemented, at least in part, on one or more tensor cores of a parallel processing unit. An efficiency of the matrix multiplication operations can be improved in cases where one of the input operands or the output operand of the matrix multiplication operation is a square matrix having a triangular data pattern. In such cases, the number of computations performed by the tensor cores of the parallel processing unit can be reduced by dropping computations and/or masking out elements of the square matrix input operand on one side of the main diagonal of the square matrix. In other cases where the output operand exhibits the triangular data pattern, computations can be dropped or masked out for the invalid side of the main diagonal of the square matrix. In an embodiment, a library implementing the matrix multiplication operations is provided.
    Type: Application
    Filed: March 21, 2022
    Publication date: September 21, 2023
    Inventors: Aniket Shivam, Andrew Kerr, Haicheng Wu, Manish Gupta, Nikita Shustrov, Qing Yang, Alan Kaatz, Aditya Avinash Atluri
  • Publication number: 20230083705
    Abstract: Embodiments of systems, apparatuses, and methods for chained fused multiply add. In some embodiments, an apparatus includes a decoder to decode a single instruction having an opcode, a destination field representing a destination operand, a first source field representing a plurality of packed data source operands of a first type that have packed data elements of a first size, a second source field representing a plurality of packed data source operands that have packed data elements of a second size, and a field for a memory location that stores a scalar value. A register file having a plurality of packed data registers includes registers for the plurality of packed data source operands that have packed data elements of a first size, the source operands that have packed data elements of a second size, and the destination operand.
    Type: Application
    Filed: September 23, 2022
    Publication date: March 16, 2023
    Inventors: JESUS CORBAL, ROBERT VALENTINE, ROMAN S. DUBTSOV, NIKITA A. SHUSTROV, MARK J. CHARNEY, DENNIS R. BRADFORD, MILIND B. GIRKAR, EDWARD T. GROCHOWSKI, THOMAS D. FLETCHER, WARREN E. FERGUSON
  • Patent number: 11487541
    Abstract: Embodiments of systems, apparatuses, and methods for chained fused multiply add. In some embodiments, an apparatus includes a decoder to decode a single instruction having an opcode, a destination field representing a destination operand, a first source field representing a plurality of packed data source operands of a first type that have packed data elements of a first size, a second source field representing a plurality of packed data source operands that have packed data elements of a second size, and a field for a memory location that stores a scalar value. A register file having a plurality of packed data registers includes registers for the plurality of packed data source operands that have packed data elements of a first size, the source operands that have packed data elements of a second size, and the destination operand.
    Type: Grant
    Filed: November 30, 2020
    Date of Patent: November 1, 2022
    Assignee: Intel Corporation
    Inventors: Jesus Corbal, Robert Valentine, Roman S. Dubtsov, Nikita A. Shustrov, Mark J. Charney, Dennis R. Bradford, Milind B. Girkar, Edward T. Grochowski, Thomas D. Fletcher, Warren E. Ferguson
  • Publication number: 20210406016
    Abstract: Embodiments for gathering and scattering matrix data by row are disclosed. In an embodiment, a processor includes a storage matrix, a decoder, and execution circuitry. The decoder is to decode an instruction having a format including an opcode field to specify an opcode and a first operand field to specify a set of irregularly spaced memory locations. The execution circuitry is to, in response to the decoded instruction, calculate a set of addresses corresponding to the set of irregularly spaced memory locations and transfer a set of rows of data between the storage and the set of irregularly spaced memory locations.
    Type: Application
    Filed: June 27, 2020
    Publication date: December 30, 2021
    Applicant: Intel Corporation
    Inventors: Christopher J. Hughes, Alexander F. Heinecke, Robert Valentine, Menachem Adelman, Evangelos Georganas, Mark J. Charney, Nikita A. Shustrov, Sara Baghsorkhi
  • Publication number: 20210081198
    Abstract: Embodiments of systems, apparatuses, and methods for chained fused multiply add. In some embodiments, an apparatus includes a decoder to decode a single instruction having an opcode, a destination field representing a destination operand, a first source field representing a plurality of packed data source operands of a first type that have packed data elements of a first size, a second source field representing a plurality of packed data source operands that have packed data elements of a second size, and a field for a memory location that stores a scalar value. A register file having a plurality of packed data registers includes registers for the plurality of packed data source operands that have packed data elements of a first size, the source operands that have packed data elements of a second size, and the destination operand.
    Type: Application
    Filed: November 30, 2020
    Publication date: March 18, 2021
    Inventors: JESUS CORBAL, ROBERT VALENTINE, ROMAN S. DUBTSOV, NIKITA A. SHUSTROV, MARK J. CHARNEY, DENNIS R. BRADFORD, MILIND B. GIRKAR, EDWARD T. GROCHOWSKI, THOMAS D. FLETCHER, WARREN E. FERGUSON
  • Patent number: 10853065
    Abstract: Embodiments of systems, apparatuses, and methods for chained fused multiply add. In some embodiments, an apparatus includes a decoder to decode a single instruction having an opcode, a destination field representing a destination operand, a first source field representing a plurality of packed data source operands of a first type that have packed data elements of a first size, a second source field representing a plurality of packed data source operands that have packed data elements of a second size, and a field for a memory location that stores a scalar value. A register file having a plurality of packed data registers includes registers for the plurality of packed data source operands that have packed data elements of a first size, the source operands that have packed data elements of a second size, and the destination operand.
    Type: Grant
    Filed: October 24, 2018
    Date of Patent: December 1, 2020
    Assignee: Intel Corporation
    Inventors: Jesus Corbal, Robert Valentine, Roman S. Dubtsov, Nikita A. Shustrov, Mark J. Charney, Dennis R. Bradford, Milind B. Girkar, Edward T. Grochowski, Thomas D. Fletcher, Warren E. Ferguson
  • Publication number: 20190121637
    Abstract: Embodiments of systems, apparatuses, and methods for chained fused multiply add. In some embodiments, an apparatus includes a decoder to decode a single instruction having an opcode, a destination field representing a destination operand, a first source field representing a plurality of packed data source operands of a first type that have packed data elements of a first size, a second source field representing a plurality of packed data source operands that have packed data elements of a second size, and a field for a memory location that stores a scalar value. A register file having a plurality of packed data registers includes registers for the plurality of packed data source operands that have packed data elements of a first size, the source operands that have packed data elements of a second size, and the destination operand.
    Type: Application
    Filed: October 24, 2018
    Publication date: April 25, 2019
    Inventors: JESUS CORBAL, ROBERT VALENTINE, ROMAN S. DUBTSOV, NIKITA A. SHUSTROV, MARK J. CHARNEY, DENNIS R. BRADFORD, MILIND B. GIRKAR, EDWARD T. GROCHOWSKI, THOMAS D. FLETCHER, WARREN E. FERGUSON
  • Patent number: 10146535
    Abstract: Embodiments of systems, apparatuses, and methods for chained fused multiply add. In some embodiments, an apparatus includes a decoder to decode a single instruction having an opcode, a destination field representing a destination operand, a first source field representing a plurality of packed data source operands of a first type that have packed data elements of a first size, a second source field representing a plurality of packed data source operands that have packed data elements of a second size, and a field for a memory location that stores a scalar value. A register file having a plurality of packed data registers includes registers for the plurality of packed data source operands that have packed data elements of a first size, the source operands that have packed data elements of a second size, and the destination operand.
    Type: Grant
    Filed: October 20, 2016
    Date of Patent: December 4, 2018
    Assignee: Intel Corporatoin
    Inventors: Jesus Corbal, Robert Valentine, Roman S. Dubtsov, Nikita A. Shustrov, Mark J. Charney, Dennis R. Bradford, Milind B. Girkar, Edward T. Grochowski, Thomas D. Fletcher, Warren E. Ferguson
  • Publication number: 20180113708
    Abstract: Embodiments of systems, apparatuses, and methods for chained fused multiply add. In some embodiments, an apparatus includes a decoder to decode a single instruction having an opcode, a destination field representing a destination operand, a first source field representing a plurality of packed data source operands of a first type that have packed data elements of a first size, a second source field representing a plurality of packed data source operands that have packed data elements of a second size, and a field for a memory location that stores a scalar value. A register file having a plurality of packed data registers includes registers for the plurality of packed data source operands that have packed data elements of a first size, the source operands that have packed data elements of a second size, and the destination operand.
    Type: Application
    Filed: October 20, 2016
    Publication date: April 26, 2018
    Inventors: JESUS CORBAL, ROBERT VALENTINE, ROMAN S. DUBTSOV, NIKITA A. SHUSTROV, MARK J. CHARNEY, DENNIS R. BRADFORD, MILIND B. GIRKAR, EDWARD T. GROCHOWSKI, THOMAS D. FLETCHER, WARREN E. FERGUSON