Patents by Inventor Milind Girkar

Milind Girkar has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20240134644
    Abstract: Embodiments detailed herein relate to matrix operations. In particular, support for matrix (tile) addition, subtraction, and multiplication is described. For example, circuitry to support instructions for element-by-element matrix (tile) addition, subtraction, and multiplication are detailed. In some embodiments, for matrix (tile) addition, decode circuitry is to decode an instruction having fields for an opcode, a first source matrix operand identifier, a second source matrix operand identifier, and a destination matrix operand identifier; and execution circuitry is to execute the decoded instruction to, for each data element position of the identified first source matrix operand: add a first data value at that data element position to a second data value at a corresponding data element position of the identified second source matrix operand, and store a result of the addition into a corresponding data element position of the identified destination matrix operand.
    Type: Application
    Filed: December 29, 2023
    Publication date: April 25, 2024
    Applicant: Intel Corporation
    Inventors: Robert VALENTINE, Dan BAUM, Zeev SPERBER, Jesus CORBAL, Elmoustapha OULD-AHMED-VALL, Bret L. TOLL, Mark J. CHARNEY, Barukh ZIV, Alexander HEINECKE, Milind GIRKAR, Simon RUBANOVICH
  • Publication number: 20240126546
    Abstract: Disclosed embodiments relate to executing a vector-complex fused multiply-add instruction. In one example, a method includes fetching an instruction, a format of the instruction including an opcode, a first source operand identifier, a second source operand identifier, and a destination operand identifier, wherein each of the identifiers identifies a location storing a packed data comprising at least one complex number, decoding the instruction, retrieving data associated with the first and second source operand identifiers, and executing the decoded instruction to, for each packed data element position of the identified first and second source operands, cross-multiply the real and imaginary components to generate four products: a product of real components, a product of imaginary components, and two mixed products, generate a complex result by using the four products according to the instruction, and store a result to the corresponding position of the identified destination operand.
    Type: Application
    Filed: December 28, 2023
    Publication date: April 18, 2024
    Inventors: Roman S. Dubtsov, Robert Valentine, Jesus Corbal, Milind Girkar, Elmoustapha Ould-Ahmed-Vall
  • Publication number: 20240111826
    Abstract: An apparatus to facilitate hardware enhancements for double precision systolic support is disclosed. The apparatus includes matrix acceleration hardware having double-precision (DP) matrix multiplication circuitry including a multiplier circuits to multiply pairs of input source operands in a DP floating-point format; adders to receive multiplier outputs from the multiplier circuits and accumulate the multiplier outputs in a high precision intermediate format; an accumulator circuit to accumulate adder outputs from the adders with at least one of a third global source operand on a first pass of the DP matrix multiplication circuitry or an intermediate result from the first pass on a second pass of the DP matrix multiplication circuitry, wherein the accumulator circuit to generate an accumulator output in the high precision intermediate format; and a down conversion and rounding circuit to down convert and round an output of the second pass as final result in the DP floating-point format.
    Type: Application
    Filed: September 30, 2022
    Publication date: April 4, 2024
    Applicant: Intel Corporation
    Inventors: Jiasheng Chen, Kevin Hurd, Changwon Rhee, Jorge Parra, Fangwen Fu, Theo Drane, William Zorn, Peter Caday, Gregory Henry, Guei-Yuan Lueh, Farzad Chehrazi, Amit Karande, Turbo Majumder, Xinmin Tian, Milind Girkar, Hong Jiang
  • Patent number: 11838418
    Abstract: A processor core that includes a token generator circuit is to execute a first instruction in response to initialization of a software program that requests access to protected data output by a cryptographic operation. To execute the first instruction, the processor core is to: retrieve a key that is to be used by the cryptographic operation; trigger the token generator circuit to generate an authorization token; cryptographically encode the key and the authorization token within a key handle; store the key handle in memory; and embed the authorization token within a cryptographic instruction that is to perform the cryptographic operation. The cryptographic instruction may be associated with a first logical compartment of the software program that is authorized access to the protected data.
    Type: Grant
    Filed: August 20, 2020
    Date of Patent: December 5, 2023
    Assignee: Intel Corporation
    Inventors: Milind Girkar, Jason W. Brandt, Michael LeMay
  • Patent number: 11579871
    Abstract: Embodiments of systems, apparatuses, and methods for performing vector-packed controllable sine and/or cosine operations in a processor are described. For example, execution circuitry executes a decoded instruction to compute at least a real output value and an imaginary output value based on at least a cosine calculation and a sine calculation, the cosine and sine calculations each based on an index value from a packed data source operand, add the index value with an index increment value from the packed data source operand to create an updated index value, and store the real output value, the imaginary output value, and the updated index value to a packed data destination operand.
    Type: Grant
    Filed: June 14, 2021
    Date of Patent: February 14, 2023
    Assignee: Intel Corporation
    Inventors: Venkateswara R. Madduri, Elmoustapha Ould-Ahmed-Vall, Robert Valentine, Jesus Corbal, Mark J. Charney, Carl Murray, Milind Girkar, Bret Toll
  • Publication number: 20220171623
    Abstract: Embodiments detailed herein relate to matrix operations. In particular, support for matrix (tile) addition, subtraction, and multiplication is described. For example, circuitry to support instructions for element-by-element matrix (tile) addition, subtraction, and multiplication are detailed. In some embodiments, for matrix (tile) addition, decode circuitry is to decode an instruction having fields for an opcode, a first source matrix operand identifier, a second source matrix operand identifier, and a destination matrix operand identifier; and execution circuitry is to execute the decoded instruction to, for each data element position of the identified first source matrix operand: add a first data value at that data element position to a second data value at a corresponding data element position of the identified second source matrix operand, and store a result of the addition into a corresponding data element position of the identified destination matrix operand.
    Type: Application
    Filed: December 10, 2021
    Publication date: June 2, 2022
    Applicant: Intel Corporation
    Inventors: Robert VALENTINE, Dan BAUM, Zeev SPERBER, Jesus CORBAL, Elmoustapha OULD-AHMED-VALL, Bret L. TOLL, Mark J. CHARNEY, Barukh ZIV, Alexander HEINECKE, Milind GIRKAR, Simon RUBANOVICH
  • Publication number: 20220035630
    Abstract: Embodiments of systems, apparatuses, and methods for performing vector-packed controllable sine and/or cosine operations in a processor are described. For example, execution circuitry executes a decoded instruction to compute at least a real output value and an imaginary output value based on at least a cosine calculation and a sine calculation, the cosine and sine calculations each based on an index value from a packed data source operand, add the index value with an index increment value from the packed data source operand to create an updated index value, and store the real output value, the imaginary output value, and the updated index value to a packed data destination operand.
    Type: Application
    Filed: June 14, 2021
    Publication date: February 3, 2022
    Applicant: Intel Corporation
    Inventors: Venkateswara R. MADDURI, Elmoustapha OULD-AHMED-VALL, Robert VALENTINE, Jesus CORBAL, Mark J. CHARNEY, Carl MURRAY, Milind GIRKAR, Bret TOLL
  • Patent number: 11200055
    Abstract: Embodiments detailed herein relate to matrix operations. In particular, support for matrix (tile) addition, subtraction, and multiplication is described. For example, circuitry to support instructions for element-by-element matrix (tile) addition, subtraction, and multiplication are detailed. In some embodiments, for matrix (tile) addition, decode circuitry is to decode an instruction having fields for an opcode, a first source matrix operand identifier, a second source matrix operand identifier, and a destination matrix operand identifier; and execution circuitry is to execute the decoded instruction to, for each data element position of the identified first source matrix operand: add a first data value at that data element position to a second data value at a corresponding data element position of the identified second source matrix operand, and store a result of the addition into a corresponding data element position of the identified destination matrix operand.
    Type: Grant
    Filed: July 1, 2017
    Date of Patent: December 14, 2021
    Assignee: Intel Corporation
    Inventors: Robert Valentine, Dan Baum, Zeev Sperber, Jesus Corbal, Elmoustapha Ould-Ahmed-Vall, Bret L. Toll, Mark J. Charney, Barukh Ziv, Alexander Heinecke, Milind Girkar, Simon Rubanovich
  • Publication number: 20210357217
    Abstract: Disclosed embodiments relate to executing a vector-complex fused multiply-add Instruction. In one example, a method includes fetching an instruction, a format of the instruction including an opcode, a first source operand identifier, a second source operand identifier, and a destination operand identifier, wherein each of the identifiers identifies a location storing a packed data comprising at least one complex number, decoding the instruction, retrieving data associated with the first and second source operand identifiers, and executing the decoded instruction to, for each packed data element position of the identified first and second source operands, cross-multiply the real and imaginary components to generate four products: a product of real components, a product of imaginary components, and two mixed products, generate a complex result by using the four products according to the instruction, and store a result to the corresponding position of the identified destination operand.
    Type: Application
    Filed: June 1, 2021
    Publication date: November 18, 2021
    Inventors: Roman S. Dubtsov, Robert Valentine, Jesus Corbal, Milind Girkar, Elmoustapha Ould-Ahmed-Vall
  • Patent number: 11036499
    Abstract: Embodiments of systems, apparatuses, and methods for performing controllable sine and/or cosine operations in a processor are described. For example, execution circuitry executes a decoded instruction to compute at least a real output value and an imaginary output value based on at least a cosine calculation and a sine calculation, the cosine and sine calculations each based on an index value from a packed data source operand, add the index value with an index increment value from the packed data source operand to create an updated index value, and store the real output value, the imaginary output value, and the updated index value to a packed data destination operand.
    Type: Grant
    Filed: June 30, 2017
    Date of Patent: June 15, 2021
    Assignee: Intel Corporation
    Inventors: Venkateswara R. Madduri, Elmoustapha Ould-Ahmed-Vall, Robert Valentine, Jesus Corbal, Mark J. Charney, Carl Murray, Milind Girkar, Bret Toll
  • Patent number: 11023231
    Abstract: Disclosed embodiments relate to executing a vector-complex fused multiply-add Instruction. In one example, a method includes fetching an instruction, a format of the instruction including an opcode, a first source operand identifier, a second source operand identifier, and a destination operand identifier, wherein each of the identifiers identifies a location storing a packed data comprising at least one complex number, decoding the instruction, retrieving data associated with the first and second source operand identifiers, and executing the decoded instruction to, for each packed data element position of the identified first and second source operands, cross-multiply the real and imaginary components to generate four products: a product of real components, a product of imaginary components, and two mixed products, generate a complex result by using the four products according to the instruction, and store a result to the corresponding position of the identified destination operand.
    Type: Grant
    Filed: October 1, 2016
    Date of Patent: June 1, 2021
    Assignee: Intel Corporation
    Inventors: Roman S. Dubtsov, Robert Valentine, Jesus Corbal, Milind Girkar, Elmoustapha Ould-Ahmed-Vall
  • Patent number: 10877910
    Abstract: Method, apparatus, and program means for a programmable event driven yield mechanism that may activate other threads. In one embodiment, an apparatus includes execution resources to execute a plurality of instructions and a monitor to detect a condition indicating a low level of progress. The monitor can disrupt processing of a program by transferring to a handler in response to detecting the condition indicating a low level of progress. In another embodiment, thread switch logic may be coupled to a plurality of event monitors which monitor events within the multithreading execution logic. The thread switch logic switches threads based at least partially on a programmable condition of one or more of the performance monitors.
    Type: Grant
    Filed: March 31, 2017
    Date of Patent: December 29, 2020
    Assignee: Intel Corporation
    Inventors: Hong Wang, Per Hammarlund, Xiang Zou, John P. Shen, Xinmin Tian, Milind Girkar, Perry H. Wang, Piyush N. Desai
  • Publication number: 20200382303
    Abstract: A processor core that includes a token generator circuit is to execute a first instruction in response to initialization of a software program that requests access to protected data output by a cryptographic operation. To execute the first instruction, the processor core is to: retrieve a key that is to be used by the cryptographic operation; trigger the token generator circuit to generate an authorization token; cryptographically encode the key and the authorization token within a key handle; store the key handle in memory; and embed the authorization token within a cryptographic instruction that is to perform the cryptographic operation. The cryptographic instruction may be associated with a first logical compartment of the software program that is authorized access to the protected data.
    Type: Application
    Filed: August 20, 2020
    Publication date: December 3, 2020
    Applicant: Intel Corporation
    Inventors: Milind Girkar, Jason W. Brandt, Michael LeMay
  • Patent number: 10785028
    Abstract: A processor core that includes a token generator circuit is to execute a first instruction in response to initialization of a software program that requests access to protected data output by a cryptographic operation. To execute the first instruction, the processor core is to: retrieve a key that is to be used by the cryptographic operation; trigger the token generator circuit to generate an authorization token; cryptographically encode the key and the authorization token within a key handle; store the key handle in memory; and embed the authorization token within a cryptographic instruction that is to perform the cryptographic operation. The cryptographic instruction may be associated with a first logical compartment of the software program that is authorized access to the protected data.
    Type: Grant
    Filed: June 29, 2018
    Date of Patent: September 22, 2020
    Assignee: Intel Corporation
    Inventors: Milind Girkar, Jason W. Brandt, Michael LeMay
  • Publication number: 20200073635
    Abstract: Embodiments of systems, apparatuses, and methods for vector-packed fractional multiplication of signed words with rounding, saturation, and high-result selection in a processor are described. For example, execution circuitry executes a decoded instruction to perform a fractional multiplication operation for each of a plurality of pairs of packed data elements to yield a plurality of output values, round each of the plurality of output values, detect whether any of the plurality of output values reflect an overflow or underflow, for any of the plurality of output values that reflect an overflow or underflow, saturate the output value, and store the plurality of output values into a corresponding plurality of positions of the packed data destination operand.
    Type: Application
    Filed: June 29, 2017
    Publication date: March 5, 2020
    Inventors: Venkateswara R. MADDURI, Elmoustapha OULD-AHMED-VALL, Robert VALENTINE, Jesus CORBAL, Mark J. CHARNEY, Carl MURRAY, Milind GIRKAR, Bret TOLL
  • Publication number: 20200073658
    Abstract: Embodiments of systems, apparatuses, and methods for performing controllable sine and/or cosine operations in a processor are described. For example, execution circuitry executes a decoded instruction to compute at least a real output value and an imaginary output value based on at least a cosine calculation and a sine calculation, the cosine and sine calculations each based on an index value from a packed data source operand, add the index value with an index increment value from the packed data source operand to create an updated index value, and store the real output value, the imaginary output value, and the updated index value to a packed data destination operand.
    Type: Application
    Filed: June 30, 2017
    Publication date: March 5, 2020
    Inventors: Venkateswara R. MADDURI, Elmoustapha OULD-AHMED-VALL, Robert VALENTINE, Jesus CORBAL, Mark J. CHARNEY, Carl MURRAY, Milind GIRKAR, Bret TOLL
  • Publication number: 20200007332
    Abstract: A processor core that includes a token generator circuit is to execute a first instruction in response to initialization of a software program that requests access to protected data output by a cryptographic operation. To execute the first instruction, the processor core is to: retrieve a key that is to be used by the cryptographic operation; trigger the token generator circuit to generate an authorization token; cryptographically encode the key and the authorization token within a key handle; store the key handle in memory; and embed the authorization token within a cryptographic instruction that is to perform the cryptographic operation. The cryptographic instruction may be associated with a first logical compartment of the software program that is authorized access to the protected data.
    Type: Application
    Filed: June 29, 2018
    Publication date: January 2, 2020
    Inventors: Milind Girkar, Jason W. Brandt, Michael LeMay
  • Publication number: 20190347310
    Abstract: Embodiments detailed herein relate to matrix operations. In particular, support for matrix (tile) addition, subtraction, and multiplication is described. For example, circuitry to support instructions for element-by-element matrix (tile) addition, subtraction, and multiplication are detailed. In some embodiments, for matrix (tile) addition, decode circuitry is to decode an instruction having fields for an opcode, a first source matrix operand identifier, a second source matrix operand identifier, and a destination matrix operand identifier; and execution circuitry is to execute the decoded instruction to, for each data element position of the identified first source matrix operand: add a first data value at that data element position to a second data value at a corresponding data element position of the identified second source matrix operand, and store a result of the addition into a corresponding data element position of the identified destination matrix operand.
    Type: Application
    Filed: July 1, 2017
    Publication date: November 14, 2019
    Applicant: Intel Corporation
    Inventors: Robert VALENTINE, Dan BAUM, Zeev SPERBER, Jesus CORBAL, Elmoustapha OULD-AHMED-VALL, Bret L. TOLL, Mark J. CHARNEY, Barukh ZIV, Alexander HEINECKE, Milind GIRKAR, Simon RUBANOVICH
  • Publication number: 20190347100
    Abstract: Embodiments detailed herein relate to matrix operations. In particular, support for a matrix transpose instruction is detailed. In some embodiments, decode circuitry to decode an instruction having fields for an opcode, a source matrix operand identifier, and a destination matrix operand identifier; and execution circuitry to execute the decoded instruction to transpose each row of elements of the identified source matrix operand into a corresponding column of the identified destination matrix operand are detailed.
    Type: Application
    Filed: July 1, 2017
    Publication date: November 14, 2019
    Applicant: Intel Corporation
    Inventors: Robert VALENTINE, Dan BAUM, Zeev SPERBER, Jesus CORBAL, Elmoustapha OULD-AHMED-VALL, Bret L. TOLL, Mark J. CHARNEY, Barukh ZIV, Alexander HEINECKE, Milind GIRKAR, Menachem ADELMAN, Simon RUBANOVICH
  • Patent number: 10459858
    Abstract: Method, apparatus, and program means for a programmable event driven yield mechanism that may activate other threads. In one embodiment, an apparatus includes execution resources to execute a plurality of instructions and a monitor to detect a condition indicating a low level of progress. The monitor can disrupt processing of a program by transferring to a handler in response to detecting the condition indicating a low level of progress. In another embodiment, thread switch logic may be coupled to a plurality of event monitors which monitor events within the multithreading execution logic. The thread switch logic switches threads based at least partially on a programmable condition of one or more of the performance monitors.
    Type: Grant
    Filed: November 6, 2017
    Date of Patent: October 29, 2019
    Assignee: Intel Corporation
    Inventors: Hong Wang, Per Hammarlund, Xiang Zou, John P. Shen, Xinmin Tian, Milind Girkar, Perry H. Wang, Piyush N. Desai