Patents by Inventor Edward T. Grochowski

Edward T. Grochowski has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 10282296
    Abstract: Embodiments of an invention a processor architecture are disclosed. In an embodiment, a processor includes a decoder, an execution unit, a coherent cache, and an interconnect. The decoder is to decode an instruction to zero a cache line. The execution unit is to issue a write command to initiate a cache line sized write of zeros. The coherent cache is to receive the write command, to determine whether there is a hit in the coherent cache and whether a cache coherency protocol state of the hit cache line is a modified state or an exclusive state, to configure a cache line to indicate all zeros, and to issue the write command toward the interconnect. The interconnect is to, responsive to receipt of the write command, issue a snoop to each of a plurality of other coherent caches for which it must be determined if there is a hit.
    Type: Grant
    Filed: December 12, 2016
    Date of Patent: May 7, 2019
    Assignee: Intel Corporation
    Inventors: Jason W. Brandt, Robert S. Chappell, Jesus Corbal, Edward T. Grochowski, Stephen H. Gunther, Buford M. Guy, Thomas R. Huff, Elmoustapha Ould-Ahmed-Vall, Bret L. Toll, David Papworth, James D. Allen
  • Patent number: 10275243
    Abstract: A processor of an aspect includes a decode unit to decode a matrix multiplication instruction. The matrix multiplication instruction is to indicate a first memory location of a first source matrix, is to indicate a second memory location of a second source matrix, and is to indicate a third memory location where a result matrix is to be stored. The processor also includes an execution unit coupled with the decode unit. The execution unit, in response to the matrix multiplication instruction, is to multiply a portion of the first and second source matrices prior to an interruption, and store a completion progress indicator in response to the interruption. The completion progress indicator to indicate an amount of progress in multiplying the first and second source matrices, and storing corresponding result data to the third memory location, that is to have been completed prior to the interruption.
    Type: Grant
    Filed: July 2, 2016
    Date of Patent: April 30, 2019
    Assignee: Intel Corporation
    Inventors: Edward T. Grochowski, Asit K. Mishra, Robert Valentine, Mark J. Charney, Simon C. Steely, Jr.
  • Publication number: 20190121637
    Abstract: Embodiments of systems, apparatuses, and methods for chained fused multiply add. In some embodiments, an apparatus includes a decoder to decode a single instruction having an opcode, a destination field representing a destination operand, a first source field representing a plurality of packed data source operands of a first type that have packed data elements of a first size, a second source field representing a plurality of packed data source operands that have packed data elements of a second size, and a field for a memory location that stores a scalar value. A register file having a plurality of packed data registers includes registers for the plurality of packed data source operands that have packed data elements of a first size, the source operands that have packed data elements of a second size, and the destination operand.
    Type: Application
    Filed: October 24, 2018
    Publication date: April 25, 2019
    Inventors: JESUS CORBAL, ROBERT VALENTINE, ROMAN S. DUBTSOV, NIKITA A. SHUSTROV, MARK J. CHARNEY, DENNIS R. BRADFORD, MILIND B. GIRKAR, EDWARD T. GROCHOWSKI, THOMAS D. FLETCHER, WARREN E. FERGUSON
  • Patent number: 10261904
    Abstract: Operations associated with a memory and operations associated with one or more functional units may be received. A dependency between the operations associated with the memory and the operations associated with one or more of the functional units may be determined. A first ordering may be created for the operations associated with the memory. Furthermore, a second ordering may be created for the operations associated with one or more of the functional units based on the determined dependency and the first operating of the operations associated with the memory.
    Type: Grant
    Filed: December 7, 2017
    Date of Patent: April 16, 2019
    Assignee: Intel Corporation
    Inventors: Chunhui Zhang, George Z. Chrysos, Edward T. Grochowski, Ramacharan Sundararaman, Chung-Lun Chan, Federico Ardanaz
  • Patent number: 10234930
    Abstract: In an embodiment, a processor includes: a plurality of first cores to independently execute instructions, each of the plurality of first cores including a plurality of counters to store performance information; at least one second core to perform memory operations; and a power controller to receive performance information from at least some of the plurality of counters, determine a workload type executed on the processor based at least in part on the performance information, and based on the workload type dynamically migrate one or more threads from one or more of the plurality of first cores to the at least one second core for execution during a next operation interval. Other embodiments are described and claimed.
    Type: Grant
    Filed: February 13, 2015
    Date of Patent: March 19, 2019
    Assignee: Intel Corporation
    Inventors: Victor W. Lee, Edward T. Grochowski, Daehyun Kim, Yuxin Bai, Sheng Li, Naveen K. Mellempudi, Dhiraj D. Kalamkar
  • Publication number: 20190012266
    Abstract: Embodiments of an invention a processor architecture are disclosed. In an embodiment, a processor includes a decoder, an execution unit, a coherent cache, and an interconnect. The decoder is to decode an instruction to zero a cache line. The execution unit is to issue a write command to initiate a cache line sized write of zeros. The coherent cache is to receive the write command, to determine whether there is a hit in the coherent cache and whether a cache coherency protocol state of the hit cache line is a modified state or an exclusive state, to configure a cache line to indicate all zeros, and to issue the write command toward the interconnect. The interconnect is to, responsive to receipt of the write command, issue a snoop to each of a plurality of other coherent caches for which it must be determined if there is a hit.
    Type: Application
    Filed: August 28, 2018
    Publication date: January 10, 2019
    Inventors: Jason W. Brandt, Robert S. Chappell, Jesus Corbal, Edward T. Grochowski, Stephen H. Gunther, Buford M. Guy, Thomas R. Huff, Christopher J. Hughes, Elmoustapha Ould-Ahmed-Vall, Ronak Singhal, Seyed Yahya Sotoudeh, Bret L. Toll, Lihu Rappoport, David Papworth, James D. Allen
  • Patent number: 10146535
    Abstract: Embodiments of systems, apparatuses, and methods for chained fused multiply add. In some embodiments, an apparatus includes a decoder to decode a single instruction having an opcode, a destination field representing a destination operand, a first source field representing a plurality of packed data source operands of a first type that have packed data elements of a first size, a second source field representing a plurality of packed data source operands that have packed data elements of a second size, and a field for a memory location that stores a scalar value. A register file having a plurality of packed data registers includes registers for the plurality of packed data source operands that have packed data elements of a first size, the source operands that have packed data elements of a second size, and the destination operand.
    Type: Grant
    Filed: October 20, 2016
    Date of Patent: December 4, 2018
    Assignee: Intel Corporatoin
    Inventors: Jesus Corbal, Robert Valentine, Roman S. Dubtsov, Nikita A. Shustrov, Mark J. Charney, Dennis R. Bradford, Milind B. Girkar, Edward T. Grochowski, Thomas D. Fletcher, Warren E. Ferguson
  • Publication number: 20180173628
    Abstract: Operations associated with a memory and operations associated with one or more functional units may be received. A dependency between the operations associated with the memory and the operations associated with one or more of the functional units may be determined. A first ordering may be created for the operations associated with the memory. Furthermore, a second ordering may be created for the operations associated with one or more of the functional units based on the determined dependency and the first operating of the operations associated with the memory.
    Type: Application
    Filed: December 7, 2017
    Publication date: June 21, 2018
    Inventors: CHUNHUI ZHANG, GEORGE Z. CHRYSOS, EDWARD T. GROCHOWSKI, RAMACHARAN SUNDARARAMAN, CHUNG-LUN CHAN, FEDERICO ARDANAZ
  • Publication number: 20180173437
    Abstract: First elements of a dense vector to be multiplied with first elements of a first row of a sparse array may be determined. The determined first elements of the dense vector may be written into a memory. A dot product for the first elements of the sparse array and the first elements of the dense vector may be calculated in a plurality of increments by multiplying a subset of the first elements of the sparse array and a corresponding subset of the first elements of the dense vector. A sequence number may be updated after each increment is completed to identify a column number and/or a row number of the sparse array for which the dot product calculations have been completed.
    Type: Application
    Filed: December 19, 2016
    Publication date: June 21, 2018
    Inventors: Asit K. Mishra, Deborah T. Marr, Edward T. Grochowski
  • Publication number: 20180165199
    Abstract: Embodiments of an invention a processor architecture are disclosed. In an embodiment, a processor includes a decoder, an execution unit, a coherent cache, and an interconnect. The decoder is to decode an instruction to zero a cache line. The execution unit is to issue a write command to initiate a cache line sized write of zeros. The coherent cache is to receive the write command, to determine whether there is a hit in the coherent cache and whether a cache coherency protocol state of the hit cache line is a modified state or an exclusive state, to configure a cache line to indicate all zeros, and to issue the write command toward the interconnect. The interconnect is to, responsive to receipt of the write command, issue a snoop to each of a plurality of other coherent caches for which it must be determined if there is a hit.
    Type: Application
    Filed: December 12, 2016
    Publication date: June 14, 2018
    Inventors: Jason W. Brandt, Robert S. Chappell, Jesus Corbal, Edward T. Grochowski, Stephen H. Gunther, Buford M. Guy, Thomas R. Huff, Christopher J. Hughes, Elmoustapha Ould-Ahmed-Vall, Ronak Singhal, Seyed Yahya Sotoudeh, Bret L. Toll, Lihu Rappoport, David Papworth, James D. Allen
  • Patent number: 9977674
    Abstract: Disclosed are an apparatus, system, and method for implementing predicated instructions using micro-operations. A micro-code engine receives an instruction, decomposes the instruction, and generates a plurality of micro-operations to implement the instruction. Each of the decomposed micro-operations indicates a single destination register. For predicated instructions, the decomposed micro-operations include “conditional move” micro-operations to select between two potential output values. Except in the case that one of the potential output values is a constant, the decomposed micro-operations for a predicated instruction also include an append instruction that saves the incoming value of a destination register in a temporary variable. For at least one embodiment, the qualifying predicate for a predicated instruction is appended to the incoming value stored in the temporary register.
    Type: Grant
    Filed: October 14, 2003
    Date of Patent: May 22, 2018
    Assignee: Intel Corporation
    Inventors: Jeffrey P. Rupley, II, Edward A. Brekelbaum, Edward T. Grochowski, Bryan P. Black
  • Publication number: 20180113708
    Abstract: Embodiments of systems, apparatuses, and methods for chained fused multiply add. In some embodiments, an apparatus includes a decoder to decode a single instruction having an opcode, a destination field representing a destination operand, a first source field representing a plurality of packed data source operands of a first type that have packed data elements of a first size, a second source field representing a plurality of packed data source operands that have packed data elements of a second size, and a field for a memory location that stores a scalar value. A register file having a plurality of packed data registers includes registers for the plurality of packed data source operands that have packed data elements of a first size, the source operands that have packed data elements of a second size, and the destination operand.
    Type: Application
    Filed: October 20, 2016
    Publication date: April 26, 2018
    Inventors: JESUS CORBAL, ROBERT VALENTINE, ROMAN S. DUBTSOV, NIKITA A. SHUSTROV, MARK J. CHARNEY, DENNIS R. BRADFORD, MILIND B. GIRKAR, EDWARD T. GROCHOWSKI, THOMAS D. FLETCHER, WARREN E. FERGUSON
  • Patent number: 9934032
    Abstract: A method includes receiving a packed data instruction indicating a first narrower source packed data operand and a narrower destination operand. The instruction is mapped to a masked packed data operation indicating a first wider source packed data operand that is wider than and includes the first narrower source operand, and indicating a wider destination operand that is wider than and includes the narrower destination operand. A packed data operation mask is generated that includes a mask element for each corresponding result data element of a packed data result to be stored by the masked packed data operation. All mask elements that correspond to result data elements to be stored by the masked operation that would not be stored by the packed data instruction are masking out. The masked operation is performed using the packed data operation mask. The packed data result is stored in the wider destination operand.
    Type: Grant
    Filed: October 24, 2016
    Date of Patent: April 3, 2018
    Assignee: Intel Corporation
    Inventors: Edward T. Grochowski, Seyed Yahya Sotoudeh, Buford M. Guy
  • Patent number: 9898286
    Abstract: A processor includes a decode unit to decode a packed finite impulse response (FIR) filter instruction that indicates one or more source packed data operands, a plurality of FIR filter coefficients, and a destination storage location. The source operand(s) include a first number of data elements and a second number of additional data elements. The second number is one less than a number of FIR filter taps. An execution unit, in response to the packed FIR filter instruction being decoded, is to store a result packed data operand. The result packed data operand includes the first number of FIR filtered data elements that each is to be based on a combination of products of the plurality of FIR filter coefficients and a different corresponding set of data elements from the one or more source packed data operands, which is equal in number to the number of FIR filter taps.
    Type: Grant
    Filed: May 5, 2015
    Date of Patent: February 20, 2018
    Assignee: Intel Corporation
    Inventors: Edwin Jan Van Dalen, Martinus C. Wezelenburg, Steven Roos, Edward T. Grochowski, Moshe Maor
  • Patent number: 9875213
    Abstract: Instructions and logic provide SIMD vector packed histogram functionality. Some processor embodiments include first and second registers storing, in each of a plurality of data fields of a register lane portion, corresponding elements of a first and of a second data type, respectively. A decode stage decodes an instruction for SIMD vector packed histograms. One or more execution units, compare each element of the first data type, in the first register lane portion, with a range specified by the instruction. For any elements of the first register portion in said range, corresponding elements of the second data type, from the second register portion, are added into one of a plurality data fields of a destination register lane portion, selected according to the value of its corresponding element of the first data type, to generate packed weighted histograms for each destination register lane portion.
    Type: Grant
    Filed: June 26, 2015
    Date of Patent: January 23, 2018
    Assignee: Intel Corporation
    Inventors: Edward T. Grochowski, Galina Ryvchin, Michael Behar
  • Patent number: 9875185
    Abstract: Operations associated with a memory and operations associated with one or more functional units may be received. A dependency between the operations associated with the memory and the operations associated with one or more of the functional units may be determined. A first ordering may be created for the operations associated with the memory. Furthermore, a second ordering may be created for the operations associated with one or more of the functional units based on the determined dependency and the first operating of the operations associated with the memory.
    Type: Grant
    Filed: July 9, 2014
    Date of Patent: January 23, 2018
    Assignee: Intel Corporation
    Inventors: Chunhui Zhang, George Z. Chrysos, Edward T. Grochowski, Ramacharan Sundararaman, Chung-Lun Chan, Federico Ardanaz
  • Publication number: 20180004510
    Abstract: A processor of an aspect includes a decode unit to decode a matrix multiplication instruction. The matrix multiplication instruction is to indicate a first memory location of a first source matrix, is to indicate a second memory location of a second source matrix, and is to indicate a third memory location where a result matrix is to be stored. The processor also includes an execution unit coupled with the decode unit. The execution unit, in response to the matrix multiplication instruction, is to multiply a portion of the first and second source matrices prior to an interruption, and store a completion progress indicator in response to the interruption. The completion progress indicator to indicate an amount of progress in multiplying the first and second source matrices, and storing corresponding result data to the third memory location, that is to have been completed prior to the interruption.
    Type: Application
    Filed: July 2, 2016
    Publication date: January 4, 2018
    Applicant: Intel Corporation
    Inventors: Edward T. Grochowski, Asit K. Mishra, Robert Valentine, Mark J. Charney, Simon C. Steely, JR.
  • Patent number: 9785436
    Abstract: An apparatus and method are described for performing efficient gather operations in a pipelined processor. For example, a processor according to one embodiment of the invention comprises: gather setup logic to execute one or more gather setup operations in anticipation of one or more gather operations, the gather setup operations to determine one or more addresses of vector data elements to be gathered by the gather operations; and gather logic to execute the one or more gather operations to gather the vector data elements using the one or more addresses determined by the gather setup operations.
    Type: Grant
    Filed: September 28, 2012
    Date of Patent: October 10, 2017
    Assignee: INTEL CORPORATION
    Inventors: Edward T. Grochowski, Dennis R. Bradford, George Z. Chrysos, Andrew T. Forsyth, Michael D. Upton, Lisa K. Wu
  • Publication number: 20170109164
    Abstract: A method includes receiving a packed data instruction indicating a first narrower source packed data operand and a narrower destination operand. The instruction is mapped to a masked packed data operation indicating a first wider source packed data operand that is wider than and includes the first narrower source operand, and indicating a wider destination operand that is wider than and includes the narrower destination operand. A packed data operation mask is generated that includes a mask element for each corresponding result data element of a packed data result to be stored by the masked packed data operation. All mask elements that correspond to result data elements to be stored by the masked operation that would not be stored by the packed data instruction are masking out. The masked operation is performed using the packed data operation mask. The packed data result is stored in the wider destination operand.
    Type: Application
    Filed: October 24, 2016
    Publication date: April 20, 2017
    Applicant: lntel Corporation
    Inventors: Edward T. Grochowski, Seyed Yahya Sotoudeh, Buford M. Guy
  • Publication number: 20170090924
    Abstract: A processor includes a decode unit to decode an instruction that is to indicate a first source packed data operand that is to include at least four data elements, to indicate a second source packed data operand that is to include at least four data elements, and to indicate one or more destination storage locations. The execution unit, in response to the instruction, is to store at least one result mask operand in the destination storage location(s). The at least one result mask operand is to include a different mask element for each corresponding data element in one of the first and second source packed data operands in a same relative position. Each mask element is to indicate whether the corresponding data element in said one of the source packed data operands equals any of the data elements in the other of the source packed data operands.
    Type: Application
    Filed: September 26, 2015
    Publication date: March 30, 2017
    Applicant: Intel Corporation
    Inventors: Asit K. Mishra, Edward T. Grochowski, Jonathan D. Pearce, Deborah T. Marr, Ehud Cohen, Elmoustapha OuId-Ahmed-Vall, Jesus Corbal San Adrian, Robert Valentine, Mark J. Charney, Christopher J. Hughes, Milind B. Girkar