Patents by Inventor Edward T. Grochowski

Edward T. Grochowski has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

DATA ELEMENT COMPARISON PROCESSORS, METHODS, SYSTEMS, AND INSTRUCTIONS

Publication number: 20200089494

Abstract: A processor includes a decode unit to decode an instruction that is to indicate a first source packed data operand that is to include at least four data elements, to indicate a second source packed data operand that is to include at least four data elements, and to indicate one or more destination storage locations. The execution unit, in response to the instruction, is to store at least one result mask operand in the destination storage location(s). The at least one result mask operand is to include a different mask element for each corresponding data element in one of the first and second source packed data operands in a same relative position. Each mask element is to indicate whether the corresponding data element in said one of the source packed data operands equals any of the data elements in the other of the source packed data operands.

Type: Application

Filed: September 23, 2019

Publication date: March 19, 2020

Inventors: Asit K. MISHRA, Edward T. GROCHOWSKI, Jonathan D. PEARCE, Deborah T. MARR, Ehud COHEN, Elmoustapha OULD-AHMED-VALL, Jesus Corbal SAN ADRIAN, Robert VALENTINE, Mark J. CHARNEY, Christopher J. HUGHES, Milind B. GIRKAR
Instructions for manipulating a multi-bit predicate register for predicating instruction sequences

Patent number: 10579378

Abstract: An apparatus and method are described for executing instructions using a predicate register. For example, one embodiment of a processor comprises: a register set including a predicate register to store a set of predicate condition bits, the predicate condition bits specifying whether results of a particular predicated instruction sequence are to be retained or discarded; and predicate execution logic to execute a first predicate instruction to indicate a start of a new predicated instruction sequence by copying a condition value from a processor control register in the register set to the predicate register. In a further embodiment, the predicate condition bits in the predicate register are to be shifted in response to the first predicate instruction to free space within the predicate register for the new condition value associated with the new predicated instruction sequence.

Type: Grant

Filed: March 27, 2014

Date of Patent: March 3, 2020

Assignee: Intel Corporation

Inventors: Edward T. Grochowski, Victor W. Lee, Sergey A. Rozhkov, Boris A. Babayan
Memory-to-memory instructions to accelerate sparse-matrix by dense-vector and sparse-vector by dense-vector multiplication

Patent number: 10489063

Abstract: First elements of a dense vector to be multiplied with first elements of a first row of a sparse array may be determined. The determined first elements of the dense vector may be written into a memory. A dot product for the first elements of the sparse array and the first elements of the dense vector may be calculated in a plurality of increments by multiplying a subset of the first elements of the sparse array and a corresponding subset of the first elements of the dense vector. A sequence number may be updated after each increment is completed to identify a column number and/or a row number of the sparse array for which the dot product calculations have been completed.

Type: Grant

Filed: December 19, 2016

Date of Patent: November 26, 2019

Assignee: Intel Corporation

Inventors: Asit K. Mishra, Deborah T. Marr, Edward T. Grochowski
SYSTEMS, METHODS, AND APPARATUSES FOR HETEROGENEOUS COMPUTING

Publication number: 20190347125

Abstract: Embodiments of systems, methods, and apparatuses for heterogeneous computing are described. In some embodiments, a hardware heterogeneous scheduler dispatches instructions for execution on one or more plurality of heterogeneous processing elements, the instructions corresponding to a code fragment to be processed by the one or more of the plurality of heterogeneous processing elements, wherein the instructions are native instructions to at least one of the one or more of the plurality of heterogeneous processing elements.

Type: Application

Filed: December 31, 2016

Publication date: November 14, 2019

Inventors: Rajesh M. SANKARAN, Gilbert NEIGER, Narayan RANGANATHAN, Stephen R. VAN DOREN, Joseph NUZMAN, Niall D. MCDONNELL, Michael A. O'HANLON, Lokpraveen B. MOSUR, Tracy Garrett DRYSDALE, Eriko NURVITADHI, Asit K. MISHRA, Ganesh VENKATESH, Deborah T. MARR, Nicholas P. CARTER, Jonathan D. PEARCE, Edward T. GROCHOWSKI, Richard J. GRECO, Robert VALENTINE, Jesus CORBAL, Thomas D. FLETCHER, Dennis R. BRADFORD, Dwight P. MANLEY, Mark J. CHARNEY, Jeffrey J. COOK, Paul CAPRIOLI, Koichi YAMADA, Kent D. GLOSSOP, David B. SHEFFIELD
Mechanism for instruction set based thread execution on a plurality of instruction sequencers

Patent number: 10452403

Abstract: In an embodiment, a method is provided. The method includes managing user-level threads on a first instruction sequencer in response to executing user-level instructions on a second instruction sequencer that is under control of an application level program. A first user-level thread is run on the second instruction sequencer and contains one or more user level instructions. A first user level instruction has at least 1) a field that makes reference to one or more instruction sequencers or 2) implicitly references with a pointer to code that specifically addresses one or more instruction sequencers when the code is executed.

Type: Grant

Filed: September 26, 2015

Date of Patent: October 22, 2019

Assignee: Intel Corporation

Inventors: Hong Wang, John P. Shen, Edward T. Grochowski, Richard A. Hankins, Gautham N. Chinya, Bryant E. Bigbee, Shivnandan D. Kaushik, Xiang Chris Zou, Per Hammarlund, Scott Dion Rodgers, Xinmin Tian, Anil Aggawal, Prashant Sethi, Baiju V. Patel, James P Held
Data element comparison processors, methods, systems, and instructions

Patent number: 10423411

Abstract: A processor includes a decode unit to decode an instruction that is to indicate a first source packed data operand that is to include at least four data elements, to indicate a second source packed data operand that is to include at least four data elements, and to indicate one or more destination storage locations. The execution unit, in response to the instruction, is to store at least one result mask operand in the destination storage location(s). The at least one result mask operand is to include a different mask element for each corresponding data element in one of the first and second source packed data operands in a same relative position. Each mask element is to indicate whether the corresponding data element in said one of the source packed data operands equals any of the data elements in the other of the source packed data operands.

Type: Grant

Filed: September 26, 2015

Date of Patent: September 24, 2019

Assignee: Intel Corporation

Inventors: Asit K. Mishra, Edward T. Grochowski, Jonathan D. Pearce, Deborah T. Marr, Ehud Cohen, Elmoustapha Ould-Ahmed-Vall, Jesus Corbal San Adrian, Robert Valentine, Mark J. Charney, Christopher J. Hughes, Milind B. Girkar
Performing Power Management In A Multicore Processor

Publication number: 20190265777

Abstract: In an embodiment, a processor includes: a plurality of first cores to independently execute instructions, each of the plurality of first cores including a plurality of counters to store performance information; at least one second core to perform memory operations; and a power controller to receive performance information from at least some of the plurality of counters, determine a workload type executed on the processor based at least in part on the performance information, and based on the workload type dynamically migrate one or more threads from one or more of the plurality of first cores to the at least one second core for execution during a next operation interval. Other embodiments are described and claimed.

Type: Application

Filed: February 28, 2019

Publication date: August 29, 2019

Inventors: Victor W. Lee, Edward T. Grochowski, Daehyun Kim, Yuxin Bai, Sheng Li, Naveen K. Mellempudi, Dhiraj D. Kalamkar
INTERRUPTIBLE AND RESTARTABLE MATRIX MULTIPLICATION INSTRUCTIONS, PROCESSORS, METHODS, AND SYSTEMS

Publication number: 20190258481

Abstract: A processor of an aspect includes a decode unit to decode a matrix multiplication instruction. The matrix multiplication instruction is to indicate a first memory location of a first source matrix, is to indicate a second memory location of a second source matrix, and is to indicate a third memory location where a result matrix is to be stored. The processor also includes an execution unit coupled with the decode unit. The execution unit, in response to the matrix multiplication instruction, is to multiply a portion of the first and second source matrices prior to an interruption, and store a completion progress indicator in response to the interruption. The completion progress indicator to indicate an amount of progress in multiplying the first and second source matrices, and storing corresponding result data to the third memory location, that is to have been completed prior to the interruption.

Type: Application

Filed: April 29, 2019

Publication date: August 22, 2019

Inventors: Edward T. GROCHOWSKI, Asit K. MISHRA, Robert VALENTINE, Mark J. CHARNEY, Simon C. STEELY, JR.
Zeroing a cache line

Patent number: 10282296

Abstract: Embodiments of an invention a processor architecture are disclosed. In an embodiment, a processor includes a decoder, an execution unit, a coherent cache, and an interconnect. The decoder is to decode an instruction to zero a cache line. The execution unit is to issue a write command to initiate a cache line sized write of zeros. The coherent cache is to receive the write command, to determine whether there is a hit in the coherent cache and whether a cache coherency protocol state of the hit cache line is a modified state or an exclusive state, to configure a cache line to indicate all zeros, and to issue the write command toward the interconnect. The interconnect is to, responsive to receipt of the write command, issue a snoop to each of a plurality of other coherent caches for which it must be determined if there is a hit.

Type: Grant

Filed: December 12, 2016

Date of Patent: May 7, 2019

Assignee: Intel Corporation

Inventors: Jason W. Brandt, Robert S. Chappell, Jesus Corbal, Edward T. Grochowski, Stephen H. Gunther, Buford M. Guy, Thomas R. Huff, Elmoustapha Ould-Ahmed-Vall, Bret L. Toll, David Papworth, James D. Allen
Interruptible and restartable matrix multiplication instructions, processors, methods, and systems

Patent number: 10275243

Abstract: A processor of an aspect includes a decode unit to decode a matrix multiplication instruction. The matrix multiplication instruction is to indicate a first memory location of a first source matrix, is to indicate a second memory location of a second source matrix, and is to indicate a third memory location where a result matrix is to be stored. The processor also includes an execution unit coupled with the decode unit. The execution unit, in response to the matrix multiplication instruction, is to multiply a portion of the first and second source matrices prior to an interruption, and store a completion progress indicator in response to the interruption. The completion progress indicator to indicate an amount of progress in multiplying the first and second source matrices, and storing corresponding result data to the third memory location, that is to have been completed prior to the interruption.

Type: Grant

Filed: July 2, 2016

Date of Patent: April 30, 2019

Assignee: Intel Corporation

Inventors: Edward T. Grochowski, Asit K. Mishra, Robert Valentine, Mark J. Charney, Simon C. Steely, Jr.
SYSTEMS, APPARATUSES, AND METHODS FOR CHAINED FUSED MULTIPLY ADD

Publication number: 20190121637

Abstract: Embodiments of systems, apparatuses, and methods for chained fused multiply add. In some embodiments, an apparatus includes a decoder to decode a single instruction having an opcode, a destination field representing a destination operand, a first source field representing a plurality of packed data source operands of a first type that have packed data elements of a first size, a second source field representing a plurality of packed data source operands that have packed data elements of a second size, and a field for a memory location that stores a scalar value. A register file having a plurality of packed data registers includes registers for the plurality of packed data source operands that have packed data elements of a first size, the source operands that have packed data elements of a second size, and the destination operand.

Type: Application

Filed: October 24, 2018

Publication date: April 25, 2019

Inventors: JESUS CORBAL, ROBERT VALENTINE, ROMAN S. DUBTSOV, NIKITA A. SHUSTROV, MARK J. CHARNEY, DENNIS R. BRADFORD, MILIND B. GIRKAR, EDWARD T. GROCHOWSKI, THOMAS D. FLETCHER, WARREN E. FERGUSON
Memory sequencing with coherent and non-coherent sub-systems

Patent number: 10261904

Abstract: Operations associated with a memory and operations associated with one or more functional units may be received. A dependency between the operations associated with the memory and the operations associated with one or more of the functional units may be determined. A first ordering may be created for the operations associated with the memory. Furthermore, a second ordering may be created for the operations associated with one or more of the functional units based on the determined dependency and the first operating of the operations associated with the memory.

Type: Grant

Filed: December 7, 2017

Date of Patent: April 16, 2019

Assignee: Intel Corporation

Inventors: Chunhui Zhang, George Z. Chrysos, Edward T. Grochowski, Ramacharan Sundararaman, Chung-Lun Chan, Federico Ardanaz
Performing power management in a multicore processor

Patent number: 10234930

Abstract: In an embodiment, a processor includes: a plurality of first cores to independently execute instructions, each of the plurality of first cores including a plurality of counters to store performance information; at least one second core to perform memory operations; and a power controller to receive performance information from at least some of the plurality of counters, determine a workload type executed on the processor based at least in part on the performance information, and based on the workload type dynamically migrate one or more threads from one or more of the plurality of first cores to the at least one second core for execution during a next operation interval. Other embodiments are described and claimed.

Type: Grant

Filed: February 13, 2015

Date of Patent: March 19, 2019

Assignee: Intel Corporation

Inventors: Victor W. Lee, Edward T. Grochowski, Daehyun Kim, Yuxin Bai, Sheng Li, Naveen K. Mellempudi, Dhiraj D. Kalamkar
APPARATUSES AND METHODS FOR A PROCESSOR ARCHITECTURE

Publication number: 20190012266

Abstract: Embodiments of an invention a processor architecture are disclosed. In an embodiment, a processor includes a decoder, an execution unit, a coherent cache, and an interconnect. The decoder is to decode an instruction to zero a cache line. The execution unit is to issue a write command to initiate a cache line sized write of zeros. The coherent cache is to receive the write command, to determine whether there is a hit in the coherent cache and whether a cache coherency protocol state of the hit cache line is a modified state or an exclusive state, to configure a cache line to indicate all zeros, and to issue the write command toward the interconnect. The interconnect is to, responsive to receipt of the write command, issue a snoop to each of a plurality of other coherent caches for which it must be determined if there is a hit.

Type: Application

Filed: August 28, 2018

Publication date: January 10, 2019

Inventors: Jason W. Brandt, Robert S. Chappell, Jesus Corbal, Edward T. Grochowski, Stephen H. Gunther, Buford M. Guy, Thomas R. Huff, Christopher J. Hughes, Elmoustapha Ould-Ahmed-Vall, Ronak Singhal, Seyed Yahya Sotoudeh, Bret L. Toll, Lihu Rappoport, David Papworth, James D. Allen
Systems, apparatuses, and methods for chained fused multiply add

Patent number: 10146535

Abstract: Embodiments of systems, apparatuses, and methods for chained fused multiply add. In some embodiments, an apparatus includes a decoder to decode a single instruction having an opcode, a destination field representing a destination operand, a first source field representing a plurality of packed data source operands of a first type that have packed data elements of a first size, a second source field representing a plurality of packed data source operands that have packed data elements of a second size, and a field for a memory location that stores a scalar value. A register file having a plurality of packed data registers includes registers for the plurality of packed data source operands that have packed data elements of a first size, the source operands that have packed data elements of a second size, and the destination operand.

Type: Grant

Filed: October 20, 2016

Date of Patent: December 4, 2018

Assignee: Intel Corporatoin

Inventors: Jesus Corbal, Robert Valentine, Roman S. Dubtsov, Nikita A. Shustrov, Mark J. Charney, Dennis R. Bradford, Milind B. Girkar, Edward T. Grochowski, Thomas D. Fletcher, Warren E. Ferguson
MEMORY SEQUENCING WITH COHERENT AND NON-COHERENT SUB-SYSTEMS

Publication number: 20180173628

Abstract: Operations associated with a memory and operations associated with one or more functional units may be received. A dependency between the operations associated with the memory and the operations associated with one or more of the functional units may be determined. A first ordering may be created for the operations associated with the memory. Furthermore, a second ordering may be created for the operations associated with one or more of the functional units based on the determined dependency and the first operating of the operations associated with the memory.

Type: Application

Filed: December 7, 2017

Publication date: June 21, 2018

Inventors: CHUNHUI ZHANG, GEORGE Z. CHRYSOS, EDWARD T. GROCHOWSKI, RAMACHARAN SUNDARARAMAN, CHUNG-LUN CHAN, FEDERICO ARDANAZ
MEMORY-TO-MEMORY INSTRUCTIONS TO ACCELERATE SPARSE-MATRIX BY DENSE-VECTOR AND SPARSE-VECTOR BY DENSE-VECTOR MULTIPLICATION

Publication number: 20180173437

Abstract: First elements of a dense vector to be multiplied with first elements of a first row of a sparse array may be determined. The determined first elements of the dense vector may be written into a memory. A dot product for the first elements of the sparse array and the first elements of the dense vector may be calculated in a plurality of increments by multiplying a subset of the first elements of the sparse array and a corresponding subset of the first elements of the dense vector. A sequence number may be updated after each increment is completed to identify a column number and/or a row number of the sparse array for which the dot product calculations have been completed.

Type: Application

Filed: December 19, 2016

Publication date: June 21, 2018

Inventors: Asit K. Mishra, Deborah T. Marr, Edward T. Grochowski
APPARATUSES AND METHODS FOR A PROCESSOR ARCHITECTURE

Publication number: 20180165199

Abstract: Embodiments of an invention a processor architecture are disclosed. In an embodiment, a processor includes a decoder, an execution unit, a coherent cache, and an interconnect. The decoder is to decode an instruction to zero a cache line. The execution unit is to issue a write command to initiate a cache line sized write of zeros. The coherent cache is to receive the write command, to determine whether there is a hit in the coherent cache and whether a cache coherency protocol state of the hit cache line is a modified state or an exclusive state, to configure a cache line to indicate all zeros, and to issue the write command toward the interconnect. The interconnect is to, responsive to receipt of the write command, issue a snoop to each of a plurality of other coherent caches for which it must be determined if there is a hit.

Type: Application

Filed: December 12, 2016

Publication date: June 14, 2018

Inventors: Jason W. Brandt, Robert S. Chappell, Jesus Corbal, Edward T. Grochowski, Stephen H. Gunther, Buford M. Guy, Thomas R. Huff, Christopher J. Hughes, Elmoustapha Ould-Ahmed-Vall, Ronak Singhal, Seyed Yahya Sotoudeh, Bret L. Toll, Lihu Rappoport, David Papworth, James D. Allen
Micro-operation generator for deriving a plurality of single-destination micro-operations from a given predicated instruction

Patent number: 9977674

Abstract: Disclosed are an apparatus, system, and method for implementing predicated instructions using micro-operations. A micro-code engine receives an instruction, decomposes the instruction, and generates a plurality of micro-operations to implement the instruction. Each of the decomposed micro-operations indicates a single destination register. For predicated instructions, the decomposed micro-operations include “conditional move” micro-operations to select between two potential output values. Except in the case that one of the potential output values is a constant, the decomposed micro-operations for a predicated instruction also include an append instruction that saves the incoming value of a destination register in a temporary variable. For at least one embodiment, the qualifying predicate for a predicated instruction is appended to the incoming value stored in the temporary register.

Type: Grant

Filed: October 14, 2003

Date of Patent: May 22, 2018

Assignee: Intel Corporation

Inventors: Jeffrey P. Rupley, II, Edward A. Brekelbaum, Edward T. Grochowski, Bryan P. Black
SYSTEMS, APPARATUSES, AND METHODS FOR CHAINED FUSED MULTIPLY ADD

Publication number: 20180113708

Abstract: Embodiments of systems, apparatuses, and methods for chained fused multiply add. In some embodiments, an apparatus includes a decoder to decode a single instruction having an opcode, a destination field representing a destination operand, a first source field representing a plurality of packed data source operands of a first type that have packed data elements of a first size, a second source field representing a plurality of packed data source operands that have packed data elements of a second size, and a field for a memory location that stores a scalar value. A register file having a plurality of packed data registers includes registers for the plurality of packed data source operands that have packed data elements of a first size, the source operands that have packed data elements of a second size, and the destination operand.

Type: Application

Filed: October 20, 2016

Publication date: April 26, 2018

Inventors: JESUS CORBAL, ROBERT VALENTINE, ROMAN S. DUBTSOV, NIKITA A. SHUSTROV, MARK J. CHARNEY, DENNIS R. BRADFORD, MILIND B. GIRKAR, EDWARD T. GROCHOWSKI, THOMAS D. FLETCHER, WARREN E. FERGUSON

prev 1 2 3 4 5 6 … next