Patents by Inventor Edward T. Grochowski

Edward T. Grochowski has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 9477467
    Abstract: A method includes receiving a packed data instruction indicating a first narrower source packed data operand and a narrower destination operand. The instruction is mapped to a masked packed data operation indicating a first wider source packed data operand that is wider than and includes the first narrower source operand, and indicating a wider destination operand that is wider than and includes the narrower destination operand. A packed data operation mask is generated that includes a mask element for each corresponding result data element of a packed data result to be stored by the masked packed data operation. All mask elements that correspond to result data elements to be stored by the masked operation that would not be stored by the packed data instruction are masking out. The masked operation is performed using the packed data operation mask. The packed data result is stored in the wider destination operand.
    Type: Grant
    Filed: March 30, 2013
    Date of Patent: October 25, 2016
    Assignee: Intel Corporation
    Inventors: Edward T. Grochowski, Seyed Yahya Sotoudeh, Buford M. Guy
  • Patent number: 9465670
    Abstract: Disclosed herein is a generational thread scheduler. One embodiment may be used with processor multithreading logic to execute threads of executable instructions, and a shared resource to be allocated fairly among the threads of executable instructions contending for access to the shared resource. Generational thread scheduling logic may allocate the shared resource efficiently and fairly by granting a first requesting thread access to the shared resource allocating a reservation for the shared resource to each other requesting thread of the executing threads and then blocking the first thread from re-requesting the shared resource until every other thread that has been allocated a reservation, has been granted access to the shared resource. Generation tracking state may be cleared when each requesting thread of the generation that was allocated a reservation has had their request satisfied.
    Type: Grant
    Filed: December 16, 2011
    Date of Patent: October 11, 2016
    Assignee: Intel Corporation
    Inventors: Edward T. Grochowski, Michael D. Upton, George Z. Chrysos, Chunhui C. Zhang, Mohammed L. Al-Aqrabawi
  • Publication number: 20160275043
    Abstract: In one embodiment, a heterogeneous multicore processor is described that is optimized to execute multi-stage computer vision algorithms such as cascade classifier workloads. In such embodiment the heterogeneous processor includes at least one SIMD core, such as a vector processor core, coupled with one or more scalar cores. In one embodiment the heterogeneous multiprocessor executes multi-stage compute operations, where the SIMD core computes a first set of stages and the one or more scalar cores compute the second set of stages. In one embodiment, a process for designing a heterogeneous multicore processor is disclosed which optimizes the ratio of scalar to SIMD cores based on execution time of the multi-stage compute operation in relation to processor die area consumed by a processor configuration having the ratio.
    Type: Application
    Filed: March 18, 2015
    Publication date: September 22, 2016
    Inventors: Edward T. Grochowski, Michael E. Kounavis, Ron Shalev
  • Patent number: 9442721
    Abstract: A method and system to provide user-level multithreading are disclosed. The method according to the present techniques comprises receiving programming instructions to execute one or more shared resource threads (shreds) via an instruction set architecture (ISA). One or more instruction pointers are configured via the ISA; and the one or more shreds are executed simultaneously with a microprocessor, wherein the microprocessor includes multiple instruction sequencers.
    Type: Grant
    Filed: December 20, 2012
    Date of Patent: September 13, 2016
    Assignee: Intel Corporation
    Inventors: Edward T. Grochowski, Hong Wang, John P. Shen, Perry H. Wang, Jamison D. Colins, James P. Held, Partha Kundu, Raya Leviathan, Tin-Fook Ngai
  • Publication number: 20160239074
    Abstract: In an embodiment, a processor includes: a plurality of first cores to independently execute instructions, each of the plurality of first cores including a plurality of counters to store performance information; at least one second core to perform memory operations; and a power controller to receive performance information from at least some of the plurality of counters, determine a workload type executed on the processor based at least in part on the performance information, and based on the workload type dynamically migrate one or more threads from one or more of the plurality of first cores to the at least one second core for execution during a next operation interval. Other embodiments are described and claimed.
    Type: Application
    Filed: February 13, 2015
    Publication date: August 18, 2016
    Inventors: VICTOR W. LEE, EDWARD T. GROCHOWSKI, DAEHYUN KIM, YUXIN BAI, SHENG LI, NAVEEN K. MELLEMPUDI, DHIRAJ D. KALAMKAR
  • Publication number: 20160055004
    Abstract: An apparatus and method are described for non-speculative execution of conditional instructions. For example, one embodiment of a processor comprises: a register set including a first register to store a set of one or more condition bits; non-speculative execution logic to execute a first instruction to identify a first target instruction strand in response to a first conditional value read from the set of condition bits, the first instruction to wait until the first conditional value becomes known before causing the first target instruction strand to be fetched and executed, the non-speculative execution logic to execute a second instruction to identify an end of the first target instruction strand and responsively identify a new current instruction pointer for instructions which follow the second instruction; and out-of-order execution logic to fetch and execute the instructions which follow the second instruction prior to the execution of the second instruction.
    Type: Application
    Filed: August 21, 2014
    Publication date: February 25, 2016
    Inventors: EDWARD T. GROCHOWSKI, MILIND B. GIRKAR, VICTOR W. LEE, DMITRY M. MASLENNIKOV, ROBERT VALENTINE, SERGEY A. ROZHKOV, BORIS A. BABAYAN
  • Publication number: 20160019067
    Abstract: In an embodiment, a method is provided. The method includes managing user-level threads on a first instruction sequencer in response to executing user-level instructions on a second instruction sequencer that is under control of an application level program. A first user-level thread is run on the second instruction sequencer and contains one or more user level instructions. A first user level instruction has at least 1) a field that makes reference to one or more instruction sequencers or 2) implicitly references with a pointer to code that specifically addresses one or more instruction sequencers when the code is executed.
    Type: Application
    Filed: September 26, 2015
    Publication date: January 21, 2016
    Inventors: Hong Wang, John P. Shen, Edward T. Grochowski, Richard A. Hankins, Gautham N. Chinya, Bryant E. Bigbee, Shivnandan D. Kaushik, Xiang Chris Zou, Per Hammarlund, Scott Dion Rodgers, Xinmin Tian, Anil Aggawal, Prashant Sethi, Baiju V. Patel, James P Held
  • Publication number: 20160011977
    Abstract: Operations associated with a memory and operations associated with one or more functional units may be received. A dependency between the operations associated with the memory and the operations associated with one or more of the functional units may be determined. A first ordering may be created for the operations associated with the memory. Furthermore, a second ordering may be created for the operations associated with one or more of the functional units based on the determined dependency and the first operating of the operations associated with the memory.
    Type: Application
    Filed: July 9, 2014
    Publication date: January 14, 2016
    Inventors: CHUNHUI ZHANG, GEORGE Z. CHRYSOS, EDWARD T. GROCHOWSKI, RAMACHARAN SUNDARARAMAN, CHUNG-LUN CHAN, FEDERICO ARDANAZ
  • Patent number: 9189230
    Abstract: A method and system to provide user-level multithreading are disclosed. The method according to the present techniques comprises receiving programming instructions to execute one or more shared resource threads (shreds) via an instruction set architecture (ISA). One or more instruction pointers are configured via the ISA; and the one or more shreds are executed simultaneously with a microprocessor, wherein the microprocessor includes multiple instruction sequencers.
    Type: Grant
    Filed: March 31, 2004
    Date of Patent: November 17, 2015
    Assignee: Intel Corporation
    Inventors: Edward T. Grochowski, Hong Wang, John P. Shen, Perry H. Wang, Jamison D. Collins, James P. Held, Partha Kundu, Raya Leviathan, Tin-Fook Ngai
  • Patent number: 9170955
    Abstract: In an embodiment, a processor includes a decode logic to receive and decode a first memory access instruction to store data in a cache memory with a replacement state indicator of a first level, and to send the decoded first memory access instruction to a control logic. In turn, the control logic is to store the data in a first way of a first set of the cache memory and to store the replacement state indicator of the first level in a metadata field of the first way responsive to the decoded first memory access instruction. Other embodiments are described and claimed.
    Type: Grant
    Filed: November 27, 2012
    Date of Patent: October 27, 2015
    Assignee: Intel Corporation
    Inventors: Andrew T. Forsyth, Ramacharan Sundararaman, Eric Sprangle, John C. Mejia, Douglas M. Carmean, Edward T. Grochowski, Robert D. Cavin
  • Publication number: 20150277910
    Abstract: An apparatus and method are described for executing instructions using a predicate register. For example, one embodiment of a processor comprises: a register set including a predicate register to store a set of predicate condition bits, the predicate condition bits specifying whether results of a particular predicated instruction sequence are to be retained or discarded; and predicate execution logic to execute a first predicate instruction to indicate a start of a new predicated instruction sequence by copying a condition value from a processor control register in the register set to the predicate register. In a further embodiment, the predicate condition bits in the predicate register are to be shifted in response to the first predicate instruction to free space within the predicate register for the new condition value associated with the new predicated instruction sequence.
    Type: Application
    Filed: March 27, 2014
    Publication date: October 1, 2015
    Inventors: EDWARD T. GROCHOWSKI, VICTOR W. LEE, SERGEY A. ROZHKOV, BORIS A. BABAYAN
  • Publication number: 20150052333
    Abstract: Embodiments of systems, apparatuses, and methods for performing gather and scatter stride instruction in a computer processor are described. In some embodiments, the execution of a gather stride instruction causes a conditionally storage of strided data elements from memory into the destination register according to at least some of bit values of a writemask.
    Type: Application
    Filed: July 25, 2014
    Publication date: February 19, 2015
    Inventors: Christopher J. HUGHES, Jesus Corbal SAN ADRIAN, Roger Espasa SANS, Bret TOLL, Robert C. VALENTINE, Milind B. GIRKAR, Andrew T. FORSYTH, Edward T. GROCHOWSKI, Jonathan C. HALL
  • Publication number: 20140297994
    Abstract: A method includes receiving a packed data instruction indicating a first narrower source packed data operand and a narrower destination operand. The instruction is mapped to a masked packed data operation indicating a first wider source packed data operand that is wider than and includes the first narrower source operand, and indicating a wider destination operand that is wider than and includes the narrower destination operand. A packed data operation mask is generated that includes a mask element for each corresponding result data element of a packed data result to be stored by the masked packed data operation. All mask elements that correspond to result data elements to be stored by the masked operation that would not be stored by the packed data instruction are masking out. The masked operation is performed using the packed data operation mask. The packed data result is stored in the wider destination operand.
    Type: Application
    Filed: March 30, 2013
    Publication date: October 2, 2014
    Inventors: Edward T. Grochowski, Seyed Yahya Sotoudeh, Buford M. Guy
  • Publication number: 20140149651
    Abstract: In an embodiment, a processor includes a decode logic to receive and decode a first memory access instruction to store data in a cache memory with a replacement state indicator of a first level, and to send the decoded first memory access instruction to a control logic. In turn, the control logic is to store the data in a first way of a first set of the cache memory and to store the replacement state indicator of the first level in a metadata field of the first way responsive to the decoded first memory access instruction. Other embodiments are described and claimed.
    Type: Application
    Filed: November 27, 2012
    Publication date: May 29, 2014
    Inventors: Andrew T. Forsyth, Ramacharan Sundararaman, Eric Sprangle, John C. Mejia, Douglas M. Carmean, Mark C. Davis, Edward T. Grochowski, Robert D. Cavin
  • Publication number: 20140095831
    Abstract: An apparatus and method are described for performing efficient gather operations in a pipelined processor. For example, a processor according to one embodiment of the invention comprises: gather setup logic to execute one or more gather setup operations in anticipation of one or more gather operations, the gather setup operations to determine one or more addresses of vector data elements to be gathered by the gather operations; and gather logic to execute the one or more gather operations to gather the vector data elements using the one or more addresses determined by the gather setup operations.
    Type: Application
    Filed: September 28, 2012
    Publication date: April 3, 2014
    Inventors: Edward T. Grochowski, Dennis R. Bradford, George Z. Chrysos, Andrew T. Forsyth, Michael D. Upton, Lisa K. Wu
  • Patent number: 8533436
    Abstract: In one embodiment, a method includes receiving an instruction for decoding in a processor core and dynamically handling the instruction with one of multiple behaviors based on whether contention is predicted. If no contention is predicted, the instruction is executed in the core, and if contention is predicted data associated with the instruction is marshaled and sent to a selected remote agent for execution. Other embodiments are described and claimed.
    Type: Grant
    Filed: June 26, 2009
    Date of Patent: September 10, 2013
    Assignee: Intel Corporation
    Inventors: Joshua B. Fryman, Edward T. Grochowski, Toni Juan, Andrew Thomas Forsyth, John Mejia, Ramacharan Sundararaman, Eric Sprangle, Roger Espasa, Ravi Rajwar
  • Publication number: 20130160020
    Abstract: Disclosed herein is a generational thread scheduler. One embodiment may be used with processor multithreading logic to execute threads of executable instructions, and a shared resource to be allocated fairly among the threads of executable instructions contending for access to the shared resource. Generational thread scheduling logic may allocate the shared resource efficiently and fairly by granting a first requesting thread access to the shared resource allocating a reservation for the shared resource to each other requesting thread of the executing threads and then blocking the first thread from re-requesting the shared resource until every other thread that has been allocated a reservation, has been granted access to the shared resource. Generation tracking state may be cleared when each requesting thread of the generation that was allocated a reservation has had their request satisfied.
    Type: Application
    Filed: December 16, 2011
    Publication date: June 20, 2013
    Inventors: Edward T. Grochowski, Michael D. Upton, George Z. Chrysos, Chunhui C. Zhang, Mohammed L. Al-Aqrabawi
  • Publication number: 20100332801
    Abstract: In one embodiment, a method includes receiving an instruction for decoding in a processor core and dynamically handling the instruction with one of multiple behaviors based on whether contention is predicted. If no contention is predicted, the instruction is executed in the core, and if contention is predicted data associated with the instruction is marshaled and sent to a selected remote agent for execution. Other embodiments are described and claimed.
    Type: Application
    Filed: June 26, 2009
    Publication date: December 30, 2010
    Inventors: Joshua B. Fryman, Edward T. Grochowski, Toni Juan, Andrew Thomas Forsyth, John Mejia, Ramacharan Sundararaman, Eric Sprangle, Roger Espasa, Ravi Rajwar
  • Patent number: 7742910
    Abstract: A system for delivering power to a device in a specified voltage range is disclosed. The system includes a power delivery network, characterized by a response function, to deliver power to the device. A current computation unit stores values representing a sequence of current amplitudes drawn by the device on successive clock cycles, and provides them to a current to voltage computation unit. The current to voltage computation unit filters the current amplitudes according to coefficients derived from the response function to provide an estimate of the voltage seen by the device. Operation of the device is adjusted if the estimated voltage falls outside the specified range.
    Type: Grant
    Filed: May 21, 2007
    Date of Patent: June 22, 2010
    Assignee: Intel Corporation
    Inventors: Edward T. Grochowski, David Sager, Vivek Tiwari, Ian Young, David J. Ayers
  • Patent number: 7516312
    Abstract: An instruction prefetch apparatus includes a branch target buffer (BTB), a presbyopic target buffer (PTB) and a prefetch stream buffer (PSB). The BTB includes records that map branch addresses to branch target addresses, and the PTB includes records that map branch target addresses to subsequent branch target addresses. When a branch instruction is encountered, the BTB can predict the dynamically adjacent subsequent block entry location as the branch target address in the record that also includes the branch instruction address. The PTB can predict multiple subsequent blocks by mapping the branch target address to subsequent dynamic blocks. The PSB holds instructions prefetched from subsequent blocks predicted by the PTB.
    Type: Grant
    Filed: April 2, 2004
    Date of Patent: April 7, 2009
    Assignee: Intel Corporation
    Inventors: Hong Wang, Ralph Kling, Edward T. Grochowski, Kalpana Ramakrishnan