Patents Examined by George Giroux

Precise-restartable parallel execution of programs

Patent number: 10185569

Abstract: Interrupt handling on a multiprocessor computer executing multiple computational operations in parallel is provided by establishing a total ordering of the multiple computational operations and defining an architectural state at the time of the interrupt as if the computational operations executed in the total ordering.

Type: Grant

Filed: February 13, 2013

Date of Patent: January 22, 2019

Assignee: Wisconsin Alumni Research Foundation

Inventors: Gagan Gupta, Gurindar S. Sohi
Mechanism for using a reservation station as a scratch register

Patent number: 10175985

Abstract: A processor core includes an instruction-sequencing unit (ISU). The ISU includes a general register file (GRF) composed of multiple hardware general purpose registers (GPRs), an exception register (XER), and a reservation station (RS). The execution unit(s) load an address of data in a data GPR, and load a first portion of the data in a first data GPR and a second portion of the data in a second data GPR in the GRF, where loading the portions of the data generate intermediate data condition codes that are loaded in the XER. The execution unit(s) generate a cumulative data condition code, which is loaded into a history buffer within the ISU. The intermediate data condition codes are loaded into a reservation station (RS) within the ISU. Upon flushing the GRF and the XER, the ISU repopulates the GRF from a history buffer and the XER from the RS.

Type: Grant

Filed: March 28, 2016

Date of Patent: January 8, 2019

Assignee: International Business Machines Corporation

Inventors: Sundeep Chadha, Michael J. Genden, Dung Q. Nguyen
Virtualization in a bi-endian-mode processor architecture

Patent number: 10152324

Abstract: Embodiments of methods and computer program products disclosed herein relate to processor architecture. One such method includes the processor obtaining an instruction. The instruction specifies an operation, and also specifies one of the registers as a source register and one of the registers as a destination register. The method also includes the processor obtaining an endian mode and determining that the instruction is an element-ordering-sensitive instruction. Based on the determination that the instruction is an element-ordering-sensitive instruction, the processor executes the instruction by performing the operation on the elements of the source register in accordance with the endian mode and writing a result of the operation to the destination register.

Type: Grant

Filed: September 5, 2014

Date of Patent: December 11, 2018

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Michael K. Gschwind, Brett Olsson
Supporting even instruction tag (‘ITAG’) requirements in a multi-slice processor using null internal operations (IOPs)

Patent number: 10120683

Abstract: Supporting even instruction tag (‘ITAG’) requirements in a multi-slice processor with null internal operations (IOPs) includes: receiving an IOP with an even ITAG requirement; determining that the IOP is to be assigned an odd ITAG; and inserting a null IOP into an instruction lane ahead of the IOP, wherein the null IOP is assigned the odd ITAG, and the IOP is assigned an even ITAG.

Type: Grant

Filed: April 27, 2016

Date of Patent: November 6, 2018

Assignee: International Business Machines Corporation

Inventors: Steven R. Carlough, Kurt A. Feiste, Paul M. Kennedy, Phillip G. Williams
Virtualization in a bi-endian-mode processor architecture

Patent number: 10120682

Abstract: Embodiments of systems disclosed herein relate to processor architecture. One such system implements a method that includes the processor obtaining an instruction. The instruction specifies an operation, and also specifies one of the registers as a source register and one of the registers as a destination register. The method also includes the processor obtaining an endian mode and determining that the instruction is an element-ordering-sensitive instruction. Based on the determination that the instruction is an element-ordering-sensitive instruction, the processor executes the instruction by performing the operation on the elements of the source register in accordance with the endian mode and writing a result of the operation to the destination register.

Type: Grant

Filed: February 28, 2014

Date of Patent: November 6, 2018

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Michael K. Gschwind, Brett Olsson
Last branch record indicators for transactional memory

Patent number: 10073719

Abstract: In one embodiment, a processor includes an execution unit and at least one last branch record (LBR) register to store address information of a branch taken during program execution. This register may further store a transaction indicator to indicate whether the branch was taken during a transactional memory (TM) transaction. This register may further store an abort indicator to indicate whether the branch was caused by a transaction abort. Other embodiments are described and claimed.

Type: Grant

Filed: April 18, 2016

Date of Patent: September 11, 2018

Assignee: Intel Corporation

Inventors: Ravi Rajwar, Peter Lachner, Laura A. Knauth, Konrad K. Lai
Branch prediction with power usage prediction and control

Patent number: 10067556

Abstract: A method maintains power usage prediction information for one or more functional units in branch prediction logic for a processing unit such that the power consumption of a functional unit may be selectively reduced in association with the execution of branch instructions when it is predicted that the functional unit will be idle subsequent to the execution of such branch instructions.

Type: Grant

Filed: August 31, 2015

Date of Patent: September 4, 2018

Assignee: International Business Machines Corporation

Inventors: Mark J. Hickey, Adam J. Muff, Matthew R. Tubbs, Charles D. Wait
Efficient detection and response to spin waits in multi-processor virtual machines

Patent number: 10067782

Abstract: Various aspects are disclosed herein for attenuating spin waiting in a virtual machine environment comprising a plurality of virtual machines and virtual processors. Selected virtual processors can be given time slice extensions in order to prevent such virtual processors from becoming de-scheduled (and hence causing other virtual processors to have to spin wait). Selected virtual processors can also be expressly scheduled so that they can be given higher priority to resources, resulting in reduced spin waits for other virtual processors waiting on such selected virtual processors. Finally, various spin wait detection techniques can be incorporated into the time slice extension and express scheduling mechanisms, in order to identify potential and existing spin waiting scenarios.

Type: Grant

Filed: November 18, 2015

Date of Patent: September 4, 2018

Assignee: Microsoft Technology Licensing, LLC

Inventors: Yau Ning Chin, John Te-Jui Sheu, Arun Kishan, Thomas Fahrig, Rene Antonio Vega
Branch prediction with power usage prediction and control

Patent number: 10042417

Abstract: A circuit arrangement maintains power usage prediction information for one or more functional units in branch prediction logic for a processing unit such that the power consumption of a functional unit may be selectively reduced in association with the execution of branch instructions when it is predicted that the functional unit will be idle subsequent to the execution of such branch instructions.

Type: Grant

Filed: July 5, 2016

Date of Patent: August 7, 2018

Assignee: International Business Machines Corporation

Inventors: Mark J. Hickey, Adam J. Muff, Matthew R. Tubbs, Charles D. Wait
Instruction and logic to provide vector blend and permute functionality

Patent number: 10037205

Abstract: Vector blend and permute functionality are provided, responsive to instructions specifying: a destination vector register comprising fields to store vector elements, a first vector register, a vector element size, a second vector register, and a third operand. Indices are read from fields in the second register. Each index has a first selector portion and a second selector portion. Corresponding unmasked vector elements are stored to fields of the destination register, wherein each vector element, responsive to the respective first selector portion having a first value, is copied to an intermediate vector from a corresponding data field of the first register, and responsive to the respective first selector portion having a second value, is copied to the intermediate vector from a corresponding data field of the third operand. Then unmasked data fields of the destination are replaced by data fields in the intermediate vector indexed by the corresponding second selector portions.

Type: Grant

Filed: December 23, 2011

Date of Patent: July 31, 2018

Assignee: Intel Corporation

Inventors: Robert Valentine, Bret L. Toll, Jesus Corbal, Jeffrey G. Wiedemeier, Sridhar Samudrala
Multicore processor and method of use that configures core functions based on executing instructions

Patent number: 10025590

Abstract: A multiprocessor system having plural heterogeneous processing units schedules instruction sets for execution on a selected of the processing units by matching workload processing characteristics of processing units and the instruction sets. To establish an instruction set's processing characteristics, the homogeneous instruction set is executed on each of the plural processing units with one or more performance metrics tracked at each of the processing units to determine which processing unit most efficiently executes the instruction set. Instruction set workload processing characteristics are stored for reference in scheduling subsequent execution of the instruction set.

Type: Grant

Filed: November 23, 2016

Date of Patent: July 17, 2018

Assignee: International Business Machines Corporation

Inventors: Louis B. Capps, Jr., Ronald E. Newhart, Thomas E. Cook, Robert H. Bell, Jr., Michael J. Shapiro
Uniform load processing for parallel thread sub-sets

Patent number: 10007527

Abstract: One embodiment of the present invention sets forth a technique for processing load instructions for parallel threads of a thread group when a sub-set of the parallel threads request the same memory address. The load/store unit determines if the memory addresses for each sub-set of parallel threads match based on one or more uniform patterns. When a match is achieved for at least one of the uniform patterns, the load/store unit transmits a read request to retrieve data for the sub-set of parallel threads. The number of read requests transmitted is reduced compared with performing a separate read request for each thread in the sub-set. A variety of uniform patterns may be defined based on common access patterns present in program instructions. A variety of uniform patterns may also be defined based on interconnect constraints between the load/store unit and the memory when a full crossbar interconnect is not available.

Type: Grant

Filed: March 5, 2012

Date of Patent: June 26, 2018

Assignee: NVIDIA CORPORATION

Inventors: Michael Fetterman, Stewart Glenn Carlton, Douglas J. Hahn, Rajeshwaran Selvanesan, Shirish Gadre, Steven James Heinrich
Dynamically enabled branch prediction

Patent number: 10001998

Abstract: Embodiments for a processor that selectively enables and disables branch prediction are disclosed. The processor may include counters to track a number of fetched instructions, a number of branches, and a number of mispredicted branches. A misprediction threshold may be calculated dependent upon the tracked number of branches and a predefined misprediction ratio. Branch prediction may then be disabled when the number of mispredictions exceed the determined threshold value and dependent upon the branch rate.

Type: Grant

Filed: April 18, 2014

Date of Patent: June 19, 2018

Assignee: Oracle International Corporation

Inventors: Haowei Zhang, Xiaoying Shen, Manish Shah
Multiplication instruction for which execution completes without writing a carry flag

Patent number: 9990201

Abstract: A method in one aspect may include receiving a multiply instruction. The multiply instruction may indicate a first source operand and a second source operand. A product of the first and second source operands may be stored in one or more destination operands indicated by the multiply instruction. Execution of the multiply instruction may complete without writing a carry flag. Other methods are also disclosed, as are apparatus, systems, and instructions on machine-readable medium.

Type: Grant

Filed: December 22, 2009

Date of Patent: June 5, 2018

Assignee: Intel Corporation

Inventors: Vinodh Gopal, James D. Guilford, Wajdi K. Feghali, Erdine Ozturk, Gilbert M. Wolrich, Martin G. Dixon, Mark C. Davis, Sean P. Mirkes, Alexandre Farcy, Bret L. Toll, Maxim Loktyukhin
Buffer store with a main store and and auxiliary store

Patent number: 9946545

Abstract: A loop buffer is provided with a main store 26 and an auxiliary store 28. The main store 26 stores micro-operation instructions. The auxiliary store 28 has fewer entries than the main store 26 and stores target addresses for predicted taken branch instructions stored within the main store 26. Read control circuitry serves to control reading from the main store and from an auxiliary store such that target addresses are read from the auxiliary store in association with the predicted taken branch instructions read from the main store.

Type: Grant

Filed: November 16, 2010

Date of Patent: April 17, 2018

Assignee: ARM Limited

Inventors: James Nolan Hardage, Glen Andrew Harris, Mark Carpenter Glass
Fetch less instruction processing (FLIP) computer architecture for central processing units (CPU)

Patent number: 9946665

Abstract: Fetch Less Instruction Processing (FLIP) Computer Architecture for Central Processing Units (CPU). This embodiment relates to computing systems, and more particularly to central processing units in computing systems. The principal object of this embodiment is to provide a Fetch Less Instruction Processing (FLIP) computer architecture using FLIP elements as building blocks for computer program processing. Another object of the embodiment is to use a protocol to interconnect FLIP elements, which makes the current operating systems, program execution models, compilers, libraries and so on to be easily transitioned to the FLIP computer architecture with minimal changes.

Type: Grant

Filed: May 14, 2012

Date of Patent: April 17, 2018

Assignee: MELANGE SYSTEMS PRIVATE LIMITED

Inventor: Narain Venkata Surendra Attili
Fast remote communication and computation between processors using store and load operations on direct core-to-core memory

Patent number: 9934079

Abstract: A system, and computer usable program product for fast remote communication and computation between processors are provided in the illustrative embodiments. A direct core to core communication unit (DCC) is configured to operate with a first processor, the first processor being a remote processor. A memory associated with the DCC receives a set of bytes, the set of bytes being sent from a second processor. An operation specified in the set of bytes is executed at the remote processor such that the operation is invoked without causing a software thread to execute.

Type: Grant

Filed: May 27, 2010

Date of Patent: April 3, 2018

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: John Bruce Carter, Elmootazbellah Nabil Elnozahy, Ahmed Gheith, Eric Van Hansbergen, Karthick Rajamani, William Evan Speight, Lixin Zhang
Predicated vector hazard check instruction

Patent number: 9928069

Abstract: A hazard check instruction has operands that specify addresses of vector elements to be read by first and second vector memory operations. The hazard check instruction outputs a dependency vector identifying, for each element position of the first vector corresponding to the first vector memory operation, which element position of the second vector that the element of the first vector depends on (if any). In an embodiment, the addresses of the vector memory operations are specified using a base address for each vector memory operation and a vector that is shared by both vector memory operations. In an embodiment, the operands may include predicates for one or both of the vector memory operations, indicating which vector elements are active. The dependency vector may be qualified by the predicates, indicating dependencies only for active elements.

Type: Grant

Filed: December 20, 2013

Date of Patent: March 27, 2018

Assignee: Apple Inc.

Inventor: Jeffry E. Gonion
System and method to select a packet format based on a number of executed threads

Patent number: 9928159

Abstract: A system and method to select a packet format based on a number of executed threads is disclosed. In a particular embodiment, a method includes determining, at a multi-threaded processor, a number of threads of a plurality of threads executing during a time period. A packet format is determined from a plurality of formats based at least in part on the determined number of threads. Data associated with execution of an instruction by a particular thread is stored in accordance with the selected format in a memory (e.g., a buffer).

Type: Grant

Filed: February 26, 2013

Date of Patent: March 27, 2018

Assignee: QUALCOMM Incorporated

Inventors: Prasanna Kumar Balasundaram, Suresh K. Venkumahanti
Instruction and logic to perform a centrifuge operation

Patent number: 9904548

Abstract: A processing device implements a set of instructions to perform a centrifuge operation using vector or general purpose registers. In one embodiment, the centrifuge operation separates bits in a source register to opposing regions of a destination register based on a control mask, where each source register bit with a corresponding control mask value of one is written to one region in a destination register, while source register bits with a corresponding control mask value of zero are written to an opposing region of the destination register.

Type: Grant

Filed: December 22, 2014

Date of Patent: February 27, 2018

Assignee: Intel Corporation

Inventors: Elmoustapha Ould-Ahmed-Vall, Robert Valentine, Jesus Corbal, Mark J. Charney

prev … 5 6 7 8 9 10 11 12 13 … next