Patents Examined by George Giroux
  • Patent number: 10185569
    Abstract: Interrupt handling on a multiprocessor computer executing multiple computational operations in parallel is provided by establishing a total ordering of the multiple computational operations and defining an architectural state at the time of the interrupt as if the computational operations executed in the total ordering.
    Type: Grant
    Filed: February 13, 2013
    Date of Patent: January 22, 2019
    Assignee: Wisconsin Alumni Research Foundation
    Inventors: Gagan Gupta, Gurindar S. Sohi
  • Patent number: 10175985
    Abstract: A processor core includes an instruction-sequencing unit (ISU). The ISU includes a general register file (GRF) composed of multiple hardware general purpose registers (GPRs), an exception register (XER), and a reservation station (RS). The execution unit(s) load an address of data in a data GPR, and load a first portion of the data in a first data GPR and a second portion of the data in a second data GPR in the GRF, where loading the portions of the data generate intermediate data condition codes that are loaded in the XER. The execution unit(s) generate a cumulative data condition code, which is loaded into a history buffer within the ISU. The intermediate data condition codes are loaded into a reservation station (RS) within the ISU. Upon flushing the GRF and the XER, the ISU repopulates the GRF from a history buffer and the XER from the RS.
    Type: Grant
    Filed: March 28, 2016
    Date of Patent: January 8, 2019
    Assignee: International Business Machines Corporation
    Inventors: Sundeep Chadha, Michael J. Genden, Dung Q. Nguyen
  • Patent number: 10152324
    Abstract: Embodiments of methods and computer program products disclosed herein relate to processor architecture. One such method includes the processor obtaining an instruction. The instruction specifies an operation, and also specifies one of the registers as a source register and one of the registers as a destination register. The method also includes the processor obtaining an endian mode and determining that the instruction is an element-ordering-sensitive instruction. Based on the determination that the instruction is an element-ordering-sensitive instruction, the processor executes the instruction by performing the operation on the elements of the source register in accordance with the endian mode and writing a result of the operation to the destination register.
    Type: Grant
    Filed: September 5, 2014
    Date of Patent: December 11, 2018
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Michael K. Gschwind, Brett Olsson
  • Patent number: 10120683
    Abstract: Supporting even instruction tag (‘ITAG’) requirements in a multi-slice processor with null internal operations (IOPs) includes: receiving an IOP with an even ITAG requirement; determining that the IOP is to be assigned an odd ITAG; and inserting a null IOP into an instruction lane ahead of the IOP, wherein the null IOP is assigned the odd ITAG, and the IOP is assigned an even ITAG.
    Type: Grant
    Filed: April 27, 2016
    Date of Patent: November 6, 2018
    Assignee: International Business Machines Corporation
    Inventors: Steven R. Carlough, Kurt A. Feiste, Paul M. Kennedy, Phillip G. Williams
  • Patent number: 10120682
    Abstract: Embodiments of systems disclosed herein relate to processor architecture. One such system implements a method that includes the processor obtaining an instruction. The instruction specifies an operation, and also specifies one of the registers as a source register and one of the registers as a destination register. The method also includes the processor obtaining an endian mode and determining that the instruction is an element-ordering-sensitive instruction. Based on the determination that the instruction is an element-ordering-sensitive instruction, the processor executes the instruction by performing the operation on the elements of the source register in accordance with the endian mode and writing a result of the operation to the destination register.
    Type: Grant
    Filed: February 28, 2014
    Date of Patent: November 6, 2018
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Michael K. Gschwind, Brett Olsson
  • Patent number: 10073719
    Abstract: In one embodiment, a processor includes an execution unit and at least one last branch record (LBR) register to store address information of a branch taken during program execution. This register may further store a transaction indicator to indicate whether the branch was taken during a transactional memory (TM) transaction. This register may further store an abort indicator to indicate whether the branch was caused by a transaction abort. Other embodiments are described and claimed.
    Type: Grant
    Filed: April 18, 2016
    Date of Patent: September 11, 2018
    Assignee: Intel Corporation
    Inventors: Ravi Rajwar, Peter Lachner, Laura A. Knauth, Konrad K. Lai
  • Patent number: 10067556
    Abstract: A method maintains power usage prediction information for one or more functional units in branch prediction logic for a processing unit such that the power consumption of a functional unit may be selectively reduced in association with the execution of branch instructions when it is predicted that the functional unit will be idle subsequent to the execution of such branch instructions.
    Type: Grant
    Filed: August 31, 2015
    Date of Patent: September 4, 2018
    Assignee: International Business Machines Corporation
    Inventors: Mark J. Hickey, Adam J. Muff, Matthew R. Tubbs, Charles D. Wait
  • Patent number: 10067782
    Abstract: Various aspects are disclosed herein for attenuating spin waiting in a virtual machine environment comprising a plurality of virtual machines and virtual processors. Selected virtual processors can be given time slice extensions in order to prevent such virtual processors from becoming de-scheduled (and hence causing other virtual processors to have to spin wait). Selected virtual processors can also be expressly scheduled so that they can be given higher priority to resources, resulting in reduced spin waits for other virtual processors waiting on such selected virtual processors. Finally, various spin wait detection techniques can be incorporated into the time slice extension and express scheduling mechanisms, in order to identify potential and existing spin waiting scenarios.
    Type: Grant
    Filed: November 18, 2015
    Date of Patent: September 4, 2018
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Yau Ning Chin, John Te-Jui Sheu, Arun Kishan, Thomas Fahrig, Rene Antonio Vega
  • Patent number: 10042417
    Abstract: A circuit arrangement maintains power usage prediction information for one or more functional units in branch prediction logic for a processing unit such that the power consumption of a functional unit may be selectively reduced in association with the execution of branch instructions when it is predicted that the functional unit will be idle subsequent to the execution of such branch instructions.
    Type: Grant
    Filed: July 5, 2016
    Date of Patent: August 7, 2018
    Assignee: International Business Machines Corporation
    Inventors: Mark J. Hickey, Adam J. Muff, Matthew R. Tubbs, Charles D. Wait
  • Patent number: 10037205
    Abstract: Vector blend and permute functionality are provided, responsive to instructions specifying: a destination vector register comprising fields to store vector elements, a first vector register, a vector element size, a second vector register, and a third operand. Indices are read from fields in the second register. Each index has a first selector portion and a second selector portion. Corresponding unmasked vector elements are stored to fields of the destination register, wherein each vector element, responsive to the respective first selector portion having a first value, is copied to an intermediate vector from a corresponding data field of the first register, and responsive to the respective first selector portion having a second value, is copied to the intermediate vector from a corresponding data field of the third operand. Then unmasked data fields of the destination are replaced by data fields in the intermediate vector indexed by the corresponding second selector portions.
    Type: Grant
    Filed: December 23, 2011
    Date of Patent: July 31, 2018
    Assignee: Intel Corporation
    Inventors: Robert Valentine, Bret L. Toll, Jesus Corbal, Jeffrey G. Wiedemeier, Sridhar Samudrala
  • Patent number: 10025590
    Abstract: A multiprocessor system having plural heterogeneous processing units schedules instruction sets for execution on a selected of the processing units by matching workload processing characteristics of processing units and the instruction sets. To establish an instruction set's processing characteristics, the homogeneous instruction set is executed on each of the plural processing units with one or more performance metrics tracked at each of the processing units to determine which processing unit most efficiently executes the instruction set. Instruction set workload processing characteristics are stored for reference in scheduling subsequent execution of the instruction set.
    Type: Grant
    Filed: November 23, 2016
    Date of Patent: July 17, 2018
    Assignee: International Business Machines Corporation
    Inventors: Louis B. Capps, Jr., Ronald E. Newhart, Thomas E. Cook, Robert H. Bell, Jr., Michael J. Shapiro
  • Patent number: 10007527
    Abstract: One embodiment of the present invention sets forth a technique for processing load instructions for parallel threads of a thread group when a sub-set of the parallel threads request the same memory address. The load/store unit determines if the memory addresses for each sub-set of parallel threads match based on one or more uniform patterns. When a match is achieved for at least one of the uniform patterns, the load/store unit transmits a read request to retrieve data for the sub-set of parallel threads. The number of read requests transmitted is reduced compared with performing a separate read request for each thread in the sub-set. A variety of uniform patterns may be defined based on common access patterns present in program instructions. A variety of uniform patterns may also be defined based on interconnect constraints between the load/store unit and the memory when a full crossbar interconnect is not available.
    Type: Grant
    Filed: March 5, 2012
    Date of Patent: June 26, 2018
    Assignee: NVIDIA CORPORATION
    Inventors: Michael Fetterman, Stewart Glenn Carlton, Douglas J. Hahn, Rajeshwaran Selvanesan, Shirish Gadre, Steven James Heinrich
  • Patent number: 10001998
    Abstract: Embodiments for a processor that selectively enables and disables branch prediction are disclosed. The processor may include counters to track a number of fetched instructions, a number of branches, and a number of mispredicted branches. A misprediction threshold may be calculated dependent upon the tracked number of branches and a predefined misprediction ratio. Branch prediction may then be disabled when the number of mispredictions exceed the determined threshold value and dependent upon the branch rate.
    Type: Grant
    Filed: April 18, 2014
    Date of Patent: June 19, 2018
    Assignee: Oracle International Corporation
    Inventors: Haowei Zhang, Xiaoying Shen, Manish Shah
  • Patent number: 9990201
    Abstract: A method in one aspect may include receiving a multiply instruction. The multiply instruction may indicate a first source operand and a second source operand. A product of the first and second source operands may be stored in one or more destination operands indicated by the multiply instruction. Execution of the multiply instruction may complete without writing a carry flag. Other methods are also disclosed, as are apparatus, systems, and instructions on machine-readable medium.
    Type: Grant
    Filed: December 22, 2009
    Date of Patent: June 5, 2018
    Assignee: Intel Corporation
    Inventors: Vinodh Gopal, James D. Guilford, Wajdi K. Feghali, Erdine Ozturk, Gilbert M. Wolrich, Martin G. Dixon, Mark C. Davis, Sean P. Mirkes, Alexandre Farcy, Bret L. Toll, Maxim Loktyukhin
  • Patent number: 9946545
    Abstract: A loop buffer is provided with a main store 26 and an auxiliary store 28. The main store 26 stores micro-operation instructions. The auxiliary store 28 has fewer entries than the main store 26 and stores target addresses for predicted taken branch instructions stored within the main store 26. Read control circuitry serves to control reading from the main store and from an auxiliary store such that target addresses are read from the auxiliary store in association with the predicted taken branch instructions read from the main store.
    Type: Grant
    Filed: November 16, 2010
    Date of Patent: April 17, 2018
    Assignee: ARM Limited
    Inventors: James Nolan Hardage, Glen Andrew Harris, Mark Carpenter Glass
  • Patent number: 9946665
    Abstract: Fetch Less Instruction Processing (FLIP) Computer Architecture for Central Processing Units (CPU). This embodiment relates to computing systems, and more particularly to central processing units in computing systems. The principal object of this embodiment is to provide a Fetch Less Instruction Processing (FLIP) computer architecture using FLIP elements as building blocks for computer program processing. Another object of the embodiment is to use a protocol to interconnect FLIP elements, which makes the current operating systems, program execution models, compilers, libraries and so on to be easily transitioned to the FLIP computer architecture with minimal changes.
    Type: Grant
    Filed: May 14, 2012
    Date of Patent: April 17, 2018
    Assignee: MELANGE SYSTEMS PRIVATE LIMITED
    Inventor: Narain Venkata Surendra Attili
  • Patent number: 9934079
    Abstract: A system, and computer usable program product for fast remote communication and computation between processors are provided in the illustrative embodiments. A direct core to core communication unit (DCC) is configured to operate with a first processor, the first processor being a remote processor. A memory associated with the DCC receives a set of bytes, the set of bytes being sent from a second processor. An operation specified in the set of bytes is executed at the remote processor such that the operation is invoked without causing a software thread to execute.
    Type: Grant
    Filed: May 27, 2010
    Date of Patent: April 3, 2018
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: John Bruce Carter, Elmootazbellah Nabil Elnozahy, Ahmed Gheith, Eric Van Hansbergen, Karthick Rajamani, William Evan Speight, Lixin Zhang
  • Patent number: 9928069
    Abstract: A hazard check instruction has operands that specify addresses of vector elements to be read by first and second vector memory operations. The hazard check instruction outputs a dependency vector identifying, for each element position of the first vector corresponding to the first vector memory operation, which element position of the second vector that the element of the first vector depends on (if any). In an embodiment, the addresses of the vector memory operations are specified using a base address for each vector memory operation and a vector that is shared by both vector memory operations. In an embodiment, the operands may include predicates for one or both of the vector memory operations, indicating which vector elements are active. The dependency vector may be qualified by the predicates, indicating dependencies only for active elements.
    Type: Grant
    Filed: December 20, 2013
    Date of Patent: March 27, 2018
    Assignee: Apple Inc.
    Inventor: Jeffry E. Gonion
  • Patent number: 9928159
    Abstract: A system and method to select a packet format based on a number of executed threads is disclosed. In a particular embodiment, a method includes determining, at a multi-threaded processor, a number of threads of a plurality of threads executing during a time period. A packet format is determined from a plurality of formats based at least in part on the determined number of threads. Data associated with execution of an instruction by a particular thread is stored in accordance with the selected format in a memory (e.g., a buffer).
    Type: Grant
    Filed: February 26, 2013
    Date of Patent: March 27, 2018
    Assignee: QUALCOMM Incorporated
    Inventors: Prasanna Kumar Balasundaram, Suresh K. Venkumahanti
  • Patent number: 9904548
    Abstract: A processing device implements a set of instructions to perform a centrifuge operation using vector or general purpose registers. In one embodiment, the centrifuge operation separates bits in a source register to opposing regions of a destination register based on a control mask, where each source register bit with a corresponding control mask value of one is written to one region in a destination register, while source register bits with a corresponding control mask value of zero are written to an opposing region of the destination register.
    Type: Grant
    Filed: December 22, 2014
    Date of Patent: February 27, 2018
    Assignee: Intel Corporation
    Inventors: Elmoustapha Ould-Ahmed-Vall, Robert Valentine, Jesus Corbal, Mark J. Charney