Patents Examined by Kenneth S. Kim
  • Patent number: 8560811
    Abstract: The present invention provides a method and apparatus for handling lane-crossing instructions in an execution pipeline. One embodiment of the method includes conveying bits of an instruction from a register to an execution stage in a pipeline along a first data path that includes a lane crossing stage configured to change a first mapping of the register to the execution stage to a second mapping. The method also includes concurrently conveying the bits along a second data path from the register to the execution stage that bypasses the lane crossing stage. The method further includes selecting the first or second data path to provide the bits to the execution stage.
    Type: Grant
    Filed: August 5, 2010
    Date of Patent: October 15, 2013
    Assignee: Advanced Micro Devices, Inc.
    Inventor: John M. King
  • Patent number: 8024550
    Abstract: Disclosed is an SIMD-type microprocessor comprising a processor element group, plural processor elements with an operation part and a register file being arranged therein and a processor element control signal generator configured to output a processor element control signal controlling an operation of the processor element, wherein a feed part configured to feed a processor element control signal output from the processor element control signal generator to the processor element is provided at a center of the processor element group.
    Type: Grant
    Filed: January 21, 2009
    Date of Patent: September 20, 2011
    Assignee: Ricoh Company, Ltd.
    Inventor: Hidehito Kitamura
  • Patent number: 8019974
    Abstract: A bypass circuit is provided in a pipeline processor. A pipeline register is provided between an instruction execution stage and a write-back stage. The pipeline register stores a data validity flag and a WRITE control flag to control writing data into a general purpose register unit. The data retained in the pipeline register is allowed to be written back into the general purpose register unit when the WRITE control flag indicates “valid”. The pipeline register continues to retain the retained data even after the writing of the retained data into the general purpose register unit. The first pipeline register supplies the retained data to the second stage through the bypass circuit at the time of executing a subsequent instruction having data dependency on a preceding instruction.
    Type: Grant
    Filed: January 12, 2009
    Date of Patent: September 13, 2011
    Assignee: Kabushiki Kaisha Toshiba
    Inventor: Jun Tanabe
  • Patent number: 8015392
    Abstract: A method of updating execution instructions of a multi-core processor comprising receiving execution instructions at a processor including multiple programmable processing cores integrated on a single die, selecting subset of at least one of the cores, and loading at least a portion of the execution instructions to the subset of cores and replacing existing execution instructions, associated with the first subset of programmable processing cores, with the received execution instructions while at least one of the other cores continues to process received packets, wherein a sequence of threads provided by the cores sequentially retrieve packets to process from at least one queue, the sequence proceeding from a subsequence of at least one thread of one core to a subsequence of at least one thread on another core and wherein the sequence of threads is specified by data identifying, at least, the next core in the sequence.
    Type: Grant
    Filed: September 29, 2004
    Date of Patent: September 6, 2011
    Assignee: Intel Corporation
    Inventors: Uday Naik, Ching Boon Lee, Ai Bee Lim, Koji Sahara
  • Patent number: 7996661
    Abstract: A dynamic reconfigurable circuit that implements optional processing by dynamically switching a processing content of a reconfigurable processing element (PE) and a connection content between the PEs in accordance with a context, includes: a configuration register section for setting a content of loop processing on the basis of the context, the loop processing content including an output source of an output signal from each of a set of the reconfigured PEs, an output destination of the output signal, and a condition for outputting the output signal to the output destination; and at least one counter circuit including a loop control section and an output register section that implement the set loop processing, that count the number of implementations of the loop processing implemented by the loop control section, and that output the output signal to the output destination based on the counted number of implementations and the condition.
    Type: Grant
    Filed: September 17, 2008
    Date of Patent: August 9, 2011
    Assignee: Fujitsu Semiconductor Limited
    Inventors: Takashi Hanai, Shinichi Sutou, Masaki Arai, Mitsuharu Wakayoshi
  • Patent number: 7984267
    Abstract: Executing a service program for an accelerator application program in a hybrid computing environment that includes a host computer and an accelerator, the host computer and the accelerator adapted to one another for data communications by a system level message passing module; where the service program includes a host portion and an accelerator portion and executing a service program for an accelerator includes receiving, from the host portion, operating information for the accelerator portion; starting the accelerator portion on the accelerator; providing, to the accelerator portion, operating information for the accelerator application program; establishing direct data communications between the host portion and the accelerator portion; and, responsive to an instruction communicated directly from the host portion, executing the accelerator application program.
    Type: Grant
    Filed: September 4, 2008
    Date of Patent: July 19, 2011
    Assignee: International Business Machines Corporation
    Inventors: Michael E. Aho, Ricardo M. Matinata, Amir F. Sanjar, Gordon G. Stewart, Cornell G. Wright, Jr.
  • Patent number: 7979672
    Abstract: A method and system for transposing a multi-dimensional array for a multi-processor system having a main memory for storing the multi-dimensional array and a local memory is provided. One implementation involves partitioning the multi-dimensional array into a number of equally sized portions in the local memory, in each processor performing a transpose function including a logical transpose on one of said portions and then a physical transpose of said portion, and combining the transposed portions and storing back in their original place in the main memory.
    Type: Grant
    Filed: July 25, 2008
    Date of Patent: July 12, 2011
    Assignee: International Business Machines Corporation
    Inventors: Ahmed H. M. R. El-Mahdy, Ali A. El-Moursy, Hisham ElShishiny
  • Patent number: 7979674
    Abstract: Executing MIMD programs on a SIMD machine, the SIMD machine including a plurality of compute nodes, each compute node capable of executing only a single thread of execution, the compute nodes initially configured exclusively for SIMD operations, the SIMD machine further comprising a data communications network, the network comprising synchronous data communications links among the compute nodes, including establishing one or more SIMD partitions, booting one or more SIMD partitions in MIMD mode; establishing a MIMD partition; executing by launcher programs a plurality of MIMD programs on two or more of the compute nodes of the MIMD partition; and re-executing a launcher program by an operating system on a compute node in the MIMD partition upon termination of the MIMD program executed by the launcher program.
    Type: Grant
    Filed: May 16, 2007
    Date of Patent: July 12, 2011
    Assignee: International Business Machines Corporation
    Inventors: Todd A. Inglett, Patrick J. McCarthy, Amanda Peters, Thomas A. Budnik, Michael B. Mundy, Gordon G. Stewart
  • Patent number: 7975126
    Abstract: Described is microprocessor architecture that includes at least one reconfigurable execution path (e.g., implemented via FPGAs or CPLDs). When an instruction is fetched, a mechanism determines whether the reconfigurable execution path (and/or which path) will handle that instruction. A content addressable memory may be used to determine the execution path when fed the instruction's operational code, or an arbiter and multiplexer may resolve conflicts if multiple instruction decode blocks recognize the same instruction. The execution path may be dynamically reconfigured, activated or deactivated as needed, such as to extend an instruction set, to optimize instructions for a particular application program, to implement a peripheral device, to provide parallel computing, and/or based on power consumption and/or processing power needs. Security may be provided by having the reconfigurable execution path loaded from an extension file that is associated with metadata, including security information.
    Type: Grant
    Filed: March 19, 2009
    Date of Patent: July 5, 2011
    Assignee: Microsoft Corporation
    Inventors: Richard Neil Pittman, Alessandro Forin, Nathaniel L. Lynch
  • Patent number: 7971033
    Abstract: A method for reducing the number of load instructions in the load reorder queue (LRQ) that are searched when a load instruction is executed by a processor, including dispatching the load instructions; inserting the load instructions in the LRQ in program order; clearing a load received data field; executing the load instructions; checking load reorder queue (LRQ) entries; re-executing the load instruction of the matching LRQ entry; continuing execution; getting the load data; setting the load received data field; comparing a load sequence number (LSQN) of each load instruction to a snoop_safe register contents; ANDing all the load received data bits if the LSQN is greater in magnitude to the snoop_safe; setting the snoop_safe register to the LSQN of the load instruction; searching the LRQ entry; and setting a load_peril_snoop register to the LRQ index value where the first load instruction younger to the snoop_safe was found.
    Type: Grant
    Filed: July 14, 2008
    Date of Patent: June 28, 2011
    Assignee: International Business Machines
    Inventors: Erik R. Altman, Vijayalakshmi Srinivasan
  • Patent number: 7966478
    Abstract: A method for reducing entries searched in a load reorder queue (LRQ) when snoop instructions are executed by a processor, including checking load reorder queue (LRQ) entries located between a load_peril_snoop register and a lrq_tail register for addresses matching the address of the snoop; and setting a snooped bit in the LRQ entry for any matches found.
    Type: Grant
    Filed: July 14, 2008
    Date of Patent: June 21, 2011
    Assignee: International Business Machines Corporation
    Inventors: Erik R. Altman, Vijayalakshmi Srinivasan
  • Patent number: 7966482
    Abstract: An apparatus includes an instruction decoder, first and second source registers and a circuit coupled to the decoder to receive packed data from the source registers and to pack the packed data responsive to a pack instruction received by the decoder. A first packed data element and a second packed data element are received from the first source register. A third packed data element and a fourth packed data element are received from the second source register. The circuit packs packing a portion of each of the packed data elements into a destination register resulting with the portion from second packed data element adjacent to the portion from the first packed data element, and the portion from the fourth packed data element adjacent to the portion from the third packed data element.
    Type: Grant
    Filed: June 12, 2006
    Date of Patent: June 21, 2011
    Assignee: Intel Corporation
    Inventors: Alexander Peleg, Yaakov Yaari, Millind Mittal, Larry M. Mennemeier, Benny Eitan
  • Patent number: 7966476
    Abstract: A method, apparatus and system are disclosed for decoding an instruction in a variable-length instruction set. The instruction is one of a set of new types of instructions that uses a new escape code value, which is two bytes in length, to indicate that a third opcode byte includes the instruction-specific opcode for a new instruction. The new instructions are defined such the length of each instruction in the opcode map for one of the new escape opcode values may be determined using the same set of inputs, where each of the inputs is relevant to determining the length of each instruction in the new opcode map. For at least one embodiment, the length of one of the new instructions is determined without evaluating the instruction-specific opcode.
    Type: Grant
    Filed: February 28, 2008
    Date of Patent: June 21, 2011
    Assignee: Intel Corporation
    Inventors: James S. Coke, Peter J. Ruscito, Masood Tahir, David B. Jackson, Ves A. Naydenov, Scott D. Rodgers, Bret L. Toll, Frank Binns
  • Patent number: 7962729
    Abstract: Software defects (e.g., array access out of bounds, stack overflow, infinite loops, and data corruption) occur due to integer values falling outside their expected range. Because programming languages do not include range-checking instructions as part of their language, to detect software defects and ensure that the code runs smoothly, programmers generally use 1) runtime assertions and/or 2) sub-range data types. However, these techniques cause additional conditional branches, incur additional overhead, and decrease processor performance. Processors comprising a range checking hardware feature supported by machine instructions for runtime integer range checking can eliminate the conditional branches generated during runtime integer range checks. Programming language extensions for the range checking hardware can allow dynamic range bounds to be defined during runtime without decreasing the processor's performance. This can allow for easier programming and code that is easier to maintain.
    Type: Grant
    Filed: January 5, 2009
    Date of Patent: June 14, 2011
    Assignee: International Business Machines Corporation
    Inventor: Jose G. Rivera
  • Patent number: 7962730
    Abstract: In one embodiment, a processor comprises a retire unit and a load/store unit coupled thereto. The retire unit is configured to retire a first store memory operation responsive to the first store memory operation having been processed at least to a pipeline stage at which exceptions are reported for the first store memory operation. The load/store unit comprises a queue having a first entry assigned to the first store memory operation. The load/store unit is configured to retain the first store memory operation in the first entry subsequent to retirement of the first store memory operation if the first store memory operation is not complete. The queue may have multiple entries, and more than one store may be retained in the queue after being retired by the retire unit.
    Type: Grant
    Filed: November 25, 2008
    Date of Patent: June 14, 2011
    Assignee: Apple Inc.
    Inventors: Wei-Han Lien, Po-Yung Chang
  • Patent number: 7958341
    Abstract: In some embodiments, each matrix processor in a matrix of mesh-interconnected matrix processors includes an instruction processing pipeline, and a hardware data switch capable of streaming data to/from one or more inter-processor matrix links and/or a matrix processor local memory links in response to execution of a data streaming instruction by the instruction processing pipeline. The data switch can transfer each data stream, which includes multiple words, at wire speed, one word per cycle. After initiating a data stream, the processing pipeline can execute other instructions, including streaming instructions, while a stream transfer is in progress. Different data streaming instructions may be used to transfer data streams from local memory to one or more inter-processor links, from an inter-processor link to local memory, from an inter-processor link to one or more inter-processor links, and from an inter-processor link to one or more inter-processor links and synchronously to local memory.
    Type: Grant
    Filed: July 7, 2008
    Date of Patent: June 7, 2011
    Assignee: Ovics
    Inventors: Sorin C Cismas, Ilie Garbacea
  • Patent number: 7949862
    Abstract: Address control section includes an encoding section to generate higher-order address information made by compressing a predetermined higher-order bit part from predetermined higher-order and lower-order bit parts included in an instruction address, and a restoring section to restore the higher-order bit part from the higher-order address information. Branch instruction predicting section includes a history memory section that stores the higher-order bit part and the lower-order bit part corresponding to a branch address of a processed branch instruction at either one of a plurality of storing places determined from the higher-order bit part and the lower-order bit part corresponding to a branch address of a processed branch instruction.
    Type: Grant
    Filed: August 21, 2008
    Date of Patent: May 24, 2011
    Assignee: Fujitsu Limited
    Inventors: Megumi Yokoi, Masaki Ukai, Takashi Suzuki
  • Patent number: 7949855
    Abstract: A processor buffers asynchronous threads. Instructions requiring operations provided by a plurality of execution units are divided into phases, each phase having at least one computation operation and at least one memory access operation. Instructions within each phase are qualified and prioritized. The instructions may be qualified based on the status of the execution unit needed to execute one or more of the current instructions. The instructions may also be qualified based on an age of each instruction, status of the execution units, a divergence potential, locality, thread diversity, and resource requirements. Qualified instructions may be prioritized based on execution units needed to execute instructions and the execution units in use. One or more of the prioritized instructions is issued per cycle to the plurality of execution units.
    Type: Grant
    Filed: April 28, 2008
    Date of Patent: May 24, 2011
    Assignee: NVIDIA Corporation
    Inventors: Peter C. Mills, John Erik Lindholm, Brett W. Coon, Gary M. Tarolli, John Matthew Burgess
  • Patent number: 7945766
    Abstract: A processor capable of executing conditional store instructions without being limited by the number of condition codes is provided. Condition data is stored in floating-point registers, and an operation unit executes a conditional floating-point store instruction of determining whether to store, in cache, store data.
    Type: Grant
    Filed: November 25, 2008
    Date of Patent: May 17, 2011
    Assignee: Fujitsu Limited
    Inventor: Toshio Yoshida
  • Patent number: 7941644
    Abstract: A processing unit includes multiple execution units and sequencer logic that is disposed downstream of instruction buffer logic, and that is responsive to a sequencer instruction present in an instruction stream. In response to such an instruction, the sequencer logic issues a plurality of instructions associated with a long latency operation to one execution unit, while blocking instructions from the instruction buffer logic from being issued to that execution unit. In addition, the blocking of instructions from being issued to the execution unit does not affect the issuance of instructions to any other execution unit, and as such, other instructions from the instruction buffer logic are still capable of being issued to and executed by other execution units even while the sequencer logic is issuing the plurality of instructions associated with the long latency operation.
    Type: Grant
    Filed: October 16, 2008
    Date of Patent: May 10, 2011
    Assignee: International Business Machines Corporation
    Inventors: Eric Oliver Mejdrich, Adam James Muff, Matthew Ray Tubbs