Simultaneous Issuance Of Multiple Instructions Patents (Class 712/215)
  • Publication number: 20150134934
    Abstract: An out of order processor. The processor includes a distributed load queue and a distributed store queue that maintain single program sequential semantics while allowing an out of order dispatch of loads and stores across a plurality of cores and memory fragments; wherein the processor allocates other instructions besides loads and stores beyond the actual physical size limitation of the load/store queue; and wherein the other instructions can be dispatched and executed even though intervening loads or stores do not have spaces in the load store queue.
    Type: Application
    Filed: December 3, 2014
    Publication date: May 14, 2015
    Inventor: Mohammad ABDALLAH
  • Publication number: 20150113252
    Abstract: The present invention relates to a thread control method of a multi-thread virtual pipeline (MVP) processor, which comprises the following steps: allocating directly and sequentially threads in a central processing unit (CPU) thread operation queue to multi-path parallel hardware thread time slots of the MVP processor for operation; allowing an operating thread to generate hardware thread call instructions corresponding thereto to a hardware thread management unit; allowing the hardware thread management unit to enable the call instructions of ithread threads to form a program queue according to receiving time, and calling and preparing the hardware threads; and allowing the hardware threads to operate sequentially in idle multi-path parallel hardware thread time slots of the MVP processor according to the sequence of the hardware threads in the queue of the hardware thread management unit. The present invention also relates to a processor.
    Type: Application
    Filed: June 7, 2013
    Publication date: April 23, 2015
    Inventors: Simon Moy, Chang Liao, Qianxiang Ji, David Ng, Stanley Law
  • Patent number: 9015720
    Abstract: A system and method to optimize processor performance and minimizing average thread latency by selectively loading a cache when a program state, resources required for execution of a program or the program itself change, is described. An embodiment of the invention supports a “cache priming program” that is selectively executed for a first thread/program/sub-routine of each process. Such a program is optimized for situations when instructions and other program data are not yet resident in cache(s), and/or whenever resources required for program execution or the program itself changes. By pre-loading the cache with two resources required for two instructions for only a first thread, average thread latency is reduced because the resources are already present in the cache.
    Type: Grant
    Filed: January 6, 2009
    Date of Patent: April 21, 2015
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Andrew Brown, Brian Emberling
  • Publication number: 20150106595
    Abstract: Methods and reservation stations for selecting instructions to issue to a functional unit of an out-of-order processor. The method includes classifying each instruction into one of a number of categories based on the type of instruction. Once classified an instruction is stored in an instruction queue corresponding to the category in which it was classified. Instructions are then selected from one or more of the instruction queues to issue to the functional unit based on a relative priority of the plurality of types of instructions. This allows certain types of instructions (e.g. control transfer instructions, flag setting instructions and/or address generation instructions) to be prioritized over other types of instructions even if they are younger.
    Type: Application
    Filed: July 25, 2014
    Publication date: April 16, 2015
    Inventors: Anand Khot, Hugh Jackson
  • Patent number: 8990767
    Abstract: A method, system, and article of manufacture for solving ordinary differential equations described in a graphical model with nodes as blocks and dependencies as links using the processing of a computer with a plurality of processors. The method includes: generating segments of block with or without duplication for each block with an internal state and for each block without any output by traversing the graphical model from each block with an internal state to each block without any output; merging the segment to reduce duplication; compiling and converting each segment from the merged results in an executable code; and individually allocating the executable code for each segment to a plurality of processors for parallel execution.
    Type: Grant
    Filed: February 7, 2013
    Date of Patent: March 24, 2015
    Assignee: International Business Machines Corporation
    Inventors: Kumiko Maeda, Shuichi Shimizu, Takeo Yoshizawa
  • Publication number: 20150074380
    Abstract: A clock-less asynchronous processor comprising a plurality of parallel asynchronous processing logic circuits, each processing logic circuit configured to generate an instruction execution result. The processor comprises an asynchronous instruction dispatch unit coupled to each processing logic circuit, the instruction dispatch unit configured to receive multiple instructions from memory and dispatch individual instructions to each of the processing logic circuits. The processor comprises a crossbar coupled to an output of each processing logic circuit and to the dispatch unit, the crossbar configured to store the instruction execution results.
    Type: Application
    Filed: September 8, 2014
    Publication date: March 12, 2015
    Inventors: Tao Huang, Yiqun Ge, Qifan Zhang, Wuxian Shi, Wen Tong
  • Patent number: 8954715
    Abstract: A multithreading processor 4 interleaves program instructions from different program threads to perform fine grained multithreading. Thread performance monitoring circuitry 30 monitors performance parameters of individual program threads to generate performance values. Issue control circuitry 28 reads these performance values to determine which program thread is next selected to be active when a thread switch event occurs. The performance parameters measured may include the proportion of cycles in which a program thread is able to provide a program instruction for execution by the execution circuitry 12 within the processor 4.
    Type: Grant
    Filed: March 16, 2012
    Date of Patent: February 10, 2015
    Assignee: ARM Limited
    Inventors: Vladimir Vasekin, Andrew Christopher Rose, Allan John Skillman, Antony John Penton
  • Patent number: 8954714
    Abstract: An apparatus includes a processor. The processor includes two memories. The first memory stores one set of instructions. The second memory stores another set of instructions that are longer than the set of instructions in the first memory. An instruction in the set of instructions in the first memory is used as a pointer to a corresponding instruction in the set of instructions in the second memory.
    Type: Grant
    Filed: February 1, 2010
    Date of Patent: February 10, 2015
    Assignee: Altera Corporation
    Inventor: Steven Perry
  • Patent number: 8924661
    Abstract: A data storage system includes a plurality of non-volatile memory devices arranged in one or more sets, a main controller and one or more processors. The main controller is configured to accept commands from a host and to convert the commands into recipes. Each recipe includes a list of multiple memory operations to be performed sequentially in the non-volatile memory devices belonging to one of the sets. Each of the processors is associated with a respective set of the non-volatile memory devices, and is configured to receive one or more of the recipes from the main controller and to execute the memory operations specified in the received recipes in the non-volatile memory devices belonging to the respective set.
    Type: Grant
    Filed: January 17, 2010
    Date of Patent: December 30, 2014
    Assignee: Apple Inc.
    Inventors: Michael Shachar, Barak Rotbard, Oren Golov, Uri Perlmutter, Dotan Sokolov, Julian Vlaiko, Yair Schwartz
  • Patent number: 8904152
    Abstract: Efficient computation of complex multiplication results and very efficient fast Fourier transforms (FFTs) are provided. A parallel array VLIW digital signal processor is employed along with specialized complex multiplication instructions and communication operations between the processing elements which are overlapped with computation to provide very high performance operation. Successive iterations of a loop of tightly packed VLIWs are used allowing the complex multiplication pipeline hardware to be efficiently used. In addition, efficient techniques for supporting combined multiply accumulate operations are described.
    Type: Grant
    Filed: May 26, 2011
    Date of Patent: December 2, 2014
    Assignee: Altera Corporation
    Inventors: Nikos P. Pitsianis, Gerald George Pechanek, Ricardo Rodriguez
  • Patent number: 8874879
    Abstract: A vector processing circuit includes a vector register file including a plurality of array elements, a command issuance control circuit, and a plurality of pipeline arithmetic units. Each pipeline arithmetic unit performs arithmetic processing of data stored in the array elements indicated as a source by one command in parts through a plurality of cycles and stores the result in the array elements indicated as a destination by the one command through a plurality of cycles. When data word length of a preceding command is longer than that of a subsequent command, the command issuance control circuit changes data sizes of the array elements in accordance with data word length of the command and determines whether there is register interference between the array element to be processed at a non-head cycle of the preceding command, and the array element to be processed at a head cycle of the subsequent command.
    Type: Grant
    Filed: October 24, 2011
    Date of Patent: October 28, 2014
    Assignee: Fujitsu Limited
    Inventors: Yi Ge, Yoshimasa Takebe, Hiromasa Takahashi
  • Patent number: 8843730
    Abstract: An apparatus includes a processor and a memory coupled to the processor. The memory stores an instruction packet (e.g., a VLIW instruction packet) including a first predicate independent instruction and a second predicate independent instruction. Each of the predicate independent instructions has the same destination.
    Type: Grant
    Filed: September 9, 2011
    Date of Patent: September 23, 2014
    Assignee: QUALCOMM Incorporated
    Inventors: Erich J. Plondke, Lucian Codrescu, Mao Zeng, Charles Joseph Tabony, Suresh K. Venkumahanti
  • Patent number: 8838906
    Abstract: In a multiprocessor system with at least two levels of cache, a speculative thread may run on a core processor in parallel with other threads. When the thread seeks to do a write to main memory, this access is to be written through the first level cache to the second level cache. After the write though, the corresponding line is deleted from the first level cache and/or prefetch unit, so that any further accesses to the same location in main memory have to be retrieved from the second level cache. The second level cache keeps track of multiple versions of data, where more than one speculative thread is running in parallel, while the first level cache does not have any of the versions during speculation. A switch allows choosing between modes of operation of a speculation blind first level cache.
    Type: Grant
    Filed: January 4, 2011
    Date of Patent: September 16, 2014
    Assignee: International Business Machines Corporation
    Inventors: Alan Gara, Martin Ohmacht
  • Patent number: 8812822
    Abstract: A design structure embodied in a machine readable storage medium for designing, manufacturing, and/or testing a design for minimizing unscheduled D-cache miss pipeline stalls is provided. The design structure includes an integrated circuit device, which includes a cascaded delayed execution pipeline unit having two or more execution pipelines that begin execution of instructions in a common issue group in a delayed manner relative to each other, and circuitry. The circuitry is configured to receive an issue group of instructions, determine whether the issue group is a load instruction, and if so, schedule the load instruction in a first pipeline of the two or more execution pipelines, and schedule each remaining instruction in the issue group to be executed in remaining pipelines of the two or more pipelines, wherein execution of the load instruction in the first pipeline begins prior to beginning execution of the remaining instructions in the remaining pipelines.
    Type: Grant
    Filed: March 13, 2008
    Date of Patent: August 19, 2014
    Assignee: International Business Machines Corporation
    Inventor: David A. Luick
  • Publication number: 20140215188
    Abstract: In an embodiment, a processor includes a multi-level dispatch circuit configured to supply operations for execution by multiple parallel execution pipelines. The multi-level dispatch circuit may include multiple dispatch buffers, each of which is coupled to multiple reservation stations. Each reservation station may be coupled to a respective execution pipeline and may be configured to schedule instruction operations (ops) for execution in the respective execution pipeline. The sets of reservation stations coupled to each dispatch buffer may be non-overlapping. Thus, if a given op is to be executed in a given execution pipeline, the op may be sent to the dispatch buffer which is coupled to the reservation station that provides ops to the given execution pipeline.
    Type: Application
    Filed: January 25, 2013
    Publication date: July 31, 2014
    Applicant: Apple Inc.
    Inventors: John H. Mylius, Gerard R. Williams, III, Shyam Sundar Balasubramanian, Conrado Blasco-Allue
  • Patent number: 8788793
    Abstract: A processor including L computing units, L being an integer of 2 or greater, the processor comprising: an instruction buffer including M×Z instruction storage areas each storing one instruction, M instruction streams being input in a state of being distinguished from each other, each of the M instruction streams including Z instructions, M and Z each being an integer of 2 or greater, M×Z being equal to or greater than L; an order information holding unit holding order information that indicates an order of the M×Z instruction storage areas; an extraction unit operable to extract instructions from the M×Z instruction storage areas; and a control unit operable to cause the extraction unit to extract L instructions in executable state from the M×Z instruction storage areas in accordance with the order indicated by the order information, and input the instructions into different ones of the L computing units.
    Type: Grant
    Filed: May 18, 2010
    Date of Patent: July 22, 2014
    Assignee: Panasonic Corporation
    Inventor: Hiroyuki Morishita
  • Patent number: 8782435
    Abstract: A processor comprising: an instruction processing pipeline, configured to receive a sequence of instructions for execution, said sequence comprising at least one instruction including a flow control instruction which terminates the sequence; a hash generator, configured to generate a hash associated with execution of the sequence of instructions; a memory configured to securely receive a reference signature corresponding to a hash of a verified corresponding sequence of instructions; verification logic configured to determine a correspondence between the hash and the reference signature; and authorization logic configured to selectively produce a signal, in dependence on a degree of correspondence of the hash with the reference signature.
    Type: Grant
    Filed: July 15, 2011
    Date of Patent: July 15, 2014
    Assignee: The Research Foundation for The State University of New York
    Inventor: Kanad Ghose
  • Patent number: 8782434
    Abstract: A pipelined processor comprising a cache memory system, fetching instructions for execution from a portion of said cache memory system, an instruction commencing processing before a digital signature of the cache line that contained the instruction is verified against a reference signature of the cache line, the verification being done at the point of decoding, dispatching, or committing execution of the instruction, the reference signature being stored in an encrypted form in the processor's memory, and the key for decrypting the said reference signature being stored in a secure storage location. The instruction processing proceeds when the two signatures exactly match and, where further instruction processing is suspended or processing modified on a mismatch of the two said signatures.
    Type: Grant
    Filed: July 15, 2011
    Date of Patent: July 15, 2014
    Assignee: The Research Foundation for the State University of New York
    Inventor: Kanad Ghose
  • Patent number: 8769246
    Abstract: In one embodiment, a multithreaded processor includes a plurality of buffers, each configured to store instructions corresponding to a respective thread. The multithreaded processor also includes a pick unit coupled to the plurality of buffers. The pick unit may pick from at least one of the buffers in a given cycle, a valid instruction based upon a thread selection algorithm. The pick unit may further cancel, in the given cycle, the picking of the valid instruction in response to receiving a cancel indication.
    Type: Grant
    Filed: February 14, 2011
    Date of Patent: July 1, 2014
    Assignee: Open Computing Trust I & II
    Inventor: Robert T. Golla
  • Patent number: 8756404
    Abstract: Improved techniques for executing instructions in a pipelined manner that may reduce stalls that occur when executing dependent instructions are provided. Stalls may be reduced by utilizing a cascaded arrangement of pipelines with execution units that are delayed with respect to each other. This cascaded delayed arrangement allows dependent instructions to be issued within a common issue group by scheduling them for execution in different pipelines to execute at different times.
    Type: Grant
    Filed: December 11, 2006
    Date of Patent: June 17, 2014
    Assignee: International Business Machines Corporation
    Inventor: David A. Luick
  • Patent number: 8756406
    Abstract: In one embodiment the present invention includes a method and apparatus for enabling a main core and one or more co-processors to operate in a de-coupled mode, thereby facilitating the execution of two or more instruction threads in parallel. A co-processor, according to an embodiment of the invention, has a coupling manager including a loop buffer for storing instructions which can be independently fetched and executed by the co-processor when operating in de-coupled mode. In addition, the coupling manager includes a loop descriptor and a counter/condition descriptor. The loop descriptor and condition descriptor work in conjunction with one another to determine what, if any, action should be taken when a co-processor is in a particular processing state, for example, as indicated by a counter keeping track of loop processing.
    Type: Grant
    Filed: January 11, 2013
    Date of Patent: June 17, 2014
    Assignee: Marvell International Ltd.
    Inventors: Moinul Khan, Mark Fullerton, Arthur Miller, Anitha Kona
  • Patent number: 8732359
    Abstract: When executing a graphical model of a dynamic system that includes two or more concurrently executing sets of operations, a processor is configured to create a first buffer and a second buffer within the executable graphical model. A first set of operations is configured to write data to the first buffer during a first execution instance of the first set of operations. The first set of operations is configured to write data to the second buffer during a second execution instance of the first thread. A second set of operations is configured to read the data from the first buffer during an instance of the second thread that executes contemporaneously with the second execution instance of the first set of operations. Determinations regarding access to the first buffer and second buffer by the first thread and second thread are self-contained within the first thread and second thread, respectively.
    Type: Grant
    Filed: December 7, 2012
    Date of Patent: May 20, 2014
    Assignee: The MathWorks, Inc.
    Inventors: James E. Carrick, Biao Yu
  • Patent number: 8719551
    Abstract: The present invention provides an information processing apparatus and an integrated circuit which realize parallel execution of different processing systems, and which do not require the provision of a dedicated memory storing instructions for common processing The information processing apparatus comprises: a plurality of processor elements; an instruction memory storing a first program and a second program; and an arbiter interposed between the processor elements and the instruction memory, the arbiter receiving, from each of the processor elements, a request for an instruction, from among instructions included in the first program and the second program, and controlling access to the instruction memory by the processor elements, wherein the arbiter arbitrates requests made by the processor elements when the requests are (i) simultaneous requests for different instructions included in one of the first program and the second program or (ii) simultaneous requests for an instruction included in the first prog
    Type: Grant
    Filed: April 15, 2010
    Date of Patent: May 6, 2014
    Assignee: Panasonic Corporation
    Inventor: Hideshi Nishida
  • Patent number: 8707094
    Abstract: A circuit arrangement and method utilize existing redundant execution pipelines in a processing unit to execute multiple instances of stability critical instructions in parallel so that the results of the multiple instances of the instructions can be compared for the purpose of detecting errors. For other types of instructions for which fault tolerant or stability critical execution is not required or desired, the redundant execution pipelines are utilized in a more conventional manner, enabling multiple non-stability critical instructions to be concurrently issued to and executed by the redundant execution pipelines. As such, for non-stability critical program code, the performance benefits of having multiple redundant execution units are preserved, yet in the instances where fault tolerant or stability critical execution is desired for certain program code, the redundant execution units may be repurposed to provide greater assurances as to the fault-free execution of such instructions.
    Type: Grant
    Filed: March 11, 2013
    Date of Patent: April 22, 2014
    Assignee: International Business Machines Corporation
    Inventors: Mark J. Hickey, Adam J. Muff, Matthew R. Tubbs, Charles D. Wait
  • Publication number: 20140089639
    Abstract: A processor includes a plurality of execution units. At least one of the execution units is configured to determine, based on a field of a first instruction, a number of additional instructions to execute in conjunction with the first instruction and prior to execution of the first instruction.
    Type: Application
    Filed: September 27, 2012
    Publication date: March 27, 2014
    Applicant: TEXAS INSTRUMENTS INCORPORATED
    Inventors: Horst DIEWALD, Johann ZIPPERER
  • Patent number: 8677078
    Abstract: A device for managing multiple instructions to access multiple wide registers may include logic to receive the multiple instructions to access one of the multiple wide registers, associate each received instruction with a corresponding one of multiple buffer memories, and allow simultaneous processing of the multiple instructions associated with each of the multiple buffer memories, where the multiple instructions are processed such that data is transferred between the multiple buffer memories and the multiple wide registers in one operation.
    Type: Grant
    Filed: June 28, 2007
    Date of Patent: March 18, 2014
    Assignee: Juniper Networks, Inc.
    Inventors: Karthikeyan Veerabadran, David J. Ofelt
  • Patent number: 8669990
    Abstract: A technique to share execution resources. In one embodiment, a CPU and a GPU share resources according to workload, power considerations, or available resources by scheduling or transferring instructions and information between the CPU and GPU.
    Type: Grant
    Filed: December 31, 2009
    Date of Patent: March 11, 2014
    Assignee: Intel Corporation
    Inventors: Eric Sprangle, Matthew Craighead, Chris Goodman, Belliappa Kuttanna
  • Patent number: 8661227
    Abstract: A processor includes an instruction fetch unit, an issue queue coupled to the instruction fetch unit, an execution unit coupled to the issue queue, and a multi-level register file including a first level register file having lower access latency and a second level register file having higher access latency. Each of the first and second level register files includes a plurality of physical registers for holding operands that is concurrently shared by a plurality of threads. The processor further includes a mapper that, at dispatch of an instruction specifying a source logical register from the instruction fetch unit to the issue queue, initiates a swap of a first operand associated with the source logical register that is in the second level register file with a second operand held in the first level register file. The issue queue, following the swap, issues the instruction to the execution unit for execution.
    Type: Grant
    Filed: September 17, 2010
    Date of Patent: February 25, 2014
    Assignee: International Business Machines Corporation
    Inventors: Christopher M. Abernathy, Mary D. Brown, Hung Q. Le, Dung Q. Nguyen
  • Patent number: 8612728
    Abstract: A pipelined computer processor is presented that reduces data hazards such that high processor utilization is attained. The processor restructures a set of instructions to operate concurrently on multiple pieces of data in multiple passes. One subset of instructions operates on one piece of data while different subsets of instructions operate concurrently on different pieces of data. A validity pipeline tracks the priming and draining of the pipeline processor to ensure that only valid data is written to registers or memory. Pass-dependent addressing is provided to correctly address registers and memory for different pieces of data.
    Type: Grant
    Filed: August 8, 2011
    Date of Patent: December 17, 2013
    Assignee: Micron Technology, Inc.
    Inventors: Neal Andrew Crook, Alan T. Wootton, James Peterson
  • Patent number: 8601244
    Abstract: An apparatus and method for generating a very long instruction word (VLIW) command that supports predicated execution, and a VLIW processor and method for processing a VLIW are provided herein. The VLIW command includes an instruction bundle formed of a plurality of instructions to be executed in parallel and a single value indicating predicated execution, and is generated using the apparatus and method for generating a VLIW command. The VLIW processor decodes the instruction bundle and executes the instructions, which are included in the decoded instruction bundle, in parallel, according to the value indicating predicated execution.
    Type: Grant
    Filed: February 16, 2010
    Date of Patent: December 3, 2013
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Bernhard Egger, Soo-jung Ryu, Dong-hoon Yoo, Il-hyun Park
  • Patent number: 8601177
    Abstract: A method may include distributing ranges of addresses in a memory among a first set of functions in a first pipeline. The first set of the functions in the first pipeline may operate on data using the ranges of addresses. Different ranges of addresses in the memory may be redistributed among a second set of functions in a second pipeline without waiting for the first set of functions to be flushed of data.
    Type: Grant
    Filed: June 27, 2012
    Date of Patent: December 3, 2013
    Assignee: Intel Corporation
    Inventor: Thomas A. Piazza
  • Patent number: 8595468
    Abstract: A multi-core processor system supporting simultaneous thread sharing across execution resources of multiple processor cores is provided. The multi-core processor system includes a first processor core with a first instruction queue and dispatch logic in communication with a first execution resource of the first processor core. The multi-core processor system also includes a second processor core with a second instruction queue and dispatch logic in communication with a second execution resource of the second processor core. A high-speed execution resource bus couples the first and second processor cores. The first instruction queue and dispatch logic is configured to issue a first instruction of a thread to the first execution resource and issue a second instruction of the thread over the high-speed execution resource bus to the second execution resource for simultaneous execution of the first and second instruction of the thread on the first and second processor cores.
    Type: Grant
    Filed: December 17, 2009
    Date of Patent: November 26, 2013
    Assignee: International Business Machines Corporation
    Inventors: Shawn M. Luke, John Sargis, Jr., Daneyand J. Singley
  • Patent number: 8589634
    Abstract: Enhancements to hardware architectures (e.g., a RISC processor or a DSP processor) to accelerate spectral band replication (SBR) processing are described. In some embodiments, instruction extensions configure a reconfigurable processor to accelerate SBR and other audio processing. In addition to the instruction extensions, execution units (e.g., multiplication and accumulation units (MACs)) may operate in parallel to reduce the number of audio processing cycles. Performance may be further enhanced through the use of source and destination units which are configured to work with the execution units and quickly fetch and store source and destination operands.
    Type: Grant
    Filed: May 25, 2012
    Date of Patent: November 19, 2013
    Assignee: SiPort, Inc.
    Inventors: Sridhar G. Sharma, Binuraj Ravindran, Jeffrey V. Hill
  • Publication number: 20130262831
    Abstract: Systems and methods for throttling GPU execution performance to avoid surges in DI/DT. A processor includes one or more execution units coupled to a scheduling unit configured to select instructions for execution by the one or more execution units. The execution units may be connected to one or more decoupling capacitors that store power for the circuits of the execution units. The scheduling unit is configured to throttle the instruction issue rate of the execution units based on a moving average issue rate over a large number of scheduling periods. The number of instructions issued during the current scheduling period is less than or equal to a throttling rate maintained by the scheduling unit that is greater than or equal to a minimum throttling issue rate. The throttling rate is set equal to the moving average plus an offset value at the end of each scheduling period.
    Type: Application
    Filed: April 2, 2012
    Publication date: October 3, 2013
    Inventors: Peter Michael NELSON, Jack Hilaire Choquette, Olivier Giroux
  • Patent number: 8533399
    Abstract: In a cache memory, energy and other efficiencies can be realized by saving a result of a cache directory lookup for sequential accesses to a same memory address. Where the cache is a point of coherence for speculative execution in a multiprocessor system, with directory lookups serving as the point of conflict detection, such saving becomes particularly advantageous.
    Type: Grant
    Filed: January 4, 2011
    Date of Patent: September 10, 2013
    Assignee: International Business Machines Corporation
    Inventor: Martin Ohmacht
  • Patent number: 8533721
    Abstract: A method and system to schedule out of order operations without the requirement to execute compare, ready and pick logic in a single cycle. A lazy out-of-order scheduler splits each scheduling loop into two consecutive cycles. The scheduling loop includes a compare stage, a ready stage and a pick stage. The compare stage and the ready stage are executed in a first of the two consecutive cycles and the pick stage is executed in a second of the two consecutive cycles. By splitting each scheduling loop into two consecutive cycles, selecting the oldest operation by default and checking the readiness of the oldest operation, it relieves the system of timing requirements and avoids the need for power hungry logic. Every execution of an operation does not appear as one extra cycle longer and the lazy out-of-order scheduler retains most of the performance of a full out-of-order scheduler.
    Type: Grant
    Filed: March 26, 2010
    Date of Patent: September 10, 2013
    Assignee: Intel Corporation
    Inventors: Stephen J. Robinson, Deepak Limaye
  • Patent number: 8527740
    Abstract: A system and method for enhancing barrier collective synchronization on a computer system comprises a computer system including a data storage device. The computer system includes a program stored in the data storage device and steps of the program being executed by a processor. The system includes providing a plurality of communicators for storing state information for a bather algorithm. Each communicator designates a master core in a multi-processor environment of the computer system. The system allocates or designates one counter for each of a plurality of threads. The system configures a table with a number of entries equal to the maximum number of threads. The system sets a table entry with an ID associated with a communicator when a process thread initiates a collective. The system determines an allocated or designated counter by searching entries in the table.
    Type: Grant
    Filed: January 29, 2010
    Date of Patent: September 3, 2013
    Assignee: International Business Machines Corporation
    Inventors: Sameer Kumar, Amith R. Mamidala, Joseph D. Ratterman, Michael Blocksome, Douglas Miller
  • Patent number: 8521991
    Abstract: A technique for selecting instructions for execution from an issue queue at multiple function units while reducing the chances of instruction collisions. In an embodiment, each function unit in a processor may include a selection logic circuit that selects a specific instruction from the issue queue for execution. In order to avoid instruction collision, a function unit may have a selection logic circuit that may select two instructions from an instruction queue: one according to a first selection technique and one according to a second selection technique. Then, by comparing the instruction selected by the first selection technique to the instruction selected by the selection logic circuit of another function unit, the instruction selected by the second technique may be used instead if there will be an instruction collision because the instruction selected by the first selection technique is the same as the instruction selected at a different function unit.
    Type: Grant
    Filed: December 4, 2009
    Date of Patent: August 27, 2013
    Assignee: STMicroelectronics (Beijing) R&D Co., Ltd.
    Inventors: Kai-feng Wang, Hong-Xia Sun, Peng-fei Zhu, Yong-qiang Wu
  • Patent number: 8516224
    Abstract: Instructions asserted in the instruction pipeline of the microprocessor are accompanied by control information, comprising a group of bits, asserted within a control information pipeline of the processor. The control information pipeline is synchronized to the instruction pipeline so that the control information for an instruction progresses in synchronism with the instruction. The control information may identify, directly or indirectly, the type of operation called for by the instruction and, if the operation is to be performed in parts, indicate the part to be performed. Means are included in the processor, such as a number of functional execution units, to interpret that control information and take appropriate action.
    Type: Grant
    Filed: January 31, 2012
    Date of Patent: August 20, 2013
    Inventors: Brett Coon, Godfrey D'Souza, Paul Serris
  • Publication number: 20130179664
    Abstract: Techniques are disclosed relating to integrated circuits that include hardware support for divide and/or square root operations. In one embodiment, an integrated circuit is disclosed that includes a division unit that, in turn, includes a normalization circuit and a plurality of divide engines. The normalization circuit is configured to normalize a set of operands. Each divide engine is configured to operate on a respective normalized set of operands received from the normalization circuit. In some embodiments, the integrated circuit includes a scheduler unit configured to select instructions for issuance to a plurality of execution units including the division unit. The scheduler unit is further configured to maintain a counter indicative of a number of instructions currently being operated on by the division unit, and to determine, based on the counter whether to schedule subsequent instructions for issuance to the division unit.
    Type: Application
    Filed: January 6, 2012
    Publication date: July 11, 2013
    Inventors: Christopher H. Olson, Jeffrey S. Brooks, Matthew B. Smittle
  • Patent number: 8484442
    Abstract: A computer processor comprises a decode unit and a processing channel. The decode unit decodes a stream of instruction packets from a memory, each instruction packet comprising a plurality of instructions. The processing channel comprises a plurality of functional units and operable to perform control processing operations. The decode unit is operable to receive and decode instruction packets of a bit length of 64 bits and to detect if the instruction packet defines three control instructions each having a length of 21 bits. The decode unit detects that the instruction packet comprises the three control instructions. The control instructions are supplied to the processing channel for execution in the order in which they appear in the instruction packet. The detection uses an identification bit in the instruction packet.
    Type: Grant
    Filed: December 9, 2010
    Date of Patent: July 9, 2013
    Assignee: Icera Inc.
    Inventor: Simon Knowles
  • Publication number: 20130173886
    Abstract: Systems and methods for tracking data hazards in a processor. The processor comprises a pipelined architecture configured to execute a first instruction and a second instruction, wherein the second instruction is older than the first instruction. At least one of the first and second instructions comprises at least one operand expressed as a range of registers. Hazard detection logic is configured to compare the first instruction and the second instruction to determine if there is a data hazard, prior to expanding the second instruction.
    Type: Application
    Filed: January 4, 2012
    Publication date: July 4, 2013
    Applicant: QUALCOMM INCORPORATED
    Inventors: Kenneth Alan Dockser, Yusuf Cagatay Tekmen
  • Publication number: 20130159677
    Abstract: Generating instructions, in particular for mailbox verification in a simulation environment. A sequence of instructions is received, as well as selection data representative of a plurality of commands including a special command. Repeatedly selecting one of the plurality of commands and outputting an instruction based on the selected command. The outputting of an instruction includes outputting a next instruction in the sequence of instructions if the selected command is the special command, and outputting an instruction associated with the command if the selected command is not the special command.
    Type: Application
    Filed: November 1, 2012
    Publication date: June 20, 2013
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventor: INTERNATIONAL BUSINESS MACHINES CORPORATION
  • Publication number: 20130151816
    Abstract: Methods, systems, and computer program products may provide delay-identification in data processing systems. An apparatus may include a delay-identification unit having a delay counter, a threshold register, a delay register, and a delay detector. The delay detector may be configured to start the delay counter in response to detecting that one group of instructions is delayed, and stop the delay counter in response to detecting that the one group of instructions is no longer delayed. The delay detector may additionally be configured to compare the number of cycles counted by the delay counter with a threshold number of cycles in the threshold register, and store at least one effective address of one of the instructions of the one group of instructions when the number of cycles counted by the delay counter is greater than the threshold number of cycles stored in the threshold register.
    Type: Application
    Filed: December 7, 2011
    Publication date: June 13, 2013
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Venkat R. Indukuru, Alexander E. Mericas
  • Publication number: 20130145128
    Abstract: A method and circuit arrangement utilize a programmable microcode unit that is capable of being programmed via software to modify the instruction sequences output by the microcode unit in response to microcode instructions issued to the microcode unit. Among other benefits, a programmable microcode unit consistent with the invention enables customization of a processor design to handle specific applications or tasks, as well as to support specific hardware configurations such as specific execution units. In addition, a programmable microcode unit may be updatable, e.g., to correct bugs or faults found in previous instruction sequences supported by the unit.
    Type: Application
    Filed: December 6, 2011
    Publication date: June 6, 2013
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Paul E. Schardt, Robert A. Shearer, Matthew R. Tubbs
  • Patent number: 8458716
    Abstract: Methods, systems, and computer program products for operating an enterprise resource planning system. The method includes running a placeholder job in said enterprise resource planning system in response to a request from at least one client application for notification of at least one background processing event, wherein the placeholder job is executed in response to the at least one background processing event.
    Type: Grant
    Filed: December 31, 2008
    Date of Patent: June 4, 2013
    Assignee: International Business Machines Corporation
    Inventors: Ralf Altrichter, Gerd Kehrer, Martin Raitza
  • Patent number: 8448154
    Abstract: A method, apparatus and software for processing software code for use in a multithreaded processing environment in which lock verification mechanisms are automatically inserted in the software code and arranged to determine whether a respective shared storage element is locked prior to the use of the respective shared storage element by a given processing thread in a multithreaded processing environment.
    Type: Grant
    Filed: February 4, 2008
    Date of Patent: May 21, 2013
    Assignee: International Business Machines Corporation
    Inventor: Colin Penfold
  • Patent number: 8412980
    Abstract: A circuit arrangement and method utilize existing redundant execution pipelines in a processing unit to execute multiple instances of stability critical instructions in parallel so that the results of the multiple instances of the instructions can be compared for the purpose of detecting errors. For other types of instructions for which fault tolerant or stability critical execution is not required or desired, the redundant execution pipelines are utilized in a more conventional manner, enabling multiple non-stability critical instructions to be concurrently issued to and executed by the redundant execution pipelines. As such, for non-stability critical program code, the performance benefits of having multiple redundant execution units are preserved, yet in the instances where fault tolerant or stability critical execution is desired for certain program code, the redundant execution units may be repurposed to provide greater assurances as to the fault-free execution of such instructions.
    Type: Grant
    Filed: June 4, 2010
    Date of Patent: April 2, 2013
    Assignee: International Business Machines Corporation
    Inventors: Mark J. Hickey, Adam J. Muff, Matthew R. Tubbs, Charles D. Wait
  • Patent number: 8402256
    Abstract: The present invention is directed to realize efficient issue of a superscalar instruction in an instruction set including an instruction with a prefix. A circuit is employed which retrieves an instruction of each instruction code type other than a prefix on the basis of a determination result of decoders for determining an instruction code type, adds the immediately preceding instruction to the retrieved instruction, and outputs the resultant to instruction executing means. When an instruction of a target instruction code type is detected in a plurality of instruction units to be searched, the circuit outputs the detected instruction code and the immediately preceding instruction other than the target instruction code type as prefix code candidates. When an instruction of a target instruction code type cannot be detected at the rear end of the instruction units to be searched, the circuit outputs the instruction at the rear end as a prefix code candidate.
    Type: Grant
    Filed: August 25, 2009
    Date of Patent: March 19, 2013
    Assignee: Renesas Electronics Corporation
    Inventor: Fumio Arakawa
  • Patent number: 8397238
    Abstract: Methods, apparatuses, and computer-readable storage media are disclosed for reducing power by reducing hardware-thread toggling in a multi-threaded processor. In a particular embodiment, a method allocates software threads to hardware threads. A number of software threads to be allocated is identified. It is determined when the number of software threads is less than a number of hardware threads. When the number of software threads is less than the number of hardware threads, at least two of the software threads are allocated to non-sequential hardware threads. A clock signal to be applied to the hardware threads is adjusted responsive to the non-sequential hardware threads allocated.
    Type: Grant
    Filed: December 8, 2009
    Date of Patent: March 12, 2013
    Assignee: QUALCOMM Incorporated
    Inventors: Suresh K. Venkumahanti, Martin Saint-Laurent, Lucian Codrescu, Baker S. Mohammad