Abstract: A method and apparatus are provided to perform efficient merging operations of two or more streams of data by using SIMD instruction. Streams of data are merged together in parallel and with mitigated or removed conditional branching. The merge operations of the streams of data include Merge AND and Merge OR operations.
Abstract: A method of performing a processing task in a data processing apparatus is provided that reduces memory usage of the processing task. According to this method a Virtual Machine performs the steps of accessing platform-neutral program code in a function repository, executing the processing task on the Virtual Machine, and analysing at a current execution point, on a function-by-function basis, which functions in the function repository are inactive functions. The Virtual Machine performs software-based unloading from the function repository of at least a portion of platform-neutral program code corresponding to one or more inactive functions. A corresponding virtual machine and data processing apparatus are also provided.
Abstract: A wake-and-go mechanism is provided for a data processing system. The wake-and-go mechanism detects a thread running on a first processing unit within a plurality of processing units that is waiting for an event that modifies a data value associated with a target address. The wake-and-go mechanism creates a wake-and-go instance for the thread by populating a wake-and-go storage array with the target address. The operating system places the thread in a sleep state. Responsive to detecting the event that modifies the data value associated with the target address, the wake-and-go mechanism assigns the wake-and-go instance to a second processing unit within the plurality of processing units. The operating system on the second processing unit places the thread in a non-sleep state.
Type:
Grant
Filed:
April 16, 2009
Date of Patent:
July 24, 2012
Assignee:
International Business Machines Corporation
Inventors:
Ravi K. Arimilli, Satya P. Sharma, Randal C. Swanberg
Abstract: A scoreboard memory for a processing unit has separate memory regions allocated to each of the multiple threads to be processed. For each thread, the scoreboard memory stores register identifiers of registers that have pending writes. When an instruction is added to an instruction buffer, the register identifiers of the registers specified in the instruction are compared with the register identifiers stored in the scoreboard memory for that instruction's thread, and a multi-bit value representing the comparison result is generated. The multi-bit value is stored with the instruction in the instruction buffer and may be updated as instructions belonging to the same thread complete their execution. Before the instruction is issued for execution, this multi-bit value is checked. If this multi-bit value indicates that none of the registers specified in the instruction have pending writes, the instruction is allowed to issue for execution.
Type:
Grant
Filed:
September 18, 2008
Date of Patent:
July 17, 2012
Assignee:
NVIDIA Corporation
Inventors:
Brett W. Coon, Peter C. Mills, Stuart F. Oberman, Ming Y. Siu
Abstract: A resource allocation method, a resource allocation program, and a resource allocation apparatus in which a request reception server subjects an inputted SQL to a syntax analysis. At least one SQL process is extracted from the input SQL, and a resource cost of a database required by a BES (Back End Server) to perform the SQL process for each of one or more process types contained in the SQL process is calculated. Further, an allocation ratio is determined for allocating the resource of a request executing server to a virtualized server in accordance with a resource cost ratio required by each of the BES to execute the SQL process. Additionally, requests are made for execution of the respective BES on the virtualized server to which the resource has been allocated so as to execute the SQL process.
Abstract: Systems and methods are provided for managing access to registers. A system may include a set of direct registers and a set of indirect registers. The indirect registers may be accessed through the direct registers, and the direct registers may provide various features to provide faster access to the indirect registers. One of the direct registers may indicate access modes for accessing the indirect registers. The access modes may include auto-increment, auto-decrement, auto-reset, and no change modes. Based on the access mode, the currently accessed address may be automatically modified after accessing the indirect register at the address.
Type:
Grant
Filed:
October 18, 2008
Date of Patent:
June 26, 2012
Assignee:
Micron Technology, Inc.
Inventors:
Harold B Noyes, Mark Jurenka, Gavin Huggins
Abstract: A technique for managing context state information enables a reduced number of save and restore operations. At least one embodiment includes a plurality of save area segments to store a plurality of machine context state information, which can be saved into the segments and restored to the machine state. One embodiment includes at least one in-use bit vector to indicate status of the plurality of machine context information stored in the segments, and another vector associated with the machine state.
Type:
Grant
Filed:
September 19, 2005
Date of Patent:
May 1, 2012
Assignee:
Intel Corporation
Inventors:
Chris J. Newburn, Dion Rodgers, Bryant E. Bigbee, Shivnandan D. Kaushik, Gautham N. Chinya, Xiang Zou, Hong Wang
Abstract: Predictive decoding is achieved by fetching an instruction, accessing a predictor containing predictor information including prior instruction execution characteristics, obtaining predictor information for the fetched instruction from the predictor; and generating a selected one of a plurality of decode operation streams corresponding to the fetched instruction. The decode operation stream is selected based on the predictor information.
Type:
Grant
Filed:
May 3, 2007
Date of Patent:
April 24, 2012
Assignee:
International Business Machines Corporation
Inventors:
Bartholomew Blaner, Michael K. Gschwind
Abstract: A dynamic predictive and/or exact caching mechanism is provided in various stages of a microprocessor pipeline so that various control signals can be stored and memorized in the course of program execution. Exact control signal vector caching may be done. Whenever an issue group is formed following instruction decode, register renaming, and dependency checking, an encoded copy of the issue group information can be cached under the tag of the leading instruction. The resulting dependency cache or control vector cache can be accessed right at the beginning of the instruction issue logic stage of the microprocessor pipeline the next time the corresponding group of instructions come up for re-execution. Since the encoded issue group bit pattern may be accessed in a single cycle out of the cache, the resulting microprocessor pipeline with this embodiment can be seen as two parallel pipes, where the shorter pipe is followed if there is a dependency cache or control vector cache hit.
Type:
Grant
Filed:
January 12, 2005
Date of Patent:
April 3, 2012
Assignee:
International Business Machines Corporation
Inventors:
Erik Richter Altman, Michael Karl Gschwind, Jude A. Rivers, Sumedh W. Sathaye, John-David Wellman, Victor V. Zyuban
Abstract: A processor includes primary threads of execution that may simultaneously issue instructions, and one or more backup threads. When a primary thread stalls, the contents of its instruction buffer may be switched with the instruction buffer for a backup thread, thereby allowing the backup thread to begin execution. This design allows two primary threads to issue simultaneously, which allows for overlap of instruction pipeline latencies. This design further allows a fast switch to a backup thread when a primary thread stalls, thereby providing significantly improved throughput in executing instructions by the processor.
Type:
Grant
Filed:
November 20, 2003
Date of Patent:
March 20, 2012
Assignee:
International Business Machines Corporation
Inventors:
Richard James Eickemeyer, David Arnold Luick
Abstract: A method for branch prediction, the method comprising, receiving a load instruction including a first data location in a first memory area, retrieving data including a branch address and a target address from the first data location, and saving the data in a branch prediction memory, or receiving an unload instruction including the first data location in the first memory area, retrieving data including a branch address and a target address from the branch prediction memory, and saving the data in the first data location.
Type:
Grant
Filed:
June 13, 2008
Date of Patent:
March 6, 2012
Assignee:
International Business Machines Corporation
Inventors:
Philip G. Emma, Allan M. Hartstein, Keith N. Langston, Brian R. Prasky, Thomas R. Puzak, Charles F. Webb
Abstract: A method and system of presenting an interrupt request to processors executing in lock step. At least some of the illustrative embodiments are computer systems comprising a first processor configured to execute a program, a second processor configured to execute a duplicate copy of the program in lock step with the first processor, and a logic device coupled to the processors. The logic device is configured to present an interrupt request to the processors when the processors are at substantially the same computational point in the program.
Type:
Grant
Filed:
February 3, 2006
Date of Patent:
January 24, 2012
Assignee:
Hewlett-Packard Development Company, L.P.
Inventors:
James S. Klecka, William F. Bruckert, Mihai Damian, Peter A. Reynolds, Dale E. Southgate
Abstract: The reconfigurable circuit of the present invention in which time division multiple processing is possible has a pipeline structure with the number of stages of an integral multiple of a given number, and comprises a plurality of processor elements having a processing unit whose configuration is variable according to first configuration data to be supplied, a network in which all inputs and outputs of a plurality of said processor elements are connected and which transfers data by one clock between the input and output according to second configuration data to be supplied, and a switching unit which cyclically switches by one clock and supplies the first and second configuration data prepared for the given number of tasks to each of the processing units.
Abstract: The data processing apparatus has processing logic for performing data processing operations and a register bank for storing data associated with the processing logic. The register bank has at least one register group, each register group having a plurality of register sets. The processing logic has an operating state associated with each register group defining how that register group is used, a first operating state being a state in which each register set in the register group is used to support an independent execution thread of the processing logic, and a second operating state being a state in which the register sets of the register group are collectively used to support a single execution thread of the processing logic. Control logic is provided to control how the register sets of each register group are used dependent on the operating state associated with that register group.
Type:
Grant
Filed:
May 11, 2005
Date of Patent:
October 18, 2011
Assignee:
ARM Limited
Inventors:
David Hennah Mansell, Stuart David Biles, David Michael Gilday, Daniel Kershaw
Abstract: A coprocessor interface unit for interfacing a coprocessor to an out-of-order execution pipeline, and applications thereof. In an embodiment, the coprocessor interface unit includes an in-order instruction queue, a coprocessor load data queue, and a coprocessor store data queue. Instructions are written into the in-order instruction queue by an instruction dispatch unit. Instructions exit the in-order instruction queue and enter the coprocessor. In the coprocessor, the instructions operate on data read from the coprocessor load data queue. Data is written back, for example, to memory or a register file by inserting the data into the out-of-order execution pipeline, either directly or via the coprocessor store data queue, which writes back the data.