Abstract: A graphics processing unit core 26 includes a plurality of processing pipelines 38, 40, 42, 44. A program instruction of a thread of program instructions being executed by a processing pipeline includes a next-instruction-type field 36 indicating an instruction type of a next program instruction following the current program instruction within the processing thread concerned. This next-instruction-type field is used to control selection of to which processing pipeline the next instruction is issued before that next instruction has been fetched and decoded. The next-instruction-type field may be passed along the processing pipeline as the least significant four bits within a program counter value associated with a current program instruction 32. The next-instruction-type field may also be used to control the forwarding of thread state variables between processing pipelines when a thread migrates between processing pipelines prior to the next program instruction being fetched or decoded.
Abstract: A method, system, and computer program product are provided for enhancing the execution of independent loads in a processing unit. A processing unit detects if a long-latency miss associated with a load instruction has been encountered. Responsive to a long-latency miss, the processing unit enters a load lookahead mode. Responsive to entering the load lookahead mode, the processing unit dispatches each instruction from a first set of instructions from a first buffer with an associated vector. The processing unit determines if the first set of instructions from the first buffer have completed execution. Responsive to completed execution of the first set of instructions from the first buffer, the processing unit copies the set of vectors from a first vector array to a second vector array. Then the processing unit dispatches a second set of instructions from a second buffer with an associated vector from the second vector array.
Type:
Grant
Filed:
June 15, 2007
Date of Patent:
March 27, 2012
Assignee:
International Business Machines Corporation
Abstract: A wake-and-go mechanism is provided for a data processing system. The wake-and-go mechanism is configured to issue a look-ahead load command on a system bus to read a data value from a target address and perform a comparison operation to determine whether the data value at the target address indicates that an event for which a thread is waiting has occurred. In response to the comparison resulting in a determination that the event has not occurred, the wake-and-go engine populates a wake-and-go storage array with the target address and snooping the target address on the system bus without data exclusivity. In response to the comparison resulting in a determination that the event has occurred, the wake-and-go engine issues a load command on the system bus to read the data value from the target address with data exclusivity.
Type:
Grant
Filed:
February 1, 2008
Date of Patent:
March 27, 2012
Assignee:
International Business Machines Corporation
Inventors:
Ravi K. Arimilli, Satya P. Sharma, Randal C. Swanberg
Abstract: A processor includes an initiating hardware thread, which initiates a first assist hardware thread to execute a first code segment. Next, the initiating hardware thread sets an assist thread executing indicator in response to initiating the first assist hardware thread. The set assist thread executing indicator indicates whether assist hardware threads are executing. A second assist hardware thread initiates and begins executing a second code segment. In turn, the initiating hardware thread detects a change in the assist thread executing indicator, which signifies that both the first assist hardware thread and the second assist hardware thread terminated. As such, the initiating hardware thread evaluates assist hardware thread results in response to both of the assist hardware threads terminating.
Type:
Application
Filed:
September 20, 2010
Publication date:
March 22, 2012
Applicant:
International Business Machines Corporation
Inventors:
Richard Louis Arndt, Giles Roger Frazier, Ronald P. Hall
Abstract: A first hardware thread executes a software program instruction, which instructs the first hardware thread to initiate a second hardware thread. As such, the first hardware thread identifies one or more register values accessible by the first hardware thread. Next, the first hardware thread copies the identified register values to one or more registers accessible by the second hardware thread. In turn, the second hardware thread accesses the copied register values included in the accessible registers and executes software code accordingly.
Type:
Application
Filed:
September 20, 2010
Publication date:
March 22, 2012
Applicant:
International Business Machines Corporation
Abstract: Disclosed are methods and devices, among which is a method for configuring an electronic device. In one embodiment, an electronic device may include one or more memory locations having stored values representative of the capabilities of the device. According to an example configuration method, a configuring system may access the device capabilities from the one or more memory locations and configure the device based on the accessed device capabilities.
Abstract: A processor includes primary threads of execution that may simultaneously issue instructions, and one or more backup threads. When a primary thread stalls, the contents of its instruction buffer may be switched with the instruction buffer for a backup thread, thereby allowing the backup thread to begin execution. This design allows two primary threads to issue simultaneously, which allows for overlap of instruction pipeline latencies. This design further allows a fast switch to a backup thread when a primary thread stalls, thereby providing significantly improved throughput in executing instructions by the processor.
Type:
Grant
Filed:
November 20, 2003
Date of Patent:
March 20, 2012
Assignee:
International Business Machines Corporation
Inventors:
Richard James Eickemeyer, David Arnold Luick
Abstract: A computing device includes: an instruction cache storing primary execution unit instructions and auxiliary execution unit instructions in a sequential order; a primary execution unit configured to receive and execute the primary execution unit instructions from the instruction cache; an auxiliary execution unit configured to receive and execute only the auxiliary execution unit instructions from the instruction cache in a manner independent from and asynchronous to the primary execution unit; and completion circuitry configured to coordinate completion of the primary execution unit instructions by the primary execution unit and the auxiliary execution unit instructions according to the sequential order.
Type:
Application
Filed:
September 15, 2010
Publication date:
March 15, 2012
Applicant:
INTERNATIONAL BUSINESS MACHINES CORPORATION
Inventors:
Bechara F. Boury, Michael Bryan Mitchell, Paul Michael Steinmetz, Kenichi Tsuchiya
Abstract: A method, apparatus, system, and computer readable medium for communicating between an apparatus hosting a workflow application and a device, by generating a template including a placeholder. The instruction template is then sent to the device and a device output is received from the device, the device output including device data generated by the device and inserted into the placeholder of the template.
Abstract: A mechanism is provided for group communications using a MULTI-PIPE synthetic file system. A master application creates a multi-pipe synthetic file in the MULTI-PIPE synthetic file system, the master application indicating a multi-pipe operation to be performed. The master application then writes a header-control block of the multi-pipe synthetic file specifying at least one of a multi-pipe synthetic file system name, a message type, a message size, a specific destination, or a specification of the multi-pipe operation. Any other application participating in the group communications then opens the same multi-pipe synthetic file. A MULTI-PIPE file system module then implements the multi-pipe operation as identified by the master application. The master application and the other applications then either read or write operation messages to the multi-pipe synthetic file and the MULTI-PIPE synthetic file system module performs appropriate actions.
Type:
Application
Filed:
September 2, 2010
Publication date:
March 8, 2012
Applicant:
International Business Machines Corporation
Abstract: There is provided a processor comprising a plurality of registers, an acquisition unit, a calculation unit, a pipeline register, and a storage unit, wherein in a case in which a register indicated by source register information included in a second instruction and a register indicated by destination register information included in a first instruction match, and the second instruction or an instruction that precedes to the second instruction designates the second instruction as the last instruction that uses the calculated value obtained in accordance with the first instruction, the storage unit does not store the calculated value stored in the pipeline register in a register indicated by destination register information included in the first instruction, and stores, in other cases, the calculated value stored in the pipeline register in the register indicated by the destination register information included in the first instruction.
Abstract: A wake-and-go mechanism is provided for a data processing system. The wake-and-go mechanism is configured to issue a look-ahead load command on a system bus to read a data value from a target address and perform a comparison operation to determine whether the data value at the target address indicates that an event for which a thread is waiting has occurred. In response to the comparison resulting in a determination that the event has not occurred, the wake-and-go engine populates the wake-and-go storage array with the target address and snoops the target address on the system bus.
Type:
Grant
Filed:
February 1, 2008
Date of Patent:
February 28, 2012
Assignee:
International Business Machines Corporation
Inventors:
Ravi K. Arimilli, Satya P. Sharma, Randal C. Swanberg
Abstract: An execution environment is created or extended to include support for coroutines to facilitate reactive programming. Utilizing functionality provided by an execution environment, such as a virtual machine, additional operations are derived to enable creation, invocation, and suspension of coroutines.
Type:
Application
Filed:
August 18, 2010
Publication date:
February 23, 2012
Applicant:
MICROSOFT CORPORATION
Inventors:
Henricus Johannes Maria Meijer, Gavin Bierman
Abstract: A system and method of parallelizing programs employs runtime instructions to identify data accessed by program portions and to assign those program portions to particular processors based on potential overlap between the access data. Data dependence between different program portions may be identified and used to look for pending “predicate” program portions that could create data dependencies and to postpone program portions that may be dependent while permitting parallel execution of other program portions.
Type:
Application
Filed:
August 18, 2010
Publication date:
February 23, 2012
Inventors:
Gagan Gupta, Gurindar S. Schi, Srinath Sridharan
Abstract: The Thread Data Base 1 holds a thread identifier to uniquely identify a thread in the system. The Check means 3 lets, when no thread being a target exist in the same processor, a trap (TRAP) 10 occur. The Issue means 2, when a thread being a target exists in the same processor, at a time of issuing a subsequent instruction, successively inputs a thread 9 to be executed next, as a thread serving as a target, into a pipeline. The Gate (G) means 11 uses data on the execution of a thread as an input for computation of a thread serving as a succeeding target. The Switch means 13 transfers data in a context of a thread to a context of a target thread without inputting the target thread as a non-executable thread into a pipeline while the thread is being executed.
Abstract: A data processor with a plurality of processor cores. Accumulated usage information of each of the plurality of processor cores is stored in a storage device within the data processor, wherein the accumulated usage information is indicative of accumulated usage of each processor core of the plurality of processor cores. The processor uses the accumulated usage information in selecting processor cores to perform processor operations.
Type:
Application
Filed:
October 13, 2011
Publication date:
February 9, 2012
Applicant:
FREESCALE SEMICONDUCTOR, INC.
Inventors:
William C. Moyer, David R. Bearden, Ravindraraj Ramaraju
Abstract: Embodiments of the invention relate to a data processing apparatus including a processor adapted to operate under control of an executable comprising instructions, and in any of a plurality of operating modes including a non-privileged mode and a privileged mode, the apparatus comprising: means for storing a plurality of stacks; a first stack pointer register for storing a pointer to an address in a first of said stacks; a second stack pointer register for storing a pointer to an address in a second of said stacks, wherein said processing apparatus is adapted to use said second stack pointer when said processor is operating in either the non-privileged mode or the privileged mode; and means for transferring operation of said processor from the non-privileged mode to the privileged mode in response to at least one of said instructions. Embodiments of the invention also relate to a method of operating a data processing apparatus.
Type:
Application
Filed:
May 27, 2009
Publication date:
February 9, 2012
Applicant:
Cambridge Consultants Ltd.
Inventors:
Alistair G. Morfey, Karl Leighton Swepson, Peter Giles Lloyd
Abstract: A method of data processing includes a processor of a data processing system executing a controlling thread of a program and detecting occurrence of a particular asynchronous event during execution of the controlling thread of the program. In response to occurrence of the particular asynchronous event during execution of the controlling thread of the program, the processor initiates execution of an assist thread of the program such that the processor simultaneously executes the assist thread and controlling thread of the program.
Type:
Application
Filed:
August 4, 2010
Publication date:
February 9, 2012
Applicant:
INTERNATIONAL BUSINESS MACHINES CORPORATION
Abstract: A method and system for parallel computation of a linear sequential circuit (LSC) based on a state transition matrix is disclosed herein. A multistep state transition matrix and a multistep output generation matrix can be pre-computed and stored in association with the linear sequential circuit. The multiple state transitions and the multiple output bits can be computed by multiplying the current input-state vector with a multistep next state transition matrix and a multistep output generation matrix, respectively. Multiple state transitions and multiple output bits can be generated in parallel in a single clock cycle based on the pre-computed state transition matrix and the output generation matrix utilizing a dot product in order to improve computational speed. Such a simple augmentation provides a flexible and inexpensive solution for high speedup linear sequential circuit computation with respect to a processor.
Abstract: The present disclosure relates to methods and systems for data tag control for quantum dot cellular automata (QCA). An example method includes receiving data, associating a data tag with the data, communicating the data tag along a first wire-like element to a local tag decoder, reading instructions from the data tag using the local tag decoder, communicating the instructions to a processing element, communicating the data along a second wire-like element to the processing element, and processing the data with the processing element according to the instructions. A length of the first wire-like elements and a length of the second wire-like element are approximately the same such that communication of the instructions and the data to the processing element are synchronized.
Abstract: An Parallel and Long Adaptive Instruction Set Architecture (PALADIN) is provided to optimize packet processing. The Instruction Set Architecture (ISA) includes instructions such as aggregate comparison, comparison OR, comparison AND and bitwise instructions. The ISA also includes dedicated packet processing instructions such as hash, predicate, select, checksum and time to live adjust, move header left, post, move header left/right and load/store header/status.
Type:
Application
Filed:
August 13, 2010
Publication date:
February 2, 2012
Applicant:
Broadcom Corporation
Inventors:
Fong PONG, Kwong-Tak CHUI, Chun NING, Patrick LAU
Abstract: A processor receives an instruction operation (OP) code from a verification system. The instruction OP code includes instruction bits and forced event bits. The processor identifies a forced event based upon the forced event bits, which is unrelated to an instruction that corresponds to the instruction bits. In turn, the processor executes the forced event.
Type:
Application
Filed:
July 26, 2010
Publication date:
January 26, 2012
Applicant:
International Business Machines Corporation
Inventors:
Christopher Lee Colletti, Bryan Glen Hickerson, Michael Joseph Schiffli
Abstract: There is provided with a microprocessor control apparatus for controlling an operating speed of a microprocessor which executes a program including instruction codes, including: a state observing unit observing an execution state of the program at predetermined timings before execution of a deadline instruction code; prediction data of a remaining calculation amount required before execution of the deadline instruction code completes for each of predefined execution states; a predicted calculation amount acquiring unit acquiring a remaining calculation amount corresponding to an observed execution state as a remaining predicted calculation amount; a remaining time calculating unit calculating a remaining time until the deadline of the deadline instruction code; an operating speed calculating unit calculating a minimum operation speed of the microprocessor that is required to process the remaining predicted calculation amount within the remaining time; and a controlling unit controlling the microprocessor t
Abstract: A method, system and program product for executing a multi-function instruction in a computer system by specifying, via the multi-function instruction, either a capability query or execution of a selected function of one or more optional functions, wherein the selected function is an installed optional function, wherein the capability query determines which optional functions of the one or more optional functions are installed on the computer system.
Type:
Grant
Filed:
March 28, 2007
Date of Patent:
January 24, 2012
Assignee:
International Business Machines Corporation
Inventors:
Shawn D. Lundvall, Ronald M. Smith, Sr., Phil Chi-Chung Yeh
Abstract: Parallelism in a parallel processing subsystem is exploited in a scalable manner. A problem to be solved can be hierarchically decomposed into at least two levels of sub-problems. Individual threads of program execution are defined to solve the lowest-level sub-problems. The threads are grouped into one or more thread arrays, each of which solves a higher-level sub-problem. The thread arrays are executable by processing cores, each of which can execute at least one thread array at a time. Thread arrays can be grouped into grids of independent thread arrays, which solve still higher-level sub-problems or an entire problem. Thread arrays within a grid, or entire grids, can be distributed across all of the available processing cores as available in a particular system implementation.
Abstract: A new signal processor technique and apparatus combining microprocessor technology with switch fabric telecommunication technology to achieve a programmable processor architecture wherein the processor and the connections among its functional blocks are configured by software for each specific application by communication through a switch fabric in a dynamic, parallel and flexible fashion to achieve a reconfigurable pipeline, wherein the length of the pipeline stages and the order of the stages varies from time to time and from application to application, admirably handling the explosion of varieties of diverse signal processing needs in single devices such as handsets, set-top boxes and the like with unprecedented performance, cost and power savings, and with full application flexibility.
Abstract: A computing device-implemented method includes receiving a program created by a technical computing environment, analyzing the program, generating multiple program portions based on the analysis of the program, dynamically allocating the multiple program portions to multiple software units of execution for parallel programming, receiving multiple results associated with the multiple program portions from the multiple software units of execution, and providing the multiple results or a single result to the program.
Type:
Application
Filed:
August 22, 2011
Publication date:
January 12, 2012
Applicant:
THE MATHWORKS, INC.
Inventors:
John N. LITTLE, Joseph F. Hicklin, Jocelyn Luke Martin, Nausheen B. Moulana, Halldor N. Stefansson, Loren Dean, Roy E. Lurie, Stephen C. Johnson, Penelope L. Anderson, Michael E. Karr, Jason A. Kinchen
Abstract: An electronic circuit (200) includes one or more programmable control-plane engines (410, 460) operable to process packet header information and form at least one command, one or more programmable data-plane engines (310, 320, 370) selectively operable for at least one of a plurality of cryptographic processes selectable in response to the at least one command, and a programmable host processor (100) coupled to such a data-plane engine (310) and such a control-plane engine (410). Other processors, circuits, devices and systems and processes for their operation and manufacture are disclosed.
Type:
Application
Filed:
June 21, 2011
Publication date:
January 12, 2012
Applicant:
TEXAS INSTRUMENTS INCORPORATED
Inventors:
Amritpal Singh Mundra, Denis Roland Beaudoin
Abstract: A multi-issue processor includes a register file and a plurality of issue slots, each one of the plurality of issue slots having a plurality of functional units and a plurality of holdable registers. The plurality of issue slots include a first set of issue slots and a second set of issue slots, and the register file is accessible by the plurality of issue slots. A location of at least a part of the plurality of holdable registers in the first set of issue slots is different from a location of at least a corresponding part of the plurality of holdable registers in the second set of issue slots.
Abstract: Access to a memory area by a first processor that executes a first processor program and a second processor that executes a second processor program is granted to one of the first processor and the second processor at a time. Access to the memory area by the first processor and the second processor are cyclically uniquely allocated (e.g., t?[(ad mod m)=o]) between the first and the second processor by the first and second processor programs.
Abstract: A next program counter (PC) value generator. The next PC value generator includes a discontinuity decoder that is provide to detect a discontinuity instruction among a plurality of instructions and a tight loop decoder that is provide to: a) detect a tight loop instruction, and b) provide a tight loop instruction target address. The next PC value generator further includes a next PC value logic having a plurality of inputs: a first input coupled to an output of the discontinuity decoder, and a second input coupled to an output of the tight loop decoder. The next PC value logic provides as an output, without a stall, a control signal that a next PC value is to be loaded with the tight loop instruction target address if: the discontinuity decoder detects a discontinuity instruction, and the tight loop decoder detects a tight loop instruction.
Type:
Grant
Filed:
September 4, 2008
Date of Patent:
January 10, 2012
Assignee:
Verisilicon Holdings Co., Ltd.
Inventors:
Vijayanand Angarai, Michelle Y. Che, Asheesh Kashyap, Tracy Nguyen
Abstract: The present invention relates to a method for the unification of PER branch and PER store operations within the same dataflow. The method comprises determining a PER range, the PER range comprising a storage area defined by a designated storage starting area and a designated storage ending area, wherein the storage starting area is designated by a value of the contents of a first control register and the storage ending area is designated by a value of the contents of a second control register. The method also comprises retrieving register field content values that are stored at a plurality of registers, wherein the retrieved content values comprises a length field content value, and setting the length field content value to zero for a PER branch instruction, thereby enabling a PER branch instruction to performed similarly to a PER storage instruction.
Type:
Grant
Filed:
February 12, 2008
Date of Patent:
January 3, 2012
Assignee:
International Business Machines Corporation
Abstract: The present invention provides a method for designing three-dimensional scaffold structures that are anatomically accurate and possess the necessary internal porous micro-architecture design, wherein the porous micro-architecture is necessary for the proliferation and colonization of cultured cells that lead to tissue formation. The design method of the present invention utilizes the patient data derived from medical imaging modalities (e.g., CT or MRI) in combination with computer data manipulation techniques. The present invention further provides that the resultant scaffold design can be easily manufactured by Rapid Prototyping fabrication techniques.
Abstract: A system-on-chip including a processor, a control module, a first plurality of data registers, a second plurality of data registers, a plurality of address registers, and a first control module. The first plurality of data registers are configured to store data. The processor is configured to respectively write addresses corresponding to selected ones of the first plurality of data registers in the plurality of address registers. The second plurality of data registers are configured to receive data from the selected ones of the first plurality of data registers. In response to a request from the processor for a first address, the first control module is configured to provide data to the processor from the second plurality of data registers in response to the first address matching an address stored in the plurality of address registers, and otherwise provide data to the processor from the first plurality of data registers.
Abstract: Systems, methods, and computer program products are disclosed for intermixing different types of machine instructions. One embodiment of the invention provides a protocol for intermixing the different types of machine instructions. By adhering to the protocol, different types of machine instructions may be intermixed to concurrently update data structures without leading to unpredictable results.
Type:
Application
Filed:
June 28, 2010
Publication date:
December 29, 2011
Applicant:
International Business Machines Corporation
Abstract: A pipelined processing device includes: a device controller configured to receive a request to perform an operation; a plurality of subcontrollers configured to receive at least one instruction associated with the operation, each of the plurality of subcontrollers including a counter configured to generate an active time value indicating at least a portion of a time taken to process the at least one instruction; a pipeline processor configured to receive and process the at least one instruction, the pipeline processor configured to receive the active time value; and a shared pipeline storage area configured to store the active time value for each of the plurality of subcontrollers.
Type:
Application
Filed:
June 24, 2010
Publication date:
December 29, 2011
Applicant:
INTERNATIONAL BUSINESS MACHINES CORPORATION
Inventors:
Ekaterina M. Ambroladze, Deanna Postles Dunn Berger, Michael Fee, Christine C. Jones, Arthur J. O'Neill, JR., Diana L. Orf, Robert J. Sonnelitter, III
Abstract: An instruction is provided to establish various operational parameters for an adapter. These parameters include adapter interruption parameters, input/output address translation parameters, resetting error indications, setting measurement parameters, and setting an interception control, as examples. The instruction specifies a function information block, which is a program representation of a device table entry used by the adapter, to be used in certain situations in establishing the parameters. A store instruction is also provided that stores the current contents of the function information block.
Type:
Application
Filed:
June 23, 2010
Publication date:
December 29, 2011
Applicant:
INTERNATIONAL BUSINESS MACHINES CORPORATION
Inventors:
David Craddock, Mark S. Farrell, Beth A. Glendening, Thomas A. Gregg, Dan F. Greiner, Gustav E. Sittmann, III, Peter K. Szwed
Abstract: Serializing instructions in a multiprocessor system includes receiving a plurality of processor requests at a central point in the multiprocessor system. Each of the plurality of processor requests includes a needs register having a requestor needs switch and a resource needs switch. The method also includes establishing a tail switch indicating the presence of the plurality of processor requests at the central point, establishing a sequential order of the plurality of processor requests, and processing the plurality of processor requests at the central point in the sequential order.
Type:
Application
Filed:
June 23, 2010
Publication date:
December 29, 2011
Applicant:
INTERNATIONAL BUSINESS MACHINES CORPORATION
Inventors:
Garrett M. Drapala, Michael A. Blake, Timothy C. Bronson, Lawrence D. Curley
Abstract: Selected installed function of a multi-function instruction is hidden such that even though a processor is capable of performing the hidden installed function, the availability of the hidden function is hidden such that responsive to the multi-function instruction querying the availability of functions, only functions not hidden are reported as installed.
Type:
Application
Filed:
June 24, 2010
Publication date:
December 29, 2011
Applicant:
INTERNATIONAL BUSINESS MACHINES CORPORATION
Inventors:
Dan F. Greiner, Damian Leo Osisek, Timothy J. Slegel
Abstract: An apparatus for controlling access to a pipeline includes a plurality of command queues including a first subset of the plurality of command queues being assigned processes the commands of first command type, a second subset of the plurality of command queues being assigned to process commands of the second command type, and a third subset of the plurality of the command queues not being assigned to either the first subset or the second subset. The apparatus also includes an input controller configured to receive requests having the first command type and the second command type and assign requests having the first command type to command queues in the first subset until all command queues in the first subset are filled and then assign requests having the first command type to command queues in the third subset.
Type:
Application
Filed:
June 23, 2010
Publication date:
December 29, 2011
Applicant:
INTERNATIONAL BUSINESS MACHINES
Inventors:
Deanna Postles Dunn Berger, Garrett M. Drapala, Michael F. Fee, Robert J. Sonnelitter, III
Abstract: There is provided a method of, and apparatus for, processing a computation on a computing device comprising at least one processor and a memory, the method comprising: storing, in said memory, plural copies of a set of data, each copy of said set of data having a different compression ratio and/or compression scheme; selecting a copy of said set of data; and performing, on a processor, a computation using said selected copy of said set of data. By providing such a method, different compression ratios and/or compression schemes can be selected as appropriate. For example, if high precision is required in a computation, a copy of the set of data can be chosen which has a low compression ratio at the expense of processing time and memory transfer time. In the alternative, if low precision is acceptable, then the speed benefits of a high compression ratio and/or lossy compression scheme may be utilised.
Abstract: Systems and methods for booting a programmable processor such as a DSP that is incorporated into an HDA codec. The codec and a system memory containing boot program instructions are connected to an HDA bus. In a first mode, the DSP receives boot program instructions via the HDA bus and boots using these instructions. In a second mode, the DSP boots from instructions that are contained in a memory that is connected to the DSP. In one embodiment, the memory connected to the DSP is a component of a plug-in card, and the DSP is configured to determine whether the plug-in card is present, then boot from the memory on the plug-in card if it is present or boot from the system memory via the HDA bus if the plug-in card is not present.
Type:
Grant
Filed:
September 1, 2008
Date of Patent:
December 20, 2011
Assignee:
D2Audio Corporation
Inventors:
Daniel L. Chieng, Douglas D. Gephardt, Jeffrey M. Klaas, Adam Zaharias
Abstract: Methods, apparatus, and products are disclosed for determining when a set of compute nodes participating in a barrier operation on a parallel computer are ready to exit the barrier operation that includes, for each compute node in the set: initializing a barrier counter with no counter underflow interrupt; configuring, upon entering the barrier operation, the barrier counter with a value in dependence upon a number of compute nodes in the set; broadcasting, by a DMA engine on the compute node to each of the other compute nodes upon entering the barrier operation, a barrier control packet; receiving, by the DMA engine from each of the other compute nodes, a barrier control packet; modifying, by the DMA engine, the value for the barrier counter in dependence upon each of the received barrier control packets; exiting the barrier operation if the value for the barrier counter matches the exit value.
Type:
Grant
Filed:
August 1, 2007
Date of Patent:
December 20, 2011
Assignee:
International Business Machines Corporation
Abstract: A processing core of a plurality of processing cores is configured to execute a speculative region of code as a single atomic memory transaction with respect one or more others of the plurality of processing cores. In response to determining an abort condition for an issued one of the plurality of program instructions and in response to determining that the issued program instruction is not part of a mispredicted execution path, the processing core is configured to abort an attempt to execute the speculative region of code.
Type:
Application
Filed:
June 11, 2010
Publication date:
December 15, 2011
Inventors:
Jaewoong Chung, David S. Christie, Michael P. Hohmuth, Stephan Diestelhorst, Martin T. Pohlack, Luke Yen
Abstract: Systems and methods for controlling instruction throughput are disclosed. One embodiment of a system may comprise a comparator that determines a difference value in an actual instructions per clock cycle throughput and a target instructions per clock cycle throughput setting, and a throttle control that sums a plurality of difference values to determine an average difference value over a plurality of clock cycles and adjusts the actual instructions per clock cycle throughput based on the average difference value.
Type:
Grant
Filed:
March 8, 2005
Date of Patent:
December 6, 2011
Assignee:
Hewlett-Packard Development Company, L.P.
Abstract: In one embodiment, the present invention includes a method for receiving a request from a user-level agent for programming of a user-level privilege for at least one architectural resource of an application-managed sequencer (AMS) and programming the user-level privilege for the at least one architectural resource using an operating system-managed sequencer (OMS) coupled to the AMS. Other embodiments are described and claimed.
Type:
Grant
Filed:
December 29, 2006
Date of Patent:
December 6, 2011
Assignee:
Intel Corporation
Inventors:
Hong Wang, Gautham Chinya, Perry Wang, Jamison Collins, Richard A. Hankins, Per Hammarlund, John Shen
Abstract: Performing a deterministic reduction operation in a parallel computer that includes compute nodes, each of which includes computer processors and a CAU (Collectives Acceleration Unit) that couples computer processors to one another for data communications, including organizing processors and a CAU into a branched tree topology in which the CAU is a root and the processors are children; receiving, from each of the processors in any order, dummy contribution data, where each processor is restricted from sending any other data to the root CAU prior to receiving an acknowledgement of receipt from the root CAU; sending, by the root CAU to the processors in the branched tree topology, in a predefined order, acknowledgements of receipt of the dummy contribution data; receiving, by the root CAU from the processors in the predefined order, the processors' contribution data to the reduction operation; and reducing, by the root CAU, the processors' contribution data.
Type:
Application
Filed:
May 28, 2010
Publication date:
December 1, 2011
Applicant:
INTERNATIONAL BUSINESS MACHINES CORPORATION
Inventors:
Charles J. Archer, Michael A. Blocksome, Joseph D. Ratterman, Brian E. Smith
Abstract: A scalable reconfigurable register file (SRRF) containing multiple register files, read and write multiplexer complexes, and a control unit operating in response to instructions is described. Multiple address configurations of the register files are supported by each instruction and different configurations are operable simultaneously during a single instruction execution. For example, with separate files of the size 32×32 supported configurations of 128×32 bit s, 64×64 bit s and 32×128 bit s can be in operation each cycle. Single width, double width, quad width operands are optimally supported without increasing the register file size and without increasing the number of register file read or write ports.
Type:
Grant
Filed:
May 9, 2011
Date of Patent:
November 29, 2011
Assignee:
Altera Corporation
Inventors:
Gerald George Pechanek, Edward A. Wolff
Abstract: A processor includes an instruction sequencing unit, execution unit, and multi-level register file including a first level register file having a lower access latency and a second level register file having a higher access latency. Responsive to the processor processing a second instruction in a transactional code section to obtain as an execution result a second register value of the logical register, the mapper moves a first register value of the logical register to the second level register file, places the second register value in the first level register file, marks the second register value as speculative, and replaces a first mapping for the logical register with a second mapping. Responsive to unsuccessful termination of the transactional code section, the mapper designates the second register value in the first level register file as invalid so that the first register value in the second level register file becomes the working value.
Type:
Application
Filed:
May 12, 2010
Publication date:
November 17, 2011
Applicant:
INTERNATIONAL BUSINESS MACHINES CORPORATION
Inventors:
CHRISTOPHER M. ABERNATHY, MARY D. BROWN, HUNG Q. LE, DUNG Q. NGUYEN
Abstract: Mechanisms are provided for offloading a workload from a main thread to an assist thread. The mechanisms receive, in a fetch unit of a processor of the data processing system, a branch-to-assist-thread instruction of a main thread. The branch-to-assist-thread instruction informs hardware of the processor to look for an already spawned idle thread to be used as an assist thread. Hardware implemented pervasive thread control logic determines if one or more already spawned idle threads are available for use as an assist thread. The hardware implemented pervasive thread control logic selects an idle thread from the one or more already spawned idle threads if it is determined that one or more already spawned idle threads are available for use as an assist thread, to thereby provide the assist thread. In addition, the hardware implemented pervasive thread control logic offloads a portion of a workload of the main thread to the assist thread.
Type:
Application
Filed:
May 12, 2010
Publication date:
November 17, 2011
Applicant:
INTERNATIONAL BUSINESS MACHINES CORPORATION
Inventors:
Ronald P. Hall, Hung Q. Le, Raul E. Silvera, Balaram Sinharoy