Abstract: The invention provides an embedded processor architecture comprising a plurality of virtual processing units that each execute processes or threads (collectively, “threads”). One or more execution units, which are shared by the processing units, execute instructions from the threads. An event delivery mechanism delivers events—such as, by way of non-limiting example, hardware interrupts, software-initiated signaling events (“software events”) and memory events—to respective threads without execution of instructions. Each event can, per aspects of the invention, be processed by the respective thread without execution of instructions outside that thread. The threads need not be constrained to execute on the same respective processing units during the lives of those threads—though, in some embodiments, they can be so constrained. The execution units execute instructions from the threads without needing to know what threads those instructions are from.
Abstract: A method for running, on a processor in non-privileged mode, different computer programs P while, in a nominal mode, using privileged instructions including running a hypervisor program in privileged mode of the processor, the hypervisor program providing the computer programs P with services substantially equivalent to those available for running in privileged mode, source codes of the computer programs P being modified beforehand for replacing the privileged instructions with calls for services supplied by the hypervisor program, and the hypervisor program creates at least two privileged submodes organized into a hierarchy within the non-privileged mode and the processor includes only two operating modes.
Abstract: Performing non-transactional escape actions within a hardware based transactional memory system. A method includes at a hardware thread on a processor beginning a hardware based transaction for the thread. Without committing or aborting the transaction, the method further includes suspending the hardware based transaction and performing one or more operations for the thread, non-transactionally and not affected by: transaction monitoring and buffering for the transaction, an abort for the transaction, or a commit for the transaction. After performing one or more operations for the thread, non-transactionally, the method further includes resuming the transaction and performing additional operations transactionally. After performing the additional operations, the method further includes either committing or aborting the transaction.
Type:
Grant
Filed:
June 26, 2009
Date of Patent:
July 16, 2013
Assignee:
Microsoft Corporation
Inventors:
Gad Sheaffer, Jan Gray, Martin Taillefer, Ali-Reza Adl-Tabatabai, Bratin Saha, Vadim Bassin, Robert Y. Geva, David Callahan
Abstract: An information handling system includes a processor with an instruction issue queue (IQ) that may perform age tracking operations. The issue queue IQ maintains or stores instructions that may issue out-of-order in an internal data store (IDS). The IDS organizes instructions in a queue position (QPOS) addressing arrangement. An age matrix of the IQ maintains a record of relative instruction aging for those instructions within the IDS. The age matrix updates latches or other memory cell data to reflect the changes in IDS instruction ages during a dispatch operation into the IQ. During dispatch of one or more instructions, the age matrix may update only those latches that require data change to reflect changing IDS instruction ages. The age matrix employs row and column data and clock controls to individually update those latches requiring update.
Type:
Grant
Filed:
April 19, 2012
Date of Patent:
July 16, 2013
Assignee:
International Business Machines Corporation
Inventors:
James Wilson Bishop, Mary Douglass Brown, Jeffrey Carl Brownscheidle, Robert Allen Cordes, Maureen Anne Delaney, Jafar Nahidi, Dung Quoc Nguyen, Joel Abraham Silberman
Abstract: An object of the invention is to reduce the electric power consumption resulting from temporarily activating a processor requiring a large electric power consumption, out of a plurality of processors. A multiprocessor system (1) includes: a first processor (141) which executes a first instruction code; a second processor (151) which executes a second instruction code, a hypervisor (130) which converts the second instruction code into an instruction code executable by the first processor (141); and a power control circuit (170) which controls the operation of at least one of the first processor (141) and the second processor (151). When the operation of the second processor (151) is suppressed by the power control circuit (170), the hypervisor (130) converts the second instruction code into the instruction code executable by the first processor (141), and the first processor (141) executes the converted instruction code.
Abstract: A command chain system includes plurality of processing elements, a memory, and a chain engine. The chain engine is in communication with the memory and accesses instructions in the memory. The chain engine accesses a subroutine stored in the memory. The chain engine sends a command to a specialized hardware. The chain engine performs an action determined by one or more of the operation-code portion, the skip portion, and the loop-count portion of the instruction.
Type:
Grant
Filed:
April 15, 2010
Date of Patent:
July 16, 2013
Assignee:
Lockheed Martin Corporation
Inventors:
Joshua W. Rensch, Marlon O. Gunderson, James V. Hedin
Abstract: A programmable processor and method for improving the performance of processors by expanding at least two source operands, or a source and a result operand, to a width greater than the width of either the general purpose register or the data path width. The present invention provides operands which are substantially larger than the data path width of the processor by using the contents of a general purpose register to specify a memory address at which a plurality of data path widths of data can be read or written, as well as the size and shape of the operand. In addition, several instructions and apparatus for implementing these instructions are described which obtain performance advantages if the operands are not limited to the width and accessible number of general purpose registers.
Type:
Application
Filed:
August 22, 2012
Publication date:
July 4, 2013
Applicant:
MicroUnity Systems Engineering, Inc.
Inventors:
Craig Hansen, John Moussouris, Alexia Massalin
Abstract: A parallel processing system for computing particle interactions includes a plurality of computation nodes arranged according to a geometric partitioning of a simulation volume. Each computation node has storage for particle data. This particle data is associated with particles in a region of the geometrically partitioned simulation volume. The parallel processing system also includes a communication system having links interconnecting the computation nodes. Each of the computation nodes includes a processor subsystem. These processor subsystems cooperate to coordinate computation of the particle interactions in a distributed manner.
Abstract: According to one embodiment, a data processing apparatus includes a processor and a memory. The processor includes core blocks. The memory stores a command queue and task management structure data. The command queue stores a series of kernel functions. The task management structure data defines an order of execution of kernel functions by associating a return value of a previous kernel function with an argument of a subsequent kernel function. Core blocks of the processor are capable of executing different kernel functions.
Abstract: A method and apparatus are described for generating flags in response to processing data during an execution pipeline cycle of a processor. The processor may include a multiplexer configured generate valid bits for received data according to a designated data size, and a logic unit configured to control the generation of flags based on a shift or rotate operation command, the designated data size and information indicating how many bytes and bits to rotate or shift the data by. A carry flag may be used to extend the amount of bits supported by shift and rotate operations. A sign flag may be used to indicate whether a result is a positive or negative number. An overflow flag may be used to indicate that a data overflow exists, whereby there are not a sufficient number of bits to store the data.
Abstract: Techniques are described for predictively starting a processing element. Embodiments receive streaming data to be processed by a plurality of processing elements. An operator graph of the plurality of processing elements that defines at least one execution path is established. Embodiments determine a historical startup time for a first processing element in the operator graph, where, once started, the first processing element begins normal operations once the first processing element has received a requisite amount of data from one or more upstream processing elements. Additionally, embodiments determine an amount of time the first processing element takes to receive the requisite amount of data from the one or more upstream processing elements. The first processing element is then predictively started at a first startup time based on the determined historical startup time and the determined amount of time historically taken to receive the requisite amount of data.
Type:
Application
Filed:
December 10, 2012
Publication date:
June 27, 2013
Applicant:
INTERNATIONAL BUSINESS MACHINES CORPORATION
Inventor:
International Business Machines Corporation
Abstract: In one embodiment, the present invention includes a method of assigning a location within a shared variable for each of multiple threads and writing a value to a corresponding location to indicate that the corresponding thread has reached a barrier. In such manner, when all the threads have reached the barrier, synchronization is established. In some embodiments, the shared variable may be stored in a cache accessible by the multiple threads. Other embodiments are described and claimed.
Abstract: A method for providing generic formatted data to at least one digital data processor, configured to translate generic formatted data into specific formatted data. The generic formatted data includes data relative to logical blocks, at least one of the logical blocks corresponding to an object to be processed directly or indirectly according to specific formatted data by at least one processing platform with processor(s) and memory(ies), located upstream from the processor or integrated into the processor, the object being made up of elementary information of same type, all information being represented by at least one numerical value.
Abstract: Generating instructions, in particular for mailbox verification in a simulation environment. A sequence of instructions is received, as well as selection data representative of a plurality of commands including a special command. Repeatedly selecting one of the plurality of commands and outputting an instruction based on the selected command. The outputting of an instruction includes outputting a next instruction in the sequence of instructions if the selected command is the special command, and outputting an instruction associated with the command if the selected command is not the special command.
Type:
Application
Filed:
November 1, 2012
Publication date:
June 20, 2013
Applicant:
INTERNATIONAL BUSINESS MACHINES CORPORATION
Inventor:
INTERNATIONAL BUSINESS MACHINES CORPORATION
Abstract: A method is provided that includes determining a number of outstanding out-of-order instructions in an instruction stream. The method includes determining a number of available hardware resources for executing out-of-order instructions and inserting fencing instructions into the instruction stream if the number of outstanding out-of-order instructions exceeds the determined number of available hardware resources. A second method is provided for compiling source code that includes determining a speculative region. The second method includes generating machine-level instructions and inserting fencing instructions into the machine-level instructions in response to determining the speculative region. A processing device is provided that includes cache memory and a processing unit to execute processing device instructions in an instruction stream.
Type:
Application
Filed:
December 15, 2011
Publication date:
June 20, 2013
Inventors:
Martin T. Pohlack, Michael Hohmuth, Stephan Diestelhorst, David Christie, Luke Yen
Abstract: In one embodiment, the present invention includes a method for receiving a data access instruction and obtaining an index into a data access hint register (DAHR) register file of a processor from the data access instruction, reading hint information from a register of the DAHR register file accessed using the index, and performing the data access instruction using the hint information. Other embodiments are described and claimed.
Abstract: A code section of a computer program to be executed by a computing device includes memory barrier instructions. Where the code section satisfies a threshold, the code section is modified, by enclosing the code section within a transaction that employs hardware transactional memory of the computing device, and removing the memory barrier instructions from the code section. Execution of the code section as has been enclosed within the transaction can be monitored to yield monitoring results. Where the monitoring results satisfy an abort threshold corresponding to excessive aborting of the execution of the code section as has been enclosed within the transaction, the code section is split into code sub-sections, and each code sub-section enclosed within a separate transaction that employs the hardware transactional memory. Splitting the code section sections and enclosing each code sub-section within a separate transaction can decrease occurrence of the code section aborting during execution.
Type:
Application
Filed:
December 15, 2011
Publication date:
June 20, 2013
Inventors:
Toshihiko Koju, Takuya Nakaike, Ali Ijaz Sheikh, Harold Wade Cain, III, Maged M. Michael
Abstract: A particular method includes receiving, at a processor, an instruction and an address of the instruction. The method also includes preventing execution of the instruction based at least in part on determining that the address is within a range of addresses.
Type:
Application
Filed:
December 16, 2011
Publication date:
June 20, 2013
Applicant:
INTERNATIONAL BUSINESS MACHINES CORPORATION
Inventors:
Mark J. Hickey, Adam J. Muff, Matthew R. Tubbs, Charles D. Wait
Abstract: Systems and methods for synchronizing thread wavefronts and associated events are disclosed. According to an embodiment, a method for synchronizing one or more thread wavefronts and associated events includes inserting a first event associated with a first data output from a first thread wavefront into an event synchronizer. The event synchronizer is configured to release the first event before releasing events inserted subsequent to the first event. The method further includes releasing the first event from the event synchronizer after the first data is stored in the memory. Corresponding system and computer readable medium embodiments are also disclosed.
Type:
Grant
Filed:
November 23, 2010
Date of Patent:
June 18, 2013
Assignees:
Advanced Micro Devices, Inc., ATI Technologies ULC
Inventors:
Laurent LeFebvre, Michael Mantor, Deborah Lynne Szasz
Abstract: A computer array (10) has a plurality of computers (12). The computers (12) communicate with each other asynchronously, and the computers (12) themselves operate in a generally asynchronous manner internally. When one computer (12) attempts to communicate with another it goes to sleep until the other computer (12) is ready to complete the transaction, thereby saving power and reducing heat production. The sleeping computer (12) can be awaiting data or instructions (12). In the case of instructions, the sleeping computer (12) can be waiting to store the instructions or to immediately execute the instructions. In the later case, the instructions are placed in an instruction register (30a) when they are received and executed therefrom, without first placing the instructions first into memory. The instructions can include a micro-loop (100) which is capable of performing a series of operations repeatedly.
Type:
Grant
Filed:
March 21, 2011
Date of Patent:
June 18, 2013
Assignee:
ARRAY Portfolio LLC
Inventors:
Charles H. Moore, Jeffrey Arthur Fox, John W. Rible
Abstract: In one embodiment, a processor comprises a redirect unit configured to detect a match of an instruction pointer (IP) in an IP redirect table, the IP corresponding to a guest instruction that the processor has intercepted, wherein the guest is executed under control of a virtual machine monitor (VMM), and wherein the redirect unit is configured to redirect instruction fetching by the processor to a routine identified in the IP redirect table instead of exiting to the VMM in response to the intercept of the guest instruction.
Abstract: A method, information processing system, and computer program product crack and/or shorten computer executable instructions. At least one instruction is received. The at least on instruction is analyzed. An instruction type associated with the at least one instruction is identified. At least one of a base field, an index field, one or more operands, and a mask field of the instruction are analyzed. At least one of the following is then performed: the at least one instruction is organized into a set of unit of operation; and the at least one instruction is shortened. The set of unit of operations is then executed.
Type:
Grant
Filed:
April 9, 2010
Date of Patent:
June 11, 2013
Assignee:
International Business Machines Corporation
Inventors:
Fadi Busaba, Brian Curran, Lee Eisen, Bruce Giamei, David Hutton
Abstract: A method utilizes information provided by performance monitoring hardware to dynamically adjust the number of levels of speculative branch predictions allowed (typically 3 or 4 per thread). for a processor core. The information includes cycles-per-instruction (CPI) for the processor core and number of memory accesses per unit time. If the CPI is below a CPI threshold; and the number of memory accesses (NMA) per unit time is above a prescribe threshold, the number of levels of speculative branch predictions is reduced per thread for the processor core. Likewise, the number of levels of speculative branch predictions could be increased, from a low level to maximum allowed, if the CPI threshold is exceeded or the number of memory accesses per unit time is below the prescribed threshold.
Type:
Application
Filed:
December 1, 2011
Publication date:
June 6, 2013
Applicant:
INTERNATIONAL BUSINESS MACHINES CORPORATION
Abstract: A system, for use with a compiler architecture framework, includes performing a statically speculative compilation process to extract and use speculative static information, encoding the speculative static information in an instruction set architecture of a processor, and executing a compiled computer program using the speculative static information, wherein executing supports static speculation driven mechanisms and controls.
Abstract: A processor may include a plurality of processing units for processing instructions, where each processing unit is associated with a discrete instruction queue. Data is read from a data queue selected by each instruction, and a sequencer manages distribution of instructions to the plurality of discrete instruction queues.
Type:
Grant
Filed:
September 8, 2009
Date of Patent:
June 4, 2013
Assignee:
SMSC Holdings S.A.R.L.
Inventors:
Matthias Tramm, Manfred Stadler, Christian Hitz
Abstract: An indirect branch instruction takes an address register as an argument in order to provide indirect function call capability for single-instruction multiple-thread (SIMT) processor architectures. The indirect branch instruction is used to implement indirect function calls, virtual function calls, and switch statements to improve processing performance compared with using sequential chains of tests and branches.
Abstract: A method and circuit arrangement speculatively preprocess data stored in a register file during otherwise unused cycles in an execution unit, e.g., to prenormalize denormal floating point values stored in a floating point register file, to decompress compressed values stored in a register file, to decrypt encrypted values stored in a register file, or to otherwise preprocess data that is stored in an unprocessed form in a register file.
Type:
Application
Filed:
November 30, 2011
Publication date:
May 30, 2013
Applicant:
INTERNATIONAL BUSINESS MACHINES CORPORATION
Inventors:
Mark J. Hickey, Adam J. Muff, Matthew R. Tubbs, Charles D. Wait
Abstract: A data processing apparatus has an instruction memory system arranged to output an instruction word addressed by an instruction address. An instruction execution unit, processes a plurality of instructions from the instruction word in parallel. A detection unit, detects in which of a plurality of ranges the instruction address lies. The detection unit is coupled to the instruction execution unit and/or the instruction memory system, to control a way in which the instruction execution unit parallelizes processing of the instructions from the instruction word, dependent on a detected range. In an embodiment the instruction execution unit and/or the instruction memory system adjusts a width of the instruction word that determines a number of instructions from the instruction word that is processed in parallel, dependent on the detected range.
Type:
Application
Filed:
January 28, 2013
Publication date:
May 30, 2013
Applicant:
Nytell Software LLC
Inventors:
Ramanathan Sethuraman, Balakrishnan Srinivasan, Carlos Alba Pinto, Harm Peters, Rafael Peset LLopis
Abstract: A hardware wake-and-go mechanism is provided for a data processing system. The wake-and-go mechanism looks ahead in the instruction stream of a thread for programming idioms that indicates that the thread is waiting for an event. The wake-and-go mechanism updates a wake-and-go array with a target address associated with the event for each recognized programming idiom. When the thread reaches a programming idiom, the thread goes to sleep until the event occurs. The wake-and-go array may be a content addressable memory (CAM). When a transaction appears on the symmetric multiprocessing (SMP) fabric that modifies the value at a target address in the CAM, the CAM returns a list of storage addresses at which the target address is stored. The wake-and-go mechanism associates these storage addresses with the threads waiting for an event at the target addresses, and may wake the one or more threads waiting for the event.
Type:
Grant
Filed:
February 1, 2008
Date of Patent:
May 28, 2013
Assignee:
International Business Machines Corporation
Inventors:
Ravi K. Arimilli, Satya P. Sharma, Randal C. Swanberg
Abstract: A data processor includes an instruction decoder, an execution unit, a general-purpose register file, and an index-register file. The instruction set for the data processor includes indirect-indexing instructions to facilitate table lookups. When executing such an instruction, the execution unit reads an index stored at an index-register location specified by the instruction. The index refers to a general-purpose register location, which is then read and copied to a general-purpose register location as specified by the instruction. The disclosed execution unit includes four functional units, each with two read ports and a write port so that eight table lookups can be performed in parallel.
Type:
Grant
Filed:
September 17, 2002
Date of Patent:
May 28, 2013
Assignee:
Hewlett-Packard Development Company, L.P.
Abstract: A method and system for processing instruction information. Each instruction information character string of a sequence of instruction information character strings are sequentially extracted and processed. Each instruction information character string pertains to an associated target object wrapped in a target object storage unit by an associated operation target model. It is independently ascertained for each instruction information character string whether to generate a code line for each instruction information character string, by: determining whether a requirement is satisfied and generating the code line and storing the code line in a code buffer if the requirement has been determined to be satisfied and not generating the code line if the requirement has been determined to not be satisfied. The requirement relates to whether the instruction information character string being processed comprises a naming instruction or a generation instruction.
Type:
Application
Filed:
October 11, 2012
Publication date:
May 23, 2013
Applicant:
International Business Machines Corporation
Inventor:
International Business Machines Corporation
Abstract: A multiprocessor data processing system (MDPS) with a weakly-ordered architecture providing processing logic for substantially eliminating issuing sync instructions after every store instruction of a well-behaved application. Instructions of a well-behaved application are translated and executed by a weakly-ordered processor. The processing logic includes a lock address tracking utility (LATU), which provides an algorithm and a table of lock addresses, within which each lock address is stored when the lock is acquired by the weakly-ordered processor. When a store instruction is encountered in the instruction stream, the LATU compares the target address of the store instruction against the table of lock addresses. If the target address matches one of the lock addresses, indicating that the store instruction is the corresponding unlock instruction (or lock release instruction), a sync instruction is issued ahead of the store operation.
Type:
Grant
Filed:
October 28, 2008
Date of Patent:
May 21, 2013
Assignee:
International Business Machines Corporation
Inventors:
Andrew Dunshea, Satya Prakash Sharma, Mysore Sathyanarayana Srinivas
Abstract: A technique for optimizing program instruction execution throughput in a central processing unit core (CPU). The CPU implements a simultaneous multithreading (SMT) operational mode wherein program instructions associated with at least two software threads are executed in parallel as hardware threads while sharing one or more hardware resources used by the CPU, such as cache memory, translation lookaside buffers, functional execution units, etc. As part of the SMT mode, the CPU implements an autothread (AT) operational mode. During the AT operational mode, a determination is made whether there is a resource conflict between the hardware threads that undermines instruction execution throughput. If a resource conflict is detected, the CPU adjusts the relative instruction execution rates of the hardware threads based on relative priorities of the software threads.
Type:
Application
Filed:
November 11, 2011
Publication date:
May 16, 2013
Applicant:
INTERNATIONAL BUSINESS MACHINES CORPORATION
Inventors:
Amit Merchant, Dipankar Sarma, Vaidyanathan Srinivasan
Abstract: A method and apparatus for processing multi-cycle instructions include picking a multi-cycle instruction and directing the picked multi-cycle instruction to a pipeline. The pipeline includes a pipeline control configured to detect a latency and a repeat rate of the picked multi-cycle instruction and to count clock cycles based on the detected latency and the detected repeat rate. The method and apparatus further include detecting the repeat rate and the latency of the picked multi-cycle instruction, and counting clock cycles based on the detected repeat rate and the latency of the picked multi-cycle instruction.
Type:
Application
Filed:
November 4, 2011
Publication date:
May 9, 2013
Applicant:
ADVANCED MICRO DEVICES, INC.
Inventors:
Ganesh Venkataramanan, Michael G. Butler
Abstract: A method, information processing system, and computer program product record an execution of a program instruction. A determination is made that a thread has entered a program unit. Another determination is made that that the thread is associated with at least one attribute that matches a set of thread recording criteria. An instruction recording mechanism for the thread is dynamically activated in response to the at least one attribute of the thread matching the set of thread recording criteria.
Type:
Application
Filed:
November 4, 2011
Publication date:
May 9, 2013
Applicant:
INTERNATIONAL BUSINESS MACHINES CORPORATION
Inventors:
Christopher D. FILACHEK, Mei Hui WANG, Joshua B. WISNIEWSKI
Abstract: A computer employs a set of General Purpose Registers (GPRs). Each GPR comprises a plurality of portions. Programs such as an Operating System and Applications operating in a Large GPR mode, access the full GPR, however programs such as Applications operating in Small GPR mode, only have access to a portion at a time. Instruction Opcodes, in Small GPR mode, may determine which portion is accessed.
Type:
Application
Filed:
December 26, 2012
Publication date:
May 9, 2013
Applicant:
INTERNATIONAL BUSINESS MACHINES CORPORATION
Inventor:
International Business Machines Corporation
Abstract: Disclosed are a method and system for optimized, dynamic data-dependent program execution. The disclosed system comprises a statistics computer which computes statistics of the incoming data at the current time instant, where the said statistics include the probability distribution of the incoming data, the probability distribution over program modules induced by the incoming data, the probability distribution induced over program outputs by the incoming data, and the time-complexity of each program module for the incoming data, wherein the said statistics are computed on as a function of current and past data, and previously computed statistics; a plurality of alternative execution path orders designed prior to run-time by the use of an appropriate source code; a source code selector which selects one of the execution path orders as a function of the statistics computed by the statistics computer; a complexity measurement which measures the time-complexity of the currently selected execution path-order.
Type:
Application
Filed:
December 21, 2012
Publication date:
May 9, 2013
Applicant:
International Business Machines Corporation
Inventor:
International Business Machines Corporation
Abstract: In an embodiment, if a self thread has more than one conflict, a transaction of the self thread is aborted and restarted. If the self thread has only one conflict and an enemy thread of the self thread has more than one conflict, the transaction of the self thread is committed. If the self thread only conflicts with the enemy thread and the enemy thread only conflicts with the self thread and the self thread has a key that has a higher priority than a key of the enemy thread, the transaction of the self thread is committed. If the self thread only conflicts with the enemy thread, the enemy thread only conflicts with the self thread, and the self thread has a key that has a lower priority than the key of the enemy thread, the transaction of the self thread is aborted.
Type:
Grant
Filed:
February 24, 2010
Date of Patent:
May 7, 2013
Assignee:
International Business Machines Corporation
Inventors:
Mark E. Giampapa, Thomas M. Gooding, Raul E. Silvera, Kai-Ting Amy Wang, Peng Wu, Xiaotong Zhuang
Abstract: A method and apparatus for providing fairness in a multi-processing element environment is herein described. Mask elements are utilized to associated portions of a reservation station with each processing element, while still allowing common access to another portion of reservation station entries. Additionally, bias logic biases selection of processing elements in a pipeline away from a processing element associated with a blocking stall to provide fair utilization of the pipeline.
Type:
Grant
Filed:
November 8, 2010
Date of Patent:
May 7, 2013
Assignee:
Intel Corporation
Inventors:
Morris Marden, Matthew Merten, Alexandre Farcy, Avinash Sodani, James Hadley, Ilhyun Kim
Abstract: Optimizing processing of a document sequentially processed by a plurality of image processing apparatuses that refer to an instruction document indicating the processing to be performed by each of the plurality of image processing apparatuses and respective security measures performed by each of the plurality of image processing apparatuses.
Abstract: A multiprocessor executes a plurality of threads without decreasing execution efficiency. The multiprocessor includes a first processor allocating a different register file to each of a predetermined number of threads to be executed from among plural threads, and executing the predetermined number of threads in parallel; and a second processor performing processing according to a processing request made by the first processor. The first processor has areas allocated to the plurality of threads in one-to-one correspondence, makes the processing request to the second processor according to an instruction included in one of the predetermined number of threads, upon receiving a request for writing a value resulting from the processing from the second processor, judges whether the one thread is being executed, and when judging negatively, performs control such that the obtained value is written into one of the areas allocated to the one thread.
Abstract: Disclosed are machine processors and methods performed thereby. The processor has access to processing units for performing data processing and to libraries. Functions in the libraries are implementable to perform parallel processing and graphics processing. The processor may be configured to acquire (e.g., to download from a web server) a download script, possibly with extensions specifying bindings to library functions. Running the script may cause the processor to create, for each processing unit, contexts in which functions may be run, and to run, on the processing units and within a respective context, a portion of the download script. Running the script may also cause the processor to create, for a processing unit, a memory object, transfer data into that memory object, and transfer data back to the processor in such a way that a memory address of the data in the memory object is not returned to the processor.
Abstract: Methods and apparatuses are presented for graphics operations with thread count throttling, involving operating a processor to carry out multiple threads of execution of, wherein the processor comprises at least one execution unit capable of supporting up to a maximum number of threads, obtaining a defined memory allocation size for allocating, in at least one memory device, a thread-specific memory space for the multiple threads, obtaining a per thread memory requirement corresponding to the thread-specific memory space, determining a thread count limit based on the defined memory allocation size and the per thread memory requirement, and sending a command to the processor to cause the processor to limit the number of threads carried out by the at least one execution unit to a reduced number of threads, the reduced number of threads being less than the maximum number of threads.
Type:
Grant
Filed:
November 2, 2006
Date of Patent:
April 23, 2013
Assignee:
NVIDIA Corporation
Inventors:
Jerome F. Duluk, Jr., Bryon S. Nordquist
Abstract: Various techniques for dynamically allocating instruction tags and using those tags are disclosed. These techniques may apply to processors supporting out-of-order execution and to architectures that supports multiple threads. A group of instructions may be assigned a tag value from a pool of available tag values. A tag value may be usable to determine the program order of a group of instructions relative to other instructions in a thread. After the group of instructions has been (or is about to be) committed, the tag value may be freed so that it can be re-used on a second group of instructions. Tag values are dynamically allocated between threads; accordingly, a particular tag value or range of tag values is not dedicated to a particular thread.
Type:
Grant
Filed:
June 30, 2009
Date of Patent:
April 23, 2013
Assignee:
Oracle America, Inc.
Inventors:
Paul J. Jordan, Robert T. Golla, Jama I. Barreh
Abstract: Disclosed are machine processors and methods performed thereby. The processor has access to processing units for performing data processing and to libraries. Functions in the libraries are implementable to perform parallel processing and graphics processing. The processor may be configured to acquire (e.g., to download from a web server) a download script, possibly with extensions specifying bindings to library functions. Running the script may cause the processor to create, for each processing unit, contexts in which functions may be run, and to run, on the processing units and within a respective context, a portion of the download script. Running the script may also cause the processor to create, for a processing unit, a memory object, transfer data into that memory object, and transfer data back to the processor in such a way that a memory address of the data in the memory object is not returned to the processor.
Abstract: Mechanisms are provided for offloading a workload from a main thread to an assist thread. The mechanisms receive, in a fetch unit of a processor of the data processing system, a branch-to-assist-thread instruction of a main thread. The branch-to-assist-thread instruction informs hardware of the processor to look for an already spawned idle thread to be used as an assist thread. Hardware implemented pervasive thread control logic determines if one or more already spawned idle threads are available for use as an assist thread. The hardware implemented pervasive thread control logic selects an idle thread from the one or more already spawned idle threads if it is determined that one or more already spawned idle threads are available for use as an assist thread, to thereby provide the assist thread. In addition, the hardware implemented pervasive thread control logic offloads a portion of a workload of the main thread to the assist thread.
Type:
Grant
Filed:
May 12, 2010
Date of Patent:
April 16, 2013
Assignee:
International Business Machines Corporation
Inventors:
Ronald P. Hall, Hung Q. Le, Raul E. Silvera, Balaram Sinharoy
Abstract: Provided is a parallel distributed processing method executed by a computer system comprising a parallel-distributed-processing control server, a plurality of extraction processing servers and a plurality of aggregation processing servers. The managed data includes at least a first and a second data items, the plurality of data items each including a value. The method includes a step of extracting data from one of the plurality of chunks according to a value in the second data item, to thereby group the data, a step of merging groups having the same value in the second data item based on an order of a value in the first data item of data contained in a group among groups, and a step of processing data in a group obtained through the merging by focusing on the order of the value in the first data item.
Abstract: Circuitry for receiving transaction requests from a plurality of masters and the masters themselves are disclosed. The circuitry comprises: an input port for receiving said transaction requests, at least one of said transaction requests received comprising an indicator indicating if said transaction is a speculative transaction; an output port for outputting a response to said master said transaction request was received from; and transaction control circuitry; wherein said transaction control circuitry is responsive to a speculative transaction request to determine a state of at least a portion of a data processing apparatus said circuitry is operating within and in response to said state being a predetermined state said transaction control circuitry generates a transaction cancel indicator and outputs said transaction cancel indicator as said response, said transaction cancel indicator indicating to said master that said speculative transaction will not be performed.
Abstract: One embodiment of the present invention sets forth a technique for performing a parallel scan operation with high computational efficiency in a single-instruction multiple-data (SIMD) environment. Each participating thread initially writes an extended region of a data array to initialize the region with an identity value. For example, a value of zero is used as the identity value for addition. The initialized region of the data array includes an initialized entry for every possible out of bounds index that may be computed in the normal course of the parallel scan operation. During the parallel scan operation each thread computes data array indices according to any technically appropriate technique. When a participating thread computes an index that would conventionally be out of bounds, the thread is able to retrieve an identity value from the initialized region of the data array rather than perform a bounds check that returns the identity value.
Abstract: Disclosed is an information processing apparatus in which various kinds of information are processed in either the real time processing mode or the non-real time processing mode. The apparatus includes an operation display section to accept an inputted instruction, an image processing section to apply a processing to image information and a processor provided with a plurality of same cores. The real-time processing unnecessary process that is related to the operation display section, is fixed onto one of the plurality of same cores so that the one of the plurality of same cores is in charge of controlling the real-time processing unnecessary process, while, the real-time processing necessary process that is related to the image processing section, is fixed onto another one of the plurality of same cores so that the other one of the plurality of same cores is in charge of controlling the real-time processing necessary process.
Type:
Grant
Filed:
March 18, 2010
Date of Patent:
April 9, 2013
Assignee:
Konica Minolta Business Technologies, Inc.