Processing Control Patents (Class 712/220)
  • Publication number: 20130086363
    Abstract: An instruction set architecture (ISA) includes instructions for selectively indicating last-use architected operands having values that will not be accessed again, wherein architected operands are made active or inactive after an instruction specified last-use by an instruction, wherein the architected operands are made active by performing a write operation to an inactive operand, wherein the activation/deactivation may be performed by the instruction having the last-use of the operand or another (prefix) instruction.
    Type: Application
    Filed: October 3, 2011
    Publication date: April 4, 2013
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Michael K. Gschwind, Valentina Salapura
  • Publication number: 20130086367
    Abstract: Operand liveness state information is maintained during context switches for current architected operands of executing programs the current operand state information indicating whether corresponding current operands are any one of enabled or disabled for use by a first program module, the first program module comprising machine instructions of an instruction set architecture (ISA) for disabling current architected operands, wherein a current operand is accessed by a machine instruction of said first program module, the accessing comprising using the current operand state information to determine whether a previously stored current operand value is accessible by the first program module.
    Type: Application
    Filed: October 3, 2011
    Publication date: April 4, 2013
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Michael K. Gschwind, Valentina Salapura
  • Publication number: 20130086364
    Abstract: A multi-level register hierarchy is disclosed comprising a first level pool of registers for caching registers of a second level pool of registers in a system wherein programs can dynamically release and re-enable architected registers such that released architected registers need not be maintained by the processor, the processor accessing operands from the first level pool of registers, wherein a last-use instruction is identified as having a last use of an architected register before being released, the last-use architected register being released causes the multi-level register hierarchy to discard any correspondence of an entry to said last use architected register.
    Type: Application
    Filed: October 3, 2011
    Publication date: April 4, 2013
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Michael K. Gschwind, Valentina Salapura
  • Publication number: 20130086365
    Abstract: A pool of available physical registers are provided for architected registers, wherein operations are performed that activate and deactivate selected architected registers, such that the deactivated selected architected registers need not retain values, and physical registers can be deallocated to the pool, wherein deallocation of physical registers is performed after a last-use by a designated last-use instruction, wherein the last-use information is provided either by the last-use instruction or a prefix instruction, wherein reads to deallocated architecture registers return an architected default value.
    Type: Application
    Filed: October 3, 2011
    Publication date: April 4, 2013
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Michael K. Gschwind, Valentina Salapura
  • Publication number: 20130080745
    Abstract: Fine-grained enablement at sub-function granularity. An instruction encapsulates different sub-functions of a function, in which the sub-functions use different sets of registers of a composite register file, and therefore, different sets of functional units. At least one operand of the instruction specifies which set of registers, and therefore, which set of functional units, is to be used in performing the sub-function. The instruction can perform various functions (e.g., move, load, etc.) and a sub-function of the function specifies the type of function (e.g., move-floating point; move-vector; etc.).
    Type: Application
    Filed: November 20, 2012
    Publication date: March 28, 2013
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventor: INTERNATIONAL BUSINESS MACHINES CORP
  • Publication number: 20130080744
    Abstract: Methods and systems for executing a code stream of non-native binary code on a computing system are disclosed. One method includes parsing the code stream to detect a plurality of elements including one or more branch destinations, and traversing the code stream to detect a plurality of non-native operators. The method also includes executing a pattern matching algorithm against the plurality of non-native operators to find combinations of two or more non-native operators that do not span across a detected branch destination and that correspond to one or more target operators executable by the computing system. The method further includes generating a second code stream executable on the computing system including the one or more target operators.
    Type: Application
    Filed: September 27, 2011
    Publication date: March 28, 2013
    Inventor: Andrew Ward Beale
  • Publication number: 20130080746
    Abstract: In one embodiment, the present invention includes a method for communicating an assertion signal from a first instruction sequencer to a plurality of accelerators coupled to the first instruction sequencer, detecting the assertion signal in the accelerators and communicating a request for a lock, and registering an accelerator that achieves the lock by communication of a registration message for the accelerator to the first instruction sequencer. Other embodiments are described and claimed.
    Type: Application
    Filed: November 20, 2012
    Publication date: March 28, 2013
    Inventors: Perry Wang, Jamison Collins, Hong Wang
  • Patent number: 8407455
    Abstract: A computer-implemented method and article of manufacture is disclosed for enabling computer programs utilizing hardware transactional memory to safely interact with code utilizing traditional locks. A thread executing on a processor of a plurality of processors in a shared-memory system may initiate transactional execution of a section of code, which includes a plurality of access operations to the shared-memory, including one or more to locations protected by a lock. Before executing any operations accessing the location associated with the lock, the thread reads the value of the lock as part of the transaction, and only proceeds if the lock is not held. If the lock is acquired by another thread during transactional execution, the processor detects this acquisition, aborts the transaction, and attempts to re-execute it.
    Type: Grant
    Filed: July 28, 2009
    Date of Patent: March 26, 2013
    Assignee: Advanced Micro Devices, Inc.
    Inventors: David S. Christie, Michael P. Hohmuth, Stephan Diestelhorst
  • Patent number: 8407451
    Abstract: An information handling system includes a processor with multiple hardware units that generate program application load, store, and I/O interface requests to system busses within the information handling system. The processor includes a resource allocation identifier (RAID) that links the processor hardware unit initiating a system bus request with a specific resource allocation group. The resource allocation group assigns a specific bandwidth allocation rate to the initiating processor. When a load, store, or I/O interface bus request reaches the I/O bus for execution, the resource allocation manager restricts the amount of bandwidth associated with each I/O request by assigning discrete amounts of bandwidth to each successive I/O requester. Successive stages of the instruction pipeline in the hardware unit contain the resource allocation identifiers (RAID) linked to the specific load, store, or I/O instruction.
    Type: Grant
    Filed: February 6, 2007
    Date of Patent: March 26, 2013
    Assignee: International Business Machines Corporation
    Inventors: Gavin Balfour Meil, Steven Leonard Roberts, Christopher John Spandikow
  • Publication number: 20130073836
    Abstract: Fine-grained enablement at sub-function granularity. An instruction encapsulates different sub-functions of a function, in which the sub-functions use different sets of registers of a composite register file, and therefore, different sets of functional units. At least one operand of the instruction specifies which set of registers, and therefore, which set of functional units, is to be used in performing the sub-function. The instruction can perform various functions (e.g., move, load, etc.) and a sub-function of the function specifies the type of function (e.g., move-floating point; move-vector; etc.).
    Type: Application
    Filed: September 16, 2011
    Publication date: March 21, 2013
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Michael K. Gschwind, Brett Olsson, Valentina Salapura
  • Patent number: 8402253
    Abstract: In one embodiment, the present invention includes a method for determining if an instruction of a first thread dispatched from a first queue associated with the first thread is stalled in a pipestage of a pipeline, and if so, dispatching an instruction of a second thread from a second queue associated with the second thread to the pipeline if the second thread is not stalled. Other embodiments are described and claimed.
    Type: Grant
    Filed: September 29, 2006
    Date of Patent: March 19, 2013
    Assignee: Intel Corporation
    Inventors: Matthew Merten, Avinash Sodani, James Hadley, Alexandre Farcy, Iredamola Olopade
  • Publication number: 20130067202
    Abstract: A microprocessor processes conditional non-branch instructions that specify a condition and instruct the microprocessor to perform an operation if the condition is satisfied and otherwise to not perform the operation. A predictor provides a prediction about a conditional non-branch instruction. An instruction translator translates the conditional non-branch instruction into a no-operation microinstruction when the prediction predicts the condition will not be satisfied, and into a set of one or more microinstructions to unconditionally perform the operation when the prediction predicts the condition will be satisfied. An execution pipeline executes the no-operation microinstruction or the set of microinstructions. The predictor translates into a second set of one or more microinstructions to conditionally perform the operation when the prediction does not make a prediction.
    Type: Application
    Filed: March 6, 2012
    Publication date: March 14, 2013
    Applicant: VIA TECHNOLOGIES, INC.
    Inventors: G. Glenn Henry, Terry Parks, Rodney E. Hooker
  • Publication number: 20130067131
    Abstract: A method of processing J1850 requests using a scan tool having multiple processor systems is provided. The scan tool includes a first processor that processes data according to scan tool functions to assist with diagnosing and repairing a vehicle. A second processor receives data transmitted to the first processor and stores the data in a buffer. The second processor determines whether the data is complete to enable the first processor to make a determination regarding the data.
    Type: Application
    Filed: August 21, 2012
    Publication date: March 14, 2013
    Applicant: Service Solutions U.S. LLC
    Inventor: David Vossen
  • Publication number: 20130061027
    Abstract: This disclosure describes techniques for handling divergent thread conditions in a multi-threaded processing system. In some examples, a control flow unit may obtain a control flow instruction identified by a program counter value stored in a program counter register. The control flow instruction may include a target value indicative of a target program counter value for the control flow instruction. The control flow unit may select one of the target program counter value and a minimum resume counter value as a value to load into the program counter register. The minimum resume counter value may be indicative of a smallest resume counter value from a set of one or more resume counter values associated with one or more inactive threads. Each of the one or more resume counter values may be indicative of a program counter value at which a respective inactive thread should be activated.
    Type: Application
    Filed: September 7, 2011
    Publication date: March 7, 2013
    Applicant: QUALCOMM Incorporated
    Inventors: Lin Chen, David Rigel Garcia Garcia, Andrew E. Gruber, Guofang Jiao
  • Publication number: 20130061026
    Abstract: A configurable mass data portioning for parallel processing is described herein. One or more operation attributes are selected to participate in parallelization criteria. The values of the selected operation attributes for a number of operations are submitted to a specified algorithm using to provide parallelization values corresponding to the operations. The parallelization values are applied to group the operations in comparable portions for parallel execution without conflicts.
    Type: Application
    Filed: September 5, 2011
    Publication date: March 7, 2013
    Inventors: ARTUR KAUFMANN, GEORG LANG
  • Patent number: 8392171
    Abstract: Methods and systems for register mapping in emulation of a target system on a host system are disclosed. Statistics for use of a set of registers of a target system processor are determined. Based on the statistics a first subset of the target system registers, including one or more most commonly used registers is determined. The registers in the first subset are directly mapped to a first group of registers of a host system processor. A second subset of the set of target system registers is dynamically mapped to a second group of registers of the host system processor.
    Type: Grant
    Filed: August 12, 2010
    Date of Patent: March 5, 2013
    Assignee: Sony Computer Entertainment Inc.
    Inventors: Stewart Sargaison, Victor Suba
  • Patent number: 8392932
    Abstract: An information processing device for causing a processor to execute a plurality of threads by switching between them. Each thread performs a process in correspondence with an obtainment of an event. The information processing device, when causing a second thread to transit from a non-execution state to an execution state to replace a first thread, detects whether or not, in the first thread having transited to the non-execution state, a next start position of a process belongs to an already processed part, detects whether or not a start position of a process in the second thread in the execution state belongs to the processed part; and determines whether or not to set a context for execution of the second thread into the processor in accordance with detection results of the first and second detection units, and performs processing in accordance with the determination.
    Type: Grant
    Filed: May 13, 2009
    Date of Patent: March 5, 2013
    Assignee: Panasonic Corporation
    Inventor: Takuji Kawamoto
  • Patent number: 8392693
    Abstract: A microprocessor includes a cache memory and a grabline instruction. The grabline instruction specifies a memory address that implicates a cache line of the memory. The grabline instruction instructs the microprocessor to initiate a zero-beat read-invalidate transaction on the bus to obtain ownership of the cache line. The microprocessor foregoes initiating the transaction on the bus when executing the grabline instruction if the microprocessor determines that a store to the cache line would cause an exception.
    Type: Grant
    Filed: May 17, 2010
    Date of Patent: March 5, 2013
    Assignee: VIA Technologies, Inc.
    Inventors: G. Glenn Henry, Colin Eddy, Rodney E. Hooker
  • Publication number: 20130054942
    Abstract: A method for a hybrid code signature including executing, via a processor, an application, the executing comprising executing a root instruction of the application; profiling, via the processor, the executing of the application, the profiling comprising storing a reference signature; determining, via the processor, a working signature of instructions executed subsequent to the executing of the root instruction, the determining comprising implementing a hashing function of the instructions in response to storing the reference signature; tracking the updating of the working signature by storing a value in a counter; and updating continuously, via the processor, the working signature with the hashing function while at least the working signature does not match the reference signature.
    Type: Application
    Filed: August 22, 2011
    Publication date: February 28, 2013
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventor: Mauricio J. Serrano
  • Patent number: 8387053
    Abstract: A method of performing operations in a computer system, computer system, and related method of compilation, are disclosed. In one embodiment, the method of performing includes providing compiled code having at least one thread, where each of the at least one thread includes a respective plurality of blocks and each respective block includes a respective pre-fetch component and a respective execute component. The method also includes performing a first pre-fetch component from a first block of a first thread of the at least one thread, performing a first additional component after the first pre-fetch component has been performed, and performing a first execute component from the first block of the first thread. The first execute component is performed after the first additional component has been performed, and the first additional component is from either a second thread or another block of the first thread that is not the first block.
    Type: Grant
    Filed: January 25, 2007
    Date of Patent: February 26, 2013
    Assignee: Hewlett-Packard Development Company, L.P.
    Inventors: Blaine D. Gaither, Verna Knapp, Jerome Huck, Benjamin D. Osecky
  • Publication number: 20130046954
    Abstract: Disclosed is an architecture, system and method for performing multi-thread DFA descents on a single input stream. An executer performs DFA transitions from a plurality of threads each starting at a different point in an input stream. A plurality of executers may operate in parallel to each other and a plurality of thread contexts operate concurrently within each executer to maintain the context of each thread which is state transitioning. A scheduler in each executer arbitrates instructions for the thread into an at least one pipeline where the instructions are executed. Tokens may be output from each of the plurality of executers to a token processor which sorts and filters the tokens into dispatch order.
    Type: Application
    Filed: January 18, 2012
    Publication date: February 21, 2013
    Inventors: Michael Ruehle, Umesh Ramkrishnarao Kasture, Vinay Janardan Naik, Nayan Amrutlal Suthar, Robert J. McMillen
  • Patent number: 8380964
    Abstract: An information handling system includes a processor with an instruction issue queue (IQ) that may perform age tracking operations. The issue queue IQ maintains or stores instructions that may issue out-of-order in an internal data store IDS. The IDS organizes instructions in a queue position (QPOS) addressing arrangement. An age matrix of the IQ maintains a record of relative instruction aging for those instructions within the IDS. The age matrix updates latches or other memory cell data to reflect the changes in IDS instruction ages during a dispatch operation into the IQ. During dispatch of one or more instructions, the age matrix may update only those latches that require data change to reflect changing IDS instruction ages. The age matrix employs row and column data and clock controls to individually update those latches requiring update.
    Type: Grant
    Filed: April 3, 2009
    Date of Patent: February 19, 2013
    Assignee: International Business Machines Corporation
    Inventors: James Wilson Bishop, Mary Douglass Brown, Jeffrey Carl Brownscheidle, Robert Allen Cordes, Maureen Anne Delaney, Jafar Nahidi, Dung Quoc Nguyen, Joel Abraham Silberman
  • Patent number: 8380965
    Abstract: An apparatus to facilitate design of a stream processing flow that satisfies an objective, wherein the flow includes at least three processing groups, wherein a first processing group includes a data source and an operator, a second processing group includes a data source and an operator and a third processing group includes a join operator at its input and another operator, wherein data inside each group is organized by channels and each channel is a sequence of data, wherein an operator producing a data channel does not generate new data for the channel until old data of the channel is received by all other operators in the same group, and wherein data that flows from the first and second groups to the third group is done asynchronously and is stored in a queue if not ready for processing by an operator of the third group.
    Type: Grant
    Filed: June 16, 2009
    Date of Patent: February 19, 2013
    Assignee: International Business Machines Corporation
    Inventors: Eric Bouillet, Hanhua Feng, Zhen Liu, Anton V. Riabov
  • Publication number: 20130042091
    Abstract: An instruction specifies a source value and an offset value. Upon execution of the instruction, a first result of the instruction and a second result of the instruction are generated. The first result is a first portion of the source value and the second result is a second portion of the source value.
    Type: Application
    Filed: August 12, 2011
    Publication date: February 14, 2013
    Applicant: QUALCOMM INCORPORATED
    Inventors: Mao Zeng, Lucian Codrescu, Erich James Plondke
  • Publication number: 20130036295
    Abstract: A system and method for efficient garbage collection. A general-purpose central processing unit (CPU) sends a garbage collection request and a first log to a special processing unit (SPU). The first log includes an address and a data size of each allocated data object stored in a heap in memory corresponding to the CPU. The SPU has a single instruction multiple data (SIMD) parallel architecture and may be a graphics processing unit (GPU). The SPU efficiently performs operations of a garbage collection algorithm due to its architecture on a local representation of the data objects stored in the memory. The SPU records a list of changes it performs to remove dead data objects and compact live data objects. This list is subsequently sent to the CPU, which performs the included operations.
    Type: Application
    Filed: September 24, 2012
    Publication date: February 7, 2013
    Applicant: Advanced Micro Devices, Inc.
    Inventor: Advanced Micro Devices, Inc.
  • Patent number: 8370606
    Abstract: Apparatus and methods for quickly switching active context between data pointer registers are disclosed. The apparatus can include a first register operable for storing a first data pointer and a second register operable for storing a second data pointer. A configuration register can provide a first signal specifying either the first or the second data pointer as an active data pointer. An instruction decoder can receive a data pointer instruction and output a second signal. The first and second signals can be independent from one another. Decoding logic coupled to the logic devices can output one of the first or second data pointers as the active data pointer in response to the first and second signals.
    Type: Grant
    Filed: March 16, 2007
    Date of Patent: February 5, 2013
    Assignee: Atmel Corporation
    Inventors: Benjamin Francis Froemming, Emil Lambrache
  • Patent number: 8370608
    Abstract: The described embodiments provide a processor for generating a result vector with copied or propagated values from an input vector. During operation, the processor receives at least one input vector and a control vector. Using these vectors, the processor generates the result vector, which can contain copied propagated values from the input vector(s), depending on the value of the control vector. In addition, a predicate vector can be used to control the values that are written to the result vector.
    Type: Grant
    Filed: June 30, 2009
    Date of Patent: February 5, 2013
    Assignee: Apple Inc.
    Inventors: Jeffry E. Gonion, Keith E. Diefendorff
  • Patent number: 8370576
    Abstract: An embodiment of the present invention includes a circuit for tracking memory operations with trace-based execution. Each trace includes a sequence of operations that includes zero or more of the memory operations. At least some of the active memory operations access the memory in an execution order that is different from the program order. The circuit includes a first memory that caches data accessed by the memory operations. This memory is partitioned into N banks. Checkpoint entries, which are stored in a second memory also partitioned into N banks, are associated with each trace. Each entry refers to a checkpoint location in the first memory. A sub-circuit receives rollback requests and responds by overwriting checkpoint locations. Each of the N memory units consisting of a bank in the first memory and the corresponding bank in the second memory may be rolled back independently and concurrently with other memory units.
    Type: Grant
    Filed: February 13, 2008
    Date of Patent: February 5, 2013
    Assignee: Oracle America, Inc.
    Inventors: John Gregory Favor, Paul G. Chan, Graham Ricketson Murphy, Joseph Byron Rowlands
  • Patent number: 8370607
    Abstract: A system for recovering an architecture register mapping table (ARMT). The system includes a first number of collection circuits and decode circuits, a second number of selection circuits, and an enable circuit. Information related to the mapping between each physical register and an appropriate architecture register is obtained from a physical register mapping table (PRMT) by one and only one collection circuit during only one of a fourth number of instruction cycles. Each decode circuit has its input coupled to the output of one different collection circuit and is capable of converting its input into a third number bit wide binary string selection code at its output. Each selection circuit is configured to receive from each selection code a bit from a bit position associated with that selection circuit. The enable circuit is configured to appropriately enable mapping of information from the PRMT to the ARMT.
    Type: Grant
    Filed: December 23, 2009
    Date of Patent: February 5, 2013
    Assignee: STMicroelectronics (Beijing) R&D Co. Ltd.
    Inventors: Hong-Xia Sun, Kai-Feng Wang, Peng-Fei Zhu, Yong-Qiang Chris Wu
  • Patent number: 8365177
    Abstract: Measuring processes are started at a plurality of priority levels. A different one of the measuring processes is started for each of the priority levels. Subsequently, for each of the measuring processes, it is determined whether each measuring process is scheduled for executing at a respective target rate. In response to determining that a particular measuring process of the measuring processes is not scheduled for executing at a particular target rate, resource allocation to at least one monitored process running at a particular level of the priority levels is adjusted. The at least one monitored process is not any of the measuring processes.
    Type: Grant
    Filed: January 20, 2009
    Date of Patent: January 29, 2013
    Assignee: Oracle International Corporation
    Inventors: Wilson Chan, Angelo Pruscino, Ahmed S. Abbas, Tak Fung Wang
  • Publication number: 20130024662
    Abstract: Systems and methods are disclosed that allow atomic updates to global data to be at least partially eliminated to reduce synchronization overhead in parallel computing. A compiler analyzes the data to be processed to selectively permit unsynchronized data transfer for at least one type of data. A programmer may provide a hint to expressly identify the type of data that are candidates for unsynchronized data transfer. In one embodiment, the synchronization overhead is reducible by generating an application program that selectively substitutes codes for unsynchronized data transfer for a subset of codes for synchronized data transfer. In another embodiment, the synchronization overhead is reducible by employing a combination of software and hardware by using relaxation data registers and decoders that collectively convert a subset of commands for synchronized data transfer into commands for unsynchronized data transfer.
    Type: Application
    Filed: July 18, 2011
    Publication date: January 24, 2013
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Lakshminarayanan Renganarayana, Vijayalakshmi Srinivasan
  • Patent number: 8359457
    Abstract: The semiconductor device includes a controller and a plurality of dynamically reconfigurable circuits connected to one another in series below the controller to perform operations in the manner of a pipeline. The controller inputs data and reconfiguration information to the first one of the dynamically reconfigurable circuits. Each of the dynamically reconfigurable circuits includes a processing unit that performs a data computation, an updating unit that updates the reconfiguration information, and a repetition controlling unit that determines whether to repeat the computation and controls the data and the reconfiguration information.
    Type: Grant
    Filed: February 17, 2009
    Date of Patent: January 22, 2013
    Assignee: Kabushiki Kaisha Toshiba
    Inventors: Takashi Yoshikawa, Shigehiro Asano
  • Patent number: 8359459
    Abstract: A processor configured to synchronize threads in multithreaded applications. The processor includes first and second registers. The processor stores a first bitmask in the first register and a second bitmask in the second register. For each bitmask, each bit corresponds with one of multiple threads. A given bit in the first bitmask indicates the corresponding thread has been assigned to execute a portion of a unit of work. A corresponding bit in the second bitmask indicates the corresponding thread has completed execution of its assigned portion of the unit of work. The processor receives updates to the second bitmask in the second register and provides an indication that the unit of work has been completed in response to detecting that for each bit in the first bitmask that corresponds to a thread that is assigned work, a corresponding bit in the second bitmask indicates its corresponding thread has completed its assigned work.
    Type: Grant
    Filed: May 27, 2008
    Date of Patent: January 22, 2013
    Assignee: Oracle America, Inc.
    Inventor: Darryl J. Gove
  • Patent number: 8356166
    Abstract: Minimizing code duplication in an unbounded transactional memory system. A computing apparatus including one or more processors in which it is possible to use a set of common mode-agnostic TM barrier sequences that runs on legacy ISA and extended ISA processors, and that employs hardware filter indicators (when available) to filter redundant applications of TM barriers, and that enables a compiled binary representation of the subject code to run correctly in any of the currently implemented set of transactional memory execution modes, including running the code outside of a transaction, and that enables the same compiled binary to continue to work with future TM implementations which may introduce as yet unknown future TM execution modes.
    Type: Grant
    Filed: June 26, 2009
    Date of Patent: January 15, 2013
    Assignee: Microsoft Corporation
    Inventors: Ali-Reza Adl-Tabatabai, Bratin Saha, Gad Sheaffer, Vadim Bassin, Robert Y. Geva, Martin Taillefer, Darek Mihocka, Burton Jordan Smith, Jan Gray
  • Patent number: 8356162
    Abstract: An execution unit supports data dependent conditional write instructions that write data to a target only when a particular condition is met. In one implementation, a data dependent conditional write instruction identifies a condition as well as data to be tested against that condition. The data is tested against that condition, and the result of the test is used to selectively enable or disable a write to a target associated with the data dependent conditional write instruction. Then, a write is attempted while the write to the target is enabled or disabled such that the write will update the contents of the target only when the write is selectively enabled as a result of the test. By doing so, dependencies are typically avoided, as is use of an architected condition register that might otherwise introduce branch prediction mispredict penalties, enabling improved performance with z-buffer test and similar types of algorithms.
    Type: Grant
    Filed: March 18, 2008
    Date of Patent: January 15, 2013
    Assignee: International Business Machines Corporation
    Inventors: Adam James Muff, Matthew Ray Tubbs
  • Patent number: 8356163
    Abstract: A disclosed SIMD microprocessor includes plural processor elements each having n arithmetic circuits and n registers configured to temporarily store data pieces to be input to the arithmetic circuits, n being a natural number equal to or greater than 2, and; a control circuit configured to determine an arrangement order of the processor elements and an arrangement order of the arithmetic circuits in the processor elements and determine whether to use the n arithmetic circuits as a single arithmetic circuit or as n arithmetic circuits. Each processor element further includes n shifter pairs each including a PE shifter and a bit shifter; and n shift data selection circuits configured to select arbitrary data pieces from the data pieces in the shifter pairs, perform bit extension on the data pieces, and transfer the data pieces to the arithmetic circuits.
    Type: Grant
    Filed: June 24, 2008
    Date of Patent: January 15, 2013
    Assignee: Ricoh Company, Ltd.
    Inventor: Toshiki Yamanaka
  • Patent number: 8356164
    Abstract: The described embodiments provide a processor for generating a result vector with shifted values from an input vector. During operation, the processor receives an input vector and a control vector. Using these vectors, the processor generates the result vector, which can contain shifted values or propagated values from the input vector, depending on the value of the control vector. In addition, a predicate vector can be used to control the values that are written to the result vector.
    Type: Grant
    Filed: June 30, 2009
    Date of Patent: January 15, 2013
    Assignee: Apple Inc.
    Inventors: Jeffry E. Gonion, Keith E. Diefendorff
  • Publication number: 20130013900
    Abstract: A multi-thread processor includes a plurality of hardware threads each of which generates an independent instruction flow, a first thread scheduler that outputs a first thread selection signal, the first thread selection signal designating a hardware thread to be executed in a next execution cycle among the plurality of hardware threads according to a priority rank, the priority rank being established in advance for each of the plurality of hardware threads, a first selector that selects one of the plurality of hardware threads according to the first thread selection signal and outputs an instruction generated by the selected hardware thread, and an execution pipeline that executes an instruction output from the first selector. Whenever the hardware thread is executed in the execution pipeline, the first scheduler updates the priority rank for the executed hardware thread and outputs the first thread selection signal in accordance with the updated priority rank.
    Type: Application
    Filed: September 14, 2012
    Publication date: January 10, 2013
    Applicant: Renesas Electronics Corporation
    Inventors: Koji ADACHI, Teppei Oomoto
  • Publication number: 20130013899
    Abstract: Mechanisms are provided for performing escape actions within transactions. These mechanisms execute a transaction comprising a transactional section and an escape action. The transactional section is comprised of one or more instructions that are to be executed in an atomic manner as part of the transaction. The escape action is comprised of one or more instructions to be executed in a non-transactional manner. These mechanisms further populate at least one actions list data structure, associated with a thread of the data processing system that is executing the transaction, with one or more actions associated with the escape action. Moreover, these mechanisms execute one or more actions in the actions list data structure based upon whether the transaction commits successfully or is aborted.
    Type: Application
    Filed: July 6, 2011
    Publication date: January 10, 2013
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Christopher M. Barton, Harold W. Cain, III, Bradly G. Frey, Hung Q. Le, Maged M. Michael, Raul E. Silvera, Derek E. Williams, Michael Wong, Peng Wu
  • Publication number: 20130013898
    Abstract: In one embodiment, the present invention includes a method for determining if an instruction of a first thread dispatched from a first queue associated with the first thread is stalled in a pipestage of a pipeline, and if so, dispatching an instruction of a second thread from a second queue associated with the second thread to the pipeline if the second thread is not stalled. Other embodiments are described and claimed.
    Type: Application
    Filed: September 13, 2012
    Publication date: January 10, 2013
    Inventors: Matthew Merten, Avinash Sodani, James Hadley, Alexander Farcy, Iredamola Olopade
  • Publication number: 20130013839
    Abstract: A portable handheld device including a CPU for processing a script; a multi-core processor for processing an image; an input buffer for receiving data for processing by the multi-core processor, the input buffer being provided under the control of the multi-core processor to send data thereto; and an output buffer for receiving data processed by the multi-core processor, the output buffer being provided under the control of the multi-core processor to receive data therefrom. The multi-core processor comprises a plurality of micro-coded processing units. The CPU is configured with authority to clear and query the input and output buffers.
    Type: Application
    Filed: September 15, 2012
    Publication date: January 10, 2013
    Inventor: Kia Silverbrook
  • Patent number: 8352711
    Abstract: The coordination and execution of chores in a multiprocessing environment. The coordination of chores is accomplished utilizing a compiler generated correlation that relates blocks of code that execute chores and blocks of code in which the chore can be realized. By tracking the execution of the program and using the compiler-generated correlation, chores can be identified for the currently executing code.
    Type: Grant
    Filed: January 22, 2008
    Date of Patent: January 8, 2013
    Assignee: Microsoft Corporation
    Inventors: Eric Dean Tribble, Mark Ronald Plesko, Christopher Wellington Brumme
  • Publication number: 20130007420
    Abstract: A method for checking the integrity of a program executed by an electronic circuit and including at least one conditional jump, wherein: a first value is updated for any instruction which does not correspond to a jump instruction; a second value is updated with the first value for each conditional jump instruction; and the second value is compared with a third value, calculated according to the performed conditional jumps.
    Type: Application
    Filed: June 15, 2012
    Publication date: January 3, 2013
    Applicant: Proton World International N.V.
    Inventors: Gilles Van Assche, Ronny Vankeer
  • Publication number: 20130007419
    Abstract: A computer implemented method selects K extreme elements of a list of N elements by partitioning each of the N elements into a plurality of sections. For each section from a most significant section to a least significant section the method selects a threshold selection determining at least K extreme entries from the list. This iteratively compares a corresponding section to a section threshold, counts a number of sections which are more extreme than the section threshold, increasing (or decreasing) the section threshold if the count is greater than K and decreasing the section threshold if the count is less than K. After finding the section threshold for the corresponding section the method forms a combined threshold by concatenation of said section thresholds in order from a most significant section to a least significant section, compares each of the N elements to the combined threshold, and selects at least K elements from the set of N elements more extreme than the combined threshold.
    Type: Application
    Filed: April 12, 2012
    Publication date: January 3, 2013
    Applicant: Texas Instruments Incorporated
    Inventors: Constantin Bajenaru, Michael Livshitz, Mingjian Yan, Jing Jiang
  • Publication number: 20120331319
    Abstract: A technique for reducing the power consumption required to execute processing operations. A processing complex, such as a CPU or a GPU, includes a first set of cores comprising one or more fast cores and second set of cores comprising one or more slow cores. A processing mode of the processing complex can switch between a first mode of operation and a second mode of operation based on one or more of the workload characteristics, performance characteristics of the first and second sets of cores, power characteristics of the first and second sets of cores, and operating conditions of the processing complex. A controller causes the processing operations to be executed by either the first set of cores or the second set of cores to achieve the lowest total power consumption.
    Type: Application
    Filed: September 5, 2012
    Publication date: December 27, 2012
    Inventors: John George Mathieson, Phil Carmack, Brian Smith
  • Publication number: 20120331275
    Abstract: A technique for reducing the power consumption required to execute processing operations. A processing complex, such as a CPU or a GPU, includes a first set of cores comprising one or more fast cores and second set of cores comprising one or more slow cores. A processing mode of the processing complex can switch between a first mode of operation and a second mode of operation based on one or more of the workload characteristics, performance characteristics of the first and second sets of cores, power characteristics of the first and second sets of cores, and operating conditions of the processing complex. A controller causes the processing operations to be executed by either the first set of cores or the second set of cores to achieve the lowest total power consumption.
    Type: Application
    Filed: September 5, 2012
    Publication date: December 27, 2012
    Inventors: John George Mathieson, Phil Carmack, Brian Smith
  • Patent number: 8335911
    Abstract: Systems and methods for efficient dynamic utilization of shared resources in a processor. A processor comprises a front end pipeline, an execution pipeline, and a commit pipeline, wherein each pipeline comprises a shared resource with entries configured to be allocated for use in each clock cycle by each of a plurality of threads supported by the processor. To avoid starvation of any active thread, the processor further comprises circuitry configured to ensure each active thread is able to allocate at least a predetermined quota of entries of each shared resource. Each pipe stage of a total pipeline for the processor may include at least one dynamically allocated shared resource configured not to starve any active thread. Dynamic allocation of shared resources between a plurality of threads may yield higher performance over static allocation. In addition, dynamic allocation may require relatively little overhead for activation/deactivation of threads.
    Type: Grant
    Filed: September 30, 2009
    Date of Patent: December 18, 2012
    Assignee: Oracle America, Inc.
    Inventors: Robert T. Golla, Gregory F. Grohoski
  • Patent number: 8332619
    Abstract: A processor may include an address monitor table and an atomic update table to support speculative threading. The processor may also include one or more registers to maintain state associated with execution of speculative threads. The processor may support one or more of the following primitives: an instruction to write to a register of the state, an instruction to trigger the committing of buffered memory updates, an instruction to read the a status register of the state, and/or an instruction to clear one of the state bits associated with trap/exception/interrupt handling. Other embodiments are also described and claimed.
    Type: Grant
    Filed: December 8, 2011
    Date of Patent: December 11, 2012
    Assignee: Intel Corporation
    Inventors: Quinn A. Jacobson, Hong Wang, John P. Shen, Gautham N. Chinya, Per Hammarlund, Xiang Zou, Bryant Bigbee, Shivnandan D. Kaushik
  • Patent number: 8332618
    Abstract: An out-of-order execution microprocessor includes a register alias table configured to generate a first indicator that indicates whether an instruction is dependent upon a condition code result of a shift instruction. The microprocessor also includes a first execution unit configured to execute the shift instruction and to generate a second indicator that indicates whether a shift amount of the shift instruction is zero. The microprocessor also includes a second execution unit configured to receive the first and second indicators and to generate a replay signal to cause the instruction to be replayed if the first indicator indicates the instruction is dependent upon the condition code result of the shift instruction and a second indicator indicates the shift amount of the shift instruction is zero.
    Type: Grant
    Filed: December 9, 2009
    Date of Patent: December 11, 2012
    Assignee: VIA Technologies, Inc.
    Inventors: Gerard M. Col, Matthew Daniel Day, Terry Parks, Bryan Wayne Pogor
  • Patent number: 8327379
    Abstract: A task processor includes a CPU, a save circuit, and a task control circuit. The CPU is provided with a processing register and an execution control circuit operative to load data from a memory into a processing register and execute a task in accordance with the data in the processing register. The task control circuit is provided with a task selecting circuit and state storage units respectively associated with tasks. In executing a predetermined system call, the execution control circuit notifies the task control circuit as such. Upon being notified of the execution of the system call instruction, the task control circuit switches a task to be executed next in accordance with an output from the task selecting circuit. The task selecting circuit selects a task in accordance with an output from the state registers.
    Type: Grant
    Filed: August 24, 2006
    Date of Patent: December 4, 2012
    Assignee: Kernelon Silicon Inc.
    Inventor: Naotaka Maruyama