Instruction Issuing Patents (Class 712/214)
-
Patent number: 8505015Abstract: A “group work sorting” technique is used in a parallel computing system that executes multiple items of work across multiple parallel processing units, where each parallel processing unit processes one or more of the work items according to their positions in a prioritized work queue that corresponds to the parallel processing unit. When implementing the technique, one or more of the parallel processing units receives a new work item to be placed into a first work queue that corresponds to the parallel processing unit and receives data that indicates where one or more other parallel processing units would prefer to place the new work item in the prioritized work queues that correspond to the other parallel processing units. The parallel processing unit uses the received data as a guide in placing the new work item into the first work queue.Type: GrantFiled: October 29, 2009Date of Patent: August 6, 2013Assignee: Teradata US, Inc.Inventor: Curtis Stehley
-
Publication number: 20130191616Abstract: In a vector processing device, a data dependence detecting unit detects a data dependence relation between a preceding instruction and a succeeding instruction which are inputted from an instruction buffer, and an instruction issuance control unit controls issuance of an instruction based on a detection result thereof. When there is a data dependence relation between the preceding instruction and the succeeding instruction, the instruction issuance control unit generates a new instruction equivalent to processing related to a vector register including the data dependence relation with the succeeding instruction in processing executed by the preceding instruction and issues the new instruction between the preceding instruction and the succeeding instruction, and thereby a data hazard can be avoided between the preceding instruction and the succeeding instruction without making a stall occur.Type: ApplicationFiled: December 13, 2012Publication date: July 25, 2013Applicant: FUJITSU SEMICONDUCTOR LIMITEDInventor: FUJITSU SEMICONDUCTOR LIMITED
-
Publication number: 20130185542Abstract: An external Auxiliary Execution Unit (AXU) interface is provided between a processing core disposed in a first programmable chip and an off-chip AXU disposed in a second programmable chip to integrate the AXU with an issue unit, a fixed point execution unit, and optionally other functional units in the processing core. The external AXU interface enables the issue unit to issue instructions to the AXU in much the same manner as the issue unit would be able to issue instructions to an AXU that was disposed on the same chip. By doing so, the AXU on the second programmable chip can be designed, tested and verified independent of the processing core on the first programmable chip, thereby enabling a common processing core, which has been designed, tested, and verified, to be used in connection with multiple different AXU designs.Type: ApplicationFiled: January 18, 2012Publication date: July 18, 2013Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Eric O. Mejdrich, Paul E. Schardt, Robert A. Shearer, Corey V. Swenson
-
Patent number: 8489863Abstract: An information handling system includes a processor with an instruction issue queue (IQ) that may perform age tracking operations. The issue queue IQ maintains or stores instructions that may issue out-of-order in an internal data store (IDS). The IDS organizes instructions in a queue position (QPOS) addressing arrangement. An age matrix of the IQ maintains a record of relative instruction aging for those instructions within the IDS. The age matrix updates latches or other memory cell data to reflect the changes in IDS instruction ages during a dispatch operation into the IQ. During dispatch of one or more instructions, the age matrix may update only those latches that require data change to reflect changing IDS instruction ages. The age matrix employs row and column data and clock controls to individually update those latches requiring update.Type: GrantFiled: April 19, 2012Date of Patent: July 16, 2013Assignee: International Business Machines CorporationInventors: James Wilson Bishop, Mary Douglass Brown, Jeffrey Carl Brownscheidle, Robert Allen Cordes, Maureen Anne Delaney, Jafar Nahidi, Dung Quoc Nguyen, Joel Abraham Silberman
-
Publication number: 20130166885Abstract: When an instruction is executed on an integrated circuit (IC), an activity level and temperature are measured. A relationship between the activity level and temperature is determined, to estimate the temperature from the activity level. The activity level is monitored and is input to a scheduler, which estimates the IC temperature based on the activity level. The scheduler distributes work taking into account the temperature of various IC regions and may include distributing work to the IC region that has a lowest estimated temperature or relatively lower estimated temperature (e.g., lower than the average IC or IC region temperature). When the utilization level of one or more IC regions is high, the scheduler is configured to reduce the clock speed or the voltage of the one or more IC regions, or flag the regions as being unavailable for additional workload.Type: ApplicationFiled: June 22, 2012Publication date: June 27, 2013Applicant: ADVANCED MICRO DEVICES, INC.Inventors: Karthik Ramani, Stephen Presant, John Brothers
-
Publication number: 20130145127Abstract: A data processing system is provided in which destination operands to be stored within architectural registers are constrained to have zero values added as prefixes in order that the architectural register value has a fixed bit width irrespective of the bit width of the destination operand being written thereto. Instead of adding these zero values everywhere in the data path, they are instead represented by zero flags in at least the physical registers utilised for register renaming operations and in the result queue prior to results being written to the architectural register file. This saves circuitry resources and reduces energy consumption.Type: ApplicationFiled: December 6, 2011Publication date: June 6, 2013Applicant: ARM LIMITEDInventors: James Nolan Hardage, Glen Andrew Harris, Mark Carpenter Glass
-
Patent number: 8458716Abstract: Methods, systems, and computer program products for operating an enterprise resource planning system. The method includes running a placeholder job in said enterprise resource planning system in response to a request from at least one client application for notification of at least one background processing event, wherein the placeholder job is executed in response to the at least one background processing event.Type: GrantFiled: December 31, 2008Date of Patent: June 4, 2013Assignee: International Business Machines CorporationInventors: Ralf Altrichter, Gerd Kehrer, Martin Raitza
-
Patent number: 8458443Abstract: A processor may include a plurality of processing units for processing instructions, where each processing unit is associated with a discrete instruction queue. Data is read from a data queue selected by each instruction, and a sequencer manages distribution of instructions to the plurality of discrete instruction queues.Type: GrantFiled: September 8, 2009Date of Patent: June 4, 2013Assignee: SMSC Holdings S.A.R.L.Inventors: Matthias Tramm, Manfred Stadler, Christian Hitz
-
Publication number: 20130138925Abstract: A method and circuit arrangement speculatively preprocess data stored in a register file during otherwise unused cycles in an execution unit, e.g., to prenormalize denormal floating point values stored in a floating point register file, to decompress compressed values stored in a register file, to decrypt encrypted values stored in a register file, or to otherwise preprocess data that is stored in an unprocessed form in a register file.Type: ApplicationFiled: November 30, 2011Publication date: May 30, 2013Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Mark J. Hickey, Adam J. Muff, Matthew R. Tubbs, Charles D. Wait
-
Publication number: 20130117541Abstract: One embodiment of the present invention sets forth a technique for speculatively issuing instructions to allow a processing pipeline to continue to process some instructions during rollback of other instructions. A scheduler circuit issues instructions for execution assuming that, several cycles later, when the instructions reach multithreaded execution units, that dependencies between the instructions will be resolved, resources will be available, operand data will be available, and other conditions will not prevent execution of the instructions. When a rollback condition exists at the point of execution for an instruction for a particular thread group, the instruction is not dispatched to the multithreaded execution units. However, other instructions issued by the scheduler circuit for execution by different thread groups, and for which a rollback condition does not exist, are executed by the multithreaded execution units.Type: ApplicationFiled: November 4, 2011Publication date: May 9, 2013Inventors: Jack Hilaire CHOQUETTE, Olivier Giroux, Robert J. Stoll, Xiaogang Qiu
-
Publication number: 20130111191Abstract: A system and method for reducing power consumption through issue throttling of selected problematic instructions. A power throttle unit within a processor maintains instruction issue counts for associated instruction types. The instruction types may be a subset of supported instruction types executed by an execution core within the processor. The instruction types may be chosen based on high power consumption estimates for processing instructions of these types. The power throttle unit may determine a given instruction issue count exceeds a given threshold. In response, the power throttle unit may select given instruction types to limit a respective issue rate. The power throttle unit may choose an issue rate for each one of the selected given instruction types and limit an associated issue rate to a chosen issue rate. The selection of given instruction types and associated issue rate limits is programmable.Type: ApplicationFiled: October 31, 2011Publication date: May 2, 2013Inventors: Daniel C. Murray, Andrew J. Beaumont-Smith, John H. Mylius, Peter J. Bannon, Toshi Takayanagi, Jung Wook Cho
-
Patent number: 8433855Abstract: Embodiments of the invention include a method of synchronizing translation changes in a processor including a translation lookaside buffer, the method including setting a control bit to enable blocking of all fetch requests that miss the translation lookaside buffer without changing a translation state of the current process; if there is at least one pending translation, then waiting for completion of the at least one pending translation; and resetting the control bit. A processor and a computer program product are provided.Type: GrantFiled: February 15, 2008Date of Patent: April 30, 2013Assignee: International Business Machines CorporationInventors: Gregory W. Alexander, Lisa C. Heller, Chung-Lung Kevin Shum
-
Patent number: 8397052Abstract: Mechanisms are provided for controlling version pressure on a speculative versioning cache. Raw version pressure data is collected based on one or more threads accessing cache lines of the speculative versioning cache. One or more statistical measures of version pressure are generated based on the collected raw version pressure data. A determination is made as to whether one or more modifications to an operation of a data processing system are to be performed based on the one or more statistical measures of version pressure, the one or more modifications affecting version pressure exerted on the speculative versioning cache. An operation of the data processing system is modified based on the one or more determined modifications, in response to a determination that one or more modifications to the operation of the data processing system are to be performed, to affect the version pressure exerted on the speculative versioning cache.Type: GrantFiled: August 19, 2009Date of Patent: March 12, 2013Assignee: International Business Machines CorporationInventors: Alexandre E. Eichenberger, Alan Gara, Kathryn M. O'Brien, Martin Ohmacht, Xiaotong Zhuang
-
Patent number: 8397238Abstract: Methods, apparatuses, and computer-readable storage media are disclosed for reducing power by reducing hardware-thread toggling in a multi-threaded processor. In a particular embodiment, a method allocates software threads to hardware threads. A number of software threads to be allocated is identified. It is determined when the number of software threads is less than a number of hardware threads. When the number of software threads is less than the number of hardware threads, at least two of the software threads are allocated to non-sequential hardware threads. A clock signal to be applied to the hardware threads is adjusted responsive to the non-sequential hardware threads allocated.Type: GrantFiled: December 8, 2009Date of Patent: March 12, 2013Assignee: QUALCOMM IncorporatedInventors: Suresh K. Venkumahanti, Martin Saint-Laurent, Lucian Codrescu, Baker S. Mohammad
-
Patent number: 8384956Abstract: An image processing method includes: selecting an image processing module for each attribute associated with a block of image data in accordance with the content of image processing made to correspond to the attribute; generating an image processing flow for each block by use of the selected image processing module; and determining whether the image processing flow can be constructed in an image processing area. When it is determined that a processing area of the image processing flow cannot be constructed in the image processing area, the method selects an image processing flow having an image processing module which is not contained in the other image processing flows from among a plurality of the image processing flows the blocks, the selected image processing flow being constructed in the image processing area such that the processing area of the image processing flow is included in the image processing area.Type: GrantFiled: August 20, 2009Date of Patent: February 26, 2013Assignee: Canon Kabushiki KaishaInventor: Toshimitsu Nakano
-
Publication number: 20130046954Abstract: Disclosed is an architecture, system and method for performing multi-thread DFA descents on a single input stream. An executer performs DFA transitions from a plurality of threads each starting at a different point in an input stream. A plurality of executers may operate in parallel to each other and a plurality of thread contexts operate concurrently within each executer to maintain the context of each thread which is state transitioning. A scheduler in each executer arbitrates instructions for the thread into an at least one pipeline where the instructions are executed. Tokens may be output from each of the plurality of executers to a token processor which sorts and filters the tokens into dispatch order.Type: ApplicationFiled: January 18, 2012Publication date: February 21, 2013Inventors: Michael Ruehle, Umesh Ramkrishnarao Kasture, Vinay Janardan Naik, Nayan Amrutlal Suthar, Robert J. McMillen
-
Patent number: 8380964Abstract: An information handling system includes a processor with an instruction issue queue (IQ) that may perform age tracking operations. The issue queue IQ maintains or stores instructions that may issue out-of-order in an internal data store IDS. The IDS organizes instructions in a queue position (QPOS) addressing arrangement. An age matrix of the IQ maintains a record of relative instruction aging for those instructions within the IDS. The age matrix updates latches or other memory cell data to reflect the changes in IDS instruction ages during a dispatch operation into the IQ. During dispatch of one or more instructions, the age matrix may update only those latches that require data change to reflect changing IDS instruction ages. The age matrix employs row and column data and clock controls to individually update those latches requiring update.Type: GrantFiled: April 3, 2009Date of Patent: February 19, 2013Assignee: International Business Machines CorporationInventors: James Wilson Bishop, Mary Douglass Brown, Jeffrey Carl Brownscheidle, Robert Allen Cordes, Maureen Anne Delaney, Jafar Nahidi, Dung Quoc Nguyen, Joel Abraham Silberman
-
Publication number: 20130042090Abstract: One embodiment of the present invention sets forth a technique for optimizing parallel thread execution in a temporal single-instruction multiple thread (SIMT) architecture. When the threads in a parallel thread group execute temporally on a common processing pipeline rather than spatially on parallel processing pipelines, execution cycles may be reduced when some threads in the parallel thread group are inactive due to divergence. Similarly, an instruction can be dispatched for execution by only one thread in the parallel thread group when the threads in the parallel thread group are executing a scalar instruction. Reducing the number of threads that execute an instruction removes unnecessary or redundant operations for execution by the processing pipelines. Information about scalar operands and operations and divergence of the threads is used in the instruction dispatch logic to eliminate unnecessary or redundant activity in the processing pipelines.Type: ApplicationFiled: August 12, 2011Publication date: February 14, 2013Inventor: Ronny M. KRASHINSKY
-
Publication number: 20130024666Abstract: A method of scheduling a plurality of instructions for a processor comprises the steps of: establishing a functional unit resource table comprising a plurality of columns, each of which corresponds to one of a plurality of operation cycles of the processor and comprises a plurality of fields, each of which indicates a functional unit of the processor; establishing a ping-pong resource table comprising a plurality of columns, each of which corresponds to one of the plurality of operation cycles of the processor and comprises a plurality of fields, each of which indicates a read port or a write port of a register bank of the processor; and allotting the plurality of instructions to the plurality of operation cycles of the processor and registering the functional units and the ports of the register banks corresponding to the allotted instructions on the functional unit resource table and the ping-pong resource table.Type: ApplicationFiled: July 18, 2011Publication date: January 24, 2013Applicant: NATIONAL TSING HUA UNIVERSITYInventors: JENQ KUEN LEE, YU TE LIN, CHUNG JU WU
-
Patent number: 8359589Abstract: A set of helper thread binaries is created to retrieve data used by a set of main thread binaries. If executing a portion of the set of helper thread binaries results in the retrieval of data needed by the set of main thread binaries, then that retrieved data is utilized by the set of main thread binaries.Type: GrantFiled: February 1, 2008Date of Patent: January 22, 2013Assignee: International Business Machines CorporationInventors: Ravi K. Arimilli, Juan C. Rubio, Balaram Sinharoy
-
Publication number: 20130013897Abstract: A method provides efficient dispatch/completion of an N Dimensional (ND) Range command in a data processing system (DPS). The method comprises: a compiler generating one or more commands from received program instructions; ND Range work processing (WP) logic determining when a command generated by the compiler will be implemented over an ND configuration of operands, where N is greater than one (1); automatically decomposing the ND configuration of operands into a one (1) dimension (1D) work element comprising P sequentially ordered work items that each represent one of the operands; placing the 1D work element within a command queue of the DPS; enabling sequential dispatching of 1D work items in ordered sequence from to one or more processing units; and generating an ND Range output by mapping the 1D work output result to an ND position corresponding to an original location of the operand represented by the 1D work item.Type: ApplicationFiled: September 15, 2012Publication date: January 10, 2013Applicant: IBM CORPORATIONInventors: Gregory H. Bellows, Brian H. Horton, Joaquin Madruga, Barry L. Minor
-
Patent number: 8347309Abstract: Systems and methods for efficient thread arbitration in a processor. A processor comprises a multi-threaded resource. The resource may include an array of entries which may be allocated by threads. A thread arbitration table corresponding to a given thread stores a high and a low threshold value in each table entry. A thread history shift register (HSR) indexes the table, wherein each bit of the HSR indicates whether the given thread is a thread hog. When the given thread has more allocated entries in the array than the high threshold of the table entry, the given thread is stalled from further allocating array entries. Similarly, when the given thread has fewer allocated entries in the array than the low threshold of the selected table entry, the given thread is permitted to allocate entries. In this manner, threads that hog dynamic resources can be mitigated such that more resources are available to other threads that are not thread hogs.Type: GrantFiled: July 29, 2009Date of Patent: January 1, 2013Assignee: Oracle America, Inc.Inventors: Jared C. Smolens, Robert T. Golla, Matthew B. Smittle
-
Publication number: 20120331273Abstract: A method of reducing a set of instructions for execution on a processor, the method comprising: extracting information from a first instruction of the set of instructions; identifying unencoded space in one or more further instructions of the set of instructions; replacing the unencoded space of the one or more further instructions with the extracted information of the first instruction so as to form one or more amalgamated instructions; and removing the first instruction from the set of instructions.Type: ApplicationFiled: December 29, 2011Publication date: December 27, 2012Applicant: CAMBRIDGE SILICON RADIO LIMITEDInventors: Peter Smith, David Richard Hargreaves
-
Patent number: 8327118Abstract: A processor 2 is responsive to a stream of program instructions to issue program instructions under control of scheduling circuitry 6 to respective execution units 24 for execution. The execution units 24 can include error detecting circuitry 32 for detecting a change in an output signal which occurs after the output signal has latched and during an error detecting period following the latching of the output signal. The scheduling circuitry 6 is arranged so as to suppress issue of program instructions to an execution unit 24 having such error detecting circuitry 32 on consecutive processing cycles.Type: GrantFiled: July 21, 2009Date of Patent: December 4, 2012Assignee: ARM LimitedInventors: David Michael Bull, Emre Ozer, Shidhartha Das
-
Publication number: 20120303937Abstract: A computer system used to execute an application includes a motion sensing unit, a processor and an instruction transfer unit. The motion sensing unit senses a gesture of a human body and generates an input instruction based on the gesture. The processor executes the application (or a game). The instruction transfer unit is connected with the motion sensing unit and the processor and serves as a communication interface between the motion sensing unit and the application. The instruction transfer unit transfers the input instruction to a control command, and the processor controls and executes the application in accordance with the control command.Type: ApplicationFiled: May 24, 2012Publication date: November 29, 2012Inventors: Chia-I CHU, Cheng-Hsein Yang
-
Publication number: 20120272045Abstract: A control method and a system for dispatching the execution sequence of the processes in a multiprocessors system so as to dispatch an operation sequence for executing different operation programs by a monitoring processor and a plurality of target processors. The monitoring processor obtains operation status of other processors from a buffer; the monitoring processor selects at least one target processor according to the operation status; the monitoring processor assigns the target processor to execute a corresponding slave operation program, and modifies the operation status of the target processors in the buffer module; and the monitoring processor repeats the setting the operation status and assigning other target processors to execute corresponding operation programs, till a master operation program is completed.Type: ApplicationFiled: September 1, 2011Publication date: October 25, 2012Applicant: FEATURE INTEGRATION TECHNOLOGY INC.Inventor: Shih-Jen Chuang
-
Patent number: 8296549Abstract: The invention discloses an overlapping command committing method of dynamic cycle pipeline, for a chip having pipeline structure, the method comprising the following steps: reading the command from command buffer, decoding the command, judging whether operator is reasonable or not, if a illegal command, then deleting, otherwise preprocessing the operator of command, preparing the initial operator of each pipeline, and observing the status of pipeline, waiting for pipeline command exiting signal, and judging whether there is command relevance or not, if not, then committing a new command to pipeline when the command exiting a last cycle of pipeline. Overlapping command committing method of the invention can avoid appearing of bubble, improve parallelism of pipeline performing unit, and thus shorten the processing period of command in chip, let the chip process more command in unit time.Type: GrantFiled: June 22, 2004Date of Patent: October 23, 2012Assignee: ZTE CorporationInventors: Zong Zhao, Min Ren, Hu Chen
-
Patent number: 8291421Abstract: A system and method are provided for determining processor usable idle time in a system employing a software instruction processor. The method establishes an idle task with a lowest processor priority for a processor executing application software instructions, and uses the processor to execute an idle task. The method ceases to execute the idle task in response to the processor executing application software instructions. The amount of periodic idle task execution is determined and stored in a tangible memory medium. For example, idle time amounts can be determined per a unit of time, i.e. a percentage per second. In one aspect, the method generates an idle task report. The report can be a periodic report expressing the duration of idle task execution per time period, or a course of execution report expressing idle task start times, idle task stop times, and durations between the corresponding start and stop times.Type: GrantFiled: November 19, 2008Date of Patent: October 16, 2012Assignee: Sharp Laboratories of America, Inc.Inventors: Tommy Lee Oswald, John C. Thomas, James E. Owen
-
Patent number: 8285974Abstract: An apparatus for queue allocation. An embodiment of the apparatus includes a dispatch order data structure, a bit vector, and a queue controller. The dispatch order data structure corresponds to a queue. The dispatch order data structure stores a plurality of dispatch indicators associated with a plurality of pairs of entries of the queue to indicate a write order of the entries in the queue. The bit vector stores a plurality of mask values corresponding to the dispatch indicators of the dispatch order data structure. The queue controller interfaces with the queue and the dispatch order data structure. The queue controller excludes at least some of the entries from a queue operation based on the mask values of the bit vector.Type: GrantFiled: July 30, 2007Date of Patent: October 9, 2012Assignee: NetLogic Microsystems, Inc.Inventors: Gaurav Singh, Srivatsan Srinivasan, Lintsung Wong
-
Publication number: 20120246448Abstract: A system for executing instructions using a plurality of memory fragments for a processor. The system includes a global front end scheduler for receiving an incoming instruction sequence, wherein the global front end scheduler partitions the incoming instruction sequence into a plurality of code blocks of instructions and generates a plurality of inheritance vectors describing interdependencies between instructions of the code blocks. The system further includes a plurality of virtual cores of the processor coupled to receive code blocks allocated by the global front end scheduler, wherein each virtual core comprises a respective subset of resources of a plurality of partitionable engines, wherein the code blocks are executed by using the partitionable engines in accordance with a virtual core mode and in accordance with the respective inheritance vectors. A plurality memory fragments are coupled to the partitionable engines for providing data storage.Type: ApplicationFiled: March 23, 2012Publication date: September 27, 2012Applicant: SOFT MACHINES, INC.Inventor: Mohammad Abdallah
-
Publication number: 20120246447Abstract: According to one embodiment of the present disclosure, an approach is provided in which a thread is selected from multiple active threads, along with a corresponding weighting value. Computational logic determines whether one of the multiple threads is dispatching an instruction and, if so, computes a dispatch weighting value using the selected weighting value and a dispatch factor that indicates a weighting adjustment of the selected weighting value. In turn, a resource utilization value of the selected thread is computed using the dispatch weighting value.Type: ApplicationFiled: March 27, 2011Publication date: September 27, 2012Applicant: International Business Machines CorporationInventors: James Wilson Bishop, Michael J. Genden, Steven Bradford Herndon, Philip Lee Vitale
-
Patent number: 8275975Abstract: The invention proposes a simple method for controlling distributed functional units (FU) in a system. It offloads the main system processor from intermediate status monitoring. The sequencer controlled system comprises a plurality of functional units, a processor operatively coupled to the plurality of functional units through a bus, a sequencer having a set of registers, and an interrupt source register configured for interrupt polling. The registers are configured to control the timing of at least one operation of the functional units with stored instructions for each of the functional units. The processor sets up at least some of the registers through the bus for the initial configuration and the sequencer is activated by the processor.Type: GrantFiled: January 25, 2008Date of Patent: September 25, 2012Assignee: Mtekvision Co., Ltd.Inventors: Ali Osman Ors, Daniel Laroche, Jean-François Deschênes
-
Patent number: 8271765Abstract: The illustrative embodiments described herein provide a computer-implemented method, apparatus, and a system for managing instructions. A load/store unit receives a first instruction at a port. The load/store unit rejects the first instruction in response to determining that the first instruction has a first reject condition. Then, the instruction sequencing unit activates a first bit in response to the load/store unit rejection the first instruction. The instruction sequencing unit blocks the first instruction from reissue while the first bit is activated. The processor unit determines a class of rejection of the first instruction. The instruction sequencing unit starts a timer. The length of the timer is based on the class of rejection of the first instruction. The instruction sequencing unit resets the first bit in response to the timer expiring. The instruction sequencing unit allows the first instruction to become eligible for reissue in response to resetting the first bit.Type: GrantFiled: April 8, 2009Date of Patent: September 18, 2012Assignee: International Business Machines CorporationInventors: Pradip Bose, Alper Buyuktosunoglu, Michael Stephen Floyd, Dung Quoc Nguyen, Bruce Joseph Ronchetti
-
Patent number: 8255670Abstract: In one embodiment, a processor comprises a scheduler configured to issue a first instruction operation to be executed and an execution core coupled to the scheduler. Configured to execute the first instruction operation, the execution core comprises a plurality of replay sources configured to cause a replay of the first instruction operation responsive to detecting at least one of a plurality of replay cases. The scheduler is configured to inhibit issuance of the first instruction operation subsequent to the replay for a subset of the plurality of replay cases. The scheduler is coupled to receive an acknowledgement indication corresponding to each of the plurality of replay cases in the subset, and is configured to inhibit issuance of the first instruction operation until the acknowledgement indication is asserted that corresponds to an identified replay case of the subset.Type: GrantFiled: November 17, 2009Date of Patent: August 28, 2012Assignee: Apple Inc.Inventors: Po-Yung Chang, Wei-Han Lien, Jesse Pan, Ramesh Gunna, Tse-Yu Yeh, James B. Keller
-
Patent number: 8250226Abstract: In one embodiment, a method for generating one or more synthetic transactions with one or more web service operations includes accessing a Web Services Description Language (WSDL) file describing a web service and, according to the WSDL file, generating a symbol table for describing a client for generating one or more synthetic transactions with the web service. The method also includes receiving input from a user specifying one or more operations of the web service for invocation, an order for invoking the operations of the web service, and one or more values of one or more parameters of the operations. The method also includes incorporating the input from the user into the symbol table and generating the client according to the symbol table.Type: GrantFiled: July 21, 2005Date of Patent: August 21, 2012Assignee: CA, Inc.Inventors: Roger C. Saunders, William S. R. Thain
-
Patent number: 8245065Abstract: A method of power gating a microprocessor having an instruction scheduling unit for receiving issued instructions from an instruction decode unit; an execution unit coupled to receive and send signals from and to the instruction scheduling unit; and a state machine located within the execution unit, the method comprises: obtaining a number of instructions per cycle being issued to the instruction scheduling unit; determining, subsequent to obtaining the number of instructions per cycle, if the number of instruction per cycle being issued to the instruction scheduling unit is less than a threshold level, and then determining if at least two of the instructions being issued to the instruction scheduling unit are independent of each other only when the instructions per cycle is less than the threshold level; determining when at least two of the instructions being issued to the instruction scheduling unit are independent of each other; and power gating the microprocessor to gate off power to idle macros with a siType: GrantFiled: March 4, 2009Date of Patent: August 14, 2012Assignee: International Business Machines CorporationInventors: Tim Niggemeier, Harry Barowski, Maarten Boersma, Gunnar Spiess
-
Patent number: 8245015Abstract: A processor includes a plurality of executing sections configured to simultaneously execute instructions for a plurality of threads, an instruction issuing section configured to issue instructions to the plurality of executing sections, and an instruction sync monitoring section configured to, when an instruction-synchronizing instruction is issued to one or more of the plurality of executing sections from the instruction issuing section, monitor completion of execution of the instruction-synchronizing instruction for each of the executing sections, to which the instruction-synchronizing instruction has been issued, thus detecting completion of execution of preceding instructions for the thread to which the instruction-synchronizing instruction belongs.Type: GrantFiled: July 7, 2009Date of Patent: August 14, 2012Assignee: Sony CorporationInventor: Masaaki Ishii
-
Patent number: 8245232Abstract: Systems and methodologies for stall-time fair memory access scheduling for shared memory systems are provided herein. A stall-time fairness policy can be applied in accordance with various aspects described herein to schedule memory requests from threads sharing a memory system. To this end, a Stall-Time Fair Memory scheduler (STFM) algorithm can be utilized, wherein memory-related slowdown experienced by a group of threads due to interference from other threads is equalized. Additionally and/or alternatively, a traditional scheduling policy such as first-ready first-come-first-serve (FR-FCFS) can be utilized in combination with a cap on column-over-row reordering of memory requests, thereby reducing the amount of stall-time unfairness imposed by such traditional scheduling policies. Further, various aspects described herein can perform memory scheduling based on thread weights and/or other parameters, which can be configured in hardware and/or software.Type: GrantFiled: March 5, 2008Date of Patent: August 14, 2012Assignee: Microsoft CorporationInventors: Onur Mutlu, Thomas Moscibroda
-
Patent number: 8245014Abstract: The present invention provides a network multithreaded processor, such as a network processor, including a thread interleaver that implements fine-grained thread decisions to avoid underutilization of instruction execution resources in spite of large communication latencies. In an upper pipeline, an instruction unit determines an-instruction fetch sequence responsive to an instruction queue depth on a per thread basis. In a lower pipeline, a thread interleaver determines a thread interleave sequence responsive to thread conditions including thread latency conditions. The thread interleaver selects threads using a two-level round robin arbitration. Thread latency signals are active responsive to thread latencies such as thread stalls, cache misses, and interlocks. During the subsequent one or more clock cycles, the thread is ineligible for arbitration. In one embodiment, other thread conditions affect selection decisions such as local priority, global stalls, and late stalls.Type: GrantFiled: April 14, 2008Date of Patent: August 14, 2012Assignee: Cisco Technology, Inc.Inventors: Donald E Steiss, Earl T Cohen, John J Williams
-
Publication number: 20120204008Abstract: Methods and apparatus for processing instructions by elaboration of instructions prior to issuing the instructions for execution are described. An instruction is received at a hybrid instruction queue comprised of a first queue and a second queue. When the second queue has available space, the instruction is elaborated to expand one or more bit fields to reduce decoding complexity when the elaborated instruction is issued, wherein the elaborated instruction is stored in the second queue. When the second queue does not have available space, the instruction is stored in an unelaborated form in a first queue. The first queue is configured as an exemplary in-order queue and the second queue is configured as an exemplary out-of-order queue.Type: ApplicationFiled: February 1, 2012Publication date: August 9, 2012Applicant: QUALCOMM INCORPORATEDInventors: Kenneth Alan Dockser, Yusuf Cagatay Tekmen
-
Publication number: 20120204009Abstract: A processor includes an instruction fetch unit, an issue queue coupled to the instruction fetch unit, an execution unit coupled to the issue queue, and a multi-level register file including a first level register file having lower access latency and a second level register file having higher access latency. Each of the first and second level register files includes a plurality of physical registers for holding operands that is concurrently shared by a plurality of threads. The processor further includes a mapper that, at dispatch of an instruction specifying a source logical register from the instruction fetch unit to the issue queue, initiates a swap of a first operand associated with the source logical register that is in the second level register file with a second operand held in the first level register file. The issue queue, following the swap, issues the instruction to the execution unit for execution.Type: ApplicationFiled: April 16, 2012Publication date: August 9, 2012Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: CHRISTOPHER M. ABERNATHY, MARY D. BROWN, HUNG Q. LE, DUNG Q. NGUYEN
-
Patent number: 8239661Abstract: A method for double-issue complex instructions receives a complex instruction comprising a first portion and a second portion. The method sets a single issue queue slot and allocates an execution unit for the complex instruction, and identifies dependencies in the first and second portions. The method sets a dependency matrix slot and a consumers table slot for the first and section portion. In the event the first portion dependencies have been satisfied, the method issues the first portion and then issues the second portion from the single issue queue slot. In the event the second portion dependencies have not been satisfied, the method cancels the second portion issue.Type: GrantFiled: August 28, 2008Date of Patent: August 7, 2012Assignee: International Business Machines CorporationInventors: Christopher M. Abernathy, Mary D. Brown, Todd A. Venton
-
Publication number: 20120191950Abstract: The described embodiments provide a processor that executes vector instructions. In the described embodiments, while dispatching instructions at runtime, the processor encounters a predicate-generating instruction. Upon determining that a result of the predicate-generating instruction is predictable, the processor dispatches a prediction micro-operation associated with the predicate-generating instruction, wherein the prediction micro-operation generates a predicted result vector for the predicate-generating instruction. The processor then executes the prediction micro-operation to generate the predicted result vector.Type: ApplicationFiled: May 12, 2011Publication date: July 26, 2012Applicant: APPLE INC.Inventor: Jeffry E. Gonion
-
Publication number: 20120191951Abstract: A system and method for fencing memory accesses. Memory loads can be fenced, or all memory access can be fenced. The system receives a fencing instruction that separates memory access instructions into older accesses and newer accesses. A buffer within the memory ordering unit is allocated to the instruction. The access instructions newer than the fencing instruction are stalled. The older access instructions are gradually retired. When all older memory accesses are retired, the fencing instruction is dispatched from the buffer.Type: ApplicationFiled: April 5, 2012Publication date: July 26, 2012Inventors: Salvador Palanca, Stephen A. Fischer, Subramaniam Maiyuran, Shekoufeh Oawami
-
Patent number: 8230410Abstract: An enhanced mechanism for parallel execution of computer programs utilizes a bidding model to allocate additional registers and execution units for stretches of code identified as opportunities for microparallelization. A microparallel processor architecture apparatus permits software (e.g. compiler) to implement short-term parallel execution of stretches of code identified as such before execution. In one embodiment, an additional paired unit, if available, is allocated for execution of an identified stretch of code. Each additional paired unit includes an execution unit and a half set of registers. This apparatus is available for compilers or assembler language coders to use and allows software to unlock parallel execution capabilities that are present in existing computer programs but heretofore were executed sequentially for lack of a suitable apparatus.Type: GrantFiled: October 26, 2009Date of Patent: July 24, 2012Assignee: International Business Machines CorporationInventor: Larry W. Loen
-
Patent number: 8225012Abstract: A method may include distributing ranges of addresses in a memory among a first set of functions in a first pipeline. The first set of the functions in the first pipeline may operate on data using the ranges of addresses. Different ranges of addresses in the memory may be redistributed among a second set of functions in a second pipeline without waiting for the first set of functions to be flushed of data.Type: GrantFiled: September 3, 2009Date of Patent: July 17, 2012Assignee: Intel CorporationInventor: Thomas A. Piazza
-
Patent number: 8219784Abstract: A computer-implemented method and apparatus for managing an out of order dispatched instruction queue in a microprocessor. In one embodiment, the method and apparatus include assigning a group identification number and a target identification number to an instruction in an instruction stream. The group identification number and the target identification number are labeled inside an instruction fetcher unit. The group identification number and the target identification number are pre-decoded. The instruction is sent to an instruction queue. The instruction is re-ordered in the instruction stream after executing the instruction utilizing information from the pre-decoding of the group identification number and the target identification number.Type: GrantFiled: December 9, 2008Date of Patent: July 10, 2012Assignee: International Business Machines CorporationInventors: Oliver Keren Ban, Xiangang Cheng, Liang Huang Lee, Katherine June Pearsall
-
Patent number: 8219996Abstract: A computer processor includes a fairness monitor for monitoring allocations of a processor resource to requestors. If unfairness is determined, a resource allocator is biased to offset said unfairness.Type: GrantFiled: May 9, 2007Date of Patent: July 10, 2012Assignee: Hewlett-Packard Development Company, L.P.Inventor: Dale C. Morris
-
Patent number: 8200913Abstract: An information processing system includes a plurality of PMM and data transmission paths for connection between the PMM and transmitting a value of a PMM to another PMM. A memory of each PMM holds a list of values of first items arranged in the ascending order or descending order without overlap and/or a list of values of the second item to be shared. A memory module of each PMM transmits a value contained in the value list to another PMM, receives a value contained in the value list from the another PMM, references the value list of the first item and the value list of the second item of the another PMM, and generates a list of common values considering the values contained in the value lists of the first item and the second item of all the other PMM.Type: GrantFiled: January 25, 2005Date of Patent: June 12, 2012Assignee: Turbo Data Laboratories, Inc.Inventor: Shinji Furusho
-
Patent number: 8185722Abstract: The invention provides a processor comprising an execution unit and a thread scheduler configured to schedule a plurality of threads for execution by the execution unit in dependence on a respective status for each thread. The execution unit is configured to execute thread scheduling instructions which manage said statuses, the thread scheduling instructions including at least: a thread event enable instruction which sets a status to event-enabled to allow a thread to accept events, a wait instruction which sets the status to suspended pending at least one event upon which continued execution of the thread depends, and a thread event disable instruction which sets the status to event-disabled to stop the thread from accepting events. The continued execution comprises retrieval of a continuation point vector for the thread.Type: GrantFiled: March 14, 2007Date of Patent: May 22, 2012Assignee: XMOS Ltd.Inventor: Michael David May