Of Multiple Instructions Simultaneously Patents (Class 712/206)
-
Publication number: 20140281389Abstract: Methods and apparatus are disclosed for fusing instructions to provide OR-test and AND-test functionality on multiple test sources. Some embodiments include fetching instructions, said instructions including a first instruction specifying a first operand destination, a second instruction specifying a second operand source, and a third instruction specifying a branch condition. A portion of the plurality of instructions are fused into a single micro-operation, the portion including both the first and second instructions if said first operand destination and said second operand source are the same, and said branch condition is dependent upon the second instruction. Some embodiments generate a novel test instruction dynamically by fusing one logical instruction with a prior-art test instruction. Other embodiments generate the novel test instruction through a just-in-time compiler.Type: ApplicationFiled: March 15, 2013Publication date: September 18, 2014Inventors: MAXIM LOKTYUKHIN, ROBERT VALENTINE, JULIAN C. HORN, MARK J. CHARNEY
-
Publication number: 20140281368Abstract: An example method for executing multiple instructions in one or more slots includes receiving a packet including multiple instructions and executing the multiple instructions in one or more slots in a time shared manner. Each slot is associated with an execution data path or a memory data path. An example method for executing at least one instruction in a plurality of phases includes receiving a packet including an instruction, splitting the instruction into a plurality of phases, and executing the instruction in the plurality of phases.Type: ApplicationFiled: March 14, 2013Publication date: September 18, 2014Applicant: QUALCOMM INCORPORATEDInventors: Ajay Anant Ingle, Lucian Codrescu, David J. Hoyle, Jose Fridman, Marc M. Hoffman, Deepak Mathew
-
Patent number: 8832415Abstract: A multiprocessor system includes nodes. Each node includes a data path that includes a core, a TLB, and a first level cache implementing disambiguation. The system also includes at least one second level cache and a main memory. For thread memory access requests, the core uses an address associated with an instruction format of the core. The first level cache uses an address format related to the size of the main memory plus an offset corresponding to hardware thread meta data. The second level cache uses a physical main memory address plus software thread meta data to store the memory access request. The second level cache accesses the main memory using the physical address with neither the offset nor the thread meta data after resolving speculation. In short, this system includes mapping of a virtual address to a different physical addresses for value disambiguation for different threads.Type: GrantFiled: January 4, 2011Date of Patent: September 9, 2014Assignee: International Business Machines CorporationInventors: Alan Gala, Martin Ohmacht
-
Publication number: 20140215187Abstract: A system and method for efficiently processing instructions in hardware parallel execution lanes within a processor. In response to a given divergent point within an identified loop, a compiler generates code wherein when executed determines a size of a next very large instruction world (VLIW) to process and determine multiple pointer values to store in multiple corresponding PC registers in a target processor. The updated PC registers point to instructions intermingled from different basic blocks between the given divergence point and a corresponding convergence point. The target processor includes a single instruction multiple data (SIMD) micro-architecture. The assignment for a given lane is based on branch direction found at runtime for the given lane at the given divergent point. The processor includes a vector register for mapping PC registers to execution lanes.Type: ApplicationFiled: January 29, 2013Publication date: July 31, 2014Applicant: ADVANCED MICRO DEVICES, INC.Inventor: Reza Yazdani
-
Publication number: 20140208074Abstract: In one embodiment, a multi-strand system with a pipeline includes a front-end unit, an instruction scheduling unit (ISU), and a back-end unit. The front-end unit performs an out-of-order fetch of interdependent instructions queued using a front-end buffer. The ISU dedicates two hardware entries per strand for checking operand-readiness of an instruction and for determining an execution port to which the instruction is dispatched. The back-end unit receives instructions dispatched from the hardware device and stores the instructions until they are executed. Other embodiments are described and claimed.Type: ApplicationFiled: March 30, 2012Publication date: July 24, 2014Inventors: Boris A. Babayan, Vladimir Pentkovski, Jayesh Iyer, Nikolay Kosarev, Sergey Y. Shishlov, Alexander V. Butuzov, Alexey Y. Sivtsov
-
Publication number: 20140181474Abstract: Methods and apparatus for performing an atomic hardware operation (HWOP) instruction. According to a method in a computer processor coupled to a memory, the method includes fetching, decoding, and executing the atomic HWOP instruction. The instruction includes a source operand indicating a source location and a destination operand indicating a destination location, wherein each of the source location and the destination location is either a register of the computer processor or an address of the memory. Executing the atomic HWOP instruction includes sending a message to an external agent to cause the external agent to atomically access a set of one or more memory locations of the memory based upon a value stored at the source location, and return a result obtained from said atomic access of the set of memory locations to the destination location. The external agent is external to the computer processor.Type: ApplicationFiled: December 26, 2012Publication date: June 26, 2014Applicant: TELEFONAKTIEBOLAGET L M ERICSSON (PUBL)Inventors: EVAN GEWIRTZ, ROBERT HATHAWAY, EDWARD HO, STEPHAN MEIER
-
Patent number: 8745359Abstract: A VLIW processor executes a very long instruction word containing a plurality of instructions, and executes a plurality of instruction streams at low cost. A processor executing a very long instruction word containing a plurality of instructions fetches concurrently the very long instruction words of up to M instruction streams, from N instruction caches including a plurality of memory banks to store the very long instruction words of the M instruction streams.Type: GrantFiled: February 3, 2009Date of Patent: June 3, 2014Assignee: NEC CorporationInventor: Shohei Nomoto
-
Publication number: 20140075157Abstract: Processor pipeline controlling techniques are described which take advantage of the variation in critical path lengths of different instructions to achieve increased performance. By examining a processor's instruction set and execution unit implementation's critical timing paths, instructions are classified into speed classes. Based on these speed classes, one pipeline is presented where hold signals are used to dynamically control the pipeline based on the instruction class in execution. An alternative pipeline supporting multiple classes of instructions is presented where the pipeline clocking is dynamically changed as a result of decoded instruction class signals. A single pass synthesis methodology for multi-class execution stage logic is also described. For dynamic class variable pipeline processors, the mix of instructions can have a great effect on processor performance and power utilization since both can vary by the program mix of instruction classes.Type: ApplicationFiled: February 28, 2013Publication date: March 13, 2014Applicant: Altera CorporationInventors: Edwin Franklin Barry, Gerald George Pechanek, Patrick R. Marchand
-
Patent number: 8671400Abstract: A technique includes providing first objects that are associated with an application session and in a processor-based system, identifying second objects in another application session corresponding to the first objects based at least in part on a comparison of the second objects to matching rules associated with the first objects.Type: GrantFiled: December 23, 2009Date of Patent: March 11, 2014Assignee: Intel CorporationInventors: Christopher J. Cormack, Nathaniel Duca, Joseph D. Matarazzo
-
Publication number: 20140019720Abstract: A computer processor includes a decoder for decoding machine instructions and an execution unit for executing those instructions. The decoder and the execution unit are capable of decoding and executing vector instructions that include one or more format conversion indicators. For instance, the processor may be capable of executing a vector-load-convert-and-write (VLoadConWr) instruction that provides for loading data from memory to a vector register. The VLoadConWr instruction may include a format conversion indicator to indicate that the data from memory should be converted from a first format to a second format before the data is loaded into the vector register. Other embodiments are described and claimed.Type: ApplicationFiled: February 7, 2013Publication date: January 16, 2014Inventors: Eric Sprangle, Robert D. Cavin, Anwar Rohillah, Douglas M. Carmean
-
Patent number: 8624906Abstract: A method and system for graphics instruction fetching. The method includes executing a plurality of threads in a multithreaded execution environment. A respective plurality of instructions are fetched to support the execution of the threads. During runtime, at least one instruction is prefetched for one of the threads to a prefetch buffer. The at least one instruction is accessed from the prefetch buffer if required by the one thread and discarded if not required by the one thread.Type: GrantFiled: September 29, 2004Date of Patent: January 7, 2014Assignee: Nvidia CorporationInventor: Andrew D. Bowen
-
Publication number: 20130290677Abstract: An apparatus having a buffer and a circuit is disclosed. The buffer may be configured to store a plurality of fetch sets. Each fetch set generally includes a prefix word and a plurality of instruction words. Each prefix word may include a plurality of symbols. Each symbol generally corresponds to a respective one of the instruction words. The circuit may be configured to (i) identify each of the symbols in each of the fetch sets having a predetermined value and (ii) parse the fetch sets into a plurality of execution sets in response to the symbols having the predetermined value.Type: ApplicationFiled: April 26, 2012Publication date: October 31, 2013Inventors: Alexander Rabinovitch, Leonid Dubrovin
-
Patent number: 8564604Abstract: Systems and methods for improving throughput of a graphics processing unit are disclosed. In one embodiment, a system includes a multithreaded execution unit capable of processing requests to access a constant cache, a vertex attribute cache, at least one common register file, and an execution unit data path substantially simultaneously.Type: GrantFiled: April 21, 2010Date of Patent: October 22, 2013Assignee: VIA Technologies, Inc.Inventor: Yang (Jeff) Jiao
-
Patent number: 8566566Abstract: There is provided a vector processing apparatus and method allowing for the parallel processing of a plurality of different instructions while maintaining vector processing architecture. The vector processing apparatus includes an instruction memory storing a multiple instruction group including one or more instructions; an instruction fetch unit reading the multiple instruction group from the instruction memory; and a plurality of instruction processing units each receiving the multiple instruction group through the instruction fetch unit, selecting a single instruction from the multiple instruction group according to a previous arithmetic result, and performing a arithmetic operation.Type: GrantFiled: August 2, 2010Date of Patent: October 22, 2013Assignee: Electronics and Telecommunications Research InstituteInventors: Moo Kyoung Chung, Young Su Kwon, Kyung Su Kim
-
Publication number: 20130262825Abstract: Obfuscating a multi-threaded computer program is carried out using an instruction pipeline in a computer processor by streaming first instructions of a first thread of a multi-threaded computer application program into the pipeline, the first instructions entering the pipeline at the fetch stage, detecting a stall signal indicative of a stall condition in the pipeline, and responsively to the stall signal injecting second instructions of a second thread of the multi-threaded computer application program into the pipeline. The injected second instructions enter the pipeline at an injection stage that is disposed downstream from the fetch stage up to and including the register stage for processing therein. The stall condition exists at one of the stages that is located upstream from the in injection stage.Type: ApplicationFiled: November 14, 2011Publication date: October 3, 2013Applicant: Cisco Technology Inc.Inventors: David Darmon, Uri Kaluzhny
-
Publication number: 20130232359Abstract: Embodiments of a processing architecture are described. The architecture includes a fetch unit for fetching instructions from a data bus. A scheduler receives data from the fetch unit and creates a schedule allocates the data and schedule to a plurality of computational units. The scheduler also modifies voltage and frequency settings of the processing architecture to optimize power consumption and throughput of the system. The computational units include control units and execute units. The control units receive and decode the instructions and send the decoded instructions to execute units. The execute units then execute the instructions according to relevant software.Type: ApplicationFiled: March 1, 2012Publication date: September 5, 2013Applicant: NXP B.V.Inventors: Hamed Fatemi, Ajay Kapoor, J. Pineda de Gyvez
-
Publication number: 20130205118Abstract: A computer system for instruction execution includes a processor having a pipeline. The system is configured to perform a method including fetching, in the pipeline, a plurality of instructions, wherein the plurality of instructions includes a plurality of branch instructions, for each of the plurality of branch instructions, assigning a branch uncertainty to each of the plurality of branch instructions, for each of the plurality of instructions, assigning an instruction uncertainty that is a summation of branch uncertainties of older unresolved branches and balancing the instructions, based on a current summation of instruction uncertainty, in the pipeline.Type: ApplicationFiled: February 6, 2012Publication date: August 8, 2013Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Alper Buyuktosunoglu, Brian R. Prasky, Vijayalakshmi Srinivasan
-
Publication number: 20130179662Abstract: An address divergence unit detects divergence between threads in a thread group and then separates those threads into a subset of non-divergent threads and a subset of divergent threads. In one embodiment, the address divergence unit causes instructions associated with the subset of non-divergent threads to be issued for execution on a parallel processing unit, while causing the instructions associated with the subset of divergent threads to be re-fetched and re-issued for execution.Type: ApplicationFiled: January 11, 2012Publication date: July 11, 2013Inventors: Jack CHOQUETTE, Xiaogang Qiu, Jeff Tuckey, Michael (Ming Yiu) Siu, Robert J. Stoll, Olivier Giroux
-
Publication number: 20130166881Abstract: Systems and methods for scheduling instructions using pre-decode data corresponding to each instruction. In one embodiment, a multi-core processor includes a scheduling unit in each core for selecting instructions from two or more threads each scheduling cycle for execution on that particular core. As threads are scheduled for execution on the core, instructions from the threads are fetched into a buffer without being decoded. The pre-decode data is determined by a compiler and is extracted by the scheduling unit during runtime and used to control selection of threads for execution. The pre-decode data may specify a number of scheduling cycles to wait before scheduling the instruction. The pre-decode data may also specify a scheduling priority for the instruction. Once the scheduling unit selects an instruction to issue for execution, a decode unit fully decodes the instruction.Type: ApplicationFiled: December 21, 2011Publication date: June 27, 2013Inventors: Jack Hilaire CHOQUETTE, Robert J. Stoll, Olivier Giroux
-
Publication number: 20130166882Abstract: Systems and methods for scheduling instructions without instruction decode. In one embodiment, a multi-core processor includes a scheduling unit in each core for scheduling instructions from two or more threads scheduled for execution on that particular core. As threads are scheduled for execution on the core, instructions from the threads are fetched into a buffer without being decoded. The scheduling unit includes a macro-scheduler unit for performing a priority sort of the two or more threads and a micro-scheduler arbiter for determining the highest order thread that is ready to execute. The macro-scheduler unit and the micro-scheduler arbiter use pre-decode data to implement the scheduling algorithm. The pre-decode data may be generated by decoding only a small portion of the instruction or received along with the instruction. Once the micro-scheduler arbiter has selected an instruction to dispatch to the execution unit, a decode unit fully decodes the instruction.Type: ApplicationFiled: December 22, 2011Publication date: June 27, 2013Inventors: Jack Hilaire CHOQUETTE, Robert J. STOLL, Olivier GIROUX, Michael FETTERMAN, Shirish GADRE, Robert Steven GLANVILLE, Alexandre JOLY
-
Publication number: 20130151816Abstract: Methods, systems, and computer program products may provide delay-identification in data processing systems. An apparatus may include a delay-identification unit having a delay counter, a threshold register, a delay register, and a delay detector. The delay detector may be configured to start the delay counter in response to detecting that one group of instructions is delayed, and stop the delay counter in response to detecting that the one group of instructions is no longer delayed. The delay detector may additionally be configured to compare the number of cycles counted by the delay counter with a threshold number of cycles in the threshold register, and store at least one effective address of one of the instructions of the one group of instructions when the number of cycles counted by the delay counter is greater than the threshold number of cycles stored in the threshold register.Type: ApplicationFiled: December 7, 2011Publication date: June 13, 2013Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Venkat R. Indukuru, Alexander E. Mericas
-
Patent number: 8458414Abstract: A memory accessing method including the following steps is provided. Firstly, two instructions are fetched. Next, the two instructions are respectively decoded to obtain two operation fields and two address fields. The two operation fields indicate the type of operation in accessing the memory. One of the address fields includes a first upper address corresponding to the first memory block and a first lower address corresponding to a first memory unit of the first memory block. The other one of the two address fields includes a second upper address corresponding to the second memory block and a second lower address corresponding to a second memory unit of the second memory block. Then, whether two instructions are performing the same type of operation on the same memory block is determined. If yes, the type of operation indicated by the two operation fields is performed on the corresponding memory block parallelly.Type: GrantFiled: April 8, 2010Date of Patent: June 4, 2013Assignee: Realtek Semiconductor CorporationInventors: Sheng-Yuan Jan, Yen-Ju Lu
-
Patent number: 8453161Abstract: This disclosure describes a method and system that may enable fast, hardware-assisted, producer-consumer style communication of values between threads. The method, in one aspect, uses a dedicated hardware buffer as an intermediary storage for transferring values from registers in one thread to registers in another thread. The method may provide a generic, programmable solution that can transfer any subset of register values between threads in any given order, where the source and target registers may or may not be correlated. The method also may allow for determinate access times, since it completely bypasses the memory hierarchy. Also, the method is designed to be lightweight, focusing on communication, and keeping synchronization facilities orthogonal to the communication mechanism. It may be used by a helper thread that performs data prefetching for an application thread, for example, to initialize the upward-exposed reads in the address computation slice of the helper thread code.Type: GrantFiled: May 25, 2010Date of Patent: May 28, 2013Assignee: International Business Machines CorporationInventors: Michael K. Gschwind, John K. O'Brien, Valentina Salapura, Zehra N. Sura
-
Patent number: 8447911Abstract: A method and processor for providing full load/store queue functionality to an unordered load/store queue for a processor with out-of-order execution. Load and store instructions are inserted in a load/store queue in execution order. Each entry in the load/store queue includes an identification corresponding to a program order. Conflict detection in such an unordered load/store queue may be performed by searching a first CAM for all addresses that are the same or overlap with the address of the load or store instruction to be executed. A further search may be performed in a second CAM to identify those entries that are associated with younger or older instructions with respect to the sequence number of the load or store instruction to be executed. The output results of the Address CAM and Age CAM are logically ANDed.Type: GrantFiled: July 2, 2008Date of Patent: May 21, 2013Assignee: Board of Regents, University of Texas SystemInventors: Douglas C. Burger, Stephen W. Keckler, Robert McDonald, Lakshminarasimhan Sethumadhavan, Franziska Roesner
-
Publication number: 20130117536Abstract: A reconfigurable instruction encoding method includes the followings. An instruction distribution of an application is counted, and multiple instruction pairs with higher utilization rates are accordingly found. Multiple instructions of the instruction pairs are duplicately encoded according to multiple reserved sections of an original instruction table, so that the instructions have corresponding reconfigured codes and a reconfigured instruction table extended from the original instruction table and including the reconfigured codes is obtained. A compiler is utilized to generate multiple machine codes according to the reconfigured instruction table and consecutive execution instructions. Hamming distance of the machine codes corresponding to the reconfigured instruction table and the execution instructions are not longer than Hamming distance of the machine codes generated according to the original instruction table and the execution instructions.Type: ApplicationFiled: April 17, 2012Publication date: May 9, 2013Applicant: INDUSTRIAL TECHNOLOGY RESEARCH INSTITUTEInventors: Huang-Lun Lin, Ching-Hsiang Chuang, Shui-An Wen
-
Publication number: 20130117535Abstract: A method includes executing a branch instruction and determining if a branch is taken. The method further includes evaluating a number of instructions associated with the branch instruction. Upon determining that the branch is taken, the method includes selectively writing an entry into a branch target buffer that corresponds to the taken branch responsive to determining that the number of instructions is less than a threshold.Type: ApplicationFiled: November 4, 2011Publication date: May 9, 2013Applicant: QUALCOMM INCORPORATEDInventors: Suresh K. Venkumahanti, Lucian Codrescu, Suman Mamidi
-
Patent number: 8405670Abstract: A multithreaded rendering software pipeline architecture utilizes a rolling texture context data structure to store multiple texture contexts that are associated with different textures that are being processed in the software pipeline. Each texture context stores state data for a particular texture, and facilitates the access to texture data by multiple, parallel stages in a software pipeline. In addition, texture contexts are capable of being “rolled”, or copied to enable different stages of a rendering pipeline that require different state data for a particular texture to separately access the texture data independently from one another, and without the necessity for stalling the pipeline to ensure synchronization of shared texture data among the stages of the pipeline.Type: GrantFiled: May 25, 2010Date of Patent: March 26, 2013Assignee: International Business Machines CorporationInventors: Eric O. Mejdrich, Paul E. Schardt, Robert A. Shearer
-
Patent number: 8387053Abstract: A method of performing operations in a computer system, computer system, and related method of compilation, are disclosed. In one embodiment, the method of performing includes providing compiled code having at least one thread, where each of the at least one thread includes a respective plurality of blocks and each respective block includes a respective pre-fetch component and a respective execute component. The method also includes performing a first pre-fetch component from a first block of a first thread of the at least one thread, performing a first additional component after the first pre-fetch component has been performed, and performing a first execute component from the first block of the first thread. The first execute component is performed after the first additional component has been performed, and the first additional component is from either a second thread or another block of the first thread that is not the first block.Type: GrantFiled: January 25, 2007Date of Patent: February 26, 2013Assignee: Hewlett-Packard Development Company, L.P.Inventors: Blaine D. Gaither, Verna Knapp, Jerome Huck, Benjamin D. Osecky
-
Patent number: 8362880Abstract: A method and apparatus for loading and executing program code at a micro-processor are disclosed. In this method, a monitoring procedure is performed to monitor whether the micro-processor receives a loading request corresponding to a target program code. If the loading request is received, the target program code is loaded from an external memory into an internal memory of the micro-processor. The micro-processor is then rebooted to enter a first mode in which the target program code in the internal memory is to be executed.Type: GrantFiled: April 27, 2009Date of Patent: January 29, 2013Assignee: MStar Semiconductor, Inc.Inventors: Chih-Hua Huang, Chih Yen Chang
-
Patent number: 8346760Abstract: In one embodiment, the invention provides a method comprising determining metadata encoded in instructions of a stored program; and executing the stored program based on the metadata.Type: GrantFiled: May 20, 2009Date of Patent: January 1, 2013Assignee: Intel CorporationInventors: Hong Wang, John Shen, Ali-Reza Adl-Tabatabai, Anwar Ghuloum
-
Publication number: 20120297168Abstract: Processing instruction grouping information is provided that includes: reading addresses of machine instructions grouped by a processor at runtime from a buffer to form an address file; analyzing the address file to obtain grouping information of the machine instructions; converting the machine instructions in the address file into readable instructions; and obtaining grouping information of the readable instructions based on the grouping information of the machine instructions and the readable instructions resulted from conversion. Status of grouping and processing performed on instructions by a processor at runtime can be acquired dynamically, such that processing capability of the processor can be better utilized.Type: ApplicationFiled: April 26, 2012Publication date: November 22, 2012Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Qin Yue Chen, Qi Liang, Hong Chang Lin, Feng Liu
-
Publication number: 20120233441Abstract: An instruction buffer for a processor configured to execute multiple threads is disclosed. The instruction buffer is configured to receive instructions from a fetch unit and provide instructions to a selection unit. The instruction buffer includes one or more memory arrays comprising a plurality of entries configured to store instructions and/or other information (e.g., program counter addresses). One or more indicators are maintained by the processor and correspond to the plurality of threads. The one or more indicators are usable such that for instructions received by the instruction buffer, one or more of the plurality entries of a memory array can be determined as a write destination for the received instructions, and for instructions to be read from the instruction buffer (and sent to a selection unit), one or more entries can be determined as the correct source location from which to read.Type: ApplicationFiled: March 7, 2011Publication date: September 13, 2012Inventors: Jama I. Barreh, Robert T. Golla, Manish K. Shah
-
Patent number: 8261021Abstract: In order to control an access request to the cache shared between a plurality of threads, a storage unit for storing a flag provided in association with each of the threads is included. If the threads enter the execution of an atomic instruction, a defined value is written to the flags stored in the storage unit. Furthermore, if the atomic instruction is completed, a defined value different from the above defined value is written, thereby displaying whether or not the threads are executing the atomic instruction. If an access request is issued from a certain thread, it is judged whether or not a thread different from the certain thread is executing the atomic instruction by referencing the flag values in the storage unit. If it is judged that another thread is executing the atomic instruction, the access request is kept standby. This makes it possible to realize the exclusive control processing necessary for processing the atomic instruction according to simple configuration.Type: GrantFiled: December 17, 2009Date of Patent: September 4, 2012Assignee: Fujitsu LimitedInventor: Naohiro Kiyota
-
Publication number: 20120198209Abstract: A method for translating instructions for a processor. The method includes accessing a plurality of guest instructions that comprise multiple guest branch instructions comprising at least one guest far branch, and building an instruction sequence from the plurality of guest instructions by using branch prediction on the at least one guest far branch. The method further includes assembling a guest instruction block from the instruction sequence. The guest instruction block is translated to a corresponding native conversion block, wherein an at least one native far branch that corresponds to the at least one guest far branch and wherein the at least one native far branch includes an opposite guest address for an opposing branch path of the at least one guest far branch. Upon encountering a missprediction, a correct instruction sequence is obtained by accessing the opposite guest address.Type: ApplicationFiled: January 27, 2012Publication date: August 2, 2012Applicant: SOFT MACHINES, INC.Inventor: Mohammad Abdallah
-
Patent number: 8205204Abstract: An multi-threading processor is provided. The multi-threading processor includes a first instruction fetch unit to receive a first thread and a second instruction fetch unit to receive a second thread. A multi-thread scheduler coupled to the instruction fetch units and a execution unit. The multi-thread scheduler determines the width of the execution unit and the execution unit executes the threads accordingly.Type: GrantFiled: January 23, 2009Date of Patent: June 19, 2012Assignee: Intel CorporationInventors: Ken Shoemaker, Sailesh Kottapalli, Kin-Kee Sit
-
Publication number: 20120144164Abstract: An information handling system includes a processor that may perform general purpose register recovery operations after an instruction flush operation that an exception, such as a branch misprediction causes. The processor receives an instruction stream that may include multiple instructions that operate on a particular target register that stores instruction result information. The general purpose register may temporarily store instruction opcode and register bits information for use during dispatch, execution and other operations. The processor includes a recovery buffer unit for use during flush recovery operations. The processor may use recovery valid and recovery pending bits that correspond with each instruction during the register recovery from flush operation.Type: ApplicationFiled: February 14, 2012Publication date: June 7, 2012Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventor: Dung Quoc Nguyen
-
Patent number: 8179540Abstract: An image forming apparatus is provided that holds counter information obtained by integrating a consumption of a consumable that depends on usage of service provided by the image forming apparatus. A log corresponding to the usage of the service is set in job log information with a synchronization flag set off. The log in the job log information, for which the synchronization flag is set off, is set on. The counter information and the job log information are output after the synchronization flag for the log having the synchronization flag set off has been set on.Type: GrantFiled: October 29, 2008Date of Patent: May 15, 2012Assignee: Canon Kabushiki KaishaInventors: Junichi Hiruma, Nobuyuki Tonegawa
-
Patent number: 8179896Abstract: A network processor of an embodiment includes a packet classification engine, a processing pipeline, and a controller. The packet classification engine allows for classifying each of a plurality of packets according to packet type. The processing pipeline has a plurality of stages for processing each of the plurality of packets in a pipelined manner, where each stage includes one or more processors. The controller allows for providing the plurality of packets to the processing pipeline in an order that is based at least partially on: (i) packet types of the plurality of packets as classified by the packet classification engine and (ii) estimates of processing times for processing packets of the packet types at each stage of the plurality of stages of the processing pipeline. A method in a network processor allows for prefetching instructions into a cache for processing a packet based on a packet type of the packet.Type: GrantFiled: November 7, 2007Date of Patent: May 15, 2012Inventor: Justin Mark Sobaje
-
Publication number: 20120110306Abstract: A method of responding to an attempt to write a memory address including a target instruction which has been translated to a host instruction for execution by a host processor including the steps of marking a memory address including a target instruction which has been translated to a host instruction, detecting a memory address which has been marked when an attempt is made to write to the memory address, and responding to the detection of a memory address which has been marked by protecting a target instruction at the memory address until it has been assured that translations associated with the memory address will not be utilized before being updated.Type: ApplicationFiled: September 23, 2011Publication date: May 3, 2012Inventors: Edmund J. Kelly, Robert F. Cmelik, Malcolm J. Wing
-
Patent number: 8171260Abstract: The invention provides a method and apparatus for branch prediction in a processor. A fetch-block branch target buffer is used in an early stage of pipeline processing before the instruction is decoded, which stores information about a control transfer instruction for a “block” of instruction memory. The block of instruction memory is represented by a block entry in the fetch-block branch target buffer. The block entry represents one recorded control-transfer instruction (such as a branch instruction) and a set of sequentially preceding instructions, up to a fixed maximum length N. Indexing into the fetch-block branch target buffer yields an answer whether the block entry represents memory that contains a previously executed a control-transfer instruction, a length value representing the amount of memory that contains the instructions represented by the block, and an indicator for the type of control-transfer instruction that terminates the block, its target and outcome.Type: GrantFiled: June 23, 2009Date of Patent: May 1, 2012Assignee: STMicroelectronics, Inc.Inventors: Anatoly Gelman, Russell Lawrence Schnapp
-
Patent number: 8166259Abstract: A memory control apparatus, a memory control method and an information processing system are disclosed. Fetch response data retrieved from a main storage unit is received, while bypassing a storage unit, by a first port in which the received fetch response data can be set. The fetch response data retrieved from the main storage unit, if unable to be set in the first port, is set in a second port through the storage unit. A transmission control unit performs priority control operation to send out, in accordance with a predetermined priority, the fetch response data set in the first port or the second port to the processor. As a result, the latency is shortened from the time when the fetch response data arrives to the time when the fetch response data is sent out toward the processor in response to a fetch request from the processor.Type: GrantFiled: March 26, 2009Date of Patent: April 24, 2012Assignee: Fujitsu LimitedInventor: Souta Kusachi
-
Patent number: 8145885Abstract: A processor interleaves instructions according to a priority rule which determines the frequency with which instructions from each respective thread are selected and added to an interleaved stream of instructions to be processed in the data processor. The frequency with which each thread is selected according to the rule may be based on the priorities assigned to the instruction threads. A randomization is inserted into the interleaving process so that the selection of an instruction thread during any particular clock cycle is not based solely by the priority rule, but is also based in part on a random or pseudo random element. This randomization is inserted into the instruction thread selection process so as to vary the order in which instructions are selected from the various instruction threads while preserving the overall frequency of thread selection (i.e. how often threads are selected) set by the priority rule.Type: GrantFiled: April 30, 2008Date of Patent: March 27, 2012Assignee: International Business Machines CorporationInventors: Ronald Nick Kalla, Minh Michelle Quy Pham, Balaram Sinharoy, John Wesley Ward, III
-
Publication number: 20120066480Abstract: A processor includes: an instruction fetch portion configured to fetch simultaneously a plurality of fixed-length instructions in accordance with a program counter; an instruction predecoder configured to predecode specific fields in a part of the plurality of fixed-length instructions; and a program counter management portion configured to control an increment of the program counter in accordance with a result of the predecoding.Type: ApplicationFiled: July 22, 2011Publication date: March 15, 2012Applicant: Sony CorporationInventors: Hirokazu Hanaki, Satoshi Takashima
-
Publication number: 20120060017Abstract: A processor including L computing units, L being an integer of 2 or greater, the processor comprising: an instruction buffer including M×Z instruction storage areas each storing one instruction, M instruction streams being input in a state of being distinguished from each other, each of the M instruction streams including Z instructions, M and Z each being an integer of 2 or greater, M×Z being equal to or greater than L; an order information holding unit holding order information that indicates an order of the M×Z instruction storage areas; an extraction unit operable to extract instructions from the M×Z instruction storage areas; and a control unit operable to cause the extraction unit to extract L instructions in executable state from the M×Z instruction storage areas in accordance with the order indicated by the order information, and input the instructions into different ones of the L computing units.Type: ApplicationFiled: May 18, 2010Publication date: March 8, 2012Inventor: Hiroyuki Morishita
-
Patent number: 8095775Abstract: During operation of a VLIW processor, a very long instruction word is fetched. A portion of the very long instruction word that includes a pointer to an instruction is identified, and the instruction pointed to by the pointer is retrieved from a location of an instruction window. The retrieved instruction is input into a functional unit for execution.Type: GrantFiled: November 19, 2008Date of Patent: January 10, 2012Assignee: Marvell International Ltd.Inventors: Moinul H. Khan, Anitha Kona, Mark N. Fullerton
-
Patent number: 8037283Abstract: In a multi-core stream processing system and scheduling method of the same, a scheduler is coupled to a number (N) of stream processing units and a number (N+1) of stream fetching units, where N?2. When the scheduler receives a stream element from a Pth stream fetching unit, the scheduler assigns a Pth stream processing unit as a target stream processing unit when the Pth stream processing unit does not encounter a bottleneck condition, assigns a Qth stream processing unit, which does not encounter the bottleneck condition, as the target stream processing unit when the Pth stream processing unit encounters the bottleneck condition, where 1?P?N, 1?Q?N, and P?Q, and dispatches the received stream element to the target stream processing unit such that the target stream processing unit processes the stream element dispatched from the scheduler.Type: GrantFiled: May 5, 2009Date of Patent: October 11, 2011Assignee: National Taiwan UniversityInventors: You-Ming Tsao, Liang-Gee Chen, Shao-Yi Chien
-
Publication number: 20110225396Abstract: Techniques are described for decoupling fetching of an instruction stored in a main program memory from earliest execution of the instruction. An indirect execution method and program instructions to support such execution are addressed. In addition, an improved indirect deferred execution processor (DXP) VLIW architecture is described which supports a scalable array of memory centric processor elements that do not require local load and store units.Type: ApplicationFiled: May 18, 2011Publication date: September 15, 2011Applicant: ALTERA CORPORATIONInventors: Gerald George Pechanek, Stamatis Vassiliadis
-
Patent number: 8015391Abstract: A processor simultaneously issues instructions to multiple threads in a same instruction execution cycle. An instruction issuer controls issuance of an instruction for each of the multiple threads. A detector detects, for each of the multiple threads, whether a loop processing is currently being executed. A unit causes the instruction issuer to increase a number of instructions to be issued when the detector detects that the loop processing is currently being executed.Type: GrantFiled: October 8, 2010Date of Patent: September 6, 2011Assignee: Panasonic CorporationInventor: Takenobu Tani
-
Patent number: 8006041Abstract: A prefetch processing apparatus includes a central-processing-unit monitor unit that monitors processing states of the central processing unit in association with time elapsed from start time of executing a program. A cache-miss-data address obtaining unit obtains cache-miss-data addresses in association with the time elapsed from the start time of executing the program, and a cycle determining unit determines a cycle of time required for executing the program. An identifying unit identifies a prefetch position in a cycle in which a prefetch-target address is to be prefetched by associating the cycle determined by the cycle determining unit with the cache-miss data addresses obtained by the cache-miss-data address obtaining unit. The prefetch-target address is an address of data on which prefetch processing is to be performed.Type: GrantFiled: March 5, 2008Date of Patent: August 23, 2011Assignee: Fujitsu LimitedInventors: Shuji Yamamura, Takashi Aoki
-
Patent number: 7996203Abstract: A method, system, and computer program product are provided for verifying out of order instruction address (IA) stride prefetch performance in a processor design having more than one level of cache hierarchies. Multiple instruction streams are generated and the instructions loop back to corresponding instruction addresses. The multiple instruction streams are dispatched to a processor and simulation application to process. When a particular instruction is being dispatched, the particular instruction's instruction address and operand address are recorded in the queue. The processor is monitored to determine if the processor executes fetch and prefetch commands in accordance with the simulation application. It is checked to determine if prefetch commands are issued for instructions having three or more strides.Type: GrantFiled: January 31, 2008Date of Patent: August 9, 2011Assignee: International Business Machines CorporationInventors: Wei-Yi Xiao, Dean G. Bair, Christopher A. Krygowski, Chung-Lung K. Shum