Of Multiple Instructions Simultaneously Patents (Class 712/206)
  • Publication number: 20140281389
    Abstract: Methods and apparatus are disclosed for fusing instructions to provide OR-test and AND-test functionality on multiple test sources. Some embodiments include fetching instructions, said instructions including a first instruction specifying a first operand destination, a second instruction specifying a second operand source, and a third instruction specifying a branch condition. A portion of the plurality of instructions are fused into a single micro-operation, the portion including both the first and second instructions if said first operand destination and said second operand source are the same, and said branch condition is dependent upon the second instruction. Some embodiments generate a novel test instruction dynamically by fusing one logical instruction with a prior-art test instruction. Other embodiments generate the novel test instruction through a just-in-time compiler.
    Type: Application
    Filed: March 15, 2013
    Publication date: September 18, 2014
    Inventors: MAXIM LOKTYUKHIN, ROBERT VALENTINE, JULIAN C. HORN, MARK J. CHARNEY
  • Publication number: 20140281368
    Abstract: An example method for executing multiple instructions in one or more slots includes receiving a packet including multiple instructions and executing the multiple instructions in one or more slots in a time shared manner. Each slot is associated with an execution data path or a memory data path. An example method for executing at least one instruction in a plurality of phases includes receiving a packet including an instruction, splitting the instruction into a plurality of phases, and executing the instruction in the plurality of phases.
    Type: Application
    Filed: March 14, 2013
    Publication date: September 18, 2014
    Applicant: QUALCOMM INCORPORATED
    Inventors: Ajay Anant Ingle, Lucian Codrescu, David J. Hoyle, Jose Fridman, Marc M. Hoffman, Deepak Mathew
  • Patent number: 8832415
    Abstract: A multiprocessor system includes nodes. Each node includes a data path that includes a core, a TLB, and a first level cache implementing disambiguation. The system also includes at least one second level cache and a main memory. For thread memory access requests, the core uses an address associated with an instruction format of the core. The first level cache uses an address format related to the size of the main memory plus an offset corresponding to hardware thread meta data. The second level cache uses a physical main memory address plus software thread meta data to store the memory access request. The second level cache accesses the main memory using the physical address with neither the offset nor the thread meta data after resolving speculation. In short, this system includes mapping of a virtual address to a different physical addresses for value disambiguation for different threads.
    Type: Grant
    Filed: January 4, 2011
    Date of Patent: September 9, 2014
    Assignee: International Business Machines Corporation
    Inventors: Alan Gala, Martin Ohmacht
  • Publication number: 20140215187
    Abstract: A system and method for efficiently processing instructions in hardware parallel execution lanes within a processor. In response to a given divergent point within an identified loop, a compiler generates code wherein when executed determines a size of a next very large instruction world (VLIW) to process and determine multiple pointer values to store in multiple corresponding PC registers in a target processor. The updated PC registers point to instructions intermingled from different basic blocks between the given divergence point and a corresponding convergence point. The target processor includes a single instruction multiple data (SIMD) micro-architecture. The assignment for a given lane is based on branch direction found at runtime for the given lane at the given divergent point. The processor includes a vector register for mapping PC registers to execution lanes.
    Type: Application
    Filed: January 29, 2013
    Publication date: July 31, 2014
    Applicant: ADVANCED MICRO DEVICES, INC.
    Inventor: Reza Yazdani
  • Publication number: 20140208074
    Abstract: In one embodiment, a multi-strand system with a pipeline includes a front-end unit, an instruction scheduling unit (ISU), and a back-end unit. The front-end unit performs an out-of-order fetch of interdependent instructions queued using a front-end buffer. The ISU dedicates two hardware entries per strand for checking operand-readiness of an instruction and for determining an execution port to which the instruction is dispatched. The back-end unit receives instructions dispatched from the hardware device and stores the instructions until they are executed. Other embodiments are described and claimed.
    Type: Application
    Filed: March 30, 2012
    Publication date: July 24, 2014
    Inventors: Boris A. Babayan, Vladimir Pentkovski, Jayesh Iyer, Nikolay Kosarev, Sergey Y. Shishlov, Alexander V. Butuzov, Alexey Y. Sivtsov
  • Publication number: 20140181474
    Abstract: Methods and apparatus for performing an atomic hardware operation (HWOP) instruction. According to a method in a computer processor coupled to a memory, the method includes fetching, decoding, and executing the atomic HWOP instruction. The instruction includes a source operand indicating a source location and a destination operand indicating a destination location, wherein each of the source location and the destination location is either a register of the computer processor or an address of the memory. Executing the atomic HWOP instruction includes sending a message to an external agent to cause the external agent to atomically access a set of one or more memory locations of the memory based upon a value stored at the source location, and return a result obtained from said atomic access of the set of memory locations to the destination location. The external agent is external to the computer processor.
    Type: Application
    Filed: December 26, 2012
    Publication date: June 26, 2014
    Applicant: TELEFONAKTIEBOLAGET L M ERICSSON (PUBL)
    Inventors: EVAN GEWIRTZ, ROBERT HATHAWAY, EDWARD HO, STEPHAN MEIER
  • Patent number: 8745359
    Abstract: A VLIW processor executes a very long instruction word containing a plurality of instructions, and executes a plurality of instruction streams at low cost. A processor executing a very long instruction word containing a plurality of instructions fetches concurrently the very long instruction words of up to M instruction streams, from N instruction caches including a plurality of memory banks to store the very long instruction words of the M instruction streams.
    Type: Grant
    Filed: February 3, 2009
    Date of Patent: June 3, 2014
    Assignee: NEC Corporation
    Inventor: Shohei Nomoto
  • Publication number: 20140075157
    Abstract: Processor pipeline controlling techniques are described which take advantage of the variation in critical path lengths of different instructions to achieve increased performance. By examining a processor's instruction set and execution unit implementation's critical timing paths, instructions are classified into speed classes. Based on these speed classes, one pipeline is presented where hold signals are used to dynamically control the pipeline based on the instruction class in execution. An alternative pipeline supporting multiple classes of instructions is presented where the pipeline clocking is dynamically changed as a result of decoded instruction class signals. A single pass synthesis methodology for multi-class execution stage logic is also described. For dynamic class variable pipeline processors, the mix of instructions can have a great effect on processor performance and power utilization since both can vary by the program mix of instruction classes.
    Type: Application
    Filed: February 28, 2013
    Publication date: March 13, 2014
    Applicant: Altera Corporation
    Inventors: Edwin Franklin Barry, Gerald George Pechanek, Patrick R. Marchand
  • Patent number: 8671400
    Abstract: A technique includes providing first objects that are associated with an application session and in a processor-based system, identifying second objects in another application session corresponding to the first objects based at least in part on a comparison of the second objects to matching rules associated with the first objects.
    Type: Grant
    Filed: December 23, 2009
    Date of Patent: March 11, 2014
    Assignee: Intel Corporation
    Inventors: Christopher J. Cormack, Nathaniel Duca, Joseph D. Matarazzo
  • Publication number: 20140019720
    Abstract: A computer processor includes a decoder for decoding machine instructions and an execution unit for executing those instructions. The decoder and the execution unit are capable of decoding and executing vector instructions that include one or more format conversion indicators. For instance, the processor may be capable of executing a vector-load-convert-and-write (VLoadConWr) instruction that provides for loading data from memory to a vector register. The VLoadConWr instruction may include a format conversion indicator to indicate that the data from memory should be converted from a first format to a second format before the data is loaded into the vector register. Other embodiments are described and claimed.
    Type: Application
    Filed: February 7, 2013
    Publication date: January 16, 2014
    Inventors: Eric Sprangle, Robert D. Cavin, Anwar Rohillah, Douglas M. Carmean
  • Patent number: 8624906
    Abstract: A method and system for graphics instruction fetching. The method includes executing a plurality of threads in a multithreaded execution environment. A respective plurality of instructions are fetched to support the execution of the threads. During runtime, at least one instruction is prefetched for one of the threads to a prefetch buffer. The at least one instruction is accessed from the prefetch buffer if required by the one thread and discarded if not required by the one thread.
    Type: Grant
    Filed: September 29, 2004
    Date of Patent: January 7, 2014
    Assignee: Nvidia Corporation
    Inventor: Andrew D. Bowen
  • Publication number: 20130290677
    Abstract: An apparatus having a buffer and a circuit is disclosed. The buffer may be configured to store a plurality of fetch sets. Each fetch set generally includes a prefix word and a plurality of instruction words. Each prefix word may include a plurality of symbols. Each symbol generally corresponds to a respective one of the instruction words. The circuit may be configured to (i) identify each of the symbols in each of the fetch sets having a predetermined value and (ii) parse the fetch sets into a plurality of execution sets in response to the symbols having the predetermined value.
    Type: Application
    Filed: April 26, 2012
    Publication date: October 31, 2013
    Inventors: Alexander Rabinovitch, Leonid Dubrovin
  • Patent number: 8564604
    Abstract: Systems and methods for improving throughput of a graphics processing unit are disclosed. In one embodiment, a system includes a multithreaded execution unit capable of processing requests to access a constant cache, a vertex attribute cache, at least one common register file, and an execution unit data path substantially simultaneously.
    Type: Grant
    Filed: April 21, 2010
    Date of Patent: October 22, 2013
    Assignee: VIA Technologies, Inc.
    Inventor: Yang (Jeff) Jiao
  • Patent number: 8566566
    Abstract: There is provided a vector processing apparatus and method allowing for the parallel processing of a plurality of different instructions while maintaining vector processing architecture. The vector processing apparatus includes an instruction memory storing a multiple instruction group including one or more instructions; an instruction fetch unit reading the multiple instruction group from the instruction memory; and a plurality of instruction processing units each receiving the multiple instruction group through the instruction fetch unit, selecting a single instruction from the multiple instruction group according to a previous arithmetic result, and performing a arithmetic operation.
    Type: Grant
    Filed: August 2, 2010
    Date of Patent: October 22, 2013
    Assignee: Electronics and Telecommunications Research Institute
    Inventors: Moo Kyoung Chung, Young Su Kwon, Kyung Su Kim
  • Publication number: 20130262825
    Abstract: Obfuscating a multi-threaded computer program is carried out using an instruction pipeline in a computer processor by streaming first instructions of a first thread of a multi-threaded computer application program into the pipeline, the first instructions entering the pipeline at the fetch stage, detecting a stall signal indicative of a stall condition in the pipeline, and responsively to the stall signal injecting second instructions of a second thread of the multi-threaded computer application program into the pipeline. The injected second instructions enter the pipeline at an injection stage that is disposed downstream from the fetch stage up to and including the register stage for processing therein. The stall condition exists at one of the stages that is located upstream from the in injection stage.
    Type: Application
    Filed: November 14, 2011
    Publication date: October 3, 2013
    Applicant: Cisco Technology Inc.
    Inventors: David Darmon, Uri Kaluzhny
  • Publication number: 20130232359
    Abstract: Embodiments of a processing architecture are described. The architecture includes a fetch unit for fetching instructions from a data bus. A scheduler receives data from the fetch unit and creates a schedule allocates the data and schedule to a plurality of computational units. The scheduler also modifies voltage and frequency settings of the processing architecture to optimize power consumption and throughput of the system. The computational units include control units and execute units. The control units receive and decode the instructions and send the decoded instructions to execute units. The execute units then execute the instructions according to relevant software.
    Type: Application
    Filed: March 1, 2012
    Publication date: September 5, 2013
    Applicant: NXP B.V.
    Inventors: Hamed Fatemi, Ajay Kapoor, J. Pineda de Gyvez
  • Publication number: 20130205118
    Abstract: A computer system for instruction execution includes a processor having a pipeline. The system is configured to perform a method including fetching, in the pipeline, a plurality of instructions, wherein the plurality of instructions includes a plurality of branch instructions, for each of the plurality of branch instructions, assigning a branch uncertainty to each of the plurality of branch instructions, for each of the plurality of instructions, assigning an instruction uncertainty that is a summation of branch uncertainties of older unresolved branches and balancing the instructions, based on a current summation of instruction uncertainty, in the pipeline.
    Type: Application
    Filed: February 6, 2012
    Publication date: August 8, 2013
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Alper Buyuktosunoglu, Brian R. Prasky, Vijayalakshmi Srinivasan
  • Publication number: 20130179662
    Abstract: An address divergence unit detects divergence between threads in a thread group and then separates those threads into a subset of non-divergent threads and a subset of divergent threads. In one embodiment, the address divergence unit causes instructions associated with the subset of non-divergent threads to be issued for execution on a parallel processing unit, while causing the instructions associated with the subset of divergent threads to be re-fetched and re-issued for execution.
    Type: Application
    Filed: January 11, 2012
    Publication date: July 11, 2013
    Inventors: Jack CHOQUETTE, Xiaogang Qiu, Jeff Tuckey, Michael (Ming Yiu) Siu, Robert J. Stoll, Olivier Giroux
  • Publication number: 20130166881
    Abstract: Systems and methods for scheduling instructions using pre-decode data corresponding to each instruction. In one embodiment, a multi-core processor includes a scheduling unit in each core for selecting instructions from two or more threads each scheduling cycle for execution on that particular core. As threads are scheduled for execution on the core, instructions from the threads are fetched into a buffer without being decoded. The pre-decode data is determined by a compiler and is extracted by the scheduling unit during runtime and used to control selection of threads for execution. The pre-decode data may specify a number of scheduling cycles to wait before scheduling the instruction. The pre-decode data may also specify a scheduling priority for the instruction. Once the scheduling unit selects an instruction to issue for execution, a decode unit fully decodes the instruction.
    Type: Application
    Filed: December 21, 2011
    Publication date: June 27, 2013
    Inventors: Jack Hilaire CHOQUETTE, Robert J. Stoll, Olivier Giroux
  • Publication number: 20130166882
    Abstract: Systems and methods for scheduling instructions without instruction decode. In one embodiment, a multi-core processor includes a scheduling unit in each core for scheduling instructions from two or more threads scheduled for execution on that particular core. As threads are scheduled for execution on the core, instructions from the threads are fetched into a buffer without being decoded. The scheduling unit includes a macro-scheduler unit for performing a priority sort of the two or more threads and a micro-scheduler arbiter for determining the highest order thread that is ready to execute. The macro-scheduler unit and the micro-scheduler arbiter use pre-decode data to implement the scheduling algorithm. The pre-decode data may be generated by decoding only a small portion of the instruction or received along with the instruction. Once the micro-scheduler arbiter has selected an instruction to dispatch to the execution unit, a decode unit fully decodes the instruction.
    Type: Application
    Filed: December 22, 2011
    Publication date: June 27, 2013
    Inventors: Jack Hilaire CHOQUETTE, Robert J. STOLL, Olivier GIROUX, Michael FETTERMAN, Shirish GADRE, Robert Steven GLANVILLE, Alexandre JOLY
  • Publication number: 20130151816
    Abstract: Methods, systems, and computer program products may provide delay-identification in data processing systems. An apparatus may include a delay-identification unit having a delay counter, a threshold register, a delay register, and a delay detector. The delay detector may be configured to start the delay counter in response to detecting that one group of instructions is delayed, and stop the delay counter in response to detecting that the one group of instructions is no longer delayed. The delay detector may additionally be configured to compare the number of cycles counted by the delay counter with a threshold number of cycles in the threshold register, and store at least one effective address of one of the instructions of the one group of instructions when the number of cycles counted by the delay counter is greater than the threshold number of cycles stored in the threshold register.
    Type: Application
    Filed: December 7, 2011
    Publication date: June 13, 2013
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Venkat R. Indukuru, Alexander E. Mericas
  • Patent number: 8458414
    Abstract: A memory accessing method including the following steps is provided. Firstly, two instructions are fetched. Next, the two instructions are respectively decoded to obtain two operation fields and two address fields. The two operation fields indicate the type of operation in accessing the memory. One of the address fields includes a first upper address corresponding to the first memory block and a first lower address corresponding to a first memory unit of the first memory block. The other one of the two address fields includes a second upper address corresponding to the second memory block and a second lower address corresponding to a second memory unit of the second memory block. Then, whether two instructions are performing the same type of operation on the same memory block is determined. If yes, the type of operation indicated by the two operation fields is performed on the corresponding memory block parallelly.
    Type: Grant
    Filed: April 8, 2010
    Date of Patent: June 4, 2013
    Assignee: Realtek Semiconductor Corporation
    Inventors: Sheng-Yuan Jan, Yen-Ju Lu
  • Patent number: 8453161
    Abstract: This disclosure describes a method and system that may enable fast, hardware-assisted, producer-consumer style communication of values between threads. The method, in one aspect, uses a dedicated hardware buffer as an intermediary storage for transferring values from registers in one thread to registers in another thread. The method may provide a generic, programmable solution that can transfer any subset of register values between threads in any given order, where the source and target registers may or may not be correlated. The method also may allow for determinate access times, since it completely bypasses the memory hierarchy. Also, the method is designed to be lightweight, focusing on communication, and keeping synchronization facilities orthogonal to the communication mechanism. It may be used by a helper thread that performs data prefetching for an application thread, for example, to initialize the upward-exposed reads in the address computation slice of the helper thread code.
    Type: Grant
    Filed: May 25, 2010
    Date of Patent: May 28, 2013
    Assignee: International Business Machines Corporation
    Inventors: Michael K. Gschwind, John K. O'Brien, Valentina Salapura, Zehra N. Sura
  • Patent number: 8447911
    Abstract: A method and processor for providing full load/store queue functionality to an unordered load/store queue for a processor with out-of-order execution. Load and store instructions are inserted in a load/store queue in execution order. Each entry in the load/store queue includes an identification corresponding to a program order. Conflict detection in such an unordered load/store queue may be performed by searching a first CAM for all addresses that are the same or overlap with the address of the load or store instruction to be executed. A further search may be performed in a second CAM to identify those entries that are associated with younger or older instructions with respect to the sequence number of the load or store instruction to be executed. The output results of the Address CAM and Age CAM are logically ANDed.
    Type: Grant
    Filed: July 2, 2008
    Date of Patent: May 21, 2013
    Assignee: Board of Regents, University of Texas System
    Inventors: Douglas C. Burger, Stephen W. Keckler, Robert McDonald, Lakshminarasimhan Sethumadhavan, Franziska Roesner
  • Publication number: 20130117536
    Abstract: A reconfigurable instruction encoding method includes the followings. An instruction distribution of an application is counted, and multiple instruction pairs with higher utilization rates are accordingly found. Multiple instructions of the instruction pairs are duplicately encoded according to multiple reserved sections of an original instruction table, so that the instructions have corresponding reconfigured codes and a reconfigured instruction table extended from the original instruction table and including the reconfigured codes is obtained. A compiler is utilized to generate multiple machine codes according to the reconfigured instruction table and consecutive execution instructions. Hamming distance of the machine codes corresponding to the reconfigured instruction table and the execution instructions are not longer than Hamming distance of the machine codes generated according to the original instruction table and the execution instructions.
    Type: Application
    Filed: April 17, 2012
    Publication date: May 9, 2013
    Applicant: INDUSTRIAL TECHNOLOGY RESEARCH INSTITUTE
    Inventors: Huang-Lun Lin, Ching-Hsiang Chuang, Shui-An Wen
  • Publication number: 20130117535
    Abstract: A method includes executing a branch instruction and determining if a branch is taken. The method further includes evaluating a number of instructions associated with the branch instruction. Upon determining that the branch is taken, the method includes selectively writing an entry into a branch target buffer that corresponds to the taken branch responsive to determining that the number of instructions is less than a threshold.
    Type: Application
    Filed: November 4, 2011
    Publication date: May 9, 2013
    Applicant: QUALCOMM INCORPORATED
    Inventors: Suresh K. Venkumahanti, Lucian Codrescu, Suman Mamidi
  • Patent number: 8405670
    Abstract: A multithreaded rendering software pipeline architecture utilizes a rolling texture context data structure to store multiple texture contexts that are associated with different textures that are being processed in the software pipeline. Each texture context stores state data for a particular texture, and facilitates the access to texture data by multiple, parallel stages in a software pipeline. In addition, texture contexts are capable of being “rolled”, or copied to enable different stages of a rendering pipeline that require different state data for a particular texture to separately access the texture data independently from one another, and without the necessity for stalling the pipeline to ensure synchronization of shared texture data among the stages of the pipeline.
    Type: Grant
    Filed: May 25, 2010
    Date of Patent: March 26, 2013
    Assignee: International Business Machines Corporation
    Inventors: Eric O. Mejdrich, Paul E. Schardt, Robert A. Shearer
  • Patent number: 8387053
    Abstract: A method of performing operations in a computer system, computer system, and related method of compilation, are disclosed. In one embodiment, the method of performing includes providing compiled code having at least one thread, where each of the at least one thread includes a respective plurality of blocks and each respective block includes a respective pre-fetch component and a respective execute component. The method also includes performing a first pre-fetch component from a first block of a first thread of the at least one thread, performing a first additional component after the first pre-fetch component has been performed, and performing a first execute component from the first block of the first thread. The first execute component is performed after the first additional component has been performed, and the first additional component is from either a second thread or another block of the first thread that is not the first block.
    Type: Grant
    Filed: January 25, 2007
    Date of Patent: February 26, 2013
    Assignee: Hewlett-Packard Development Company, L.P.
    Inventors: Blaine D. Gaither, Verna Knapp, Jerome Huck, Benjamin D. Osecky
  • Patent number: 8362880
    Abstract: A method and apparatus for loading and executing program code at a micro-processor are disclosed. In this method, a monitoring procedure is performed to monitor whether the micro-processor receives a loading request corresponding to a target program code. If the loading request is received, the target program code is loaded from an external memory into an internal memory of the micro-processor. The micro-processor is then rebooted to enter a first mode in which the target program code in the internal memory is to be executed.
    Type: Grant
    Filed: April 27, 2009
    Date of Patent: January 29, 2013
    Assignee: MStar Semiconductor, Inc.
    Inventors: Chih-Hua Huang, Chih Yen Chang
  • Patent number: 8346760
    Abstract: In one embodiment, the invention provides a method comprising determining metadata encoded in instructions of a stored program; and executing the stored program based on the metadata.
    Type: Grant
    Filed: May 20, 2009
    Date of Patent: January 1, 2013
    Assignee: Intel Corporation
    Inventors: Hong Wang, John Shen, Ali-Reza Adl-Tabatabai, Anwar Ghuloum
  • Publication number: 20120297168
    Abstract: Processing instruction grouping information is provided that includes: reading addresses of machine instructions grouped by a processor at runtime from a buffer to form an address file; analyzing the address file to obtain grouping information of the machine instructions; converting the machine instructions in the address file into readable instructions; and obtaining grouping information of the readable instructions based on the grouping information of the machine instructions and the readable instructions resulted from conversion. Status of grouping and processing performed on instructions by a processor at runtime can be acquired dynamically, such that processing capability of the processor can be better utilized.
    Type: Application
    Filed: April 26, 2012
    Publication date: November 22, 2012
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Qin Yue Chen, Qi Liang, Hong Chang Lin, Feng Liu
  • Publication number: 20120233441
    Abstract: An instruction buffer for a processor configured to execute multiple threads is disclosed. The instruction buffer is configured to receive instructions from a fetch unit and provide instructions to a selection unit. The instruction buffer includes one or more memory arrays comprising a plurality of entries configured to store instructions and/or other information (e.g., program counter addresses). One or more indicators are maintained by the processor and correspond to the plurality of threads. The one or more indicators are usable such that for instructions received by the instruction buffer, one or more of the plurality entries of a memory array can be determined as a write destination for the received instructions, and for instructions to be read from the instruction buffer (and sent to a selection unit), one or more entries can be determined as the correct source location from which to read.
    Type: Application
    Filed: March 7, 2011
    Publication date: September 13, 2012
    Inventors: Jama I. Barreh, Robert T. Golla, Manish K. Shah
  • Patent number: 8261021
    Abstract: In order to control an access request to the cache shared between a plurality of threads, a storage unit for storing a flag provided in association with each of the threads is included. If the threads enter the execution of an atomic instruction, a defined value is written to the flags stored in the storage unit. Furthermore, if the atomic instruction is completed, a defined value different from the above defined value is written, thereby displaying whether or not the threads are executing the atomic instruction. If an access request is issued from a certain thread, it is judged whether or not a thread different from the certain thread is executing the atomic instruction by referencing the flag values in the storage unit. If it is judged that another thread is executing the atomic instruction, the access request is kept standby. This makes it possible to realize the exclusive control processing necessary for processing the atomic instruction according to simple configuration.
    Type: Grant
    Filed: December 17, 2009
    Date of Patent: September 4, 2012
    Assignee: Fujitsu Limited
    Inventor: Naohiro Kiyota
  • Publication number: 20120198209
    Abstract: A method for translating instructions for a processor. The method includes accessing a plurality of guest instructions that comprise multiple guest branch instructions comprising at least one guest far branch, and building an instruction sequence from the plurality of guest instructions by using branch prediction on the at least one guest far branch. The method further includes assembling a guest instruction block from the instruction sequence. The guest instruction block is translated to a corresponding native conversion block, wherein an at least one native far branch that corresponds to the at least one guest far branch and wherein the at least one native far branch includes an opposite guest address for an opposing branch path of the at least one guest far branch. Upon encountering a missprediction, a correct instruction sequence is obtained by accessing the opposite guest address.
    Type: Application
    Filed: January 27, 2012
    Publication date: August 2, 2012
    Applicant: SOFT MACHINES, INC.
    Inventor: Mohammad Abdallah
  • Patent number: 8205204
    Abstract: An multi-threading processor is provided. The multi-threading processor includes a first instruction fetch unit to receive a first thread and a second instruction fetch unit to receive a second thread. A multi-thread scheduler coupled to the instruction fetch units and a execution unit. The multi-thread scheduler determines the width of the execution unit and the execution unit executes the threads accordingly.
    Type: Grant
    Filed: January 23, 2009
    Date of Patent: June 19, 2012
    Assignee: Intel Corporation
    Inventors: Ken Shoemaker, Sailesh Kottapalli, Kin-Kee Sit
  • Publication number: 20120144164
    Abstract: An information handling system includes a processor that may perform general purpose register recovery operations after an instruction flush operation that an exception, such as a branch misprediction causes. The processor receives an instruction stream that may include multiple instructions that operate on a particular target register that stores instruction result information. The general purpose register may temporarily store instruction opcode and register bits information for use during dispatch, execution and other operations. The processor includes a recovery buffer unit for use during flush recovery operations. The processor may use recovery valid and recovery pending bits that correspond with each instruction during the register recovery from flush operation.
    Type: Application
    Filed: February 14, 2012
    Publication date: June 7, 2012
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventor: Dung Quoc Nguyen
  • Patent number: 8179540
    Abstract: An image forming apparatus is provided that holds counter information obtained by integrating a consumption of a consumable that depends on usage of service provided by the image forming apparatus. A log corresponding to the usage of the service is set in job log information with a synchronization flag set off. The log in the job log information, for which the synchronization flag is set off, is set on. The counter information and the job log information are output after the synchronization flag for the log having the synchronization flag set off has been set on.
    Type: Grant
    Filed: October 29, 2008
    Date of Patent: May 15, 2012
    Assignee: Canon Kabushiki Kaisha
    Inventors: Junichi Hiruma, Nobuyuki Tonegawa
  • Patent number: 8179896
    Abstract: A network processor of an embodiment includes a packet classification engine, a processing pipeline, and a controller. The packet classification engine allows for classifying each of a plurality of packets according to packet type. The processing pipeline has a plurality of stages for processing each of the plurality of packets in a pipelined manner, where each stage includes one or more processors. The controller allows for providing the plurality of packets to the processing pipeline in an order that is based at least partially on: (i) packet types of the plurality of packets as classified by the packet classification engine and (ii) estimates of processing times for processing packets of the packet types at each stage of the plurality of stages of the processing pipeline. A method in a network processor allows for prefetching instructions into a cache for processing a packet based on a packet type of the packet.
    Type: Grant
    Filed: November 7, 2007
    Date of Patent: May 15, 2012
    Inventor: Justin Mark Sobaje
  • Publication number: 20120110306
    Abstract: A method of responding to an attempt to write a memory address including a target instruction which has been translated to a host instruction for execution by a host processor including the steps of marking a memory address including a target instruction which has been translated to a host instruction, detecting a memory address which has been marked when an attempt is made to write to the memory address, and responding to the detection of a memory address which has been marked by protecting a target instruction at the memory address until it has been assured that translations associated with the memory address will not be utilized before being updated.
    Type: Application
    Filed: September 23, 2011
    Publication date: May 3, 2012
    Inventors: Edmund J. Kelly, Robert F. Cmelik, Malcolm J. Wing
  • Patent number: 8171260
    Abstract: The invention provides a method and apparatus for branch prediction in a processor. A fetch-block branch target buffer is used in an early stage of pipeline processing before the instruction is decoded, which stores information about a control transfer instruction for a “block” of instruction memory. The block of instruction memory is represented by a block entry in the fetch-block branch target buffer. The block entry represents one recorded control-transfer instruction (such as a branch instruction) and a set of sequentially preceding instructions, up to a fixed maximum length N. Indexing into the fetch-block branch target buffer yields an answer whether the block entry represents memory that contains a previously executed a control-transfer instruction, a length value representing the amount of memory that contains the instructions represented by the block, and an indicator for the type of control-transfer instruction that terminates the block, its target and outcome.
    Type: Grant
    Filed: June 23, 2009
    Date of Patent: May 1, 2012
    Assignee: STMicroelectronics, Inc.
    Inventors: Anatoly Gelman, Russell Lawrence Schnapp
  • Patent number: 8166259
    Abstract: A memory control apparatus, a memory control method and an information processing system are disclosed. Fetch response data retrieved from a main storage unit is received, while bypassing a storage unit, by a first port in which the received fetch response data can be set. The fetch response data retrieved from the main storage unit, if unable to be set in the first port, is set in a second port through the storage unit. A transmission control unit performs priority control operation to send out, in accordance with a predetermined priority, the fetch response data set in the first port or the second port to the processor. As a result, the latency is shortened from the time when the fetch response data arrives to the time when the fetch response data is sent out toward the processor in response to a fetch request from the processor.
    Type: Grant
    Filed: March 26, 2009
    Date of Patent: April 24, 2012
    Assignee: Fujitsu Limited
    Inventor: Souta Kusachi
  • Patent number: 8145885
    Abstract: A processor interleaves instructions according to a priority rule which determines the frequency with which instructions from each respective thread are selected and added to an interleaved stream of instructions to be processed in the data processor. The frequency with which each thread is selected according to the rule may be based on the priorities assigned to the instruction threads. A randomization is inserted into the interleaving process so that the selection of an instruction thread during any particular clock cycle is not based solely by the priority rule, but is also based in part on a random or pseudo random element. This randomization is inserted into the instruction thread selection process so as to vary the order in which instructions are selected from the various instruction threads while preserving the overall frequency of thread selection (i.e. how often threads are selected) set by the priority rule.
    Type: Grant
    Filed: April 30, 2008
    Date of Patent: March 27, 2012
    Assignee: International Business Machines Corporation
    Inventors: Ronald Nick Kalla, Minh Michelle Quy Pham, Balaram Sinharoy, John Wesley Ward, III
  • Publication number: 20120066480
    Abstract: A processor includes: an instruction fetch portion configured to fetch simultaneously a plurality of fixed-length instructions in accordance with a program counter; an instruction predecoder configured to predecode specific fields in a part of the plurality of fixed-length instructions; and a program counter management portion configured to control an increment of the program counter in accordance with a result of the predecoding.
    Type: Application
    Filed: July 22, 2011
    Publication date: March 15, 2012
    Applicant: Sony Corporation
    Inventors: Hirokazu Hanaki, Satoshi Takashima
  • Publication number: 20120060017
    Abstract: A processor including L computing units, L being an integer of 2 or greater, the processor comprising: an instruction buffer including M×Z instruction storage areas each storing one instruction, M instruction streams being input in a state of being distinguished from each other, each of the M instruction streams including Z instructions, M and Z each being an integer of 2 or greater, M×Z being equal to or greater than L; an order information holding unit holding order information that indicates an order of the M×Z instruction storage areas; an extraction unit operable to extract instructions from the M×Z instruction storage areas; and a control unit operable to cause the extraction unit to extract L instructions in executable state from the M×Z instruction storage areas in accordance with the order indicated by the order information, and input the instructions into different ones of the L computing units.
    Type: Application
    Filed: May 18, 2010
    Publication date: March 8, 2012
    Inventor: Hiroyuki Morishita
  • Patent number: 8095775
    Abstract: During operation of a VLIW processor, a very long instruction word is fetched. A portion of the very long instruction word that includes a pointer to an instruction is identified, and the instruction pointed to by the pointer is retrieved from a location of an instruction window. The retrieved instruction is input into a functional unit for execution.
    Type: Grant
    Filed: November 19, 2008
    Date of Patent: January 10, 2012
    Assignee: Marvell International Ltd.
    Inventors: Moinul H. Khan, Anitha Kona, Mark N. Fullerton
  • Patent number: 8037283
    Abstract: In a multi-core stream processing system and scheduling method of the same, a scheduler is coupled to a number (N) of stream processing units and a number (N+1) of stream fetching units, where N?2. When the scheduler receives a stream element from a Pth stream fetching unit, the scheduler assigns a Pth stream processing unit as a target stream processing unit when the Pth stream processing unit does not encounter a bottleneck condition, assigns a Qth stream processing unit, which does not encounter the bottleneck condition, as the target stream processing unit when the Pth stream processing unit encounters the bottleneck condition, where 1?P?N, 1?Q?N, and P?Q, and dispatches the received stream element to the target stream processing unit such that the target stream processing unit processes the stream element dispatched from the scheduler.
    Type: Grant
    Filed: May 5, 2009
    Date of Patent: October 11, 2011
    Assignee: National Taiwan University
    Inventors: You-Ming Tsao, Liang-Gee Chen, Shao-Yi Chien
  • Publication number: 20110225396
    Abstract: Techniques are described for decoupling fetching of an instruction stored in a main program memory from earliest execution of the instruction. An indirect execution method and program instructions to support such execution are addressed. In addition, an improved indirect deferred execution processor (DXP) VLIW architecture is described which supports a scalable array of memory centric processor elements that do not require local load and store units.
    Type: Application
    Filed: May 18, 2011
    Publication date: September 15, 2011
    Applicant: ALTERA CORPORATION
    Inventors: Gerald George Pechanek, Stamatis Vassiliadis
  • Patent number: 8015391
    Abstract: A processor simultaneously issues instructions to multiple threads in a same instruction execution cycle. An instruction issuer controls issuance of an instruction for each of the multiple threads. A detector detects, for each of the multiple threads, whether a loop processing is currently being executed. A unit causes the instruction issuer to increase a number of instructions to be issued when the detector detects that the loop processing is currently being executed.
    Type: Grant
    Filed: October 8, 2010
    Date of Patent: September 6, 2011
    Assignee: Panasonic Corporation
    Inventor: Takenobu Tani
  • Patent number: 8006041
    Abstract: A prefetch processing apparatus includes a central-processing-unit monitor unit that monitors processing states of the central processing unit in association with time elapsed from start time of executing a program. A cache-miss-data address obtaining unit obtains cache-miss-data addresses in association with the time elapsed from the start time of executing the program, and a cycle determining unit determines a cycle of time required for executing the program. An identifying unit identifies a prefetch position in a cycle in which a prefetch-target address is to be prefetched by associating the cycle determined by the cycle determining unit with the cache-miss data addresses obtained by the cache-miss-data address obtaining unit. The prefetch-target address is an address of data on which prefetch processing is to be performed.
    Type: Grant
    Filed: March 5, 2008
    Date of Patent: August 23, 2011
    Assignee: Fujitsu Limited
    Inventors: Shuji Yamamura, Takashi Aoki
  • Patent number: 7996203
    Abstract: A method, system, and computer program product are provided for verifying out of order instruction address (IA) stride prefetch performance in a processor design having more than one level of cache hierarchies. Multiple instruction streams are generated and the instructions loop back to corresponding instruction addresses. The multiple instruction streams are dispatched to a processor and simulation application to process. When a particular instruction is being dispatched, the particular instruction's instruction address and operand address are recorded in the queue. The processor is monitored to determine if the processor executes fetch and prefetch commands in accordance with the simulation application. It is checked to determine if prefetch commands are issued for instructions having three or more strides.
    Type: Grant
    Filed: January 31, 2008
    Date of Patent: August 9, 2011
    Assignee: International Business Machines Corporation
    Inventors: Wei-Yi Xiao, Dean G. Bair, Christopher A. Krygowski, Chung-Lung K. Shum