Of Multiple Instructions Simultaneously Patents (Class 712/206)

METHODS AND APPARATUS FOR FUSING INSTRUCTIONS TO PROVIDE OR-TEST AND AND-TEST FUNCTIONALITY ON MULTIPLE TEST SOURCES

Publication number: 20140281389

Abstract: Methods and apparatus are disclosed for fusing instructions to provide OR-test and AND-test functionality on multiple test sources. Some embodiments include fetching instructions, said instructions including a first instruction specifying a first operand destination, a second instruction specifying a second operand source, and a third instruction specifying a branch condition. A portion of the plurality of instructions are fused into a single micro-operation, the portion including both the first and second instructions if said first operand destination and said second operand source are the same, and said branch condition is dependent upon the second instruction. Some embodiments generate a novel test instruction dynamically by fusing one logical instruction with a prior-art test instruction. Other embodiments generate the novel test instruction through a just-in-time compiler.

Type: Application

Filed: March 15, 2013

Publication date: September 18, 2014

Inventors: MAXIM LOKTYUKHIN, ROBERT VALENTINE, JULIAN C. HORN, MARK J. CHARNEY
CYCLE SLICED VECTORS AND SLOT EXECUTION ON A SHARED DATAPATH

Publication number: 20140281368

Abstract: An example method for executing multiple instructions in one or more slots includes receiving a packet including multiple instructions and executing the multiple instructions in one or more slots in a time shared manner. Each slot is associated with an execution data path or a memory data path. An example method for executing at least one instruction in a plurality of phases includes receiving a packet including an instruction, splitting the instruction into a plurality of phases, and executing the instruction in the plurality of phases.

Type: Application

Filed: March 14, 2013

Publication date: September 18, 2014

Applicant: QUALCOMM INCORPORATED

Inventors: Ajay Anant Ingle, Lucian Codrescu, David J. Hoyle, Jose Fridman, Marc M. Hoffman, Deepak Mathew
Mapping virtual addresses to different physical addresses for value disambiguation for thread memory access requests

Patent number: 8832415

Abstract: A multiprocessor system includes nodes. Each node includes a data path that includes a core, a TLB, and a first level cache implementing disambiguation. The system also includes at least one second level cache and a main memory. For thread memory access requests, the core uses an address associated with an instruction format of the core. The first level cache uses an address format related to the size of the main memory plus an offset corresponding to hardware thread meta data. The second level cache uses a physical main memory address plus software thread meta data to store the memory access request. The second level cache accesses the main memory using the physical address with neither the offset nor the thread meta data after resolving speculation. In short, this system includes mapping of a virtual address to a different physical addresses for value disambiguation for different threads.

Type: Grant

Filed: January 4, 2011

Date of Patent: September 9, 2014

Assignee: International Business Machines Corporation

Inventors: Alan Gala, Martin Ohmacht
SOLUTION TO DIVERGENT BRANCHES IN A SIMD CORE USING HARDWARE POINTERS

Publication number: 20140215187

Abstract: A system and method for efficiently processing instructions in hardware parallel execution lanes within a processor. In response to a given divergent point within an identified loop, a compiler generates code wherein when executed determines a size of a next very large instruction world (VLIW) to process and determine multiple pointer values to store in multiple corresponding PC registers in a target processor. The updated PC registers point to instructions intermingled from different basic blocks between the given divergence point and a corresponding convergence point. The target processor includes a single instruction multiple data (SIMD) micro-architecture. The assignment for a given lane is based on branch direction found at runtime for the given lane at the given divergent point. The processor includes a vector register for mapping PC registers to execution lanes.

Type: Application

Filed: January 29, 2013

Publication date: July 31, 2014

Applicant: ADVANCED MICRO DEVICES, INC.

Inventor: Reza Yazdani
INSTRUCTION SCHEDULING FOR A MULTI-STRAND OUT-OF-ORDER PROCESSOR

Publication number: 20140208074

Abstract: In one embodiment, a multi-strand system with a pipeline includes a front-end unit, an instruction scheduling unit (ISU), and a back-end unit. The front-end unit performs an out-of-order fetch of interdependent instructions queued using a front-end buffer. The ISU dedicates two hardware entries per strand for checking operand-readiness of an instruction and for determining an execution port to which the instruction is dispatched. The back-end unit receives instructions dispatched from the hardware device and stores the instructions until they are executed. Other embodiments are described and claimed.

Type: Application

Filed: March 30, 2012

Publication date: July 24, 2014

Inventors: Boris A. Babayan, Vladimir Pentkovski, Jayesh Iyer, Nikolay Kosarev, Sergey Y. Shishlov, Alexander V. Butuzov, Alexey Y. Sivtsov
ATOMIC WRITE AND READ MICROPROCESSOR INSTRUCTIONS

Publication number: 20140181474

Abstract: Methods and apparatus for performing an atomic hardware operation (HWOP) instruction. According to a method in a computer processor coupled to a memory, the method includes fetching, decoding, and executing the atomic HWOP instruction. The instruction includes a source operand indicating a source location and a destination operand indicating a destination location, wherein each of the source location and the destination location is either a register of the computer processor or an address of the memory. Executing the atomic HWOP instruction includes sending a message to an external agent to cause the external agent to atomically access a set of one or more memory locations of the memory based upon a value stored at the source location, and return a result obtained from said atomic access of the set of memory locations to the destination location. The external agent is external to the computer processor.

Type: Application

Filed: December 26, 2012

Publication date: June 26, 2014

Applicant: TELEFONAKTIEBOLAGET L M ERICSSON (PUBL)

Inventors: EVAN GEWIRTZ, ROBERT HATHAWAY, EDWARD HO, STEPHAN MEIER
Processor for concurrently executing plural instruction streams

Patent number: 8745359

Abstract: A VLIW processor executes a very long instruction word containing a plurality of instructions, and executes a plurality of instruction streams at low cost. A processor executing a very long instruction word containing a plurality of instructions fetches concurrently the very long instruction words of up to M instruction streams, from N instruction caches including a plurality of memory banks to store the very long instruction words of the M instruction streams.

Type: Grant

Filed: February 3, 2009

Date of Patent: June 3, 2014

Assignee: NEC Corporation

Inventor: Shohei Nomoto
Methods and Apparatus for Adapting Pipeline Stage Latency Based on Instruction Type

Publication number: 20140075157

Abstract: Processor pipeline controlling techniques are described which take advantage of the variation in critical path lengths of different instructions to achieve increased performance. By examining a processor's instruction set and execution unit implementation's critical timing paths, instructions are classified into speed classes. Based on these speed classes, one pipeline is presented where hold signals are used to dynamically control the pipeline based on the instruction class in execution. An alternative pipeline supporting multiple classes of instructions is presented where the pipeline clocking is dynamically changed as a result of decoded instruction class signals. A single pass synthesis methodology for multi-class execution stage logic is also described. For dynamic class variable pipeline processors, the mix of instructions can have a great effect on processor performance and power utilization since both can vary by the program mix of instruction classes.

Type: Application

Filed: February 28, 2013

Publication date: March 13, 2014

Applicant: Altera Corporation

Inventors: Edwin Franklin Barry, Gerald George Pechanek, Patrick R. Marchand
Performance analysis of software executing in different sessions

Patent number: 8671400

Abstract: A technique includes providing first objects that are associated with an application session and in a processor-based system, identifying second objects in another application session corresponding to the first objects based at least in part on a comparison of the second objects to matching rules associated with the first objects.

Type: Grant

Filed: December 23, 2009

Date of Patent: March 11, 2014

Assignee: Intel Corporation

Inventors: Christopher J. Cormack, Nathaniel Duca, Joseph D. Matarazzo
METHODS, APPARATUS, AND INSTRUCTIONS FOR CONVERTING VECTOR DATA

Publication number: 20140019720

Abstract: A computer processor includes a decoder for decoding machine instructions and an execution unit for executing those instructions. The decoder and the execution unit are capable of decoding and executing vector instructions that include one or more format conversion indicators. For instance, the processor may be capable of executing a vector-load-convert-and-write (VLoadConWr) instruction that provides for loading data from memory to a vector register. The VLoadConWr instruction may include a format conversion indicator to indicate that the data from memory should be converted from a first format to a second format before the data is loaded into the vector register. Other embodiments are described and claimed.

Type: Application

Filed: February 7, 2013

Publication date: January 16, 2014

Inventors: Eric Sprangle, Robert D. Cavin, Anwar Rohillah, Douglas M. Carmean
Method and system for non stalling pipeline instruction fetching from memory

Patent number: 8624906

Abstract: A method and system for graphics instruction fetching. The method includes executing a plurality of threads in a multithreaded execution environment. A respective plurality of instructions are fetched to support the execution of the threads. During runtime, at least one instruction is prefetched for one of the threads to a prefetch buffer. The at least one instruction is accessed from the prefetch buffer if required by the one thread and discarded if not required by the one thread.

Type: Grant

Filed: September 29, 2004

Date of Patent: January 7, 2014

Assignee: Nvidia Corporation

Inventor: Andrew D. Bowen
EFFICIENT EXTRACTION OF EXECUTION SETS FROM FETCH SETS

Publication number: 20130290677

Abstract: An apparatus having a buffer and a circuit is disclosed. The buffer may be configured to store a plurality of fetch sets. Each fetch set generally includes a prefix word and a plurality of instruction words. Each prefix word may include a plurality of symbols. Each symbol generally corresponds to a respective one of the instruction words. The circuit may be configured to (i) identify each of the symbols in each of the fetch sets having a predetermined value and (ii) parse the fetch sets into a plurality of execution sets in response to the symbols having the predetermined value.

Type: Application

Filed: April 26, 2012

Publication date: October 31, 2013

Inventors: Alexander Rabinovitch, Leonid Dubrovin
Systems and methods for improving throughput of a graphics processing unit

Patent number: 8564604

Abstract: Systems and methods for improving throughput of a graphics processing unit are disclosed. In one embodiment, a system includes a multithreaded execution unit capable of processing requests to access a constant cache, a vertex attribute cache, at least one common register file, and an execution unit data path substantially simultaneously.

Type: Grant

Filed: April 21, 2010

Date of Patent: October 22, 2013

Assignee: VIA Technologies, Inc.

Inventor: Yang (Jeff) Jiao
Vector processing of different instructions selected by each unit from multiple instruction group based on instruction predicate and previous result comparison

Patent number: 8566566

Abstract: There is provided a vector processing apparatus and method allowing for the parallel processing of a plurality of different instructions while maintaining vector processing architecture. The vector processing apparatus includes an instruction memory storing a multiple instruction group including one or more instructions; an instruction fetch unit reading the multiple instruction group from the instruction memory; and a plurality of instruction processing units each receiving the multiple instruction group through the instruction fetch unit, selecting a single instruction from the multiple instruction group according to a previous arithmetic result, and performing a arithmetic operation.

Type: Grant

Filed: August 2, 2010

Date of Patent: October 22, 2013

Assignee: Electronics and Telecommunications Research Institute

Inventors: Moo Kyoung Chung, Young Su Kwon, Kyung Su Kim
Obfuscated Hardware Multi-Threading

Publication number: 20130262825

Abstract: Obfuscating a multi-threaded computer program is carried out using an instruction pipeline in a computer processor by streaming first instructions of a first thread of a multi-threaded computer application program into the pipeline, the first instructions entering the pipeline at the fetch stage, detecting a stall signal indicative of a stall condition in the pipeline, and responsively to the stall signal injecting second instructions of a second thread of the multi-threaded computer application program into the pipeline. The injected second instructions enter the pipeline at an injection stage that is disposed downstream from the fetch stage up to and including the register stage for processing therein. The stall condition exists at one of the stages that is located upstream from the in injection stage.

Type: Application

Filed: November 14, 2011

Publication date: October 3, 2013

Applicant: Cisco Technology Inc.

Inventors: David Darmon, Uri Kaluzhny
ENERGY EFFICIENT MICROPROCESSOR PLATFORM BASED ON INSTRUCTIONAL LEVEL PARALLELISM

Publication number: 20130232359

Abstract: Embodiments of a processing architecture are described. The architecture includes a fetch unit for fetching instructions from a data bus. A scheduler receives data from the fetch unit and creates a schedule allocates the data and schedule to a plurality of computational units. The scheduler also modifies voltage and frequency settings of the processing architecture to optimize power consumption and throughput of the system. The computational units include control units and execute units. The control units receive and decode the instructions and send the decoded instructions to execute units. The execute units then execute the instructions according to relevant software.

Type: Application

Filed: March 1, 2012

Publication date: September 5, 2013

Applicant: NXP B.V.

Inventors: Hamed Fatemi, Ajay Kapoor, J. Pineda de Gyvez
MULTI-THREADED PROCESSOR INSTRUCTION BALANCING THROUGH INSTRUCTION UNCERTAINTY

Publication number: 20130205118

Abstract: A computer system for instruction execution includes a processor having a pipeline. The system is configured to perform a method including fetching, in the pipeline, a plurality of instructions, wherein the plurality of instructions includes a plurality of branch instructions, for each of the plurality of branch instructions, assigning a branch uncertainty to each of the plurality of branch instructions, for each of the plurality of instructions, assigning an instruction uncertainty that is a summation of branch uncertainties of older unresolved branches and balancing the instructions, based on a current summation of instruction uncertainty, in the pipeline.

Type: Application

Filed: February 6, 2012

Publication date: August 8, 2013

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Alper Buyuktosunoglu, Brian R. Prasky, Vijayalakshmi Srinivasan
Method and System for Resolving Thread Divergences

Publication number: 20130179662

Abstract: An address divergence unit detects divergence between threads in a thread group and then separates those threads into a subset of non-divergent threads and a subset of divergent threads. In one embodiment, the address divergence unit causes instructions associated with the subset of non-divergent threads to be issued for execution on a parallel processing unit, while causing the instructions associated with the subset of divergent threads to be re-fetched and re-issued for execution.

Type: Application

Filed: January 11, 2012

Publication date: July 11, 2013

Inventors: Jack CHOQUETTE, Xiaogang Qiu, Jeff Tuckey, Michael (Ming Yiu) Siu, Robert J. Stoll, Olivier Giroux
METHODS AND APPARATUS FOR SCHEDULING INSTRUCTIONS USING PRE-DECODE DATA

Publication number: 20130166881

Abstract: Systems and methods for scheduling instructions using pre-decode data corresponding to each instruction. In one embodiment, a multi-core processor includes a scheduling unit in each core for selecting instructions from two or more threads each scheduling cycle for execution on that particular core. As threads are scheduled for execution on the core, instructions from the threads are fetched into a buffer without being decoded. The pre-decode data is determined by a compiler and is extracted by the scheduling unit during runtime and used to control selection of threads for execution. The pre-decode data may specify a number of scheduling cycles to wait before scheduling the instruction. The pre-decode data may also specify a scheduling priority for the instruction. Once the scheduling unit selects an instruction to issue for execution, a decode unit fully decodes the instruction.

Type: Application

Filed: December 21, 2011

Publication date: June 27, 2013

Inventors: Jack Hilaire CHOQUETTE, Robert J. Stoll, Olivier Giroux
METHODS AND APPARATUS FOR SCHEDULING INSTRUCTIONS WITHOUT INSTRUCTION DECODE

Publication number: 20130166882

Abstract: Systems and methods for scheduling instructions without instruction decode. In one embodiment, a multi-core processor includes a scheduling unit in each core for scheduling instructions from two or more threads scheduled for execution on that particular core. As threads are scheduled for execution on the core, instructions from the threads are fetched into a buffer without being decoded. The scheduling unit includes a macro-scheduler unit for performing a priority sort of the two or more threads and a micro-scheduler arbiter for determining the highest order thread that is ready to execute. The macro-scheduler unit and the micro-scheduler arbiter use pre-decode data to implement the scheduling algorithm. The pre-decode data may be generated by decoding only a small portion of the instruction or received along with the instruction. Once the micro-scheduler arbiter has selected an instruction to dispatch to the execution unit, a decode unit fully decodes the instruction.

Type: Application

Filed: December 22, 2011

Publication date: June 27, 2013

Inventors: Jack Hilaire CHOQUETTE, Robert J. STOLL, Olivier GIROUX, Michael FETTERMAN, Shirish GADRE, Robert Steven GLANVILLE, Alexandre JOLY
DELAY IDENTIFICATION IN DATA PROCESSING SYSTEMS

Publication number: 20130151816

Abstract: Methods, systems, and computer program products may provide delay-identification in data processing systems. An apparatus may include a delay-identification unit having a delay counter, a threshold register, a delay register, and a delay detector. The delay detector may be configured to start the delay counter in response to detecting that one group of instructions is delayed, and stop the delay counter in response to detecting that the one group of instructions is no longer delayed. The delay detector may additionally be configured to compare the number of cycles counted by the delay counter with a threshold number of cycles in the threshold register, and store at least one effective address of one of the instructions of the one group of instructions when the number of cycles counted by the delay counter is greater than the threshold number of cycles stored in the threshold register.

Type: Application

Filed: December 7, 2011

Publication date: June 13, 2013

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Venkat R. Indukuru, Alexander E. Mericas
Accessing memory with identical instruction types and central processing unit thereof

Patent number: 8458414

Abstract: A memory accessing method including the following steps is provided. Firstly, two instructions are fetched. Next, the two instructions are respectively decoded to obtain two operation fields and two address fields. The two operation fields indicate the type of operation in accessing the memory. One of the address fields includes a first upper address corresponding to the first memory block and a first lower address corresponding to a first memory unit of the first memory block. The other one of the two address fields includes a second upper address corresponding to the second memory block and a second lower address corresponding to a second memory unit of the second memory block. Then, whether two instructions are performing the same type of operation on the same memory block is determined. If yes, the type of operation indicated by the two operation fields is performed on the corresponding memory block parallelly.

Type: Grant

Filed: April 8, 2010

Date of Patent: June 4, 2013

Assignee: Realtek Semiconductor Corporation

Inventors: Sheng-Yuan Jan, Yen-Ju Lu
Method and apparatus for efficient helper thread state initialization using inter-thread register copy

Patent number: 8453161

Abstract: This disclosure describes a method and system that may enable fast, hardware-assisted, producer-consumer style communication of values between threads. The method, in one aspect, uses a dedicated hardware buffer as an intermediary storage for transferring values from registers in one thread to registers in another thread. The method may provide a generic, programmable solution that can transfer any subset of register values between threads in any given order, where the source and target registers may or may not be correlated. The method also may allow for determinate access times, since it completely bypasses the memory hierarchy. Also, the method is designed to be lightweight, focusing on communication, and keeping synchronization facilities orthogonal to the communication mechanism. It may be used by a helper thread that performs data prefetching for an application thread, for example, to initialize the upward-exposed reads in the address computation slice of the helper thread code.

Type: Grant

Filed: May 25, 2010

Date of Patent: May 28, 2013

Assignee: International Business Machines Corporation

Inventors: Michael K. Gschwind, John K. O'Brien, Valentina Salapura, Zehra N. Sura
Unordered load/store queue

Patent number: 8447911

Abstract: A method and processor for providing full load/store queue functionality to an unordered load/store queue for a processor with out-of-order execution. Load and store instructions are inserted in a load/store queue in execution order. Each entry in the load/store queue includes an identification corresponding to a program order. Conflict detection in such an unordered load/store queue may be performed by searching a first CAM for all addresses that are the same or overlap with the address of the load or store instruction to be executed. A further search may be performed in a second CAM to identify those entries that are associated with younger or older instructions with respect to the sequence number of the load or store instruction to be executed. The output results of the Address CAM and Age CAM are logically ANDed.

Type: Grant

Filed: July 2, 2008

Date of Patent: May 21, 2013

Assignee: Board of Regents, University of Texas System

Inventors: Douglas C. Burger, Stephen W. Keckler, Robert McDonald, Lakshminarasimhan Sethumadhavan, Franziska Roesner
RECONFIGURABLE INSTRUCTION ENCODING METHOD AND PROCESSOR ARCHITECTURE

Publication number: 20130117536

Abstract: A reconfigurable instruction encoding method includes the followings. An instruction distribution of an application is counted, and multiple instruction pairs with higher utilization rates are accordingly found. Multiple instructions of the instruction pairs are duplicately encoded according to multiple reserved sections of an original instruction table, so that the instructions have corresponding reconfigured codes and a reconfigured instruction table extended from the original instruction table and including the reconfigured codes is obtained. A compiler is utilized to generate multiple machine codes according to the reconfigured instruction table and consecutive execution instructions. Hamming distance of the machine codes corresponding to the reconfigured instruction table and the execution instructions are not longer than Hamming distance of the machine codes generated according to the original instruction table and the execution instructions.

Type: Application

Filed: April 17, 2012

Publication date: May 9, 2013

Applicant: INDUSTRIAL TECHNOLOGY RESEARCH INSTITUTE

Inventors: Huang-Lun Lin, Ching-Hsiang Chuang, Shui-An Wen
Selective Writing of Branch Target Buffer

Publication number: 20130117535

Abstract: A method includes executing a branch instruction and determining if a branch is taken. The method further includes evaluating a number of instructions associated with the branch instruction. Upon determining that the branch is taken, the method includes selectively writing an entry into a branch target buffer that corresponds to the taken branch responsive to determining that the number of instructions is less than a threshold.

Type: Application

Filed: November 4, 2011

Publication date: May 9, 2013

Applicant: QUALCOMM INCORPORATED

Inventors: Suresh K. Venkumahanti, Lucian Codrescu, Suman Mamidi
Rolling texture context data structure for maintaining texture data in a multithreaded image processing pipeline

Patent number: 8405670

Abstract: A multithreaded rendering software pipeline architecture utilizes a rolling texture context data structure to store multiple texture contexts that are associated with different textures that are being processed in the software pipeline. Each texture context stores state data for a particular texture, and facilitates the access to texture data by multiple, parallel stages in a software pipeline. In addition, texture contexts are capable of being “rolled”, or copied to enable different stages of a rendering pipeline that require different state data for a particular texture to separately access the texture data independently from one another, and without the necessity for stalling the pipeline to ensure synchronization of shared texture data among the stages of the pipeline.

Type: Grant

Filed: May 25, 2010

Date of Patent: March 26, 2013

Assignee: International Business Machines Corporation

Inventors: Eric O. Mejdrich, Paul E. Schardt, Robert A. Shearer
Method and system for enhancing computer processing performance

Patent number: 8387053

Abstract: A method of performing operations in a computer system, computer system, and related method of compilation, are disclosed. In one embodiment, the method of performing includes providing compiled code having at least one thread, where each of the at least one thread includes a respective plurality of blocks and each respective block includes a respective pre-fetch component and a respective execute component. The method also includes performing a first pre-fetch component from a first block of a first thread of the at least one thread, performing a first additional component after the first pre-fetch component has been performed, and performing a first execute component from the first block of the first thread. The first execute component is performed after the first additional component has been performed, and the first additional component is from either a second thread or another block of the first thread that is not the first block.

Type: Grant

Filed: January 25, 2007

Date of Patent: February 26, 2013

Assignee: Hewlett-Packard Development Company, L.P.

Inventors: Blaine D. Gaither, Verna Knapp, Jerome Huck, Benjamin D. Osecky
Method and computer program product for loading and executing program code at micro-processor

Patent number: 8362880

Abstract: A method and apparatus for loading and executing program code at a micro-processor are disclosed. In this method, a monitoring procedure is performed to monitor whether the micro-processor receives a loading request corresponding to a target program code. If the loading request is received, the target program code is loaded from an external memory into an internal memory of the micro-processor. The micro-processor is then rebooted to enter a first mode in which the target program code in the internal memory is to be executed.

Type: Grant

Filed: April 27, 2009

Date of Patent: January 29, 2013

Assignee: MStar Semiconductor, Inc.

Inventors: Chih-Hua Huang, Chih Yen Chang
Method and apparatus to improve execution of a stored program

Patent number: 8346760

Abstract: In one embodiment, the invention provides a method comprising determining metadata encoded in instructions of a stored program; and executing the stored program based on the metadata.

Type: Grant

Filed: May 20, 2009

Date of Patent: January 1, 2013

Assignee: Intel Corporation

Inventors: Hong Wang, John Shen, Ali-Reza Adl-Tabatabai, Anwar Ghuloum
PROCESSING INSTRUCTION GROUPING INFORMATION

Publication number: 20120297168

Abstract: Processing instruction grouping information is provided that includes: reading addresses of machine instructions grouped by a processor at runtime from a buffer to form an address file; analyzing the address file to obtain grouping information of the machine instructions; converting the machine instructions in the address file into readable instructions; and obtaining grouping information of the readable instructions based on the grouping information of the machine instructions and the readable instructions resulted from conversion. Status of grouping and processing performed on instructions by a processor at runtime can be acquired dynamically, such that processing capability of the processor can be better utilized.

Type: Application

Filed: April 26, 2012

Publication date: November 22, 2012

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Qin Yue Chen, Qi Liang, Hong Chang Lin, Feng Liu
MULTI-THREADED INSTRUCTION BUFFER DESIGN

Publication number: 20120233441

Abstract: An instruction buffer for a processor configured to execute multiple threads is disclosed. The instruction buffer is configured to receive instructions from a fetch unit and provide instructions to a selection unit. The instruction buffer includes one or more memory arrays comprising a plurality of entries configured to store instructions and/or other information (e.g., program counter addresses). One or more indicators are maintained by the processor and correspond to the plurality of threads. The one or more indicators are usable such that for instructions received by the instruction buffer, one or more of the plurality entries of a memory array can be determined as a write destination for the received instructions, and for instructions to be read from the instruction buffer (and sent to a selection unit), one or more entries can be determined as the correct source location from which to read.

Type: Application

Filed: March 7, 2011

Publication date: September 13, 2012

Inventors: Jama I. Barreh, Robert T. Golla, Manish K. Shah
Cache control device and control method

Patent number: 8261021

Abstract: In order to control an access request to the cache shared between a plurality of threads, a storage unit for storing a flag provided in association with each of the threads is included. If the threads enter the execution of an atomic instruction, a defined value is written to the flags stored in the storage unit. Furthermore, if the atomic instruction is completed, a defined value different from the above defined value is written, thereby displaying whether or not the threads are executing the atomic instruction. If an access request is issued from a certain thread, it is judged whether or not a thread different from the certain thread is executing the atomic instruction by referencing the flag values in the storage unit. If it is judged that another thread is executing the atomic instruction, the access request is kept standby. This makes it possible to realize the exclusive control processing necessary for processing the atomic instruction according to simple configuration.

Type: Grant

Filed: December 17, 2009

Date of Patent: September 4, 2012

Assignee: Fujitsu Limited

Inventor: Naohiro Kiyota
GUEST INSTRUCTION BLOCK WITH NEAR BRANCHING AND FAR BRANCHING SEQUENCE CONSTRUCTION TO NATIVE INSTRUCTION BLOCK

Publication number: 20120198209

Abstract: A method for translating instructions for a processor. The method includes accessing a plurality of guest instructions that comprise multiple guest branch instructions comprising at least one guest far branch, and building an instruction sequence from the plurality of guest instructions by using branch prediction on the at least one guest far branch. The method further includes assembling a guest instruction block from the instruction sequence. The guest instruction block is translated to a corresponding native conversion block, wherein an at least one native far branch that corresponds to the at least one guest far branch and wherein the at least one native far branch includes an opposite guest address for an opposing branch path of the at least one guest far branch. Upon encountering a missprediction, a correct instruction sequence is obtained by accessing the opposite guest address.

Type: Application

Filed: January 27, 2012

Publication date: August 2, 2012

Applicant: SOFT MACHINES, INC.

Inventor: Mohammad Abdallah
Apparatus and method for scheduling threads in multi-threading processors

Patent number: 8205204

Abstract: An multi-threading processor is provided. The multi-threading processor includes a first instruction fetch unit to receive a first thread and a second instruction fetch unit to receive a second thread. A multi-thread scheduler coupled to the instruction fetch units and a execution unit. The multi-thread scheduler determines the width of the execution unit and the execution unit executes the threads accordingly.

Type: Grant

Filed: January 23, 2009

Date of Patent: June 19, 2012

Assignee: Intel Corporation

Inventors: Ken Shoemaker, Sailesh Kottapalli, Kin-Kee Sit
PROCESSOR REGISTER RECOVERY AFTER FLUSH OPERATION

Publication number: 20120144164

Abstract: An information handling system includes a processor that may perform general purpose register recovery operations after an instruction flush operation that an exception, such as a branch misprediction causes. The processor receives an instruction stream that may include multiple instructions that operate on a particular target register that stores instruction result information. The general purpose register may temporarily store instruction opcode and register bits information for use during dispatch, execution and other operations. The processor includes a recovery buffer unit for use during flush recovery operations. The processor may use recovery valid and recovery pending bits that correspond with each instruction during the register recovery from flush operation.

Type: Application

Filed: February 14, 2012

Publication date: June 7, 2012

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventor: Dung Quoc Nguyen
Image forming apparatus and management system utilizing counter and job log information for usage tracking

Patent number: 8179540

Abstract: An image forming apparatus is provided that holds counter information obtained by integrating a consumption of a consumable that depends on usage of service provided by the image forming apparatus. A log corresponding to the usage of the service is set in job log information with a synchronization flag set off. The log in the job log information, for which the synchronization flag is set off, is set on. The counter information and the job log information are output after the synchronization flag for the log having the synchronization flag set off has been set on.

Type: Grant

Filed: October 29, 2008

Date of Patent: May 15, 2012

Assignee: Canon Kabushiki Kaisha

Inventors: Junichi Hiruma, Nobuyuki Tonegawa
Network processors and pipeline optimization methods

Patent number: 8179896

Abstract: A network processor of an embodiment includes a packet classification engine, a processing pipeline, and a controller. The packet classification engine allows for classifying each of a plurality of packets according to packet type. The processing pipeline has a plurality of stages for processing each of the plurality of packets in a pipelined manner, where each stage includes one or more processors. The controller allows for providing the plurality of packets to the processing pipeline in an order that is based at least partially on: (i) packet types of the plurality of packets as classified by the packet classification engine and (ii) estimates of processing times for processing packets of the packet types at each stage of the plurality of stages of the processing pipeline. A method in a network processor allows for prefetching instructions into a cache for processing a packet based on a packet type of the packet.

Type: Grant

Filed: November 7, 2007

Date of Patent: May 15, 2012

Inventor: Justin Mark Sobaje
TRANSLATED MEMORY PROTECTION APPARATUS FOR AN ADVANCED MICROPROCESSOR

Publication number: 20120110306

Abstract: A method of responding to an attempt to write a memory address including a target instruction which has been translated to a host instruction for execution by a host processor including the steps of marking a memory address including a target instruction which has been translated to a host instruction, detecting a memory address which has been marked when an attempt is made to write to the memory address, and responding to the detection of a memory address which has been marked by protecting a target instruction at the memory address until it has been assured that translations associated with the memory address will not be utilized before being updated.

Type: Application

Filed: September 23, 2011

Publication date: May 3, 2012

Inventors: Edmund J. Kelly, Robert F. Cmelik, Malcolm J. Wing
Fetching all or portion of instructions in memory line up to branch instruction based on branch prediction and size indicator stored in branch target buffer indexed by fetch address

Patent number: 8171260

Abstract: The invention provides a method and apparatus for branch prediction in a processor. A fetch-block branch target buffer is used in an early stage of pipeline processing before the instruction is decoded, which stores information about a control transfer instruction for a “block” of instruction memory. The block of instruction memory is represented by a block entry in the fetch-block branch target buffer. The block entry represents one recorded control-transfer instruction (such as a branch instruction) and a set of sequentially preceding instructions, up to a fixed maximum length N. Indexing into the fetch-block branch target buffer yields an answer whether the block entry represents memory that contains a previously executed a control-transfer instruction, a length value representing the amount of memory that contains the instructions represented by the block, and an indicator for the type of control-transfer instruction that terminates the block, its target and outcome.

Type: Grant

Filed: June 23, 2009

Date of Patent: May 1, 2012

Assignee: STMicroelectronics, Inc.

Inventors: Anatoly Gelman, Russell Lawrence Schnapp
Memory control apparatus, memory control method and information processing system

Patent number: 8166259

Abstract: A memory control apparatus, a memory control method and an information processing system are disclosed. Fetch response data retrieved from a main storage unit is received, while bypassing a storage unit, by a first port in which the received fetch response data can be set. The fetch response data retrieved from the main storage unit, if unable to be set in the first port, is set in a second port through the storage unit. A transmission control unit performs priority control operation to send out, in accordance with a predetermined priority, the fetch response data set in the first port or the second port to the processor. As a result, the latency is shortened from the time when the fetch response data arrives to the time when the fetch response data is sent out toward the processor in response to a fetch request from the processor.

Type: Grant

Filed: March 26, 2009

Date of Patent: April 24, 2012

Assignee: Fujitsu Limited

Inventor: Souta Kusachi
Apparatus for randomizing instruction thread interleaving in a multi-thread processor

Patent number: 8145885

Abstract: A processor interleaves instructions according to a priority rule which determines the frequency with which instructions from each respective thread are selected and added to an interleaved stream of instructions to be processed in the data processor. The frequency with which each thread is selected according to the rule may be based on the priorities assigned to the instruction threads. A randomization is inserted into the interleaving process so that the selection of an instruction thread during any particular clock cycle is not based solely by the priority rule, but is also based in part on a random or pseudo random element. This randomization is inserted into the instruction thread selection process so as to vary the order in which instructions are selected from the various instruction threads while preserving the overall frequency of thread selection (i.e. how often threads are selected) set by the priority rule.

Type: Grant

Filed: April 30, 2008

Date of Patent: March 27, 2012

Assignee: International Business Machines Corporation

Inventors: Ronald Nick Kalla, Minh Michelle Quy Pham, Balaram Sinharoy, John Wesley Ward, III
Processor

Publication number: 20120066480

Abstract: A processor includes: an instruction fetch portion configured to fetch simultaneously a plurality of fixed-length instructions in accordance with a program counter; an instruction predecoder configured to predecode specific fields in a part of the plurality of fixed-length instructions; and a program counter management portion configured to control an increment of the program counter in accordance with a result of the predecoding.

Type: Application

Filed: July 22, 2011

Publication date: March 15, 2012

Applicant: Sony Corporation

Inventors: Hirokazu Hanaki, Satoshi Takashima
PROCESSOR

Publication number: 20120060017

Abstract: A processor including L computing units, L being an integer of 2 or greater, the processor comprising: an instruction buffer including M×Z instruction storage areas each storing one instruction, M instruction streams being input in a state of being distinguished from each other, each of the M instruction streams including Z instructions, M and Z each being an integer of 2 or greater, M×Z being equal to or greater than L; an order information holding unit holding order information that indicates an order of the M×Z instruction storage areas; an extraction unit operable to extract instructions from the M×Z instruction storage areas; and a control unit operable to cause the extraction unit to extract L instructions in executable state from the M×Z instruction storage areas in accordance with the order indicated by the order information, and input the instructions into different ones of the L computing units.

Type: Application

Filed: May 18, 2010

Publication date: March 8, 2012

Inventor: Hiroyuki Morishita
Instruction pointers in very long instruction words

Patent number: 8095775

Abstract: During operation of a VLIW processor, a very long instruction word is fetched. A portion of the very long instruction word that includes a pointer to an instruction is identified, and the instruction pointed to by the pointer is retrieved from a location of an instruction window. The retrieved instruction is input into a functional unit for execution.

Type: Grant

Filed: November 19, 2008

Date of Patent: January 10, 2012

Assignee: Marvell International Ltd.

Inventors: Moinul H. Khan, Anitha Kona, Mark N. Fullerton
Multi-core stream processor having (N) processing units and (N+1) fetching units

Patent number: 8037283

Abstract: In a multi-core stream processing system and scheduling method of the same, a scheduler is coupled to a number (N) of stream processing units and a number (N+1) of stream fetching units, where N?2. When the scheduler receives a stream element from a Pth stream fetching unit, the scheduler assigns a Pth stream processing unit as a target stream processing unit when the Pth stream processing unit does not encounter a bottleneck condition, assigns a Qth stream processing unit, which does not encounter the bottleneck condition, as the target stream processing unit when the Pth stream processing unit encounters the bottleneck condition, where 1?P?N, 1?Q?N, and P?Q, and dispatches the received stream element to the target stream processing unit such that the target stream processing unit processes the stream element dispatched from the scheduler.

Type: Grant

Filed: May 5, 2009

Date of Patent: October 11, 2011

Assignee: National Taiwan University

Inventors: You-Ming Tsao, Liang-Gee Chen, Shao-Yi Chien
Methods and Apparatus for Storing Expanded Width Instructions in a VLIW Memory for Deferred Execution

Publication number: 20110225396

Abstract: Techniques are described for decoupling fetching of an instruction stored in a main program memory from earliest execution of the instruction. An indirect execution method and program instructions to support such execution are addressed. In addition, an improved indirect deferred execution processor (DXP) VLIW architecture is described which supports a scalable array of memory centric processor elements that do not require local load and store units.

Type: Application

Filed: May 18, 2011

Publication date: September 15, 2011

Applicant: ALTERA CORPORATION

Inventors: Gerald George Pechanek, Stamatis Vassiliadis
Simultaneous multiple thread processor increasing number of instructions issued for thread detected to be processing loop

Patent number: 8015391

Abstract: A processor simultaneously issues instructions to multiple threads in a same instruction execution cycle. An instruction issuer controls issuance of an instruction for each of the multiple threads. A detector detects, for each of the multiple threads, whether a loop processing is currently being executed. A unit causes the instruction issuer to increase a number of instructions to be issued when the detector detects that the loop processing is currently being executed.

Type: Grant

Filed: October 8, 2010

Date of Patent: September 6, 2011

Assignee: Panasonic Corporation

Inventor: Takenobu Tani
Prefetch processing apparatus, prefetch processing method, storage medium storing prefetch processing program

Patent number: 8006041

Abstract: A prefetch processing apparatus includes a central-processing-unit monitor unit that monitors processing states of the central processing unit in association with time elapsed from start time of executing a program. A cache-miss-data address obtaining unit obtains cache-miss-data addresses in association with the time elapsed from the start time of executing the program, and a cycle determining unit determines a cycle of time required for executing the program. An identifying unit identifies a prefetch position in a cycle in which a prefetch-target address is to be prefetched by associating the cycle determined by the cycle determining unit with the cache-miss data addresses obtained by the cache-miss-data address obtaining unit. The prefetch-target address is an address of data on which prefetch processing is to be performed.

Type: Grant

Filed: March 5, 2008

Date of Patent: August 23, 2011

Assignee: Fujitsu Limited

Inventors: Shuji Yamamura, Takashi Aoki
Method, system, and computer program product for out of order instruction address stride prefetch performance verification

Patent number: 7996203

Abstract: A method, system, and computer program product are provided for verifying out of order instruction address (IA) stride prefetch performance in a processor design having more than one level of cache hierarchies. Multiple instruction streams are generated and the instructions loop back to corresponding instruction addresses. The multiple instruction streams are dispatched to a processor and simulation application to process. When a particular instruction is being dispatched, the particular instruction's instruction address and operand address are recorded in the queue. The processor is monitored to determine if the processor executes fetch and prefetch commands in accordance with the simulation application. It is checked to determine if prefetch commands are issued for instructions having three or more strides.

Type: Grant

Filed: January 31, 2008

Date of Patent: August 9, 2011

Assignee: International Business Machines Corporation

Inventors: Wei-Yi Xiao, Dean G. Bair, Christopher A. Krygowski, Chung-Lung K. Shum

prev 1 2 3 4 5 6 … next