Loop Execution Patents (Class 712/241)

Command supply device that supplies a command read out from a main memory to a central processing unit

Patent number: 7822949

Abstract: A command supply device supplies a command sequence that forms a loop. A loop command buffer accumulates a first partial command sequence. The first partial command sequence is a head part of a first command sequence repeatedly supplied to a CPU from among command sequences stored in a main memory, and is accumulated before the first command sequence is supplied to the CPU again. A linking command buffer accumulates a second partial command sequence. The second partial command sequence follows the first partial command sequence in the first command sequence, and is accumulated while the accumulated first partial command sequence in the loop command buffer is supplied to the CPU. A selection circuit supplies, to the CPU, a command from the accumulated second partial command sequence in the linking command buffer when the entirety of the first partial command sequence has been supplied to the CPU.

Type: Grant

Filed: May 9, 2005

Date of Patent: October 26, 2010

Assignee: Panasonic Corporation

Inventor: Satoshi Ogura
METHOD AND SYSTEM FOR DATA PREFETCHING FOR LOOPS BASED ON LINEAR INDUCTION EXPRESSIONS

Publication number: 20100250854

Abstract: An efficient and effective compiler data prefetching technique is disclosed in which memory accesses may be prefetched are represented in linear induction expressions. Furthermore, indirect memory accesses indexed by other memory accesses of linear induction expressions in scalar loops may be prefetched.

Type: Application

Filed: March 16, 2010

Publication date: September 30, 2010

Inventor: Dz-ching Ju
MACROSCALAR PROCESSOR ARCHITECTURE

Publication number: 20100235612

Abstract: A macroscalar processor architecture is described herein. In one embodiment, an exemplary processor includes one or more execution units to execute instructions and one or more iteration units coupled to the execution units. The one or more iteration units receive one or more primary instructions of a program loop that comprise a machine executable program. For each of the primary instructions received, at least one of the iteration units generates multiple secondary instructions that correspond to multiple loop iterations of the task of the respective primary instruction when executed by the one or more execution units. Other methods and apparatuses are also described.

Type: Application

Filed: May 26, 2010

Publication date: September 16, 2010

Inventor: Jeffry E. Gonion
Estimating a dominant resource used by a computer program

Patent number: 7797692

Abstract: A system that estimates a dominant computational resource which is used by a computer program. During operation, for each basic block in the computer program, the system determines a nesting level for the basic block. Next, the system selects basic blocks with nesting levels greater than a specified threshold. For each selected basic block, the system analyzes the basic block to estimate the dominant computational resource used by the basic block. The system then uses the estimated dominant computational resources for the selected basic blocks to estimate the dominant computational resource for the computer program.

Type: Grant

Filed: May 12, 2006

Date of Patent: September 14, 2010

Assignee: Google Inc.

Inventor: Grzegorz J. Czajkowski
Mechanism for Efficient Implementation of Software Pipelined Loops in VLIW Processors

Publication number: 20100211762

Abstract: A system to implement a zero overhead software pipelined (SFP) loop includes a Very Long Instruction Word (VLIW) processor having an N number of execution slots. The VLIW processor executes a plurality of instructions in parallel without any limitation of an instruction buffer size. A program memory receives a Program Memory address to fetch an instruction packet. The program memory is closely coupled with the instruction buffer size to implement the zero overhead software pipelined (SFP) loop. The size of the zero overhead software pipelined (SFP) loop can exceed the instruction buffer size. A CPU control register includes a block count and an iteration count. The block count is loaded into a block counter and counts the plurality of instructions executed in the SFP loop, and the iteration count is loaded into an iteration counter and counts a number of iterations of the SFP loop based on the block count.

Type: Application

Filed: February 18, 2010

Publication date: August 19, 2010

Applicant: SAANKHYA LABS PVT LTD

Inventors: Anindya Saha, Manish Kumar, Hemant Mallapur, Santhosh Billava, Viji Rajangam
Method and system for performing reassociation in software loops

Patent number: 7774766

Abstract: Various embodiments of the present invention relate to methods and systems for optimizing an intermediate code in a compilation logic. The intermediate code is optimized by performing reassociation in software loops. The intermediate code includes at least one critical recurrence cycle. The performance of reassociation in software loops can reduce a critical recurrence cycle in them, which can speed up their execution. The subject method can include the determination of one or more critical recurrence cycles in a software loop. The method can also include the determination of at least one edge in a critical recurrence cycle, with respect to which reassociation can be performed, if one or more pre-determined criteria are met. The method can further include performing reassociation of a dependee and a dependent of an edge. In an embodiment, when one or more pre-determined criteria are met, the logic of the software loop is maintained after performing reassociation of the dependee and the dependent of the edge.

Type: Grant

Filed: September 29, 2005

Date of Patent: August 10, 2010

Assignee: Intel Corporation

Inventors: Kalyan Muthukumar, Daniel M Lavery
COMPUTING APPARATUS AND METHOD OF HANDLING INTERRUPT

Publication number: 20100199076

Abstract: A computing apparatus and method of handling an interrupt are provided. The computing apparatus includes a coarse-grained array, a host processor, and an interrupt supervisor. When an interrupt occurs in the coarse-grained array while performing a loop operation, the host processor processes the interrupt, and the interrupt supervisor may perform mode switching between the coarse-grained array and the host processor.

Type: Application

Filed: December 16, 2009

Publication date: August 5, 2010

Inventors: Dong-hoon YOO, Soo-jung Ryu, Yeon-gon Cho, Bernhard Egger, Il-hyun Park
APPARATUS AND METHOD FOR SCHEDULING INSTRUCTION

Publication number: 20100185839

Abstract: An apparatus and method for scheduling an instruction are provided. The apparatus includes an analyzer configured to analyze dependency of a plurality of recurrence loops and a scheduler configured to schedule the recurrence loops based the analyzed dependencies. When scheduling a plurality of recurrence loops, the apparatus first schedules a dominant loop whose loop head has no dependency on another loop among the recurrence loops.

Type: Application

Filed: November 2, 2009

Publication date: July 22, 2010

Inventors: Tae-wook OH, Won-sub Kim, Bernhard Egger
Enhancing processing efficiency in large instruction width processors

Publication number: 20100180102

Abstract: A processor includes one or more processing units, an execution pipeline and control circuitry. The execution pipeline includes at least first and second pipeline stages that are cascaded so that program instructions, specifying operations to be performed by the processing units in successive cycles of the pipeline, are fetched from a memory by the first pipeline stage and conveyed to the second pipeline stage, which causes the processing units to perform the specified operations. The control circuitry is coupled, upon determining that a program instruction that is present in the second pipeline stage in a first cycle of the pipeline is to be executed again in a subsequent cycle of the pipeline, to cause the execution pipeline to reuse the program instruction in one of the pipeline stages without re-fetching the program instruction from the memory.

Type: Application

Filed: January 15, 2009

Publication date: July 15, 2010

Applicant: ALTAIR SEMICONDUCTORS

Inventors: Edan Almog, Nohik Semel, Yigal Bitran, Nadav Cohen, Yoel Livne, Eli Zyss
Pre-decoding bytecode prefixes selectively incrementing stack machine program counter

Patent number: 7757067

Abstract: A processor (e.g., a co-processor) comprising a decoder coupled to a pre-decoder, in which the decoder decodes a current instruction in parallel with the pre-decoder pre-decoding a subsequent instruction. In particular, the pre-decoder examines at least five Bytecodes in parallel with the decoder decoding a current instruction. The pre-decoder determines if a subsequent instruction contains a prefix. If a prefix is detected in at least one of the five Bytecodes, a program counter skips the prefix and changes the behavior of the decoder during the decoding of the subsequent instruction.

Type: Grant

Filed: July 31, 2003

Date of Patent: July 13, 2010

Assignee: Texas Instruments Incorporated

Inventors: Gerard Chauvel, Serge Lasserre, Maija Kuusela
COMPILER APPARATUS WITH FLEXIBLE OPTIMIZATION

Publication number: 20100175056

Abstract: A compiler comprises an analysis unit that detects directives (options and pragmas) from a user to the compiler, an optimization unit that is made up of a processing unit (a global region allocation unit, a software pipelining unit, a loop unrolling unit, a “if” conversion unit, and a pair instruction generation unit) that performs individual optimization processing designated by options and pragmas from a user, following the directives and the like from the analysis unit, etc. The global region allocation unit performs optimization processing, following designation of the maximum data size of variables to be allocated to a global region, designation of variables to be allocated to the global region, and options and pragmas regarding designation of variables not to be allocated in the global region.

Type: Application

Filed: February 16, 2010

Publication date: July 8, 2010

Inventors: Hajime OGAWA, Taketo Heishi, Toshiyuki Sakata, Shuichi Takayama, Shohei Michimoto, Tomoo Hamada, Ryoko Miyachi
Data-Processing Unit for Nested-Loop Instructions

Publication number: 20100169612

Abstract: A data-processing unit has a fetching circuitry (20) and execution circuitry (30a, 30b). The data-processing unit has an instruction set comprising a nested-loop instruction. The fetching circuitry is arranged to fetch the nested-loop instruction, and the execution circuitry is arranged to execute the nested-loop instruction. The nested-loop instruction comprises at least one instruction field that is adapted to indicate a number of iterations of an outer loop of the nested loop and one or more operations to be performed by the outer loop. Moreover, the at least one instruction field is further adapted to indicate a number of iterations of an inner loop of the nested loop and one or more operations to be performed by the inner loop. A method for fetching, decoding, and executing the nested-loop instruction is also described as well as the structure of the nested-loop instruction.

Type: Application

Filed: June 25, 2008

Publication date: July 1, 2010

Applicant: TELEFONAKTIEBOLAGET L M ERICSSON (PUBL)

Inventors: Per Persson, Harald Gustafsson
APPARATUS AND METHOD FOR DATA PROCESS

Publication number: 20100153688

Abstract: An exemplary aspect of the present invention is a data processing apparatus for processing a loop in a pipeline that includes an instruction memory and a fetch circuit that fetches an instruction stored in the instruction memory. The fetch circuit includes an instruction queue that stores an instruction to be output from the fetch circuit, an evacuation queue that stores an instruction fetched from the instruction memory, a selector that selects one of the instruction output from the instruction queue and the instruction output from the evacuation queue, and a loop queue that stores the instruction selected by the selector and outputs to the instruction queue.

Type: Application

Filed: December 11, 2009

Publication date: June 17, 2010

Applicant: NEC Electronics Corporation

Inventor: Satoshi CHIBA
Macroscalar processor architecture

Patent number: 7739442

Abstract: A macroscalar processor architecture is described herein. In one embodiment, an exemplary processor includes one or more execution units to execute instructions and one or more iteration units coupled to the execution units. The one or more iteration units receive one or more primary instructions of a program loop that comprise a machine executable program. For each of the primary instructions received, at least one of the iteration units generates multiple secondary instructions that correspond to multiple loop iterations of the task of the respective primary instruction when executed by the one or more execution units. Other methods and apparatuses are also described.

Type: Grant

Filed: May 23, 2008

Date of Patent: June 15, 2010

Assignee: Apple Inc.

Inventor: Jeffry E. Gonion
Method and apparatus for modulo scheduled loop execution in a processor architecture

Patent number: 7725696

Abstract: A processor method and apparatus that allows for the overlapped execution of multiple iterations of a loop while allowing the compiler to include only a single copy of the loop body in the code while automatically managing which iterations are active. Since the prologue and epilogue are implicitly created and maintained within the hardware in the invention, a significant reduction in code size can be achieved compared to software-only modulo scheduling. Furthermore, loops with iteration counts less than the number of concurrent iterations present in the kernel are also automatically handled. This hardware enhanced scheme achieves the same performance as the fully-specified standard method. Furthermore, the hardware reduces the power requirement as the entire fetch unit can be deactivated for a portion of the loop's execution.

Type: Grant

Filed: October 4, 2007

Date of Patent: May 25, 2010

Inventors: Wen-mei W. Hwu, Matthew C. Merten
Faceplate including wireless LAN communications

Patent number: 7719825

Abstract: A faceplate for a housing of a computing device including a processor includes a bezel with interior and exterior surfaces. The bezel removeably covers at least a portion of an opening defined by the housing. A wireless local area network (LAN) unit is mechanically coupled to the interior surface of the bezel and comprises a data interface that enables data transfer between the wireless LAN unit and the processor.

Type: Grant

Filed: August 28, 2008

Date of Patent: May 18, 2010

Assignee: Marvell International Ltd.

Inventors: Joseph Knapp, George Chien
INSTRUCTION METHOD FOR FACILITATING EFFICIENT CODING AND INSTRUCTION FETCH OF LOOP CONSTRUCT

Publication number: 20100122066

Abstract: Instruction set techniques have been developed to identify explicitly the beginning of a loop body and to code a conditional loop-end in ways that allow a processor implementation to efficiently manage an instruction fetch buffer and/or entries in an instruction cache. In particular, for some computations and processor implementations, a machine instruction is defined that identifies a loop start, stores a corresponding loop start address on a return stack (or in other suitable storage) and directs fetch logic to take advantage of the identification by retaining in a fetch buffer or instruction cache the instruction(s) beginning at the loop start address, thereby avoiding usual branch delays on subsequent iterations of the loop. A conditional loop-end instruction can be used in conjunction with the loop start instruction to discard (or simply mark as no longer needed) the loop start address and the loop body instructions retained in the fetch buffer or instruction cache.

Type: Application

Filed: November 12, 2008

Publication date: May 13, 2010

Applicant: FREESCALE SEMICONDUCTOR, INC.

Inventor: Michael A. Fischer
Macroscalar Processor Architecture

Publication number: 20100122069

Abstract: A macroscalar processor architecture is described herein. In one embodiment, a processor receives instructions of a program loop having a vector block and a sequence block intended to be executed after the vector block, where the processor includes multiple slices and each of the slices is capable of executing an instruction of an iteration of the program loop substantially in parallel. For each iteration of the program loop, the processor executes an instruction of the sequence block using one of the slices while executing instructions of the vector block using a remainder of the slices substantially in parallel. Other methods and apparatuses are also described.

Type: Application

Filed: November 6, 2009

Publication date: May 13, 2010

Inventor: Jeffry E. Gonion
Method for predicate promotion in a software loop

Patent number: 7712091

Abstract: A method and system for optimizing the execution of a software loop is provided. The method involves the determination of an edge in a critical recurrence cycle in the software loop. The edge is a dependency link between two instructions and contains a dependee and a dependent. The dependee is an instruction that produces a result, and the dependent is an instruction that uses the result. The method further involves performing predicate promotion of at least one of the dependee and the dependent if one or more pre-determined conditions are met.

Type: Grant

Filed: September 30, 2005

Date of Patent: May 4, 2010

Assignee: Intel Corporation

Inventors: Kalyan Muthukumar, Robyn A. Sampson, Daniel Lavery
Method and apparatus for configuring a processor embedded in an integrated circuit for use as a logic element

Patent number: 7698449

Abstract: Method and apparatus for configuring a processor embedded in an integrated circuit for use as a logic element is described. In one example, a processing apparatus in an integrated circuit includes a point-to-point data streaming interface and arithmetic logic unit (ALU) circuitry. The ALU circuitry includes at least one input port in communication with the point-to-point data streaming interface. The processor may also include a register file and multiplexer logic. The multiplexer logic is configured to selectively couple the register file and the point-to-point streaming interface to the at least one input port of the ALU circuitry.

Type: Grant

Filed: February 23, 2005

Date of Patent: April 13, 2010

Assignee: XILINX, Inc.

Inventors: Eric R. Keller, Philip B. James-Roxby
Compiler apparatus with flexible optimization

Patent number: 7698696

Abstract: A compiler comprises an analysis unit that detects directives (options and pragmas) from a user to the compiler, an optimization unit that is made up of a processing unit (a global region allocation unit, a software pipelining unit, a loop unrolling unit, a “if” conversion unit, and a pair instruction generation unit) that performs individual optimization processing designated by options and pragmas from a user, following the directives and the like from the analysis unit, etc. The global region allocation unit performs optimization processing, following designation of the maximum data size of variables to be allocated to a global region, designation of variables to be allocated to the global region, and options and pragmas regarding designation of variables not to be allocated in the global region.

Type: Grant

Filed: June 30, 2003

Date of Patent: April 13, 2010

Assignee: Panasonic Corporation

Inventors: Hajime Ogawa, Taketo Heishi, Toshiyuki Sakata, Shuichi Takayama, Shohei Michimoto, Tomoo Hamada, Ryoko Miyachi
Multi-mode instruction memory unit

Patent number: 7685411

Abstract: An instruction memory unit comprises a first memory structure operable to store program instructions, and a second memory structure operable to store program instructions fetched from the first memory structure, and to issue stored program instructions for execution. The second memory structure is operable to identify a repeated issuance of a forward program redirect construct, and issue a next program instruction already stored in the second memory structure if a resolution of the forward branching instruction is identical to a last resolution of the same. The second memory structure is further operable to issue a backward program redirect construct, determine whether a target instruction is stored in the second memory structure, issue the target instruction if the target instruction is stored in the second memory structure, and fetch the target instruction from the first memory structure if the target instruction is not stored in the second memory structure.

Type: Grant

Filed: April 11, 2005

Date of Patent: March 23, 2010

Assignee: QUALCOMM Incorporated

Inventors: Muhammad Ahmed, Lucian Codrescu, Erich Plondke, William C. Anderson, Robert Allan Lester, Phillip M. Jones
DATA PROCESSOR AND DATA PROCESSING SYSTEM

Publication number: 20100064106

Abstract: The present invention provides a data processor capable of automatically discriminating a loop program and performing a reduction in power by size-variable lock control on an instruction buffer. The instruction buffer of the data processor includes a buffer controller for controlling a memory unit that stores each fetched instruction therein. When an execution history of a fetched condition branch instruction suggests condition establishment, and in the case that the branch direction of the fetched condition branch instruction is a direction opposite to the order of an instruction execution and the difference of instruction addresses from the branch source to the branch target based on the condition branch instruction is a range held in the storage capacity of the instruction buffer, the buffer controller retains an instruction sequence from a branch source to a branch target based on the condition branch instruction in the instruction buffer.

Type: Application

Filed: August 24, 2009

Publication date: March 11, 2010

Inventors: Tetsuya YAMADA, Naoki KATO
Apparatus for controlling instruction fetch reusing fetched instruction

Patent number: 7676650

Abstract: When an instruction stored in a specific instruction buffer is the same as another instruction stored in another instruction buffer and logically subsequent to the instruction in the specific instruction buffer, a connection is made from the instruction buffer storing a logically and immediately preceding instruction, not the instruction in the other instruction buffer, to the specific instruction buffer without the instruction in the other instruction buffer, and a loop is generated by instruction buffers, thereby performing a short loop in an instruction buffer system capable of arbitrarily connecting a plurality of instruction buffers.

Type: Grant

Filed: January 21, 2003

Date of Patent: March 9, 2010

Assignee: Fujitsu Limited

Inventor: Masaki Ukai
Method, system and program product for pipelined processor having a branch target buffer (BTB) table with a recent entry queue in parallel with the BTB table

Patent number: 7676663

Abstract: A method and apparatus enable supplementing a Branch Target Buffer (BTB) table with a recent entry queue that prevents unnecessary removal of valuable BTB table data of multiple entries for another entry. The recent entry queue detects when the startup latency of the BTB table prevents it from asynchronously aiding the microprocessor pipeline as designed for and thereby can delay the pipeline in the required situations such that the BTB table latency on startup can be overcome. The recent entry queue provides a quick access to BTB table entries that are accessed in a tight loop pattern where the throughput of the standalone BTB table cannot track the throughput of the microprocessor execution pipeline. By using the recent entry queue, the modified BTB table processes information at the rate of the execution pipeline which provides acceleration thereof.

Type: Grant

Filed: March 9, 2004

Date of Patent: March 9, 2010

Assignee: International Business Machines Corporation

Inventors: Brian Robert Prasky, Thomas Roberts Puzak, Allan Mark Hartstein
INSTRUCTION FETCH PIPELINE FOR SUPERSCALAR DIGITAL SIGNAL PROCESSORS AND METHOD OF OPERATION THEREOF

Publication number: 20100058039

Abstract: A next program counter (PC) value generator. The next PC value generator includes a discontinuity decoder that is provide to detect a discontinuity instruction among a plurality of instructions and a tight loop decoder that is provide to: a) detect a tight loop instruction, and b) provide a tight loop instruction target address. The next PC value generator further includes a next PC value logic having a plurality of inputs: a first input coupled to an output of the discontinuity decoder, and a second input coupled to an output of the tight loop decoder. The next PC value logic provides as an output, without a stall, a control signal that a next PC value is to be loaded with the tight loop instruction target address if: the discontinuity decoder detects a discontinuity instruction, and the tight loop decoder detects a tight loop instruction.

Type: Application

Filed: September 4, 2008

Publication date: March 4, 2010

Applicant: VeriSilicon Holdings Company, Limited

Inventors: Vijayanand Angarai, Michelle Y. Che, Asheesh Kashyap, Tracy Nguyen
METHOD FOR EXECUTING AN INSTRUCTION LOOPS AND A DEVICE HAVING INSTRUCTION LOOP EXECUTION CAPABILITIES

Publication number: 20100049958

Abstract: A method for managing a hardware instruction loop, the method includes: (i) detecting, by a branch prediction unit, an instruction loop; wherein a size of the instruction loop exceeds a size of a storage space allocated in a fetch unit for storing fetched instructions; (ii) requesting from the fetch unit to fetch instructions of the instruction loop that follow the first instructions of the instruction loop; and (iii) selecting, during iterations of the instruction loop, whether to provide to a dispatch unit one of the first instructions of the instruction loop or another instruction that is fetched by the fetch unit; wherein the first instructions of the instruction loop are stored at the dispatch unit.

Type: Application

Filed: August 19, 2008

Publication date: February 25, 2010

Inventors: Lev Vaskevich, Itzhak Barak, Amir Paran, Yuval Peled, Idan Rozenberg, Doron Schupper
Pipeline controller for context-based operation reconfigurable instruction set processor

Patent number: 7669042

Abstract: An instruction execution pipeline for use in a data processor. The instruction execution pipeline comprises: 1) an instruction fetch stage; 2) a decode stage; 3) an execution stage; and 4) a write-back stage. The instruction pipeline repetitively executes a loop of instructions by fetching and decoding a first instruction associated with the loop during a first iteration of the loop, storing first decoded instruction information associated with the first instruction during the first iteration of the loop, and using the stored first decoded instruction information during at least a second iteration of the loop without further fetching and decoding of the first instruction during the at least a second iteration of the loop.

Type: Grant

Filed: June 10, 2005

Date of Patent: February 23, 2010

Assignee: Samsung Electronics Co., Ltd.

Inventors: Eran Pisek, Jasmin Oz, Yan Wang
Program execution method using an optimizing just-in-time compiler

Patent number: 7665079

Abstract: It is one object of the present invention to provide a program execution method for performing greater optimization. A program execution apparatus according to the present invention performs a transfer from an interpreter process to a compiled code process in the course of the execution of a method. At this time, if no problem occurs when a transfer point is moved to the top of a loop, the transfer point for code is so moved. And when a transfer point is located inside a loop, a point that post-dominates the top of the loop and the transfer point is copied to a position immediately preceding the loop. Then, information for generating recalculation code is provided for the transfer point, and a recalculation is performed.

Type: Grant

Filed: November 8, 2000

Date of Patent: February 16, 2010

Assignee: International Business Machines Corporation

Inventors: Toshiaki Yasue, Kazunori Ogata, Kazuaki Ishizaki, Hideaki Komatsu
Register allocation method and system for program compiling

Patent number: 7660970

Abstract: Disclosed is a data processing system and method. The data processing method determines the number of static registers and the number of rotating registers for assigning a register to a variable contained in a certain program, assigns the register to the variable based on the number of the static registers and the number of the rotating registers, and compiles the program. Further, the method stores in the special register a value corresponding to the number of the rotating registers in the compiling operation, and obtains a physical address from a logical address of the register based on the value. Accordingly, the present invention provides an aspect of efficiently using register files by dynamically controlling the number of rotating registers and the number of static registers for a software pipelined loop, and has an effect capable of reducing the generations of spill/fill codes unnecessary during program execution to a minimum.

Type: Grant

Filed: August 21, 2006

Date of Patent: February 9, 2010

Assignee: Samsung Electronics Co., Ltd.

Inventors: Suk-jin Kim, Jeong-wook Kim, Hong-seok Kim, Soo-jung Ryu
System for forming a critical update loop to continuously reload active thread state from a register storing thread state until another active thread is detected

Patent number: 7653904

Abstract: A method, apparatus, and system are provided for a multi-threaded virtual state mechanism. According to one embodiment, active thread state of a first active thread is received using a virtual state mechanism, and virtual thread state is generated in accordance with the active thread state of the first active thread, and the virtual thread state corresponding to the first active thread is forwarded to state update logic.

Type: Grant

Filed: September 26, 2003

Date of Patent: January 26, 2010

Assignee: Intel Corporation

Inventor: Nicholas G. Samra
INFORMATION PROCESSING DEVICE AND METHOD OF CONTROLLING INSTRUCTION FETCH

Publication number: 20100005276

Abstract: An information processing device includes an instruction fetch unit, an instruction buffer, an instruction executing unit, and an instruction fetch control unit. The instruction fetch unit supplies a fetch address to an instruction memory. The instruction buffer stores an instruction read out from the instruction memory. The instruction executing unit decodes and executes the instruction supplied from the instruction buffer. The instruction fetch control unit stops supply of the fetch address to the instruction memory by the instruction fetch unit when the fetch address corresponds to a first address or an address after the first address while the instruction executing unit executes loop processing. The loop processing is repeatedly executed for a predetermined number of times in accordance with decoding of the loop instruction by the instruction executing unit. The first address is an address after an address of an end instruction included in the loop processing.

Type: Application

Filed: June 11, 2009

Publication date: January 7, 2010

Applicant: NEC ELECTRONICS CORPORATION

Inventor: Hideyuki MIWA
Loop Control System and Method

Publication number: 20090327674

Abstract: Loop control systems and methods are disclosed. In a particular embodiment, a hardware loop control logic circuit includes a detection unit to detect an end of loop indicator of a program loop. The hardware loop control logic circuit also includes a decrement unit to decrement a loop count and to decrement a predicate trigger counter. The hardware loop control logic circuit further includes a comparison unit to compare the predicate trigger counter to a reference to determine when to set a predicate value.

Type: Application

Filed: June 27, 2008

Publication date: December 31, 2009

Applicant: QUALCOMM INCORPORATED

Inventors: Lucian Codrescu, Erich James Plondke, Lin Wang, Suresh K. Venkumahanti
Method and Apparatus for Nested Instruction Looping Using Implicit Predicates

Publication number: 20090307472

Abstract: A method and apparatus for executing a nested program loop on a vector processor, the loop comprising outer-pre, inner and outer-post portions. An input stream unit of the vector processor provides a data value to a data path and sets an associated data validity tag to ‘valid’ once per outer loop iteration, as indicated by an inner counter of the input stream unit. The tag is set to ‘invalid’ in other iterations. Functional units of the vector processor operate on data values in the data path, each functional unit producing a valid result if the data validity tags associated with inputs data values are set to ‘valid’. An output stream unit of the vector processor sinks a data value from the data path once per outer loop iteration if an associated data validity tag indicates that the data value is valid.

Type: Application

Filed: June 5, 2008

Publication date: December 10, 2009

Applicant: MOTOROLA, INC.

Inventors: Raymond B. Essick IV, Kent D. Moat, Michael A. Schuette
Macroscalar processor architecture

Patent number: 7617496

Abstract: A macroscalar processor architecture is described herein. In one embodiment, a processor receives instructions of a program loop having a vector block and a sequence block intended to be executed after the vector block, where the processor includes multiple slices and each of the slices is capable of executing an instruction of an iteration of the program loop substantially in parallel. For each iteration of the program loop, the processor executes an instruction of the sequence block using one of the slices while executing instructions of the vector block using a remainder of the slices substantially in parallel. Other methods and apparatuses are also described.

Type: Grant

Filed: September 1, 2005

Date of Patent: November 10, 2009

Assignee: Apple Inc.

Inventor: Jeffry E. Gonion
Fairness, Performance, and Livelock Assessment Using a Loop Manager With Comparative Parallel Looping

Publication number: 20090265534

Abstract: A method, apparatus, and computer program are provided for assessing fairness, performance, and livelock in a logic development process utilizing comparative parallel looping. Multiple loop macros are generated, the multiple loop macros respectively correspond to multiple processor threads, and the multiple loop macros are parallel comparative loop macros. The multiple processor threads for the multiple loop macros are executed in which a common resource is accessed. A forward performance of each of the multiple processor threads is verified. The forward performance of the multiple processor threads is compared with each other. It is determined whether any of the multiple processor threads fails to meet a minimum loop count or a minimum loop time. It is determined whether any of the multiple processor threads exceeds a maximum loop count or a maximum loop time. It is recognized whether fairness is maintained during the execution of the multiple processor threads.

Type: Application

Filed: April 17, 2008

Publication date: October 22, 2009

Inventors: Duane A. Averill, Anthony D. Drumm, Christopher T. Phan, Brian T. Vanderpool, Sharon D. Vincent
Computation spreading utilizing dithering for spur reduction in a digital phase lock loop

Publication number: 20090262877

Abstract: A novel and useful apparatus for and method of spur reduction using computation spreading with dithering in a digital phase locked loop (DPLL) architecture. A software based PLL incorporates a reconfigurable calculation unit (RCU) that is optimized and programmed to sequentially perform all the atomic operations of a PLL or any other desired task in a time sharing manner. An application specific instruction-set processor (ASIP) incorporating the RCU is adapted to spread the computation of the atomic operations out over a PLL reference clock period wherein each computation is performed at a much higher processor clock frequency than the PLL reference clock rate. This significantly reduces the per cycle current transient generated by the computations. The frequency content of the current transients is at the higher processor clock frequency which results in a significant reduction in spurs within sensitive portions of the output spectrum.

Type: Application

Filed: April 17, 2008

Publication date: October 22, 2009

Inventors: Fuqiang Shi, Roman Staszewski, Robert B. Staszewski
RETARGETTING AN APPLICATION PROGRAM FOR EXECUTION BY A GENERAL PURPOSE PROCESSOR

Publication number: 20090259832

Abstract: One embodiment of the present invention sets forth a technique for translating application programs written using a parallel programming model for execution on multi-core graphics processing unit (GPU) for execution by general purpose central processing unit (CPU). Portions of the application program that rely on specific features of the multi-core GPU are converted by a translator for execution by a general purpose CPU. The application program is partitioned into regions of synchronization independent instructions. The instructions are classified as convergent or divergent and divergent memory references that are shared between regions are replicated. Thread loops are inserted to ensure correct sharing of memory between various threads during execution by the general purpose CPU.

Type: Application

Filed: March 19, 2009

Publication date: October 15, 2009

Inventors: Vinod GROVER, Bastiaan Joannes Matheus AARTS, Michael MURPHY, Boris BEYLIN, Jayant B. KOLHE, Douglas SAYLOR
OBFUSCATION DEVICE, PROCESSING DEVICE, METHOD, PROGRAM, AND INTEGRATED CIRCUIT THEREOF

Publication number: 20090254738

Abstract: It is an object of the present invention to provide an obfuscation device that can achieve both sufficient obfuscation and the appropriate instruction block to be executed. In the obfuscation device, a first instruction generating unit, for each of the first process and the second process, generates an initialization instruction for securing a management area for managing the identification information indicating an instruction block that should be executed next so as to proceed with the process, and to store the initialization instruction in said storage unit.

Type: Application

Filed: March 24, 2009

Publication date: October 8, 2009

Inventors: Taichi SATO, Tomoyuki Haga, Kenichi Matsumoto, Akito Monden, Haruaki Tamada
Data Processing Device and Electronic Equipment

Publication number: 20090235052

Abstract: A data processing device is provided using pipeline architecture to reduce a time loss due to a branch without causing an increase in circuit scale. The data processing device uses pipeline control. The data processing device includes an instruction queue in which a plurality of instruction codes can be fetched, a fetch address operation circuit which calculates a fetch address, a fetch circuit which fetches an instruction code based on the fetch address, and a branch information setting circuit which decodes a branch setting instruction, stores a branch address in a branch address storage register, and stores a branch target address in a branch target address storage register. The fetch address operation circuit compares either a previous fetch address or an expected next fetch address with a value stored in the branch address storage register, and determines a next fetch address to be output, based on the comparison result.

Type: Application

Filed: April 2, 2009

Publication date: September 17, 2009

Applicant: SEIKO EPSON CORPORATION

Inventor: Makoto Kudo
Loop accelerator and data processing system having the same

Patent number: 7590831

Abstract: Provided are a loop accelerator and a data processing system having the loop accelerator. The data processing system includes a loop accelerator which executes a loop part of a program, a processor core which processes a remaining part of the program except the loop part, and a central register file which transmits data between the processor core and the loop accelerator. The loop accelerator includes a plurality of processing elements (PEs) each of which performs an operation on each word to execute the program, a configuration memory which stores configuration bits indicating operations, states, etc. of the PEs, and a plurality of context memories, installed in a column or row direction of the PEs, which transmit the configuration bits along a direction toward which the PEs are arrayed. Thus, a connection structure between the configuration memory and the PEs can be simplified to easily modify a structure of the loop accelerator so as to extend the loop accelerator.

Type: Grant

Filed: September 5, 2006

Date of Patent: September 15, 2009

Assignee: Samsung Electronics Co., Ltd.

Inventors: Soo-jung Ryu, Jeong-wook Kim, Suk-jin Kim, Hong-Seok Kim, Jun-jin Kong
DIGITAL DATA PROCESSING METHOD AND SYSTEM

Publication number: 20090228677

Abstract: A method and system for processing generic formatted data, including first data describing a sequence of generic operations without any loops, in view of providing specific formatted data, for a determined platform including Q processor(s) and at least one memory, the platform configured to process, according, directly or indirectly, to specific formatted data, an object made up of elementary information of same type, each elementary information being represented by at least one numerical value.

Type: Application

Filed: December 19, 2006

Publication date: September 10, 2009

Applicant: DXO LABS

Inventor: Bruno Liege
METHOD, SYSTEM AND COMPUTER PROGRAM PRODUCT FOR MINIMIZING BRANCH PREDICTION LATENCY

Publication number: 20090217017

Abstract: A method, system, and computer program product for minimizing branch prediction latency in a pipelined computer processing environment are provided. The method includes detecting a branch loop utilizing branch instruction addresses and corresponding target addresses stored in a branch target buffer (BTB). The method also includes fetching the branch loop into a pre-decode instruction buffer and qualifying the branch loop for loop lockdown. The method further includes locking an instruction stream that forms the branch loop in the pre-decode instruction buffer and processing qualified branch loop instructions from the buffer and powering down instruction fetching and branch prediction logic (BPL) associated with the BTB.

Type: Application

Filed: February 26, 2008

Publication date: August 27, 2009

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Khary J. Alexander, David S. Hutton, Brian R. Prasky, Anthony Saporito, Robert J. Sonnelitter, III, John W. Ward, III
Reusing a buffer memory as a microcache for program instructions of a detected program loop

Patent number: 7571305

Abstract: A data processing system 2 includes an instruction cache 6 having an associated buffer memory 18, 8. The buffer memory 18, 8 can operate in a buffer mode or in a microcache mode. The buffer memory is switched into the microcache mode upon program loop detection performed by loop detector circuitry 20. When operating in the microcache mode, instruction data is read from the buffer memory 18, 8 without requiring an access to the instruction cache 6.

Type: Grant

Filed: January 11, 2007

Date of Patent: August 4, 2009

Assignee: ARM Limited

Inventors: Fredrick Claude Marie Piry, Louis-Marie Vincent Mouton, Stephane Eric Sabastien Brochier, Gilles Eric Grandou
Method for providing zero overhead looping using carry chain masking

Patent number: 7558948

Abstract: A method for reducing overhead on a loop of a plurality of instructions is disclosed. The method includes providing a carry mask, the carry mask having a first value for the loop being performed at least the particular number of times minus one and a second value for at least a last instruction of the loop being performed a last time, providing addition logic, wherein the carry mask and a current instruction address of the plurality of instructions correspond to inputs of the addition logic and determining which of the plurality of instructions is to be executed using the carry mask to provide a resultant of the addition logic based on the carry mask and the current instruction address of the plurality of instructions.

Type: Grant

Filed: September 20, 2004

Date of Patent: July 7, 2009

Assignee: International Business Machines Corporation

Inventors: Anthony J. Bybell, Richard W. Doing, David D. Dukro
Method and System for Auto Parallelization of Zero-Trip Loops Through the Induction Variable Substitution

Publication number: 20090158018

Abstract: A method and system of auto parallelization of zero-trip loops that substitutes a nested basic linear induction variable by exploiting a parallelizing compiler is provided. Provided is a use of a max{0,N} variable for loop iterations in case of no information is known about the value of N, for a typical loop iterating from 1 to N, in which N is the loop invariant. For the nested basic induction variables, an induction variable substitution process is applied to the nested loops starting from the innermost loop to the outermost one. Then a removal of the max operator afterwards through a copy propagation pass of the IBM compiler is provided. In doing so, the loop dependency on the induction variable is eliminated and an opportunity for a parallelizing compiler to parallel the outermost loop is provided.

Type: Application

Filed: January 21, 2009

Publication date: June 18, 2009

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Zhixing Ren, Raul Esteban Silvera, Guansong Zhang
Processor and Signal Processing Method

Publication number: 20090150658

Abstract: This invention combines a loop support mechanism and a branch prediction mechanism. After an instruction execution unit executes an end block instruction of a block repeat, the loop control unit branches to the first instruction in the loop and sends a pseudo branch instruction to the instruction execution unit. The instruction execution unit acts as if the last instruction in the block is an instruction for branching to the start address of the block. This is stored in the branch prediction unit and branch prediction is performed thereafter.

Type: Application

Filed: December 5, 2008

Publication date: June 11, 2009

Inventor: Hiroyuki Mizumo
SIMD Code Generation For Loops With Mixed Data Lengths

Publication number: 20090144529

Abstract: Generating loop code to execute on Single-Instruction Multiple-Datapath (SIMD) architectures, where the loop operates on datatypes having different lengths, is disclosed. Further, a preferred embodiment of the present invention includes a novel technique to efficiently realign or shift arbitrary streams to an arbitrary offset, regardless whether the alignments or offsets are known at the compile time or not. This technique enables the application of advanced alignment optimizations to runtime alignment. Length conversion operations, for packing and unpacking data values, are included in the alignment handling framework. These operations are formally defined in terms of standard SIMD instructions that are readily available on various SIMD platforms. This allows sequential loop code operating on datatypes of disparate length to be transformed (“simdized”) into optimized SIMD code through a fully automated process.

Type: Application

Filed: December 4, 2008

Publication date: June 4, 2009

Applicant: International Business Machines Corporation

Inventors: Alexandre E. Eichenberger, Kai-Ting Amy Wang, Peng Wu
Meta-Architecture Defined Programmable Instruction Fetch Functions Supporting Assembled Variable Length Instruction Processors

Publication number: 20090144502

Abstract: In an implementation, a processing system includes an instruction fetch (IF) memory storing IF instructions; an arithmetic/logic (AL) instruction memory (IMemory) storing AL instructions; and a programmable instruction fetch mechanism to generate IMemory instruction addresses, from IF instructions fetched from the IF memory, to select AL instructions to be fetched from the IMemory for execution, wherein at least one IF instruction includes a loop count field indicating a number of iterations of a loop to be performed, a loop start address of the loop, and a loop end address of the loop.

Type: Application

Filed: February 6, 2009

Publication date: June 4, 2009

Applicant: Renesky Tap III, Limited Liability Compnay

Inventor: Gerald George Pechanek
High performance memory and system organization for digital signal processing

Publication number: 20090125912

Abstract: An innovative approach for constructing optimum, high-performance, efficient DSP systems may include a system organization to match compute execution and data availability rate and to organize DSP operations as loop iterations such that there is maximal reuse of data between multiple consecutive iterations. Independent set up and preparation of data before it is required through suitable mechanisms such as data pre-fetching may be used. This technique may be useful and important for devices that require cost-effective, high-performance, power consumption efficient VLSI IC.

Type: Application

Filed: January 15, 2009

Publication date: May 14, 2009

Inventor: SIAMACK HAGHIGHI

prev 1 2 3 4 5 6 7 8 9 … next