Loop Execution Patents (Class 712/241)
  • Patent number: 7822949
    Abstract: A command supply device supplies a command sequence that forms a loop. A loop command buffer accumulates a first partial command sequence. The first partial command sequence is a head part of a first command sequence repeatedly supplied to a CPU from among command sequences stored in a main memory, and is accumulated before the first command sequence is supplied to the CPU again. A linking command buffer accumulates a second partial command sequence. The second partial command sequence follows the first partial command sequence in the first command sequence, and is accumulated while the accumulated first partial command sequence in the loop command buffer is supplied to the CPU. A selection circuit supplies, to the CPU, a command from the accumulated second partial command sequence in the linking command buffer when the entirety of the first partial command sequence has been supplied to the CPU.
    Type: Grant
    Filed: May 9, 2005
    Date of Patent: October 26, 2010
    Assignee: Panasonic Corporation
    Inventor: Satoshi Ogura
  • Publication number: 20100250854
    Abstract: An efficient and effective compiler data prefetching technique is disclosed in which memory accesses may be prefetched are represented in linear induction expressions. Furthermore, indirect memory accesses indexed by other memory accesses of linear induction expressions in scalar loops may be prefetched.
    Type: Application
    Filed: March 16, 2010
    Publication date: September 30, 2010
    Inventor: Dz-ching Ju
  • Publication number: 20100235612
    Abstract: A macroscalar processor architecture is described herein. In one embodiment, an exemplary processor includes one or more execution units to execute instructions and one or more iteration units coupled to the execution units. The one or more iteration units receive one or more primary instructions of a program loop that comprise a machine executable program. For each of the primary instructions received, at least one of the iteration units generates multiple secondary instructions that correspond to multiple loop iterations of the task of the respective primary instruction when executed by the one or more execution units. Other methods and apparatuses are also described.
    Type: Application
    Filed: May 26, 2010
    Publication date: September 16, 2010
    Inventor: Jeffry E. Gonion
  • Patent number: 7797692
    Abstract: A system that estimates a dominant computational resource which is used by a computer program. During operation, for each basic block in the computer program, the system determines a nesting level for the basic block. Next, the system selects basic blocks with nesting levels greater than a specified threshold. For each selected basic block, the system analyzes the basic block to estimate the dominant computational resource used by the basic block. The system then uses the estimated dominant computational resources for the selected basic blocks to estimate the dominant computational resource for the computer program.
    Type: Grant
    Filed: May 12, 2006
    Date of Patent: September 14, 2010
    Assignee: Google Inc.
    Inventor: Grzegorz J. Czajkowski
  • Publication number: 20100211762
    Abstract: A system to implement a zero overhead software pipelined (SFP) loop includes a Very Long Instruction Word (VLIW) processor having an N number of execution slots. The VLIW processor executes a plurality of instructions in parallel without any limitation of an instruction buffer size. A program memory receives a Program Memory address to fetch an instruction packet. The program memory is closely coupled with the instruction buffer size to implement the zero overhead software pipelined (SFP) loop. The size of the zero overhead software pipelined (SFP) loop can exceed the instruction buffer size. A CPU control register includes a block count and an iteration count. The block count is loaded into a block counter and counts the plurality of instructions executed in the SFP loop, and the iteration count is loaded into an iteration counter and counts a number of iterations of the SFP loop based on the block count.
    Type: Application
    Filed: February 18, 2010
    Publication date: August 19, 2010
    Applicant: SAANKHYA LABS PVT LTD
    Inventors: Anindya Saha, Manish Kumar, Hemant Mallapur, Santhosh Billava, Viji Rajangam
  • Patent number: 7774766
    Abstract: Various embodiments of the present invention relate to methods and systems for optimizing an intermediate code in a compilation logic. The intermediate code is optimized by performing reassociation in software loops. The intermediate code includes at least one critical recurrence cycle. The performance of reassociation in software loops can reduce a critical recurrence cycle in them, which can speed up their execution. The subject method can include the determination of one or more critical recurrence cycles in a software loop. The method can also include the determination of at least one edge in a critical recurrence cycle, with respect to which reassociation can be performed, if one or more pre-determined criteria are met. The method can further include performing reassociation of a dependee and a dependent of an edge. In an embodiment, when one or more pre-determined criteria are met, the logic of the software loop is maintained after performing reassociation of the dependee and the dependent of the edge.
    Type: Grant
    Filed: September 29, 2005
    Date of Patent: August 10, 2010
    Assignee: Intel Corporation
    Inventors: Kalyan Muthukumar, Daniel M Lavery
  • Publication number: 20100199076
    Abstract: A computing apparatus and method of handling an interrupt are provided. The computing apparatus includes a coarse-grained array, a host processor, and an interrupt supervisor. When an interrupt occurs in the coarse-grained array while performing a loop operation, the host processor processes the interrupt, and the interrupt supervisor may perform mode switching between the coarse-grained array and the host processor.
    Type: Application
    Filed: December 16, 2009
    Publication date: August 5, 2010
    Inventors: Dong-hoon YOO, Soo-jung Ryu, Yeon-gon Cho, Bernhard Egger, Il-hyun Park
  • Publication number: 20100185839
    Abstract: An apparatus and method for scheduling an instruction are provided. The apparatus includes an analyzer configured to analyze dependency of a plurality of recurrence loops and a scheduler configured to schedule the recurrence loops based the analyzed dependencies. When scheduling a plurality of recurrence loops, the apparatus first schedules a dominant loop whose loop head has no dependency on another loop among the recurrence loops.
    Type: Application
    Filed: November 2, 2009
    Publication date: July 22, 2010
    Inventors: Tae-wook OH, Won-sub Kim, Bernhard Egger
  • Publication number: 20100180102
    Abstract: A processor includes one or more processing units, an execution pipeline and control circuitry. The execution pipeline includes at least first and second pipeline stages that are cascaded so that program instructions, specifying operations to be performed by the processing units in successive cycles of the pipeline, are fetched from a memory by the first pipeline stage and conveyed to the second pipeline stage, which causes the processing units to perform the specified operations. The control circuitry is coupled, upon determining that a program instruction that is present in the second pipeline stage in a first cycle of the pipeline is to be executed again in a subsequent cycle of the pipeline, to cause the execution pipeline to reuse the program instruction in one of the pipeline stages without re-fetching the program instruction from the memory.
    Type: Application
    Filed: January 15, 2009
    Publication date: July 15, 2010
    Applicant: ALTAIR SEMICONDUCTORS
    Inventors: Edan Almog, Nohik Semel, Yigal Bitran, Nadav Cohen, Yoel Livne, Eli Zyss
  • Patent number: 7757067
    Abstract: A processor (e.g., a co-processor) comprising a decoder coupled to a pre-decoder, in which the decoder decodes a current instruction in parallel with the pre-decoder pre-decoding a subsequent instruction. In particular, the pre-decoder examines at least five Bytecodes in parallel with the decoder decoding a current instruction. The pre-decoder determines if a subsequent instruction contains a prefix. If a prefix is detected in at least one of the five Bytecodes, a program counter skips the prefix and changes the behavior of the decoder during the decoding of the subsequent instruction.
    Type: Grant
    Filed: July 31, 2003
    Date of Patent: July 13, 2010
    Assignee: Texas Instruments Incorporated
    Inventors: Gerard Chauvel, Serge Lasserre, Maija Kuusela
  • Publication number: 20100175056
    Abstract: A compiler comprises an analysis unit that detects directives (options and pragmas) from a user to the compiler, an optimization unit that is made up of a processing unit (a global region allocation unit, a software pipelining unit, a loop unrolling unit, a “if” conversion unit, and a pair instruction generation unit) that performs individual optimization processing designated by options and pragmas from a user, following the directives and the like from the analysis unit, etc. The global region allocation unit performs optimization processing, following designation of the maximum data size of variables to be allocated to a global region, designation of variables to be allocated to the global region, and options and pragmas regarding designation of variables not to be allocated in the global region.
    Type: Application
    Filed: February 16, 2010
    Publication date: July 8, 2010
    Inventors: Hajime OGAWA, Taketo Heishi, Toshiyuki Sakata, Shuichi Takayama, Shohei Michimoto, Tomoo Hamada, Ryoko Miyachi
  • Publication number: 20100169612
    Abstract: A data-processing unit has a fetching circuitry (20) and execution circuitry (30a, 30b). The data-processing unit has an instruction set comprising a nested-loop instruction. The fetching circuitry is arranged to fetch the nested-loop instruction, and the execution circuitry is arranged to execute the nested-loop instruction. The nested-loop instruction comprises at least one instruction field that is adapted to indicate a number of iterations of an outer loop of the nested loop and one or more operations to be performed by the outer loop. Moreover, the at least one instruction field is further adapted to indicate a number of iterations of an inner loop of the nested loop and one or more operations to be performed by the inner loop. A method for fetching, decoding, and executing the nested-loop instruction is also described as well as the structure of the nested-loop instruction.
    Type: Application
    Filed: June 25, 2008
    Publication date: July 1, 2010
    Applicant: TELEFONAKTIEBOLAGET L M ERICSSON (PUBL)
    Inventors: Per Persson, Harald Gustafsson
  • Publication number: 20100153688
    Abstract: An exemplary aspect of the present invention is a data processing apparatus for processing a loop in a pipeline that includes an instruction memory and a fetch circuit that fetches an instruction stored in the instruction memory. The fetch circuit includes an instruction queue that stores an instruction to be output from the fetch circuit, an evacuation queue that stores an instruction fetched from the instruction memory, a selector that selects one of the instruction output from the instruction queue and the instruction output from the evacuation queue, and a loop queue that stores the instruction selected by the selector and outputs to the instruction queue.
    Type: Application
    Filed: December 11, 2009
    Publication date: June 17, 2010
    Applicant: NEC Electronics Corporation
    Inventor: Satoshi CHIBA
  • Patent number: 7739442
    Abstract: A macroscalar processor architecture is described herein. In one embodiment, an exemplary processor includes one or more execution units to execute instructions and one or more iteration units coupled to the execution units. The one or more iteration units receive one or more primary instructions of a program loop that comprise a machine executable program. For each of the primary instructions received, at least one of the iteration units generates multiple secondary instructions that correspond to multiple loop iterations of the task of the respective primary instruction when executed by the one or more execution units. Other methods and apparatuses are also described.
    Type: Grant
    Filed: May 23, 2008
    Date of Patent: June 15, 2010
    Assignee: Apple Inc.
    Inventor: Jeffry E. Gonion
  • Patent number: 7725696
    Abstract: A processor method and apparatus that allows for the overlapped execution of multiple iterations of a loop while allowing the compiler to include only a single copy of the loop body in the code while automatically managing which iterations are active. Since the prologue and epilogue are implicitly created and maintained within the hardware in the invention, a significant reduction in code size can be achieved compared to software-only modulo scheduling. Furthermore, loops with iteration counts less than the number of concurrent iterations present in the kernel are also automatically handled. This hardware enhanced scheme achieves the same performance as the fully-specified standard method. Furthermore, the hardware reduces the power requirement as the entire fetch unit can be deactivated for a portion of the loop's execution.
    Type: Grant
    Filed: October 4, 2007
    Date of Patent: May 25, 2010
    Inventors: Wen-mei W. Hwu, Matthew C. Merten
  • Patent number: 7719825
    Abstract: A faceplate for a housing of a computing device including a processor includes a bezel with interior and exterior surfaces. The bezel removeably covers at least a portion of an opening defined by the housing. A wireless local area network (LAN) unit is mechanically coupled to the interior surface of the bezel and comprises a data interface that enables data transfer between the wireless LAN unit and the processor.
    Type: Grant
    Filed: August 28, 2008
    Date of Patent: May 18, 2010
    Assignee: Marvell International Ltd.
    Inventors: Joseph Knapp, George Chien
  • Publication number: 20100122066
    Abstract: Instruction set techniques have been developed to identify explicitly the beginning of a loop body and to code a conditional loop-end in ways that allow a processor implementation to efficiently manage an instruction fetch buffer and/or entries in an instruction cache. In particular, for some computations and processor implementations, a machine instruction is defined that identifies a loop start, stores a corresponding loop start address on a return stack (or in other suitable storage) and directs fetch logic to take advantage of the identification by retaining in a fetch buffer or instruction cache the instruction(s) beginning at the loop start address, thereby avoiding usual branch delays on subsequent iterations of the loop. A conditional loop-end instruction can be used in conjunction with the loop start instruction to discard (or simply mark as no longer needed) the loop start address and the loop body instructions retained in the fetch buffer or instruction cache.
    Type: Application
    Filed: November 12, 2008
    Publication date: May 13, 2010
    Applicant: FREESCALE SEMICONDUCTOR, INC.
    Inventor: Michael A. Fischer
  • Publication number: 20100122069
    Abstract: A macroscalar processor architecture is described herein. In one embodiment, a processor receives instructions of a program loop having a vector block and a sequence block intended to be executed after the vector block, where the processor includes multiple slices and each of the slices is capable of executing an instruction of an iteration of the program loop substantially in parallel. For each iteration of the program loop, the processor executes an instruction of the sequence block using one of the slices while executing instructions of the vector block using a remainder of the slices substantially in parallel. Other methods and apparatuses are also described.
    Type: Application
    Filed: November 6, 2009
    Publication date: May 13, 2010
    Inventor: Jeffry E. Gonion
  • Patent number: 7712091
    Abstract: A method and system for optimizing the execution of a software loop is provided. The method involves the determination of an edge in a critical recurrence cycle in the software loop. The edge is a dependency link between two instructions and contains a dependee and a dependent. The dependee is an instruction that produces a result, and the dependent is an instruction that uses the result. The method further involves performing predicate promotion of at least one of the dependee and the dependent if one or more pre-determined conditions are met.
    Type: Grant
    Filed: September 30, 2005
    Date of Patent: May 4, 2010
    Assignee: Intel Corporation
    Inventors: Kalyan Muthukumar, Robyn A. Sampson, Daniel Lavery
  • Patent number: 7698449
    Abstract: Method and apparatus for configuring a processor embedded in an integrated circuit for use as a logic element is described. In one example, a processing apparatus in an integrated circuit includes a point-to-point data streaming interface and arithmetic logic unit (ALU) circuitry. The ALU circuitry includes at least one input port in communication with the point-to-point data streaming interface. The processor may also include a register file and multiplexer logic. The multiplexer logic is configured to selectively couple the register file and the point-to-point streaming interface to the at least one input port of the ALU circuitry.
    Type: Grant
    Filed: February 23, 2005
    Date of Patent: April 13, 2010
    Assignee: XILINX, Inc.
    Inventors: Eric R. Keller, Philip B. James-Roxby
  • Patent number: 7698696
    Abstract: A compiler comprises an analysis unit that detects directives (options and pragmas) from a user to the compiler, an optimization unit that is made up of a processing unit (a global region allocation unit, a software pipelining unit, a loop unrolling unit, a “if” conversion unit, and a pair instruction generation unit) that performs individual optimization processing designated by options and pragmas from a user, following the directives and the like from the analysis unit, etc. The global region allocation unit performs optimization processing, following designation of the maximum data size of variables to be allocated to a global region, designation of variables to be allocated to the global region, and options and pragmas regarding designation of variables not to be allocated in the global region.
    Type: Grant
    Filed: June 30, 2003
    Date of Patent: April 13, 2010
    Assignee: Panasonic Corporation
    Inventors: Hajime Ogawa, Taketo Heishi, Toshiyuki Sakata, Shuichi Takayama, Shohei Michimoto, Tomoo Hamada, Ryoko Miyachi
  • Patent number: 7685411
    Abstract: An instruction memory unit comprises a first memory structure operable to store program instructions, and a second memory structure operable to store program instructions fetched from the first memory structure, and to issue stored program instructions for execution. The second memory structure is operable to identify a repeated issuance of a forward program redirect construct, and issue a next program instruction already stored in the second memory structure if a resolution of the forward branching instruction is identical to a last resolution of the same. The second memory structure is further operable to issue a backward program redirect construct, determine whether a target instruction is stored in the second memory structure, issue the target instruction if the target instruction is stored in the second memory structure, and fetch the target instruction from the first memory structure if the target instruction is not stored in the second memory structure.
    Type: Grant
    Filed: April 11, 2005
    Date of Patent: March 23, 2010
    Assignee: QUALCOMM Incorporated
    Inventors: Muhammad Ahmed, Lucian Codrescu, Erich Plondke, William C. Anderson, Robert Allan Lester, Phillip M. Jones
  • Publication number: 20100064106
    Abstract: The present invention provides a data processor capable of automatically discriminating a loop program and performing a reduction in power by size-variable lock control on an instruction buffer. The instruction buffer of the data processor includes a buffer controller for controlling a memory unit that stores each fetched instruction therein. When an execution history of a fetched condition branch instruction suggests condition establishment, and in the case that the branch direction of the fetched condition branch instruction is a direction opposite to the order of an instruction execution and the difference of instruction addresses from the branch source to the branch target based on the condition branch instruction is a range held in the storage capacity of the instruction buffer, the buffer controller retains an instruction sequence from a branch source to a branch target based on the condition branch instruction in the instruction buffer.
    Type: Application
    Filed: August 24, 2009
    Publication date: March 11, 2010
    Inventors: Tetsuya YAMADA, Naoki KATO
  • Patent number: 7676650
    Abstract: When an instruction stored in a specific instruction buffer is the same as another instruction stored in another instruction buffer and logically subsequent to the instruction in the specific instruction buffer, a connection is made from the instruction buffer storing a logically and immediately preceding instruction, not the instruction in the other instruction buffer, to the specific instruction buffer without the instruction in the other instruction buffer, and a loop is generated by instruction buffers, thereby performing a short loop in an instruction buffer system capable of arbitrarily connecting a plurality of instruction buffers.
    Type: Grant
    Filed: January 21, 2003
    Date of Patent: March 9, 2010
    Assignee: Fujitsu Limited
    Inventor: Masaki Ukai
  • Patent number: 7676663
    Abstract: A method and apparatus enable supplementing a Branch Target Buffer (BTB) table with a recent entry queue that prevents unnecessary removal of valuable BTB table data of multiple entries for another entry. The recent entry queue detects when the startup latency of the BTB table prevents it from asynchronously aiding the microprocessor pipeline as designed for and thereby can delay the pipeline in the required situations such that the BTB table latency on startup can be overcome. The recent entry queue provides a quick access to BTB table entries that are accessed in a tight loop pattern where the throughput of the standalone BTB table cannot track the throughput of the microprocessor execution pipeline. By using the recent entry queue, the modified BTB table processes information at the rate of the execution pipeline which provides acceleration thereof.
    Type: Grant
    Filed: March 9, 2004
    Date of Patent: March 9, 2010
    Assignee: International Business Machines Corporation
    Inventors: Brian Robert Prasky, Thomas Roberts Puzak, Allan Mark Hartstein
  • Publication number: 20100058039
    Abstract: A next program counter (PC) value generator. The next PC value generator includes a discontinuity decoder that is provide to detect a discontinuity instruction among a plurality of instructions and a tight loop decoder that is provide to: a) detect a tight loop instruction, and b) provide a tight loop instruction target address. The next PC value generator further includes a next PC value logic having a plurality of inputs: a first input coupled to an output of the discontinuity decoder, and a second input coupled to an output of the tight loop decoder. The next PC value logic provides as an output, without a stall, a control signal that a next PC value is to be loaded with the tight loop instruction target address if: the discontinuity decoder detects a discontinuity instruction, and the tight loop decoder detects a tight loop instruction.
    Type: Application
    Filed: September 4, 2008
    Publication date: March 4, 2010
    Applicant: VeriSilicon Holdings Company, Limited
    Inventors: Vijayanand Angarai, Michelle Y. Che, Asheesh Kashyap, Tracy Nguyen
  • Publication number: 20100049958
    Abstract: A method for managing a hardware instruction loop, the method includes: (i) detecting, by a branch prediction unit, an instruction loop; wherein a size of the instruction loop exceeds a size of a storage space allocated in a fetch unit for storing fetched instructions; (ii) requesting from the fetch unit to fetch instructions of the instruction loop that follow the first instructions of the instruction loop; and (iii) selecting, during iterations of the instruction loop, whether to provide to a dispatch unit one of the first instructions of the instruction loop or another instruction that is fetched by the fetch unit; wherein the first instructions of the instruction loop are stored at the dispatch unit.
    Type: Application
    Filed: August 19, 2008
    Publication date: February 25, 2010
    Inventors: Lev Vaskevich, Itzhak Barak, Amir Paran, Yuval Peled, Idan Rozenberg, Doron Schupper
  • Patent number: 7669042
    Abstract: An instruction execution pipeline for use in a data processor. The instruction execution pipeline comprises: 1) an instruction fetch stage; 2) a decode stage; 3) an execution stage; and 4) a write-back stage. The instruction pipeline repetitively executes a loop of instructions by fetching and decoding a first instruction associated with the loop during a first iteration of the loop, storing first decoded instruction information associated with the first instruction during the first iteration of the loop, and using the stored first decoded instruction information during at least a second iteration of the loop without further fetching and decoding of the first instruction during the at least a second iteration of the loop.
    Type: Grant
    Filed: June 10, 2005
    Date of Patent: February 23, 2010
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Eran Pisek, Jasmin Oz, Yan Wang
  • Patent number: 7665079
    Abstract: It is one object of the present invention to provide a program execution method for performing greater optimization. A program execution apparatus according to the present invention performs a transfer from an interpreter process to a compiled code process in the course of the execution of a method. At this time, if no problem occurs when a transfer point is moved to the top of a loop, the transfer point for code is so moved. And when a transfer point is located inside a loop, a point that post-dominates the top of the loop and the transfer point is copied to a position immediately preceding the loop. Then, information for generating recalculation code is provided for the transfer point, and a recalculation is performed.
    Type: Grant
    Filed: November 8, 2000
    Date of Patent: February 16, 2010
    Assignee: International Business Machines Corporation
    Inventors: Toshiaki Yasue, Kazunori Ogata, Kazuaki Ishizaki, Hideaki Komatsu
  • Patent number: 7660970
    Abstract: Disclosed is a data processing system and method. The data processing method determines the number of static registers and the number of rotating registers for assigning a register to a variable contained in a certain program, assigns the register to the variable based on the number of the static registers and the number of the rotating registers, and compiles the program. Further, the method stores in the special register a value corresponding to the number of the rotating registers in the compiling operation, and obtains a physical address from a logical address of the register based on the value. Accordingly, the present invention provides an aspect of efficiently using register files by dynamically controlling the number of rotating registers and the number of static registers for a software pipelined loop, and has an effect capable of reducing the generations of spill/fill codes unnecessary during program execution to a minimum.
    Type: Grant
    Filed: August 21, 2006
    Date of Patent: February 9, 2010
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Suk-jin Kim, Jeong-wook Kim, Hong-seok Kim, Soo-jung Ryu
  • Patent number: 7653904
    Abstract: A method, apparatus, and system are provided for a multi-threaded virtual state mechanism. According to one embodiment, active thread state of a first active thread is received using a virtual state mechanism, and virtual thread state is generated in accordance with the active thread state of the first active thread, and the virtual thread state corresponding to the first active thread is forwarded to state update logic.
    Type: Grant
    Filed: September 26, 2003
    Date of Patent: January 26, 2010
    Assignee: Intel Corporation
    Inventor: Nicholas G. Samra
  • Publication number: 20100005276
    Abstract: An information processing device includes an instruction fetch unit, an instruction buffer, an instruction executing unit, and an instruction fetch control unit. The instruction fetch unit supplies a fetch address to an instruction memory. The instruction buffer stores an instruction read out from the instruction memory. The instruction executing unit decodes and executes the instruction supplied from the instruction buffer. The instruction fetch control unit stops supply of the fetch address to the instruction memory by the instruction fetch unit when the fetch address corresponds to a first address or an address after the first address while the instruction executing unit executes loop processing. The loop processing is repeatedly executed for a predetermined number of times in accordance with decoding of the loop instruction by the instruction executing unit. The first address is an address after an address of an end instruction included in the loop processing.
    Type: Application
    Filed: June 11, 2009
    Publication date: January 7, 2010
    Applicant: NEC ELECTRONICS CORPORATION
    Inventor: Hideyuki MIWA
  • Publication number: 20090327674
    Abstract: Loop control systems and methods are disclosed. In a particular embodiment, a hardware loop control logic circuit includes a detection unit to detect an end of loop indicator of a program loop. The hardware loop control logic circuit also includes a decrement unit to decrement a loop count and to decrement a predicate trigger counter. The hardware loop control logic circuit further includes a comparison unit to compare the predicate trigger counter to a reference to determine when to set a predicate value.
    Type: Application
    Filed: June 27, 2008
    Publication date: December 31, 2009
    Applicant: QUALCOMM INCORPORATED
    Inventors: Lucian Codrescu, Erich James Plondke, Lin Wang, Suresh K. Venkumahanti
  • Publication number: 20090307472
    Abstract: A method and apparatus for executing a nested program loop on a vector processor, the loop comprising outer-pre, inner and outer-post portions. An input stream unit of the vector processor provides a data value to a data path and sets an associated data validity tag to ‘valid’ once per outer loop iteration, as indicated by an inner counter of the input stream unit. The tag is set to ‘invalid’ in other iterations. Functional units of the vector processor operate on data values in the data path, each functional unit producing a valid result if the data validity tags associated with inputs data values are set to ‘valid’. An output stream unit of the vector processor sinks a data value from the data path once per outer loop iteration if an associated data validity tag indicates that the data value is valid.
    Type: Application
    Filed: June 5, 2008
    Publication date: December 10, 2009
    Applicant: MOTOROLA, INC.
    Inventors: Raymond B. Essick IV, Kent D. Moat, Michael A. Schuette
  • Patent number: 7617496
    Abstract: A macroscalar processor architecture is described herein. In one embodiment, a processor receives instructions of a program loop having a vector block and a sequence block intended to be executed after the vector block, where the processor includes multiple slices and each of the slices is capable of executing an instruction of an iteration of the program loop substantially in parallel. For each iteration of the program loop, the processor executes an instruction of the sequence block using one of the slices while executing instructions of the vector block using a remainder of the slices substantially in parallel. Other methods and apparatuses are also described.
    Type: Grant
    Filed: September 1, 2005
    Date of Patent: November 10, 2009
    Assignee: Apple Inc.
    Inventor: Jeffry E. Gonion
  • Publication number: 20090265534
    Abstract: A method, apparatus, and computer program are provided for assessing fairness, performance, and livelock in a logic development process utilizing comparative parallel looping. Multiple loop macros are generated, the multiple loop macros respectively correspond to multiple processor threads, and the multiple loop macros are parallel comparative loop macros. The multiple processor threads for the multiple loop macros are executed in which a common resource is accessed. A forward performance of each of the multiple processor threads is verified. The forward performance of the multiple processor threads is compared with each other. It is determined whether any of the multiple processor threads fails to meet a minimum loop count or a minimum loop time. It is determined whether any of the multiple processor threads exceeds a maximum loop count or a maximum loop time. It is recognized whether fairness is maintained during the execution of the multiple processor threads.
    Type: Application
    Filed: April 17, 2008
    Publication date: October 22, 2009
    Inventors: Duane A. Averill, Anthony D. Drumm, Christopher T. Phan, Brian T. Vanderpool, Sharon D. Vincent
  • Publication number: 20090262877
    Abstract: A novel and useful apparatus for and method of spur reduction using computation spreading with dithering in a digital phase locked loop (DPLL) architecture. A software based PLL incorporates a reconfigurable calculation unit (RCU) that is optimized and programmed to sequentially perform all the atomic operations of a PLL or any other desired task in a time sharing manner. An application specific instruction-set processor (ASIP) incorporating the RCU is adapted to spread the computation of the atomic operations out over a PLL reference clock period wherein each computation is performed at a much higher processor clock frequency than the PLL reference clock rate. This significantly reduces the per cycle current transient generated by the computations. The frequency content of the current transients is at the higher processor clock frequency which results in a significant reduction in spurs within sensitive portions of the output spectrum.
    Type: Application
    Filed: April 17, 2008
    Publication date: October 22, 2009
    Inventors: Fuqiang Shi, Roman Staszewski, Robert B. Staszewski
  • Publication number: 20090259832
    Abstract: One embodiment of the present invention sets forth a technique for translating application programs written using a parallel programming model for execution on multi-core graphics processing unit (GPU) for execution by general purpose central processing unit (CPU). Portions of the application program that rely on specific features of the multi-core GPU are converted by a translator for execution by a general purpose CPU. The application program is partitioned into regions of synchronization independent instructions. The instructions are classified as convergent or divergent and divergent memory references that are shared between regions are replicated. Thread loops are inserted to ensure correct sharing of memory between various threads during execution by the general purpose CPU.
    Type: Application
    Filed: March 19, 2009
    Publication date: October 15, 2009
    Inventors: Vinod GROVER, Bastiaan Joannes Matheus AARTS, Michael MURPHY, Boris BEYLIN, Jayant B. KOLHE, Douglas SAYLOR
  • Publication number: 20090254738
    Abstract: It is an object of the present invention to provide an obfuscation device that can achieve both sufficient obfuscation and the appropriate instruction block to be executed. In the obfuscation device, a first instruction generating unit, for each of the first process and the second process, generates an initialization instruction for securing a management area for managing the identification information indicating an instruction block that should be executed next so as to proceed with the process, and to store the initialization instruction in said storage unit.
    Type: Application
    Filed: March 24, 2009
    Publication date: October 8, 2009
    Inventors: Taichi SATO, Tomoyuki Haga, Kenichi Matsumoto, Akito Monden, Haruaki Tamada
  • Publication number: 20090235052
    Abstract: A data processing device is provided using pipeline architecture to reduce a time loss due to a branch without causing an increase in circuit scale. The data processing device uses pipeline control. The data processing device includes an instruction queue in which a plurality of instruction codes can be fetched, a fetch address operation circuit which calculates a fetch address, a fetch circuit which fetches an instruction code based on the fetch address, and a branch information setting circuit which decodes a branch setting instruction, stores a branch address in a branch address storage register, and stores a branch target address in a branch target address storage register. The fetch address operation circuit compares either a previous fetch address or an expected next fetch address with a value stored in the branch address storage register, and determines a next fetch address to be output, based on the comparison result.
    Type: Application
    Filed: April 2, 2009
    Publication date: September 17, 2009
    Applicant: SEIKO EPSON CORPORATION
    Inventor: Makoto Kudo
  • Patent number: 7590831
    Abstract: Provided are a loop accelerator and a data processing system having the loop accelerator. The data processing system includes a loop accelerator which executes a loop part of a program, a processor core which processes a remaining part of the program except the loop part, and a central register file which transmits data between the processor core and the loop accelerator. The loop accelerator includes a plurality of processing elements (PEs) each of which performs an operation on each word to execute the program, a configuration memory which stores configuration bits indicating operations, states, etc. of the PEs, and a plurality of context memories, installed in a column or row direction of the PEs, which transmit the configuration bits along a direction toward which the PEs are arrayed. Thus, a connection structure between the configuration memory and the PEs can be simplified to easily modify a structure of the loop accelerator so as to extend the loop accelerator.
    Type: Grant
    Filed: September 5, 2006
    Date of Patent: September 15, 2009
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Soo-jung Ryu, Jeong-wook Kim, Suk-jin Kim, Hong-Seok Kim, Jun-jin Kong
  • Publication number: 20090228677
    Abstract: A method and system for processing generic formatted data, including first data describing a sequence of generic operations without any loops, in view of providing specific formatted data, for a determined platform including Q processor(s) and at least one memory, the platform configured to process, according, directly or indirectly, to specific formatted data, an object made up of elementary information of same type, each elementary information being represented by at least one numerical value.
    Type: Application
    Filed: December 19, 2006
    Publication date: September 10, 2009
    Applicant: DXO LABS
    Inventor: Bruno Liege
  • Publication number: 20090217017
    Abstract: A method, system, and computer program product for minimizing branch prediction latency in a pipelined computer processing environment are provided. The method includes detecting a branch loop utilizing branch instruction addresses and corresponding target addresses stored in a branch target buffer (BTB). The method also includes fetching the branch loop into a pre-decode instruction buffer and qualifying the branch loop for loop lockdown. The method further includes locking an instruction stream that forms the branch loop in the pre-decode instruction buffer and processing qualified branch loop instructions from the buffer and powering down instruction fetching and branch prediction logic (BPL) associated with the BTB.
    Type: Application
    Filed: February 26, 2008
    Publication date: August 27, 2009
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Khary J. Alexander, David S. Hutton, Brian R. Prasky, Anthony Saporito, Robert J. Sonnelitter, III, John W. Ward, III
  • Patent number: 7571305
    Abstract: A data processing system 2 includes an instruction cache 6 having an associated buffer memory 18, 8. The buffer memory 18, 8 can operate in a buffer mode or in a microcache mode. The buffer memory is switched into the microcache mode upon program loop detection performed by loop detector circuitry 20. When operating in the microcache mode, instruction data is read from the buffer memory 18, 8 without requiring an access to the instruction cache 6.
    Type: Grant
    Filed: January 11, 2007
    Date of Patent: August 4, 2009
    Assignee: ARM Limited
    Inventors: Fredrick Claude Marie Piry, Louis-Marie Vincent Mouton, Stephane Eric Sabastien Brochier, Gilles Eric Grandou
  • Patent number: 7558948
    Abstract: A method for reducing overhead on a loop of a plurality of instructions is disclosed. The method includes providing a carry mask, the carry mask having a first value for the loop being performed at least the particular number of times minus one and a second value for at least a last instruction of the loop being performed a last time, providing addition logic, wherein the carry mask and a current instruction address of the plurality of instructions correspond to inputs of the addition logic and determining which of the plurality of instructions is to be executed using the carry mask to provide a resultant of the addition logic based on the carry mask and the current instruction address of the plurality of instructions.
    Type: Grant
    Filed: September 20, 2004
    Date of Patent: July 7, 2009
    Assignee: International Business Machines Corporation
    Inventors: Anthony J. Bybell, Richard W. Doing, David D. Dukro
  • Publication number: 20090158018
    Abstract: A method and system of auto parallelization of zero-trip loops that substitutes a nested basic linear induction variable by exploiting a parallelizing compiler is provided. Provided is a use of a max{0,N} variable for loop iterations in case of no information is known about the value of N, for a typical loop iterating from 1 to N, in which N is the loop invariant. For the nested basic induction variables, an induction variable substitution process is applied to the nested loops starting from the innermost loop to the outermost one. Then a removal of the max operator afterwards through a copy propagation pass of the IBM compiler is provided. In doing so, the loop dependency on the induction variable is eliminated and an opportunity for a parallelizing compiler to parallel the outermost loop is provided.
    Type: Application
    Filed: January 21, 2009
    Publication date: June 18, 2009
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Zhixing Ren, Raul Esteban Silvera, Guansong Zhang
  • Publication number: 20090150658
    Abstract: This invention combines a loop support mechanism and a branch prediction mechanism. After an instruction execution unit executes an end block instruction of a block repeat, the loop control unit branches to the first instruction in the loop and sends a pseudo branch instruction to the instruction execution unit. The instruction execution unit acts as if the last instruction in the block is an instruction for branching to the start address of the block. This is stored in the branch prediction unit and branch prediction is performed thereafter.
    Type: Application
    Filed: December 5, 2008
    Publication date: June 11, 2009
    Inventor: Hiroyuki Mizumo
  • Publication number: 20090144529
    Abstract: Generating loop code to execute on Single-Instruction Multiple-Datapath (SIMD) architectures, where the loop operates on datatypes having different lengths, is disclosed. Further, a preferred embodiment of the present invention includes a novel technique to efficiently realign or shift arbitrary streams to an arbitrary offset, regardless whether the alignments or offsets are known at the compile time or not. This technique enables the application of advanced alignment optimizations to runtime alignment. Length conversion operations, for packing and unpacking data values, are included in the alignment handling framework. These operations are formally defined in terms of standard SIMD instructions that are readily available on various SIMD platforms. This allows sequential loop code operating on datatypes of disparate length to be transformed (“simdized”) into optimized SIMD code through a fully automated process.
    Type: Application
    Filed: December 4, 2008
    Publication date: June 4, 2009
    Applicant: International Business Machines Corporation
    Inventors: Alexandre E. Eichenberger, Kai-Ting Amy Wang, Peng Wu
  • Publication number: 20090144502
    Abstract: In an implementation, a processing system includes an instruction fetch (IF) memory storing IF instructions; an arithmetic/logic (AL) instruction memory (IMemory) storing AL instructions; and a programmable instruction fetch mechanism to generate IMemory instruction addresses, from IF instructions fetched from the IF memory, to select AL instructions to be fetched from the IMemory for execution, wherein at least one IF instruction includes a loop count field indicating a number of iterations of a loop to be performed, a loop start address of the loop, and a loop end address of the loop.
    Type: Application
    Filed: February 6, 2009
    Publication date: June 4, 2009
    Applicant: Renesky Tap III, Limited Liability Compnay
    Inventor: Gerald George Pechanek
  • Publication number: 20090125912
    Abstract: An innovative approach for constructing optimum, high-performance, efficient DSP systems may include a system organization to match compute execution and data availability rate and to organize DSP operations as loop iterations such that there is maximal reuse of data between multiple consecutive iterations. Independent set up and preparation of data before it is required through suitable mechanisms such as data pre-fetching may be used. This technique may be useful and important for devices that require cost-effective, high-performance, power consumption efficient VLSI IC.
    Type: Application
    Filed: January 15, 2009
    Publication date: May 14, 2009
    Inventor: SIAMACK HAGHIGHI