Loop Execution Patents (Class 712/241)
-
Patent number: 7822949Abstract: A command supply device supplies a command sequence that forms a loop. A loop command buffer accumulates a first partial command sequence. The first partial command sequence is a head part of a first command sequence repeatedly supplied to a CPU from among command sequences stored in a main memory, and is accumulated before the first command sequence is supplied to the CPU again. A linking command buffer accumulates a second partial command sequence. The second partial command sequence follows the first partial command sequence in the first command sequence, and is accumulated while the accumulated first partial command sequence in the loop command buffer is supplied to the CPU. A selection circuit supplies, to the CPU, a command from the accumulated second partial command sequence in the linking command buffer when the entirety of the first partial command sequence has been supplied to the CPU.Type: GrantFiled: May 9, 2005Date of Patent: October 26, 2010Assignee: Panasonic CorporationInventor: Satoshi Ogura
-
Publication number: 20100250854Abstract: An efficient and effective compiler data prefetching technique is disclosed in which memory accesses may be prefetched are represented in linear induction expressions. Furthermore, indirect memory accesses indexed by other memory accesses of linear induction expressions in scalar loops may be prefetched.Type: ApplicationFiled: March 16, 2010Publication date: September 30, 2010Inventor: Dz-ching Ju
-
Publication number: 20100235612Abstract: A macroscalar processor architecture is described herein. In one embodiment, an exemplary processor includes one or more execution units to execute instructions and one or more iteration units coupled to the execution units. The one or more iteration units receive one or more primary instructions of a program loop that comprise a machine executable program. For each of the primary instructions received, at least one of the iteration units generates multiple secondary instructions that correspond to multiple loop iterations of the task of the respective primary instruction when executed by the one or more execution units. Other methods and apparatuses are also described.Type: ApplicationFiled: May 26, 2010Publication date: September 16, 2010Inventor: Jeffry E. Gonion
-
Patent number: 7797692Abstract: A system that estimates a dominant computational resource which is used by a computer program. During operation, for each basic block in the computer program, the system determines a nesting level for the basic block. Next, the system selects basic blocks with nesting levels greater than a specified threshold. For each selected basic block, the system analyzes the basic block to estimate the dominant computational resource used by the basic block. The system then uses the estimated dominant computational resources for the selected basic blocks to estimate the dominant computational resource for the computer program.Type: GrantFiled: May 12, 2006Date of Patent: September 14, 2010Assignee: Google Inc.Inventor: Grzegorz J. Czajkowski
-
Publication number: 20100211762Abstract: A system to implement a zero overhead software pipelined (SFP) loop includes a Very Long Instruction Word (VLIW) processor having an N number of execution slots. The VLIW processor executes a plurality of instructions in parallel without any limitation of an instruction buffer size. A program memory receives a Program Memory address to fetch an instruction packet. The program memory is closely coupled with the instruction buffer size to implement the zero overhead software pipelined (SFP) loop. The size of the zero overhead software pipelined (SFP) loop can exceed the instruction buffer size. A CPU control register includes a block count and an iteration count. The block count is loaded into a block counter and counts the plurality of instructions executed in the SFP loop, and the iteration count is loaded into an iteration counter and counts a number of iterations of the SFP loop based on the block count.Type: ApplicationFiled: February 18, 2010Publication date: August 19, 2010Applicant: SAANKHYA LABS PVT LTDInventors: Anindya Saha, Manish Kumar, Hemant Mallapur, Santhosh Billava, Viji Rajangam
-
Patent number: 7774766Abstract: Various embodiments of the present invention relate to methods and systems for optimizing an intermediate code in a compilation logic. The intermediate code is optimized by performing reassociation in software loops. The intermediate code includes at least one critical recurrence cycle. The performance of reassociation in software loops can reduce a critical recurrence cycle in them, which can speed up their execution. The subject method can include the determination of one or more critical recurrence cycles in a software loop. The method can also include the determination of at least one edge in a critical recurrence cycle, with respect to which reassociation can be performed, if one or more pre-determined criteria are met. The method can further include performing reassociation of a dependee and a dependent of an edge. In an embodiment, when one or more pre-determined criteria are met, the logic of the software loop is maintained after performing reassociation of the dependee and the dependent of the edge.Type: GrantFiled: September 29, 2005Date of Patent: August 10, 2010Assignee: Intel CorporationInventors: Kalyan Muthukumar, Daniel M Lavery
-
Publication number: 20100199076Abstract: A computing apparatus and method of handling an interrupt are provided. The computing apparatus includes a coarse-grained array, a host processor, and an interrupt supervisor. When an interrupt occurs in the coarse-grained array while performing a loop operation, the host processor processes the interrupt, and the interrupt supervisor may perform mode switching between the coarse-grained array and the host processor.Type: ApplicationFiled: December 16, 2009Publication date: August 5, 2010Inventors: Dong-hoon YOO, Soo-jung Ryu, Yeon-gon Cho, Bernhard Egger, Il-hyun Park
-
Publication number: 20100185839Abstract: An apparatus and method for scheduling an instruction are provided. The apparatus includes an analyzer configured to analyze dependency of a plurality of recurrence loops and a scheduler configured to schedule the recurrence loops based the analyzed dependencies. When scheduling a plurality of recurrence loops, the apparatus first schedules a dominant loop whose loop head has no dependency on another loop among the recurrence loops.Type: ApplicationFiled: November 2, 2009Publication date: July 22, 2010Inventors: Tae-wook OH, Won-sub Kim, Bernhard Egger
-
Publication number: 20100180102Abstract: A processor includes one or more processing units, an execution pipeline and control circuitry. The execution pipeline includes at least first and second pipeline stages that are cascaded so that program instructions, specifying operations to be performed by the processing units in successive cycles of the pipeline, are fetched from a memory by the first pipeline stage and conveyed to the second pipeline stage, which causes the processing units to perform the specified operations. The control circuitry is coupled, upon determining that a program instruction that is present in the second pipeline stage in a first cycle of the pipeline is to be executed again in a subsequent cycle of the pipeline, to cause the execution pipeline to reuse the program instruction in one of the pipeline stages without re-fetching the program instruction from the memory.Type: ApplicationFiled: January 15, 2009Publication date: July 15, 2010Applicant: ALTAIR SEMICONDUCTORSInventors: Edan Almog, Nohik Semel, Yigal Bitran, Nadav Cohen, Yoel Livne, Eli Zyss
-
Patent number: 7757067Abstract: A processor (e.g., a co-processor) comprising a decoder coupled to a pre-decoder, in which the decoder decodes a current instruction in parallel with the pre-decoder pre-decoding a subsequent instruction. In particular, the pre-decoder examines at least five Bytecodes in parallel with the decoder decoding a current instruction. The pre-decoder determines if a subsequent instruction contains a prefix. If a prefix is detected in at least one of the five Bytecodes, a program counter skips the prefix and changes the behavior of the decoder during the decoding of the subsequent instruction.Type: GrantFiled: July 31, 2003Date of Patent: July 13, 2010Assignee: Texas Instruments IncorporatedInventors: Gerard Chauvel, Serge Lasserre, Maija Kuusela
-
Publication number: 20100175056Abstract: A compiler comprises an analysis unit that detects directives (options and pragmas) from a user to the compiler, an optimization unit that is made up of a processing unit (a global region allocation unit, a software pipelining unit, a loop unrolling unit, a “if” conversion unit, and a pair instruction generation unit) that performs individual optimization processing designated by options and pragmas from a user, following the directives and the like from the analysis unit, etc. The global region allocation unit performs optimization processing, following designation of the maximum data size of variables to be allocated to a global region, designation of variables to be allocated to the global region, and options and pragmas regarding designation of variables not to be allocated in the global region.Type: ApplicationFiled: February 16, 2010Publication date: July 8, 2010Inventors: Hajime OGAWA, Taketo Heishi, Toshiyuki Sakata, Shuichi Takayama, Shohei Michimoto, Tomoo Hamada, Ryoko Miyachi
-
Publication number: 20100169612Abstract: A data-processing unit has a fetching circuitry (20) and execution circuitry (30a, 30b). The data-processing unit has an instruction set comprising a nested-loop instruction. The fetching circuitry is arranged to fetch the nested-loop instruction, and the execution circuitry is arranged to execute the nested-loop instruction. The nested-loop instruction comprises at least one instruction field that is adapted to indicate a number of iterations of an outer loop of the nested loop and one or more operations to be performed by the outer loop. Moreover, the at least one instruction field is further adapted to indicate a number of iterations of an inner loop of the nested loop and one or more operations to be performed by the inner loop. A method for fetching, decoding, and executing the nested-loop instruction is also described as well as the structure of the nested-loop instruction.Type: ApplicationFiled: June 25, 2008Publication date: July 1, 2010Applicant: TELEFONAKTIEBOLAGET L M ERICSSON (PUBL)Inventors: Per Persson, Harald Gustafsson
-
Publication number: 20100153688Abstract: An exemplary aspect of the present invention is a data processing apparatus for processing a loop in a pipeline that includes an instruction memory and a fetch circuit that fetches an instruction stored in the instruction memory. The fetch circuit includes an instruction queue that stores an instruction to be output from the fetch circuit, an evacuation queue that stores an instruction fetched from the instruction memory, a selector that selects one of the instruction output from the instruction queue and the instruction output from the evacuation queue, and a loop queue that stores the instruction selected by the selector and outputs to the instruction queue.Type: ApplicationFiled: December 11, 2009Publication date: June 17, 2010Applicant: NEC Electronics CorporationInventor: Satoshi CHIBA
-
Patent number: 7739442Abstract: A macroscalar processor architecture is described herein. In one embodiment, an exemplary processor includes one or more execution units to execute instructions and one or more iteration units coupled to the execution units. The one or more iteration units receive one or more primary instructions of a program loop that comprise a machine executable program. For each of the primary instructions received, at least one of the iteration units generates multiple secondary instructions that correspond to multiple loop iterations of the task of the respective primary instruction when executed by the one or more execution units. Other methods and apparatuses are also described.Type: GrantFiled: May 23, 2008Date of Patent: June 15, 2010Assignee: Apple Inc.Inventor: Jeffry E. Gonion
-
Patent number: 7725696Abstract: A processor method and apparatus that allows for the overlapped execution of multiple iterations of a loop while allowing the compiler to include only a single copy of the loop body in the code while automatically managing which iterations are active. Since the prologue and epilogue are implicitly created and maintained within the hardware in the invention, a significant reduction in code size can be achieved compared to software-only modulo scheduling. Furthermore, loops with iteration counts less than the number of concurrent iterations present in the kernel are also automatically handled. This hardware enhanced scheme achieves the same performance as the fully-specified standard method. Furthermore, the hardware reduces the power requirement as the entire fetch unit can be deactivated for a portion of the loop's execution.Type: GrantFiled: October 4, 2007Date of Patent: May 25, 2010Inventors: Wen-mei W. Hwu, Matthew C. Merten
-
Patent number: 7719825Abstract: A faceplate for a housing of a computing device including a processor includes a bezel with interior and exterior surfaces. The bezel removeably covers at least a portion of an opening defined by the housing. A wireless local area network (LAN) unit is mechanically coupled to the interior surface of the bezel and comprises a data interface that enables data transfer between the wireless LAN unit and the processor.Type: GrantFiled: August 28, 2008Date of Patent: May 18, 2010Assignee: Marvell International Ltd.Inventors: Joseph Knapp, George Chien
-
Publication number: 20100122066Abstract: Instruction set techniques have been developed to identify explicitly the beginning of a loop body and to code a conditional loop-end in ways that allow a processor implementation to efficiently manage an instruction fetch buffer and/or entries in an instruction cache. In particular, for some computations and processor implementations, a machine instruction is defined that identifies a loop start, stores a corresponding loop start address on a return stack (or in other suitable storage) and directs fetch logic to take advantage of the identification by retaining in a fetch buffer or instruction cache the instruction(s) beginning at the loop start address, thereby avoiding usual branch delays on subsequent iterations of the loop. A conditional loop-end instruction can be used in conjunction with the loop start instruction to discard (or simply mark as no longer needed) the loop start address and the loop body instructions retained in the fetch buffer or instruction cache.Type: ApplicationFiled: November 12, 2008Publication date: May 13, 2010Applicant: FREESCALE SEMICONDUCTOR, INC.Inventor: Michael A. Fischer
-
Publication number: 20100122069Abstract: A macroscalar processor architecture is described herein. In one embodiment, a processor receives instructions of a program loop having a vector block and a sequence block intended to be executed after the vector block, where the processor includes multiple slices and each of the slices is capable of executing an instruction of an iteration of the program loop substantially in parallel. For each iteration of the program loop, the processor executes an instruction of the sequence block using one of the slices while executing instructions of the vector block using a remainder of the slices substantially in parallel. Other methods and apparatuses are also described.Type: ApplicationFiled: November 6, 2009Publication date: May 13, 2010Inventor: Jeffry E. Gonion
-
Patent number: 7712091Abstract: A method and system for optimizing the execution of a software loop is provided. The method involves the determination of an edge in a critical recurrence cycle in the software loop. The edge is a dependency link between two instructions and contains a dependee and a dependent. The dependee is an instruction that produces a result, and the dependent is an instruction that uses the result. The method further involves performing predicate promotion of at least one of the dependee and the dependent if one or more pre-determined conditions are met.Type: GrantFiled: September 30, 2005Date of Patent: May 4, 2010Assignee: Intel CorporationInventors: Kalyan Muthukumar, Robyn A. Sampson, Daniel Lavery
-
Patent number: 7698449Abstract: Method and apparatus for configuring a processor embedded in an integrated circuit for use as a logic element is described. In one example, a processing apparatus in an integrated circuit includes a point-to-point data streaming interface and arithmetic logic unit (ALU) circuitry. The ALU circuitry includes at least one input port in communication with the point-to-point data streaming interface. The processor may also include a register file and multiplexer logic. The multiplexer logic is configured to selectively couple the register file and the point-to-point streaming interface to the at least one input port of the ALU circuitry.Type: GrantFiled: February 23, 2005Date of Patent: April 13, 2010Assignee: XILINX, Inc.Inventors: Eric R. Keller, Philip B. James-Roxby
-
Patent number: 7698696Abstract: A compiler comprises an analysis unit that detects directives (options and pragmas) from a user to the compiler, an optimization unit that is made up of a processing unit (a global region allocation unit, a software pipelining unit, a loop unrolling unit, a “if” conversion unit, and a pair instruction generation unit) that performs individual optimization processing designated by options and pragmas from a user, following the directives and the like from the analysis unit, etc. The global region allocation unit performs optimization processing, following designation of the maximum data size of variables to be allocated to a global region, designation of variables to be allocated to the global region, and options and pragmas regarding designation of variables not to be allocated in the global region.Type: GrantFiled: June 30, 2003Date of Patent: April 13, 2010Assignee: Panasonic CorporationInventors: Hajime Ogawa, Taketo Heishi, Toshiyuki Sakata, Shuichi Takayama, Shohei Michimoto, Tomoo Hamada, Ryoko Miyachi
-
Patent number: 7685411Abstract: An instruction memory unit comprises a first memory structure operable to store program instructions, and a second memory structure operable to store program instructions fetched from the first memory structure, and to issue stored program instructions for execution. The second memory structure is operable to identify a repeated issuance of a forward program redirect construct, and issue a next program instruction already stored in the second memory structure if a resolution of the forward branching instruction is identical to a last resolution of the same. The second memory structure is further operable to issue a backward program redirect construct, determine whether a target instruction is stored in the second memory structure, issue the target instruction if the target instruction is stored in the second memory structure, and fetch the target instruction from the first memory structure if the target instruction is not stored in the second memory structure.Type: GrantFiled: April 11, 2005Date of Patent: March 23, 2010Assignee: QUALCOMM IncorporatedInventors: Muhammad Ahmed, Lucian Codrescu, Erich Plondke, William C. Anderson, Robert Allan Lester, Phillip M. Jones
-
Publication number: 20100064106Abstract: The present invention provides a data processor capable of automatically discriminating a loop program and performing a reduction in power by size-variable lock control on an instruction buffer. The instruction buffer of the data processor includes a buffer controller for controlling a memory unit that stores each fetched instruction therein. When an execution history of a fetched condition branch instruction suggests condition establishment, and in the case that the branch direction of the fetched condition branch instruction is a direction opposite to the order of an instruction execution and the difference of instruction addresses from the branch source to the branch target based on the condition branch instruction is a range held in the storage capacity of the instruction buffer, the buffer controller retains an instruction sequence from a branch source to a branch target based on the condition branch instruction in the instruction buffer.Type: ApplicationFiled: August 24, 2009Publication date: March 11, 2010Inventors: Tetsuya YAMADA, Naoki KATO
-
Patent number: 7676650Abstract: When an instruction stored in a specific instruction buffer is the same as another instruction stored in another instruction buffer and logically subsequent to the instruction in the specific instruction buffer, a connection is made from the instruction buffer storing a logically and immediately preceding instruction, not the instruction in the other instruction buffer, to the specific instruction buffer without the instruction in the other instruction buffer, and a loop is generated by instruction buffers, thereby performing a short loop in an instruction buffer system capable of arbitrarily connecting a plurality of instruction buffers.Type: GrantFiled: January 21, 2003Date of Patent: March 9, 2010Assignee: Fujitsu LimitedInventor: Masaki Ukai
-
Patent number: 7676663Abstract: A method and apparatus enable supplementing a Branch Target Buffer (BTB) table with a recent entry queue that prevents unnecessary removal of valuable BTB table data of multiple entries for another entry. The recent entry queue detects when the startup latency of the BTB table prevents it from asynchronously aiding the microprocessor pipeline as designed for and thereby can delay the pipeline in the required situations such that the BTB table latency on startup can be overcome. The recent entry queue provides a quick access to BTB table entries that are accessed in a tight loop pattern where the throughput of the standalone BTB table cannot track the throughput of the microprocessor execution pipeline. By using the recent entry queue, the modified BTB table processes information at the rate of the execution pipeline which provides acceleration thereof.Type: GrantFiled: March 9, 2004Date of Patent: March 9, 2010Assignee: International Business Machines CorporationInventors: Brian Robert Prasky, Thomas Roberts Puzak, Allan Mark Hartstein
-
INSTRUCTION FETCH PIPELINE FOR SUPERSCALAR DIGITAL SIGNAL PROCESSORS AND METHOD OF OPERATION THEREOF
Publication number: 20100058039Abstract: A next program counter (PC) value generator. The next PC value generator includes a discontinuity decoder that is provide to detect a discontinuity instruction among a plurality of instructions and a tight loop decoder that is provide to: a) detect a tight loop instruction, and b) provide a tight loop instruction target address. The next PC value generator further includes a next PC value logic having a plurality of inputs: a first input coupled to an output of the discontinuity decoder, and a second input coupled to an output of the tight loop decoder. The next PC value logic provides as an output, without a stall, a control signal that a next PC value is to be loaded with the tight loop instruction target address if: the discontinuity decoder detects a discontinuity instruction, and the tight loop decoder detects a tight loop instruction.Type: ApplicationFiled: September 4, 2008Publication date: March 4, 2010Applicant: VeriSilicon Holdings Company, LimitedInventors: Vijayanand Angarai, Michelle Y. Che, Asheesh Kashyap, Tracy Nguyen -
Publication number: 20100049958Abstract: A method for managing a hardware instruction loop, the method includes: (i) detecting, by a branch prediction unit, an instruction loop; wherein a size of the instruction loop exceeds a size of a storage space allocated in a fetch unit for storing fetched instructions; (ii) requesting from the fetch unit to fetch instructions of the instruction loop that follow the first instructions of the instruction loop; and (iii) selecting, during iterations of the instruction loop, whether to provide to a dispatch unit one of the first instructions of the instruction loop or another instruction that is fetched by the fetch unit; wherein the first instructions of the instruction loop are stored at the dispatch unit.Type: ApplicationFiled: August 19, 2008Publication date: February 25, 2010Inventors: Lev Vaskevich, Itzhak Barak, Amir Paran, Yuval Peled, Idan Rozenberg, Doron Schupper
-
Patent number: 7669042Abstract: An instruction execution pipeline for use in a data processor. The instruction execution pipeline comprises: 1) an instruction fetch stage; 2) a decode stage; 3) an execution stage; and 4) a write-back stage. The instruction pipeline repetitively executes a loop of instructions by fetching and decoding a first instruction associated with the loop during a first iteration of the loop, storing first decoded instruction information associated with the first instruction during the first iteration of the loop, and using the stored first decoded instruction information during at least a second iteration of the loop without further fetching and decoding of the first instruction during the at least a second iteration of the loop.Type: GrantFiled: June 10, 2005Date of Patent: February 23, 2010Assignee: Samsung Electronics Co., Ltd.Inventors: Eran Pisek, Jasmin Oz, Yan Wang
-
Patent number: 7665079Abstract: It is one object of the present invention to provide a program execution method for performing greater optimization. A program execution apparatus according to the present invention performs a transfer from an interpreter process to a compiled code process in the course of the execution of a method. At this time, if no problem occurs when a transfer point is moved to the top of a loop, the transfer point for code is so moved. And when a transfer point is located inside a loop, a point that post-dominates the top of the loop and the transfer point is copied to a position immediately preceding the loop. Then, information for generating recalculation code is provided for the transfer point, and a recalculation is performed.Type: GrantFiled: November 8, 2000Date of Patent: February 16, 2010Assignee: International Business Machines CorporationInventors: Toshiaki Yasue, Kazunori Ogata, Kazuaki Ishizaki, Hideaki Komatsu
-
Patent number: 7660970Abstract: Disclosed is a data processing system and method. The data processing method determines the number of static registers and the number of rotating registers for assigning a register to a variable contained in a certain program, assigns the register to the variable based on the number of the static registers and the number of the rotating registers, and compiles the program. Further, the method stores in the special register a value corresponding to the number of the rotating registers in the compiling operation, and obtains a physical address from a logical address of the register based on the value. Accordingly, the present invention provides an aspect of efficiently using register files by dynamically controlling the number of rotating registers and the number of static registers for a software pipelined loop, and has an effect capable of reducing the generations of spill/fill codes unnecessary during program execution to a minimum.Type: GrantFiled: August 21, 2006Date of Patent: February 9, 2010Assignee: Samsung Electronics Co., Ltd.Inventors: Suk-jin Kim, Jeong-wook Kim, Hong-seok Kim, Soo-jung Ryu
-
Patent number: 7653904Abstract: A method, apparatus, and system are provided for a multi-threaded virtual state mechanism. According to one embodiment, active thread state of a first active thread is received using a virtual state mechanism, and virtual thread state is generated in accordance with the active thread state of the first active thread, and the virtual thread state corresponding to the first active thread is forwarded to state update logic.Type: GrantFiled: September 26, 2003Date of Patent: January 26, 2010Assignee: Intel CorporationInventor: Nicholas G. Samra
-
Publication number: 20100005276Abstract: An information processing device includes an instruction fetch unit, an instruction buffer, an instruction executing unit, and an instruction fetch control unit. The instruction fetch unit supplies a fetch address to an instruction memory. The instruction buffer stores an instruction read out from the instruction memory. The instruction executing unit decodes and executes the instruction supplied from the instruction buffer. The instruction fetch control unit stops supply of the fetch address to the instruction memory by the instruction fetch unit when the fetch address corresponds to a first address or an address after the first address while the instruction executing unit executes loop processing. The loop processing is repeatedly executed for a predetermined number of times in accordance with decoding of the loop instruction by the instruction executing unit. The first address is an address after an address of an end instruction included in the loop processing.Type: ApplicationFiled: June 11, 2009Publication date: January 7, 2010Applicant: NEC ELECTRONICS CORPORATIONInventor: Hideyuki MIWA
-
Publication number: 20090327674Abstract: Loop control systems and methods are disclosed. In a particular embodiment, a hardware loop control logic circuit includes a detection unit to detect an end of loop indicator of a program loop. The hardware loop control logic circuit also includes a decrement unit to decrement a loop count and to decrement a predicate trigger counter. The hardware loop control logic circuit further includes a comparison unit to compare the predicate trigger counter to a reference to determine when to set a predicate value.Type: ApplicationFiled: June 27, 2008Publication date: December 31, 2009Applicant: QUALCOMM INCORPORATEDInventors: Lucian Codrescu, Erich James Plondke, Lin Wang, Suresh K. Venkumahanti
-
Publication number: 20090307472Abstract: A method and apparatus for executing a nested program loop on a vector processor, the loop comprising outer-pre, inner and outer-post portions. An input stream unit of the vector processor provides a data value to a data path and sets an associated data validity tag to ‘valid’ once per outer loop iteration, as indicated by an inner counter of the input stream unit. The tag is set to ‘invalid’ in other iterations. Functional units of the vector processor operate on data values in the data path, each functional unit producing a valid result if the data validity tags associated with inputs data values are set to ‘valid’. An output stream unit of the vector processor sinks a data value from the data path once per outer loop iteration if an associated data validity tag indicates that the data value is valid.Type: ApplicationFiled: June 5, 2008Publication date: December 10, 2009Applicant: MOTOROLA, INC.Inventors: Raymond B. Essick IV, Kent D. Moat, Michael A. Schuette
-
Patent number: 7617496Abstract: A macroscalar processor architecture is described herein. In one embodiment, a processor receives instructions of a program loop having a vector block and a sequence block intended to be executed after the vector block, where the processor includes multiple slices and each of the slices is capable of executing an instruction of an iteration of the program loop substantially in parallel. For each iteration of the program loop, the processor executes an instruction of the sequence block using one of the slices while executing instructions of the vector block using a remainder of the slices substantially in parallel. Other methods and apparatuses are also described.Type: GrantFiled: September 1, 2005Date of Patent: November 10, 2009Assignee: Apple Inc.Inventor: Jeffry E. Gonion
-
Publication number: 20090265534Abstract: A method, apparatus, and computer program are provided for assessing fairness, performance, and livelock in a logic development process utilizing comparative parallel looping. Multiple loop macros are generated, the multiple loop macros respectively correspond to multiple processor threads, and the multiple loop macros are parallel comparative loop macros. The multiple processor threads for the multiple loop macros are executed in which a common resource is accessed. A forward performance of each of the multiple processor threads is verified. The forward performance of the multiple processor threads is compared with each other. It is determined whether any of the multiple processor threads fails to meet a minimum loop count or a minimum loop time. It is determined whether any of the multiple processor threads exceeds a maximum loop count or a maximum loop time. It is recognized whether fairness is maintained during the execution of the multiple processor threads.Type: ApplicationFiled: April 17, 2008Publication date: October 22, 2009Inventors: Duane A. Averill, Anthony D. Drumm, Christopher T. Phan, Brian T. Vanderpool, Sharon D. Vincent
-
Publication number: 20090262877Abstract: A novel and useful apparatus for and method of spur reduction using computation spreading with dithering in a digital phase locked loop (DPLL) architecture. A software based PLL incorporates a reconfigurable calculation unit (RCU) that is optimized and programmed to sequentially perform all the atomic operations of a PLL or any other desired task in a time sharing manner. An application specific instruction-set processor (ASIP) incorporating the RCU is adapted to spread the computation of the atomic operations out over a PLL reference clock period wherein each computation is performed at a much higher processor clock frequency than the PLL reference clock rate. This significantly reduces the per cycle current transient generated by the computations. The frequency content of the current transients is at the higher processor clock frequency which results in a significant reduction in spurs within sensitive portions of the output spectrum.Type: ApplicationFiled: April 17, 2008Publication date: October 22, 2009Inventors: Fuqiang Shi, Roman Staszewski, Robert B. Staszewski
-
Publication number: 20090259832Abstract: One embodiment of the present invention sets forth a technique for translating application programs written using a parallel programming model for execution on multi-core graphics processing unit (GPU) for execution by general purpose central processing unit (CPU). Portions of the application program that rely on specific features of the multi-core GPU are converted by a translator for execution by a general purpose CPU. The application program is partitioned into regions of synchronization independent instructions. The instructions are classified as convergent or divergent and divergent memory references that are shared between regions are replicated. Thread loops are inserted to ensure correct sharing of memory between various threads during execution by the general purpose CPU.Type: ApplicationFiled: March 19, 2009Publication date: October 15, 2009Inventors: Vinod GROVER, Bastiaan Joannes Matheus AARTS, Michael MURPHY, Boris BEYLIN, Jayant B. KOLHE, Douglas SAYLOR
-
Publication number: 20090254738Abstract: It is an object of the present invention to provide an obfuscation device that can achieve both sufficient obfuscation and the appropriate instruction block to be executed. In the obfuscation device, a first instruction generating unit, for each of the first process and the second process, generates an initialization instruction for securing a management area for managing the identification information indicating an instruction block that should be executed next so as to proceed with the process, and to store the initialization instruction in said storage unit.Type: ApplicationFiled: March 24, 2009Publication date: October 8, 2009Inventors: Taichi SATO, Tomoyuki Haga, Kenichi Matsumoto, Akito Monden, Haruaki Tamada
-
Publication number: 20090235052Abstract: A data processing device is provided using pipeline architecture to reduce a time loss due to a branch without causing an increase in circuit scale. The data processing device uses pipeline control. The data processing device includes an instruction queue in which a plurality of instruction codes can be fetched, a fetch address operation circuit which calculates a fetch address, a fetch circuit which fetches an instruction code based on the fetch address, and a branch information setting circuit which decodes a branch setting instruction, stores a branch address in a branch address storage register, and stores a branch target address in a branch target address storage register. The fetch address operation circuit compares either a previous fetch address or an expected next fetch address with a value stored in the branch address storage register, and determines a next fetch address to be output, based on the comparison result.Type: ApplicationFiled: April 2, 2009Publication date: September 17, 2009Applicant: SEIKO EPSON CORPORATIONInventor: Makoto Kudo
-
Patent number: 7590831Abstract: Provided are a loop accelerator and a data processing system having the loop accelerator. The data processing system includes a loop accelerator which executes a loop part of a program, a processor core which processes a remaining part of the program except the loop part, and a central register file which transmits data between the processor core and the loop accelerator. The loop accelerator includes a plurality of processing elements (PEs) each of which performs an operation on each word to execute the program, a configuration memory which stores configuration bits indicating operations, states, etc. of the PEs, and a plurality of context memories, installed in a column or row direction of the PEs, which transmit the configuration bits along a direction toward which the PEs are arrayed. Thus, a connection structure between the configuration memory and the PEs can be simplified to easily modify a structure of the loop accelerator so as to extend the loop accelerator.Type: GrantFiled: September 5, 2006Date of Patent: September 15, 2009Assignee: Samsung Electronics Co., Ltd.Inventors: Soo-jung Ryu, Jeong-wook Kim, Suk-jin Kim, Hong-Seok Kim, Jun-jin Kong
-
Publication number: 20090228677Abstract: A method and system for processing generic formatted data, including first data describing a sequence of generic operations without any loops, in view of providing specific formatted data, for a determined platform including Q processor(s) and at least one memory, the platform configured to process, according, directly or indirectly, to specific formatted data, an object made up of elementary information of same type, each elementary information being represented by at least one numerical value.Type: ApplicationFiled: December 19, 2006Publication date: September 10, 2009Applicant: DXO LABSInventor: Bruno Liege
-
Publication number: 20090217017Abstract: A method, system, and computer program product for minimizing branch prediction latency in a pipelined computer processing environment are provided. The method includes detecting a branch loop utilizing branch instruction addresses and corresponding target addresses stored in a branch target buffer (BTB). The method also includes fetching the branch loop into a pre-decode instruction buffer and qualifying the branch loop for loop lockdown. The method further includes locking an instruction stream that forms the branch loop in the pre-decode instruction buffer and processing qualified branch loop instructions from the buffer and powering down instruction fetching and branch prediction logic (BPL) associated with the BTB.Type: ApplicationFiled: February 26, 2008Publication date: August 27, 2009Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Khary J. Alexander, David S. Hutton, Brian R. Prasky, Anthony Saporito, Robert J. Sonnelitter, III, John W. Ward, III
-
Patent number: 7571305Abstract: A data processing system 2 includes an instruction cache 6 having an associated buffer memory 18, 8. The buffer memory 18, 8 can operate in a buffer mode or in a microcache mode. The buffer memory is switched into the microcache mode upon program loop detection performed by loop detector circuitry 20. When operating in the microcache mode, instruction data is read from the buffer memory 18, 8 without requiring an access to the instruction cache 6.Type: GrantFiled: January 11, 2007Date of Patent: August 4, 2009Assignee: ARM LimitedInventors: Fredrick Claude Marie Piry, Louis-Marie Vincent Mouton, Stephane Eric Sabastien Brochier, Gilles Eric Grandou
-
Patent number: 7558948Abstract: A method for reducing overhead on a loop of a plurality of instructions is disclosed. The method includes providing a carry mask, the carry mask having a first value for the loop being performed at least the particular number of times minus one and a second value for at least a last instruction of the loop being performed a last time, providing addition logic, wherein the carry mask and a current instruction address of the plurality of instructions correspond to inputs of the addition logic and determining which of the plurality of instructions is to be executed using the carry mask to provide a resultant of the addition logic based on the carry mask and the current instruction address of the plurality of instructions.Type: GrantFiled: September 20, 2004Date of Patent: July 7, 2009Assignee: International Business Machines CorporationInventors: Anthony J. Bybell, Richard W. Doing, David D. Dukro
-
Publication number: 20090158018Abstract: A method and system of auto parallelization of zero-trip loops that substitutes a nested basic linear induction variable by exploiting a parallelizing compiler is provided. Provided is a use of a max{0,N} variable for loop iterations in case of no information is known about the value of N, for a typical loop iterating from 1 to N, in which N is the loop invariant. For the nested basic induction variables, an induction variable substitution process is applied to the nested loops starting from the innermost loop to the outermost one. Then a removal of the max operator afterwards through a copy propagation pass of the IBM compiler is provided. In doing so, the loop dependency on the induction variable is eliminated and an opportunity for a parallelizing compiler to parallel the outermost loop is provided.Type: ApplicationFiled: January 21, 2009Publication date: June 18, 2009Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Zhixing Ren, Raul Esteban Silvera, Guansong Zhang
-
Publication number: 20090150658Abstract: This invention combines a loop support mechanism and a branch prediction mechanism. After an instruction execution unit executes an end block instruction of a block repeat, the loop control unit branches to the first instruction in the loop and sends a pseudo branch instruction to the instruction execution unit. The instruction execution unit acts as if the last instruction in the block is an instruction for branching to the start address of the block. This is stored in the branch prediction unit and branch prediction is performed thereafter.Type: ApplicationFiled: December 5, 2008Publication date: June 11, 2009Inventor: Hiroyuki Mizumo
-
Publication number: 20090144529Abstract: Generating loop code to execute on Single-Instruction Multiple-Datapath (SIMD) architectures, where the loop operates on datatypes having different lengths, is disclosed. Further, a preferred embodiment of the present invention includes a novel technique to efficiently realign or shift arbitrary streams to an arbitrary offset, regardless whether the alignments or offsets are known at the compile time or not. This technique enables the application of advanced alignment optimizations to runtime alignment. Length conversion operations, for packing and unpacking data values, are included in the alignment handling framework. These operations are formally defined in terms of standard SIMD instructions that are readily available on various SIMD platforms. This allows sequential loop code operating on datatypes of disparate length to be transformed (“simdized”) into optimized SIMD code through a fully automated process.Type: ApplicationFiled: December 4, 2008Publication date: June 4, 2009Applicant: International Business Machines CorporationInventors: Alexandre E. Eichenberger, Kai-Ting Amy Wang, Peng Wu
-
Publication number: 20090144502Abstract: In an implementation, a processing system includes an instruction fetch (IF) memory storing IF instructions; an arithmetic/logic (AL) instruction memory (IMemory) storing AL instructions; and a programmable instruction fetch mechanism to generate IMemory instruction addresses, from IF instructions fetched from the IF memory, to select AL instructions to be fetched from the IMemory for execution, wherein at least one IF instruction includes a loop count field indicating a number of iterations of a loop to be performed, a loop start address of the loop, and a loop end address of the loop.Type: ApplicationFiled: February 6, 2009Publication date: June 4, 2009Applicant: Renesky Tap III, Limited Liability CompnayInventor: Gerald George Pechanek
-
Publication number: 20090125912Abstract: An innovative approach for constructing optimum, high-performance, efficient DSP systems may include a system organization to match compute execution and data availability rate and to organize DSP operations as loop iterations such that there is maximal reuse of data between multiple consecutive iterations. Independent set up and preparation of data before it is required through suitable mechanisms such as data pre-fetching may be used. This technique may be useful and important for devices that require cost-effective, high-performance, power consumption efficient VLSI IC.Type: ApplicationFiled: January 15, 2009Publication date: May 14, 2009Inventor: SIAMACK HAGHIGHI