Loop Execution Patents (Class 712/241)
  • Patent number: 9971695
    Abstract: An apparatus is connected to a main memory, includes a cache memory holding data and a memory storing prediction information in plural areas thereof. The prediction information is referenced to determine whether to execute prefetch, which holds data from the main memory to the cache memory, in a case where a plurality of unrolled instructions produced by unrolling a target instruction included in a loop sentence are executed individually, and corresponds to individual memory accesses executed at certain address intervals in accordance with the respective unrolled instructions. The apparatus executes memory access to the main memory, and executes the prefetch. When the plurality of unrolled instructions are executed individually, the apparatus consolidates a plurality of pieces of prediction information respectively stored in the plural areas of the memory into one based on the number-of-unrolling information, and stores the consolidated prediction information into any one of the plural areas.
    Type: Grant
    Filed: September 9, 2015
    Date of Patent: May 15, 2018
    Assignee: FUJITSU LIMITED
    Inventor: Tomoyuki Watahiki
  • Patent number: 9946539
    Abstract: Methods, systems, and apparatus, including an apparatus for accessing a N-dimensional tensor, the apparatus including, for each dimension of the N-dimensional tensor, a partial address offset value element that stores a partial address offset value for the dimension based at least on an initial value for the dimension, a step value for the dimension, and a number of iterations of a loop for the dimension. The apparatus includes a hardware adder and a processor. The processor obtains an instruction to access a particular element of the N-dimensional tensor. The N-dimensional tensor has multiple elements arranged across each of the N dimensions, where N is an integer that is equal to or greater than one. The processor determines, using the partial address offset value elements and the hardware adder, an address of the particular element and outputs data indicating the determined address for accessing the particular element of the N-dimensional tensor.
    Type: Grant
    Filed: May 23, 2017
    Date of Patent: April 17, 2018
    Assignee: Google LLC
    Inventors: Olivier Temam, Harshit Khaitan, Ravi Narayanaswami, Dong Hyuk Woo
  • Patent number: 9875104
    Abstract: Methods, systems, and apparatus, including an apparatus for processing an instruction for accessing a N-dimensional tensor, the apparatus including multiple tensor index elements and multiple dimension multiplier elements, where each of the dimension multiplier elements has a corresponding tensor index element. The apparatus includes one or more processors configured to obtain an instruction to access a particular element of a N-dimensional tensor, where the N-dimensional tensor has multiple elements arranged across each of the N dimensions, and where N is an integer that is equal to or greater than one; determine, using one or more tensor index elements of the multiple tensor index elements and one or more dimension multiplier elements of the multiple dimension multiplier elements, an address of the particular element; and output data indicating the determined address for accessing the particular element of the N-dimensional tensor.
    Type: Grant
    Filed: February 3, 2016
    Date of Patent: January 23, 2018
    Assignee: Google LLC
    Inventors: Dong Hyuk Woo, Andrew Everett Phelps
  • Patent number: 9760356
    Abstract: Systems and methods may provide for identifying a nested loop iteration space in user code, wherein the nested loop iteration space includes a plurality of outer loop iterations, and distributing iterations from the nested loop iteration space across a plurality of threads, wherein each thread is assigned a group of outer loop iterations. Additionally, a compiler output may be automatically generated, wherein the compiler output contains serial code corresponding to each group of outer loop iterations and de-linearization code to be executed outside the plurality of outer loop iterations. In one example, the de-linearization code includes index recovery code that is positioned before one or more instances of the serial code in the compiler output.
    Type: Grant
    Filed: September 23, 2014
    Date of Patent: September 12, 2017
    Assignee: Intel Corporation
    Inventor: Alejandro Duran Gonzalez
  • Patent number: 9753707
    Abstract: A method and a computing device for reducing deoptimization in a virtual machine are provided. Source code of a dynamically-typed program is compiled. A context-free type-state recorder records a first data type of a value associated with a particular named memory location within the source code. Optimized code may be generated based on the first data type of the value being a matching data type for global values associated with the particular named memory location. One or more global values associated with the particular named memory location may be type-checked. The context-free type-state recorder may record, if one or more of the global values associated with the particular named memory location is a different data type than the first data type, one or more different data types associated with the particular named memory location. New optimized code may then be generated.
    Type: Grant
    Filed: July 24, 2015
    Date of Patent: September 5, 2017
    Assignee: QUALCOMM Innovation Center, Inc.
    Inventor: Derek Jay Conrod
  • Patent number: 9753733
    Abstract: Methods, apparatuses, and processors for packing multiple iterations of a loop in a loop buffer. A loop candidate that meets the criteria for buffering is detected in the instruction stream being executed by a processor. When the loop is being written to the loop buffer and the end of the loop is detected, another iteration of the loop is written to the loop buffer if the loop buffer is not yet halfway full. In this way, short loops are written to the loop buffer multiple times to maximize the instruction operations per cycle throughput out of the loop buffer when the processor is in loop buffer mode.
    Type: Grant
    Filed: June 15, 2012
    Date of Patent: September 5, 2017
    Assignee: Apple Inc.
    Inventors: Conrado Blasco-Allue, Ian D. Kountanis
  • Patent number: 9733717
    Abstract: A computer-implemented method for a gesture-based user interface and a gesture-based user interface system are described. The method comprises receiving image data from a multi-aperture image sensor in said electronic device, said image sensor being configured to simultaneously expose an image sensor to at least a first part of the electromagnetic (EM) spectrum using a first aperture and at least a second part of the EM spectrum using one or more second apertures; determining sharpness information in at least one area of said image data associated with at least part of an object imaged by said first aperture and said one or more second apertures onto the image plane of said image sensor; generating depth information on the basis of at least part of said sharpness information; and, recognizing on the basis of said depth information, at least part of a gesture associated with a movement of said object.
    Type: Grant
    Filed: July 12, 2012
    Date of Patent: August 15, 2017
    Assignee: DUAL APERTURE INTERNATIONAL CO. LTD.
    Inventor: Andrew Augustine Wajs
  • Patent number: 9720667
    Abstract: Technologies for automatic loop vectorization include a computing device with an optimizing compiler. During an optimization pass, the compiler identifies a loop and generates a transactional code segment including a vectorized implementation of the loop body including one or more vector memory read instructions capable of generating an exception. The compiler also generates a non-transactional fallback code segment including a scalar implementation of the loop body that is executed in response to an exception generated within the transactional code segment. The compiler may detect whether the loop contains a memory read dependent on a condition that may be updated in a previous iteration or whether the loop contains a potential data dependence between two iterations. The compiler may generate a dynamic check for an actual data dependence and an explicit transactional abort instruction to be executed when an actual data dependence exists. Other embodiments are described and claimed.
    Type: Grant
    Filed: March 21, 2014
    Date of Patent: August 1, 2017
    Assignee: Intel Corporation
    Inventors: Sara S. Baghsorkhi, Albert Hartono, Youfeng Wu, Nalini Vasudevan, Cheng Wang
  • Patent number: 9703559
    Abstract: When the branch condition of a branch command for a loop process is satisfied and enters the loop mode, the relative branch address is saved in a branch relative address save circuit that points to the branch command for loop processing, and the loop state flag is set in a loop state save circuit. When the loop state flag is set, if the absolute value of the value outputted by a command code counter circuit matches the absolute value of the relative branch address outputted by the branch relative address save circuit, a program counter sum value switching circuit outputs the relative branch address to an program counter adder. If the absolute values do not match, the program counter sum value switching circuit outputs the value ‘1’ to the program counter adder. With this, the branch penalty during loop processing is eliminated even with little hardware.
    Type: Grant
    Filed: November 2, 2012
    Date of Patent: July 11, 2017
    Assignee: NEC CORPORATION
    Inventor: Hiroyuki Igura
  • Patent number: 9697005
    Abstract: In an example, there is disclosed a digital signal processor having a register containing a modular integer configured for use as a thread offset counter. In a multi-stage, pipelined loop, which may be implemented in microcode, the main body of the loop has only one repeating stage. On each stage, the operation executed by each thread of the single repeating stage is identified by the sum of a fixed integer and the thread offset counter. After each pass through the loop, the thread offset counter is incremented, thus maintaining pipelined operation of the single repeating stage.
    Type: Grant
    Filed: December 4, 2013
    Date of Patent: July 4, 2017
    Assignee: ANALOG DEVICES, INC.
    Inventor: Boris Lerner
  • Patent number: 9632786
    Abstract: A method and circuit arrangement selectively repurpose bits from a primary opcode portion of an instruction for use in decoding one or more operands for the instruction. Decode logic of a processor, for example, may be placed in a predetermined mode that decodes a primary opcode for an instruction that is different from that specified in the primary opcode portion of the instruction, and then utilize one or more bits in the primary opcode portion to decode one or more operands for the instruction. By doing so, additional space is freed up in the instruction to support a larger register file and/or additional instruction types, e.g., as specified by a secondary or extended opcode.
    Type: Grant
    Filed: December 20, 2011
    Date of Patent: April 25, 2017
    Assignee: International Business Machines Corporation
    Inventors: Adam J. Muff, Paul E. Schardt, Robert A. Shearer, Matthew R. Tubbs
  • Patent number: 9619234
    Abstract: A circuit arrangement and program product selectively predicate instructions in an instruction stream by determining a first register address from an instruction, determining a second register address based on a value stored at the first register address, and determining whether to predicate the instruction based at least in part on a value stored at the second register address. Predication logic may analyze the instruction to determine the first register address, analyze a register corresponding to the first register address to determine the second register address, and communicate a predication signal to an execution unit based at least in part on the value stored at the second register address.
    Type: Grant
    Filed: March 22, 2016
    Date of Patent: April 11, 2017
    Assignee: International Business Machines Corporation
    Inventors: Adam J. Muff, Paul E. Schardt, Robert A. Shearer, Matthew R. Tubbs
  • Patent number: 9619229
    Abstract: In an embodiment, the present invention is directed to a processor including a decode logic to receive a multi-dimensional loop counter update instruction and to decode the multi-dimensional loop counter update instruction into at least one decoded instruction, and an execution logic to execute the at least one decoded instruction to update at least one loop counter value of a first operand associated with the multi-dimensional loop counter update instruction by a first amount. Methods to collapse loops using such instructions are also disclosed. Other embodiments are described and claimed.
    Type: Grant
    Filed: December 27, 2012
    Date of Patent: April 11, 2017
    Assignee: Intel Corporation
    Inventors: Mikhail Plotnikov, Andrey Naraikin, Elmoustapha Ould-Ahmed-Vall
  • Patent number: 9582277
    Abstract: A method for selectively predicating instructions in an instruction stream by determining a first register address from an instruction, determining a second register address based on a value stored at the first register address, and determining whether to predicate the instruction based at least in part on a value stored at the second register address. Predication logic may analyze the instruction to determine the first register address, analyze a register corresponding to the first register address to determine the second register address, and communicate a predication signal to an execution unit based at least in part on the value stored at the second register address.
    Type: Grant
    Filed: March 22, 2016
    Date of Patent: February 28, 2017
    Assignee: International Business Machines Corporation
    Inventors: Adam J. Muff, Paul E. Schardt, Robert A. Shearer, Matthew R. Tubbs
  • Patent number: 9557999
    Abstract: Methods, apparatuses, and processors for tracking loop candidates in an instruction stream. A load buffer control unit detects a backwards taken branch and starts tracking the loop candidate. The control unit tracks taken branches of the loop candidate, and keeps track of the distance to each taken branch from the start of the loop. If the distance to each taken branch stays the same over multiple iterations of the loop, then the loop is stored in a loop buffer. The loop is then dispatched from the loop buffer, and the front-end of the processor is powered down until the loop terminates.
    Type: Grant
    Filed: June 15, 2012
    Date of Patent: January 31, 2017
    Assignee: Apple Inc.
    Inventors: Conrado Blasco-Allue, Ian D. Kountanis
  • Patent number: 9542184
    Abstract: A circuit arrangement utilizes a register file of an execution unit as a local instruction loop buffer to enable suitable algorithms, such as DSP algorithms, to be fetched and executed directly within the execution unit, and often enabling other logic circuits utilized for other, general purpose workloads to either be powered down or freed up to handle other workloads.
    Type: Grant
    Filed: March 25, 2016
    Date of Patent: January 10, 2017
    Assignee: International Business Machines Corporation
    Inventors: Adam J. Muff, Paul E. Schardt, Robert A. Shearer, Matthew R. Tubbs
  • Patent number: 9529599
    Abstract: Systems, apparatuses, methods, and software for processing data in pipeline architectures are provided herein. In one example, a pipeline architecture is presented. The pipeline architecture includes a plurality of processing stages, linked in series, that iteratively process data as the data propagates through the plurality of processing stages. The pipeline architecture includes at least one other processing stage linked in series with and preceded by the plurality of processing stages and configured to iteratively process the data a number of times based at least on an iteration count comprising how many times the data was iteratively processed as the data propagated through the plurality of processing stages.
    Type: Grant
    Filed: February 12, 2013
    Date of Patent: December 27, 2016
    Inventor: William Erik Anderson
  • Patent number: 9513922
    Abstract: A computer system for generating an optimized program code from a program code having a loop with an exit branch, wherein the computer system comprises a processing unit, wherein the processing unit is arranged to convert an exit instruction of the exit branch into a predicated exit instruction, wherein the processing unit is arranged to determine common dependencies within the loop, wherein the processing unit is arranged to generate modified dependencies by adding additional dependencies to the common dependencies, and wherein the processing unit is arranged to apply an algorithm that uses software pipelining for generating an optimized program code for the loop based on the modified dependencies.
    Type: Grant
    Filed: April 20, 2012
    Date of Patent: December 6, 2016
    Assignee: FREESCALE SEMICONDUCTOR, INC.
    Inventor: Rene Catalin Palalau
  • Patent number: 9501279
    Abstract: A method utilizes a register file of an execution unit as a local instruction loop buffer to enable suitable algorithms, such as DSP algorithms, to be fetched and executed directly within the execution unit, and often enabling other logic circuits utilized for other, general purpose workloads to either be powered down or freed up to handle other workloads.
    Type: Grant
    Filed: March 25, 2016
    Date of Patent: November 22, 2016
    Assignee: International Business Machines Corporation
    Inventors: Adam J. Muff, Paul E. Schardt, Robert A. Shearer, Matthew R. Tubbs
  • Patent number: 9495168
    Abstract: In an embodiment, a system includes a processor including one or more cores and a plurality of alias registers to store memory range information associated with a plurality of operations of a loop. The memory range information references one or more memory locations within a memory. The system also includes register assignment means for assigning each of the alias registers to a corresponding operation of the loop, where the assignments are made according to a rotation schedule, and one of the alias registers is assigned to a first operation in a first iteration of the loop and to a second operation in a subsequent iteration of the loop. The system also includes the memory coupled to the processor. Other embodiments are described and claimed.
    Type: Grant
    Filed: May 30, 2013
    Date of Patent: November 15, 2016
    Assignee: Intel Corporation
    Inventors: Hongbo Rong, Cheng Wang, Hyunchul Park, Youfeng Wu
  • Patent number: 9454507
    Abstract: Embodiments of systems, apparatuses, and methods for performing in a computer processor conversion of a mask register into a list of index values in response to a single vector packed convert a mask register into a list of index values instruction that includes a destination vector register operand, a source writemask register operand, and an opcode are described.
    Type: Grant
    Filed: December 23, 2011
    Date of Patent: September 27, 2016
    Assignee: Intel Corporation
    Inventors: Elmoustapha Ould-Ahmed-Vall, Thomas Willhalm, Garrett T. Drysdale
  • Patent number: 9395962
    Abstract: A technology for executing an external operation from a software-pipelined loop is provided. Code performance efficiency can be improved by overlapping the execution of the external operations of the loop and the iterations of the loop.
    Type: Grant
    Filed: August 14, 2012
    Date of Patent: July 19, 2016
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Min-Wook Ahn, Won-Sub Kim, Dong-Hoon Yoo
  • Patent number: 9391645
    Abstract: Apparatuses and methods for determining soft data using a classification code are provided. One example apparatus can include a classification code (CC) decoder and an outer code decoder coupled to the CC decoder. The CC decoder is configured to receive a CC codeword. The CC codeword includes a piece of an outer code codeword and corresponding CC parity digits. The CC decoder is configured to determine soft data associated with the piece of the outer code codeword, at least partially, using the corresponding CC digits.
    Type: Grant
    Filed: June 9, 2015
    Date of Patent: July 12, 2016
    Assignee: Micron Technology, Inc.
    Inventors: Sivagnanam Parthasarathy, Patrick R. Khayat, Mustafa N. Kaynak
  • Patent number: 9354881
    Abstract: Embodiments of systems, apparatuses, and methods of performing in a computer processor dependency index vector calculation in response to an instruction that includes a first and second source writemask register operands, a destination vector register operand, and an opcode are described.
    Type: Grant
    Filed: December 27, 2011
    Date of Patent: May 31, 2016
    Assignee: Intel Corporation
    Inventor: Jayashankar Bharadwaj
  • Patent number: 9323530
    Abstract: Embodiments of the invention relate to a computer system for storing an internal instruction loop in a loop buffer. The computer system includes a loop buffer and a processor. The computer system is configured to perform a method including fetching instructions from memory to generate an internal instruction to be executed, detecting a beginning of a first instruction loop in the instructions, determining that a first internal instruction loop corresponding to the first instruction loop is not stored in the loop buffer, fetching the first instruction loop, optimizing one or more instructions corresponding to the first instruction loop to generate a first optimized internal instruction loop, and storing the first optimized internal instruction loop in the loop buffer based on the determination that the first internal instruction loop is not stored in the loop buffer.
    Type: Grant
    Filed: March 28, 2012
    Date of Patent: April 26, 2016
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Michael K. Gschwind, Valentina Salapura
  • Patent number: 9317291
    Abstract: A method and circuit arrangement utilize a register file of an execution unit as a local instruction loop buffer to enable suitable algorithms, such as DSP algorithms, to be fetched and executed directly within the execution unit, and often enabling other logic circuits utilized for other, general purpose workloads to either be powered down or freed up to handle other workloads.
    Type: Grant
    Filed: February 14, 2013
    Date of Patent: April 19, 2016
    Assignee: International Business Machines Corporation
    Inventors: Adam J. Muff, Paul E. Schardt, Robert A. Shearer, Matthew R. Tubbs
  • Patent number: 9311096
    Abstract: A method and circuit arrangement utilize a register file of an execution unit as a local instruction loop buffer to enable suitable algorithms, such as DSP algorithms, to be fetched and executed directly within the execution unit, and often enabling other logic circuits utilized for other, general purpose workloads to either be powered down or freed up to handle other workloads.
    Type: Grant
    Filed: March 12, 2013
    Date of Patent: April 12, 2016
    Assignee: International Business Machines Corporation
    Inventors: Adam J. Muff, Paul E. Schardt, Robert A. Shearer, Matthew R. Tubbs
  • Patent number: 9311090
    Abstract: A method, circuit arrangement, and program product for selectively predicating instructions in an instruction stream by determining a first register address from an instruction, determining a second register address based on a value stored at the first register address, and determining whether to predicate the instruction based at least in part on a value stored at the second register address. Predication logic may analyze the instruction to determine the first register address, analyze a register corresponding to the first register address to determine the second register address, and communicate a predication signal to an execution unit based at least in part on the value stored at the second register address.
    Type: Grant
    Filed: February 27, 2013
    Date of Patent: April 12, 2016
    Assignee: International Business Machines Corporation
    Inventors: Adam J. Muff, Paul E. Schardt, Robert A. Shearer, Matthew R. Tubbs
  • Patent number: 9304771
    Abstract: A method, circuit arrangement, and program product for selectively predicating instructions in an instruction stream by determining a first register address from an instruction, determining a second register address based on a value stored at the first register address, and determining whether to predicate the instruction based at least in part on a value stored at the second register address. Predication logic may analyze the instruction to determine the first register address, analyze a register corresponding to the first register address to determine the second register address, and communicate a predication signal to an execution unit based at least in part on the value stored at the second register address.
    Type: Grant
    Filed: February 13, 2013
    Date of Patent: April 5, 2016
    Assignee: International Business Machines Corporation
    Inventors: Adam J. Muff, Paul E. Schardt, Robert A. Shearer, Matthew R. Tubbs
  • Patent number: 9280344
    Abstract: A processor includes a plurality of execution units. At least one of the execution units is configured to repeatedly execute a first instruction based on a first field of the first instruction indicating that the first instruction is to be iteratively executed.
    Type: Grant
    Filed: September 27, 2012
    Date of Patent: March 8, 2016
    Assignee: TEXAS INSTRUMENTS INCORPORATED
    Inventors: Horst Diewald, Johann Zipperer
  • Patent number: 9244883
    Abstract: A technology for controlling a reconfigurable processor is provided. The reconfigurable processor dynamically loads configuration data from a peripheral memory to a configuration memory while a program is being executed, in place of loading all compiled configuration data in advance into the configuration memory when booting commences. Accordingly, a reduction in capacity of a configuration memory may be achieved.
    Type: Grant
    Filed: March 2, 2010
    Date of Patent: January 26, 2016
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Jae-un Park, Ki-seok Kwon, Sang-suk Lee
  • Patent number: 9235417
    Abstract: In accordance with embodiments disclosed herein, there are provided methods, systems, mechanisms, techniques, and apparatuses for implementing Real Time Instruction Tracing compression of RET instructions For example, in one embodiment, such means may include an integrated circuit having means for initiating instruction tracing for instructions of a traced application, mode, or code region, as the instructions are executed by the integrated circuit; means for generating a plurality of packets describing the instruction tracing; and means for compressing a multi-bit RET instruction (RETurn instruction) to a single bit RET instruction.
    Type: Grant
    Filed: December 31, 2011
    Date of Patent: January 12, 2016
    Assignee: Intel Corporation
    Inventors: Jason Brandt, Jonathan Tyler, John Zurawski, Dennis Lastor
  • Patent number: 9170816
    Abstract: A processor includes one or more processing units, an execution pipeline and control circuitry. The execution pipeline includes at least first and second pipeline stages that are cascaded so that program instructions, specifying operations to be performed by the processing units in successive cycles of the pipeline, are fetched from a memory by the first pipeline stage and conveyed to the second pipeline stage, which causes the processing units to perform the specified operations. The control circuitry is coupled, upon determining that a program instruction that is present in the second pipeline stage in a first cycle of the pipeline is to be executed again in a subsequent cycle of the pipeline, to cause the execution pipeline to reuse the program instruction in one of the pipeline stages without re-fetching the program instruction from the memory.
    Type: Grant
    Filed: January 15, 2009
    Date of Patent: October 27, 2015
    Assignee: ALTAIR SEMICONDUCTOR LTD.
    Inventors: Edan Almog, Nohik Semel, Yigal Bitran, Nadav Cohen, Yoel Livne, Eli Zyss
  • Patent number: 9170811
    Abstract: The structured control instruction fetch unit is a structured instruction stream controller that processes expand (XP), expand register indirect (XPR), loop (LOOP), and break (BRK) instructions for structured control. The fetch unit processes stop bits that mark the end of instruction blocks. Any instruction can be marked with a stop bit to indicate that it is the last one in an instruction block. All instructions are encoded with a predicate to reduce the use of control instructions and to simplify the control. A control stack guides instruction fetching by storing return addresses, loop block addresses, loop predicates, and loop counters. Control instructions and stop bits manage operation of the control stack. An instruction unit feeds execution units and includes a set-associative instruction cache, a control stack, an instruction buffer that decouples instruction fetching from execution, instruction decoders, and program counter (PC) control logic.
    Type: Grant
    Filed: January 9, 2013
    Date of Patent: October 27, 2015
    Assignees: KING FAHD UNIVERSITY OF PETROLEUM AND MINERALS, KING ABDULAZIZ CITY FOR SCIENCE AND TECHNOLOGY
    Inventor: Muhamed Fawzi Mudawar
  • Patent number: 9087152
    Abstract: A verification supporting apparatus and a verification supporting method of a reconfigurable processor is provided. The verification supporting apparatus includes an invalid operation determiner configured to detect an invalid operation from a result of scheduling on a source code, and a masking hint generator configured to generate a masking hint for the detected invalid operation.
    Type: Grant
    Filed: February 21, 2013
    Date of Patent: July 21, 2015
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Young-Chul Cho, Tai-Song Jin, Dong-Kwan Suh, Yen-Jo Han
  • Patent number: 9052910
    Abstract: A design structure provides instruction fetching within a processor instruction unit, utilizing a loop buffer, one or more virtual loop buffers, and/or an instruction buffer. During instruction fetch, modified instruction buffers coupled to an instruction cache (I-cache) temporarily store instructions from a single branch, backwards short loop. The modified instruction buffers may be a loop buffer, one or more virtual loop buffers, and/or an instruction buffer. Instructions are stored in the modified instruction buffers for the length of the loop cycle. The instruction fetch within the instruction unit of a processor retrieves the instructions for the short loop from the modified buffers during the loop cycle, rather than from the instruction cache.
    Type: Grant
    Filed: June 3, 2008
    Date of Patent: June 9, 2015
    Assignee: International Business Machines Corporation
    Inventors: Ronald Hall, Michael L. Karm, Brian R. Mestan, David Mui
  • Publication number: 20150149747
    Abstract: Provided is a loop scheduling method including scheduling a first loop using execution units, and scheduling a second loop using execution units available as a result of the scheduling of the first loop. An n-th loop (n>2) may be scheduled using a result of scheduling an (n?1)-th loop, similar to the (n?1)-th loop. The first loop may be a higher priority loop than the second loop.
    Type: Application
    Filed: July 14, 2014
    Publication date: May 28, 2015
    Applicant: SAMSUNG ELECTRONICS CO., LTD.
    Inventors: Yeon Bok LEE, Young Hwan PARK, Ho YANG, Keshava PRASAD
  • Patent number: 9038074
    Abstract: In accordance with embodiments, there are provided mechanisms and methods for controlling a process using a process map. These mechanisms and methods for controlling a process using a process map can enable process operations to execute in order without necessarily having knowledge of one another. The ability to provide the process map can avoid a requirement that the operations themselves be programmed to follow a particular sequence, as can further improve the ease by which the sequence of operations may be changed.
    Type: Grant
    Filed: May 15, 2012
    Date of Patent: May 19, 2015
    Assignee: salesforce.com, inc.
    Inventor: Richard Haven
  • Patent number: 9026769
    Abstract: A processor for processing loop instructions can include an instruction reorder structure and a loop processing controller. The instruction reorder structure is configured to store decoded instructions according to program order and issue the decoded instructions for execution out of program order. The loop processing controller is configured to detect a loop in the decoded instructions stored in the instruction reorder structure and cause the instruction reorder structure to reissue the decoded instructions that form the loop for re-execution.
    Type: Grant
    Filed: January 24, 2012
    Date of Patent: May 5, 2015
    Assignee: Marvell International Ltd.
    Inventors: Sujat Jamil, R. Frank O'Bleness, Joseph Delgross, Tom Hameenanttila
  • Publication number: 20150121051
    Abstract: A debugging system and method, referred to as a kernel functionality checker, is described for enabling debugging of software written for device-specific APIs (application program interfaces) without requiring support or changes in the software driver or hardware. Specific example embodiments are described for OpenCL, but the disclosed methods may also be used to enable debugging capabilities for other device-specific APIs such as DirectX® and OpenGL®.
    Type: Application
    Filed: March 14, 2013
    Publication date: April 30, 2015
    Inventors: Jeremy Bottleson, Alfredo Gimenez
  • Publication number: 20150113229
    Abstract: Code versioning for enabling transactional memory region promotion may include receiving a portion of candidate source code; outlining the portion of candidate source code received for parallel execution; wrapping a critical region with entry and exit routines to enter into a speculation sub-process, wherein the entry and exit routines also gather conflict statistics at run time; and generating an outlined code portion comprising multiple loop versions using a processor.
    Type: Application
    Filed: October 2, 2014
    Publication date: April 23, 2015
    Inventors: Hans Boettiger, Yaoqing Gao, Martin Ohmacht, Kai-Ting Amy Wang
  • Publication number: 20150106603
    Abstract: A modulo scheduling method including calculating at least two candidate initiation intervals between adjacent iterations, searching for schedules of the instructions in parallel by using the candidate initiation intervals, and selecting a schedule determined to be valid from among the searched schedules.
    Type: Application
    Filed: October 7, 2014
    Publication date: April 16, 2015
    Applicant: SAMSUNG ELECTRONICS CO., LTD.
    Inventors: Min-wook AHN, Won-sub KIM, Tai Song JIN, Seung-won LEE, Jin-seok LEE, Chae-seok IM
  • Publication number: 20150026434
    Abstract: Techniques are described herein for using configurable logic constructs in a loop buffer. In an embodiment, a configurable hardware block is programmed based on one or more target functions within a loop. The configurable hardware block is associated with a plurality of registers, including a loopcount register and a first output register. For each iteration of the loop, a counter value in the loopcount register is updated and a target value in the first output register is updated using the programmed configurable hardware block. For each iteration of the loop, a set of one or more instructions may be fetched from the instruction buffer and executed based on the updated target value in the first output value.
    Type: Application
    Filed: July 22, 2013
    Publication date: January 22, 2015
    Applicant: Oracle International Corporation
    Inventors: Aarti Basant, Brian Gold, Erik Schlanger
  • Patent number: 8892853
    Abstract: An image processing system including a vector processor and a memory adapted for attaching to the vector processor. The memory is adapted to store multiple image frames. The vector processor includes an address generator operatively attached to the memory to access the memory. The address generator is adapted for calculating addresses of the memory over the multiple image frames. The addresses may be calculated over the image frames based upon an image parameter. The image parameter may specify which of the image frames are processed simultaneously. A scalar processor may be attached to the vector processor. The scalar processor provides the image parameter(s) to the address generator for address calculation over the multiple image frames. An input register may be attached to the vector processor. The input register may be adapted to receive a very long instruction word (VLIW) instruction.
    Type: Grant
    Filed: June 10, 2010
    Date of Patent: November 18, 2014
    Assignee: Mobileye Technologies Limited
    Inventors: Yosef Kreinin, Gil Dogon, Emmanuel Sixsou, Yosi Arbeli, Mois Navon, Roman Sajman
  • Publication number: 20140337606
    Abstract: When the branch condition of a branch command for a loop process is satisfied and enters the loop mode, the relative branch address is saved in a branch relative address save circuit that points to the branch command for loop processing, and the loop state flag is set in a loop state save circuit. When the loop state flag is set, if the absolute value of the value outputted by a command code counter circuit matches the absolute value of the relative branch address outputted by the branch relative address save circuit, a program counter sum value switching circuit outputs the relative branch address to an program counter adder. If the absolute values do not match, the program counter sum value switching circuit outputs the value ‘1’ to the program counter adder. With this, the branch penalty during loop processing is eliminated even with little hardware.
    Type: Application
    Filed: November 2, 2012
    Publication date: November 13, 2014
    Applicant: NEC CORPORATION
    Inventor: Hiroyuki Igura
  • Patent number: 8869129
    Abstract: An apparatus and method for scheduling an instruction are provided. The apparatus includes an analyzer configured to analyze dependency of a plurality of recurrence loops and a scheduler configured to schedule the recurrence loops based the analyzed dependencies. When scheduling a plurality of recurrence loops, the apparatus first schedules a dominant loop whose loop head has no dependency on another loop among the recurrence loops.
    Type: Grant
    Filed: November 2, 2009
    Date of Patent: October 21, 2014
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Tae-wook Oh, Won-sub Kim, Bernhard Egger
  • Patent number: 8856499
    Abstract: An apparatus is disclosed. The apparatus comprises an instruction mapping table, which includes a plurality of instruction counts and a plurality of instruction pointers each corresponding with one of the instruction counts. Each instruction pointer identifies a next instruction for execution. Further, each instruction count specifies a number of instructions to execute beginning with the next instruction. The apparatus also has a data operation unit adapted to receive a data group and adapted to execute on the received data group the number of instructions specified by a current instruction count of the instruction mapping table beginning with the next instruction identified by a current instruction pointer of the instruction mapping table before proceeding with another data group.
    Type: Grant
    Filed: August 15, 2007
    Date of Patent: October 7, 2014
    Assignee: Nvidia Corporation
    Inventors: Michael J. M. Toksvig, Justin M. Mahan, Edward A. Hutchins, Tyson J. Bergland, James T. Battle, Ashok Srinivasan
  • Publication number: 20140297997
    Abstract: Various embodiments are generally directed to techniques for reducing syntax requirements in application code to cause concurrent execution of multiple iterations of at least a portion of a loop thereof to reduce overall execution time in solving a large scale problem. At least one non-transitory machine-readable storage medium includes instructions that when executed by a computing device, cause the computing device to parse an application code to identify a loop instruction indicative of an instruction block that includes instructions that define a loop of which multiple iterations are capable of concurrent execution, the instructions including at least one call instruction to an executable routine capable of concurrent execution; and insert at least one coordinating instruction into an instruction sub-block of the instruction block to cause sequential execution of instructions of the instruction sub-block across the multiple iterations based on identification of the loop instruction.
    Type: Application
    Filed: December 30, 2013
    Publication date: October 2, 2014
    Applicant: SAS Institute Inc.
    Inventors: Jack Joseph Rouse, Leonardo Bezerra Lopes, Robert William Pratt
  • Patent number: 8850171
    Abstract: When a temporary data storage unit 104 stores a value of “3” and an iteration number of “3”, and a data updating management unit 103 receives a value of “2” in combination with an iteration number of “2”, a data updating management unit 103 determines not to overwrite information in the temporary data storage unit 104 with the received information by comparing the relative sizes of the iteration numbers. Subsequently, upon receiving information from the multithreaded execution unit 102 indicating that parallel execution is complete, the data updating management unit 103 copies the value of “3”, stored by the temporary data storage unit 104, into the final data storage unit 105.
    Type: Grant
    Filed: June 3, 2011
    Date of Patent: September 30, 2014
    Assignee: Panasonic Corporation
    Inventor: Kyoko Ueda
  • Patent number: 8850170
    Abstract: An apparatus and method for dynamically determining the execution mode of a reconfigurable array are provided. Performance information of a loop may be obtained before and/or during the execution of the loop. The performance information may be used to determine whether to operate the apparatus in a very long instruction word (VLIW) mode or in a coarse grained array (CGA) mode.
    Type: Grant
    Filed: August 25, 2011
    Date of Patent: September 30, 2014
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Bernhard Egger, Dong-Hoon Yoo, Tai-Song Jin, Won-Sub Kim, Min-Wook Ahn, Jin-Seok Lee, Hee-Jin Ahn