Patents Examined by Corey S Faherty
  • Patent number: 10120685
    Abstract: An apparatus and method for supporting simultaneous multiple iterations (SMI) in a course grained reconfigurable architecture (CGRA). In support of SMI, the apparatus includes: Hardware structures that connect all of multiple processing engines (PEs) to a load-store unit (LSU) configured to keep track of which compiled program code iterations have completed, which ones are in flight and which are yet to begin, and a control unit including hardware structures that are used to maintain synchronization and initiate and terminate loops within the PEs. SMI permits execution of the next instruction within any iteration (in flight). If instructions from multiple iterations are ready for execution (and are pre-decoded), then the hardware selects the lowest iteration number ready for execution. If in a particular clock cycle, a loop iteration with a lower iteration number is stalled (i.e.
    Type: Grant
    Filed: November 4, 2015
    Date of Patent: November 6, 2018
    Assignee: International Business Machines Corporation
    Inventors: Chia-yu Chen, Kailash Gopalakrishnan, Jinwook Oh, Sunil K. Shukla, Vijayalakshmi Srinivasan
  • Patent number: 10114639
    Abstract: An arithmetic device which controls a parallel arithmetic operation includes a global memory, a plurality of compute units, each of the compute units including a local memory and a plurality of processing elements, and each of the processing elements including a private memory and processing data blocks stored in the private memory, an attribute group holding unit which includes a specific attribute which includes a parameter indicative of a size of the data block, an arithmetic attribute which includes a parameter indicating whether the data block is a data relevant to processing, and indicating a transfer order when the data block is data relevant to processing, and a policy attribute which includes a parameter indicative of how to execute a transfer of the data block and how to execute processing of the data block.
    Type: Grant
    Filed: April 28, 2017
    Date of Patent: October 30, 2018
    Assignee: RENESAS ELECTRONICS CORPORATION
    Inventor: Shorin Kyo
  • Patent number: 10108417
    Abstract: Storing narrow produced values for instruction operands directly in a register map in an out-of-order processor (OoP) is provided. An OoP is provided that includes an instruction processing system. The instruction processing system includes a number of instruction processing stages configured to pipeline the processing and execution of instructions according to a dataflow execution. The instruction processing system also includes a register map table (RMT) configured to store address pointers mapping logical registers to physical registers in a physical register file (PRF) for storing produced data for use by consumer instructions without overwriting logical registers for later executed, out-of-order instructions. In certain aspects, the instruction processing system is configured to write back (i.e., store) narrow values produced by executed instructions directly into the RMT, as opposed to writing the narrow produced values into the PRF in a write back stage.
    Type: Grant
    Filed: September 21, 2015
    Date of Patent: October 23, 2018
    Assignee: QUALCOMM Incorporated
    Inventors: Anil Krishna, Rodney Wayne Smith, Sandeep Suresh Navada, Shivam Priyadarshi, Raguram Damodaran
  • Patent number: 10102017
    Abstract: A computing system in which a software component executing on a platform can reliably and efficiently obtain state information about a component supported by the platform through the use of a shared memory page. State information may be supplied by the platform, but any state translation information needed to map the state information as supplied to a format as used may be provided through the shared page. In a virtualized environment, the state translation information can be used to map the value of a virtual timer counter or other component from a value provided by a virtual processor to a normalized reference time that will yield the same result, regardless of whether the software component is migrated to or from another virtual processor. Use of a shared page avoids the inefficiency of an intercept into a virtualized environment or a system calls in native mode operation.
    Type: Grant
    Filed: February 19, 2013
    Date of Patent: October 16, 2018
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Shuvabrata Ganguly, Jason S. Wohlgemuth, Allen Marshall
  • Patent number: 10095584
    Abstract: The amount of data to be backed up and recovered is reduced when supply of power to a semiconductor device is stopped and restarted. A backup need determination circuit provided in the semiconductor device reads the kind of instruction decoded by a decoder and determines whether data needs to be backed up from a volatile register to a nonvolatile register. With a structure according to one embodiment of the present invention, it is possible to select necessary data from data used for operation in a logic circuit before the power supply is stopped and after the power supply is restarted. Data that is necessary after the power supply is restarted can be backed up from the volatile register to the nonvolatile register before the power supply is stopped. Data that is unnecessary is not backed up from the volatile register to the nonvolatile register before the power supply is stopped.
    Type: Grant
    Filed: April 23, 2014
    Date of Patent: October 9, 2018
    Assignee: Semiconductor Energy Laboratory Co., Ltd.
    Inventor: Seiichi Yoneda
  • Patent number: 10061609
    Abstract: A method and system uses exceptions for code specialization in a system that supports transactions. The method and system includes inserting one or more branchless instructions into a sequence of computer instructions. The branchless instructions include one or more instructions that are executable if a commonly occurring condition is satisfied and include one or more instructions that are configured to raise an exception if the commonly occurring condition is not satisfied.
    Type: Grant
    Filed: October 31, 2016
    Date of Patent: August 28, 2018
    Assignee: Intel Corporation
    Inventors: Arvind Krishnaswamy, Daniel M. Lavery
  • Patent number: 10061705
    Abstract: A technique for processing instructions includes examining instructions in an instruction stream of a processor to determine properties of the instructions. The properties indicate whether the instructions may belong in an instruction sequence subject to decode-time instruction optimization (DTIO). Whether the properties of multiple ones of the instructions are compatible for inclusion within an instruction sequence of a same group is determined. The instructions with compatible ones of the properties are grouped into a first instruction group. The instructions of the first instruction group are decoded subsequent to formation of the first instruction group. Whether the first instruction group actually includes a DTIO sequence is verified based on the decoding. Based on the verifying, DTIO is performed on the instructions of the first instruction group or is not performed on the instructions of the first instruction group.
    Type: Grant
    Filed: June 9, 2015
    Date of Patent: August 28, 2018
    Assignee: International Business Machines Corporation
    Inventors: Michael K. Gschwind, Valentina Salapura
  • Patent number: 10061587
    Abstract: A processor includes a front end, a decoder, an allocator, and a retirement unit. The decoder includes logic to identify an end-of-live-range (EOLR) indicator. The EOLR indicator specifies an architectural register and a location in code for which the architectural register is unused. The allocator includes logic to scan for a mapping of the architectural register to a physical register, based upon the EOLR indicator. The allocator also includes logic to generate a request to disassociate the architectural register from the physical register. The retirement unit includes logic to disassociate the architectural register from the physical register.
    Type: Grant
    Filed: September 25, 2014
    Date of Patent: August 28, 2018
    Assignee: Intel Corporation
    Inventors: David Pardo Keppel, Denis M. Khartikov, Fernando LaTorre, Marc Lupon, Grigorios Magklis, Naveen Neelakantam, Georgios Tournavitis, Polychronis Xekalakis
  • Patent number: 10055226
    Abstract: A system and process for managing thread transitions includes determining that a transition is to be made regarding the relative use of two data register sets where the two data register sets are used by a processor as first-level registers for thread execution. Based on the transition determination, a determination is made whether to move thread data in at least one of the first-level registers to second-level registers. Responsive to determining to move the thread data, a portion of main memory or cache memory is assigned as the second-level registers where the second-level registers serve as registers of at least one of the two data register sets for executing a thread. The thread data from the at least one first-level register is moved to the second-level registers based on the move determination.
    Type: Grant
    Filed: July 2, 2017
    Date of Patent: August 21, 2018
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Christopher M. Abernathy, Mary D. Brown, Susan E. Eisen, James A. Kahle, Hung Q. Le, Dung Q. Nguyen
  • Patent number: 10042813
    Abstract: Methods and apparatus relating to improved SIMD (Single Instruction, Multiple Data) K-nearest-neighbors implementations are described. An embodiment provides a technique for improving SIMD implementations of the multidimensional K-Nearest-Neighbors (KNN) techniques. One embodiment replaces the non-SIMD friendly part of the KNN algorithm with a sequence of SIMD operations. For example, in order to avoid branches in the algorithm hotspot (e.g., the inner loop), SIMD operations may be used to update the list of nearest distances (and neighbors) after each iteration. Other embodiments are also disclosed and claimed.
    Type: Grant
    Filed: December 15, 2014
    Date of Patent: August 7, 2018
    Assignee: Intel Corporation
    Inventor: Amos Goldman
  • Patent number: 10019266
    Abstract: A method includes providing a data processor having an instruction pipeline, where the instruction pipeline has a plurality of instruction pipeline stages, and where the plurality of instruction pipeline stages includes a first instruction pipeline stage and a second instruction pipeline stage. The method further includes providing a data processor instruction that causes the data processor to perform a first set of computational operations during execution of the data processor instruction, performing the first set of computational operations in the first instruction pipeline stage if the data processor instruction is being executed and a first mode has been selected, and performing the first set of computational operations in the second instruction pipeline stage if the data processor instruction is being executed and a second mode has been selected.
    Type: Grant
    Filed: September 11, 2015
    Date of Patent: July 10, 2018
    Assignee: RAMBUS INC.
    Inventors: William C. Moyer, Jeffrey W. Scott
  • Patent number: 10013257
    Abstract: Embodiments relate to register comparison for register comparison for operand store compare (OSC) prediction. An aspect includes, for each instruction in an instruction group of a processor pipeline: determining a base register value of the instruction; determining an index register value of the instruction; and determining a displacement of the instruction. Another aspect includes comparing the base register value, index register value, and displacement of each instruction in the instruction group to the base register value, index register value, and displacement of all other instructions in the instruction group. Another aspect includes based on the comparison, determining that a load instruction of the instruction group has a probable OSC conflict with a store instruction of the instruction group. Yet another aspect includes delaying the load instruction based on the determined probable OSC conflict.
    Type: Grant
    Filed: August 16, 2017
    Date of Patent: July 3, 2018
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: David Hutton, Wen Li, Eric Schwarz
  • Patent number: 10013290
    Abstract: A system and method are provided for synchronizing threads in a divergent region of code within a multi-threaded parallel processing system. The method includes, prior to any thread entering a divergent region, generating a count that represents a number of threads that will enter the divergent region. The method also includes using the count within the divergent region to synchronize the threads in the divergent region.
    Type: Grant
    Filed: September 10, 2012
    Date of Patent: July 3, 2018
    Assignee: Nvidia Corporation
    Inventor: Stephen Jones
  • Patent number: 10007524
    Abstract: Branch history information characterizes results of branch instructions previously executed by a processor. A count is stored of a number of consecutive branch instructions previously executed by the processor whose results all indicate a not taken branch. In a first pipeline stage, a predicted branch result is provided based on at least a portion of the branch history information, and one or more of the branch history information, and the count, is updated based on the predicted branch result. In a second pipeline stage an actual branch result is provided based on an executed branch instruction, and the branch history information is updated based on the actual branch result. If the predicted branch result indicates a taken branch, the branch history information is updated based on the count, and if the predicted branch result indicates a not taken branch, the count is updated but not the branch history information.
    Type: Grant
    Filed: November 14, 2014
    Date of Patent: June 26, 2018
    Assignee: Cavium, Inc.
    Inventor: David Albert Carlson
  • Patent number: 10001992
    Abstract: A method includes: calculating a percentage of an instruction belonging to a certain instruction type among instruction types included in each of a plurality of blocks partitioned from a program; extracting an execution address and a number of execution instructions from an arithmetic processing unit that executes the program and performs sampling of the execution address and the number of execution instructions at a plurality of time points, calculating a first execution frequency of the instruction included in each of the plurality of blocks based on the extracted execution address and the number of execution instructions; calculating a second execution frequency of the instruction belonging to the instruction type by multiplying the first execution frequency of the block by the percentage of the instruction in the block; calculating total number of second execution frequencies calculated for each of the plurality of blocks.
    Type: Grant
    Filed: February 12, 2016
    Date of Patent: June 19, 2018
    Assignee: FUJITSU LIMITED
    Inventor: Masao Yamamoto
  • Patent number: 9996348
    Abstract: A system and method for reducing the latency of load operations. A register rename unit within a processor determines whether a decoded load instruction is eligible for conversion to a zero-cycle load operation. If so, control logic assigns a physical register identifier associated with a source operand of an older dependent store instruction to the destination operand of the load instruction. Additionally, the register rename unit marks the load instruction to prevent it from reading data associated with the source operand of the store instruction from memory. Due to the duplicate renaming, this data may be forwarded from a physical register file to instructions that are younger and dependent on the load instruction.
    Type: Grant
    Filed: June 14, 2012
    Date of Patent: June 12, 2018
    Assignee: Apple Inc.
    Inventors: Gerard R. Williams, III, John H. Mylius, Conrade Blasco-Allue
  • Patent number: 9996358
    Abstract: A system and method of coupling a Branch Target Buffer (BTB) content of a BTB with an instruction cache content of an instruction cache. The method includes: tagging a plurality of target buffer entries that belong to branches within a same instruction block with a corresponding instruction block address and a branch bitmap to indicate individual branches in the block; coupling an overflow buffer with the BTB to accommodate further target buffer entries of instruction blocks, distinct from the plurality of target buffer entries, which have more branches than the bundle is configured to accommodate in the corresponding instruction's bundle in the BTB; and predicting the instructions or the instruction blocks that are likely to be fetched by the core in the future and fetch those instructions from the lower levels of the memory hierarchy proactively by means of a prefetcher.
    Type: Grant
    Filed: September 30, 2015
    Date of Patent: June 12, 2018
    Assignee: ECOLE POLYTECHNIQUE FEDERALE DE LAUSANNE
    Inventors: Babak Falsafi, Ilknur Cansu Kaynak, Boris Robert Grot
  • Patent number: 9996355
    Abstract: An instruction for parsing a buffer to be utilized within a data processing system including: an operation code field, the operation code field identifies the instruction; a control field, the control field controls operation of the instruction; and one or more general registers, wherein a first general register stores an argument address, a second general register stores a function code, a third general register stores length of an argument-character buffer, and the fourth of which contains the address of the function-code data structure.
    Type: Grant
    Filed: February 2, 2017
    Date of Patent: June 12, 2018
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: John R. Ehrman, Dan F. Greiner
  • Patent number: 9996350
    Abstract: Methods and apparatuses relating to a prefetch instruction to prefetch a multidimensional block of elements from a multidimensional array into a cache. In one embodiment, a hardware processor includes a decoder to decode a prefetch instruction to prefetch a multidimensional block of elements from a multidimensional array into a cache, wherein at least one operand of the prefetch instruction is to indicate a system memory address of an element of the multidimensional block of elements, a stride of the multidimensional block of elements, and boundaries of the multidimensional block of elements, and an execution unit to execute the prefetch instruction to generate system memory addresses of the other elements of the multidimensional block of elements, and load the multidimensional block of elements into the cache from the system memory addresses.
    Type: Grant
    Filed: December 27, 2014
    Date of Patent: June 12, 2018
    Assignee: INTEL CORPORATION
    Inventors: Victor Lee, Mikhail Smelyanskiy, Alexander Heinecke
  • Patent number: 9996352
    Abstract: Systems, methods, and other embodiments associated with a processor that includes selectively enabled features are described. According to one embodiment, a processor includes a plurality of processing routines embedded within the processor that when executed cause the processor to implement corresponding processor features. The processor includes a processor engine configured to determine whether a processing routine of the plurality of processing routines is enabled based, at least in part, on a corresponding value in a control register. The processing engine is configured to selectively execute the processing routine based, at least in part, on whether the value indicates that the processing routine is enabled.
    Type: Grant
    Filed: February 24, 2016
    Date of Patent: June 12, 2018
    Assignee: MARVELL INTERNATIONAL LTD.
    Inventor: Kapil Jain