Patents Examined by Corey S Faherty
  • Patent number: 10061705
    Abstract: A technique for processing instructions includes examining instructions in an instruction stream of a processor to determine properties of the instructions. The properties indicate whether the instructions may belong in an instruction sequence subject to decode-time instruction optimization (DTIO). Whether the properties of multiple ones of the instructions are compatible for inclusion within an instruction sequence of a same group is determined. The instructions with compatible ones of the properties are grouped into a first instruction group. The instructions of the first instruction group are decoded subsequent to formation of the first instruction group. Whether the first instruction group actually includes a DTIO sequence is verified based on the decoding. Based on the verifying, DTIO is performed on the instructions of the first instruction group or is not performed on the instructions of the first instruction group.
    Type: Grant
    Filed: June 9, 2015
    Date of Patent: August 28, 2018
    Assignee: International Business Machines Corporation
    Inventors: Michael K. Gschwind, Valentina Salapura
  • Patent number: 10061609
    Abstract: A method and system uses exceptions for code specialization in a system that supports transactions. The method and system includes inserting one or more branchless instructions into a sequence of computer instructions. The branchless instructions include one or more instructions that are executable if a commonly occurring condition is satisfied and include one or more instructions that are configured to raise an exception if the commonly occurring condition is not satisfied.
    Type: Grant
    Filed: October 31, 2016
    Date of Patent: August 28, 2018
    Assignee: Intel Corporation
    Inventors: Arvind Krishnaswamy, Daniel M. Lavery
  • Patent number: 10061587
    Abstract: A processor includes a front end, a decoder, an allocator, and a retirement unit. The decoder includes logic to identify an end-of-live-range (EOLR) indicator. The EOLR indicator specifies an architectural register and a location in code for which the architectural register is unused. The allocator includes logic to scan for a mapping of the architectural register to a physical register, based upon the EOLR indicator. The allocator also includes logic to generate a request to disassociate the architectural register from the physical register. The retirement unit includes logic to disassociate the architectural register from the physical register.
    Type: Grant
    Filed: September 25, 2014
    Date of Patent: August 28, 2018
    Assignee: Intel Corporation
    Inventors: David Pardo Keppel, Denis M. Khartikov, Fernando LaTorre, Marc Lupon, Grigorios Magklis, Naveen Neelakantam, Georgios Tournavitis, Polychronis Xekalakis
  • Patent number: 10055226
    Abstract: A system and process for managing thread transitions includes determining that a transition is to be made regarding the relative use of two data register sets where the two data register sets are used by a processor as first-level registers for thread execution. Based on the transition determination, a determination is made whether to move thread data in at least one of the first-level registers to second-level registers. Responsive to determining to move the thread data, a portion of main memory or cache memory is assigned as the second-level registers where the second-level registers serve as registers of at least one of the two data register sets for executing a thread. The thread data from the at least one first-level register is moved to the second-level registers based on the move determination.
    Type: Grant
    Filed: July 2, 2017
    Date of Patent: August 21, 2018
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Christopher M. Abernathy, Mary D. Brown, Susan E. Eisen, James A. Kahle, Hung Q. Le, Dung Q. Nguyen
  • Patent number: 10042813
    Abstract: Methods and apparatus relating to improved SIMD (Single Instruction, Multiple Data) K-nearest-neighbors implementations are described. An embodiment provides a technique for improving SIMD implementations of the multidimensional K-Nearest-Neighbors (KNN) techniques. One embodiment replaces the non-SIMD friendly part of the KNN algorithm with a sequence of SIMD operations. For example, in order to avoid branches in the algorithm hotspot (e.g., the inner loop), SIMD operations may be used to update the list of nearest distances (and neighbors) after each iteration. Other embodiments are also disclosed and claimed.
    Type: Grant
    Filed: December 15, 2014
    Date of Patent: August 7, 2018
    Assignee: Intel Corporation
    Inventor: Amos Goldman
  • Patent number: 10019266
    Abstract: A method includes providing a data processor having an instruction pipeline, where the instruction pipeline has a plurality of instruction pipeline stages, and where the plurality of instruction pipeline stages includes a first instruction pipeline stage and a second instruction pipeline stage. The method further includes providing a data processor instruction that causes the data processor to perform a first set of computational operations during execution of the data processor instruction, performing the first set of computational operations in the first instruction pipeline stage if the data processor instruction is being executed and a first mode has been selected, and performing the first set of computational operations in the second instruction pipeline stage if the data processor instruction is being executed and a second mode has been selected.
    Type: Grant
    Filed: September 11, 2015
    Date of Patent: July 10, 2018
    Assignee: RAMBUS INC.
    Inventors: William C. Moyer, Jeffrey W. Scott
  • Patent number: 10013257
    Abstract: Embodiments relate to register comparison for register comparison for operand store compare (OSC) prediction. An aspect includes, for each instruction in an instruction group of a processor pipeline: determining a base register value of the instruction; determining an index register value of the instruction; and determining a displacement of the instruction. Another aspect includes comparing the base register value, index register value, and displacement of each instruction in the instruction group to the base register value, index register value, and displacement of all other instructions in the instruction group. Another aspect includes based on the comparison, determining that a load instruction of the instruction group has a probable OSC conflict with a store instruction of the instruction group. Yet another aspect includes delaying the load instruction based on the determined probable OSC conflict.
    Type: Grant
    Filed: August 16, 2017
    Date of Patent: July 3, 2018
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: David Hutton, Wen Li, Eric Schwarz
  • Patent number: 10013290
    Abstract: A system and method are provided for synchronizing threads in a divergent region of code within a multi-threaded parallel processing system. The method includes, prior to any thread entering a divergent region, generating a count that represents a number of threads that will enter the divergent region. The method also includes using the count within the divergent region to synchronize the threads in the divergent region.
    Type: Grant
    Filed: September 10, 2012
    Date of Patent: July 3, 2018
    Assignee: Nvidia Corporation
    Inventor: Stephen Jones
  • Patent number: 10007524
    Abstract: Branch history information characterizes results of branch instructions previously executed by a processor. A count is stored of a number of consecutive branch instructions previously executed by the processor whose results all indicate a not taken branch. In a first pipeline stage, a predicted branch result is provided based on at least a portion of the branch history information, and one or more of the branch history information, and the count, is updated based on the predicted branch result. In a second pipeline stage an actual branch result is provided based on an executed branch instruction, and the branch history information is updated based on the actual branch result. If the predicted branch result indicates a taken branch, the branch history information is updated based on the count, and if the predicted branch result indicates a not taken branch, the count is updated but not the branch history information.
    Type: Grant
    Filed: November 14, 2014
    Date of Patent: June 26, 2018
    Assignee: Cavium, Inc.
    Inventor: David Albert Carlson
  • Patent number: 10001992
    Abstract: A method includes: calculating a percentage of an instruction belonging to a certain instruction type among instruction types included in each of a plurality of blocks partitioned from a program; extracting an execution address and a number of execution instructions from an arithmetic processing unit that executes the program and performs sampling of the execution address and the number of execution instructions at a plurality of time points, calculating a first execution frequency of the instruction included in each of the plurality of blocks based on the extracted execution address and the number of execution instructions; calculating a second execution frequency of the instruction belonging to the instruction type by multiplying the first execution frequency of the block by the percentage of the instruction in the block; calculating total number of second execution frequencies calculated for each of the plurality of blocks.
    Type: Grant
    Filed: February 12, 2016
    Date of Patent: June 19, 2018
    Assignee: FUJITSU LIMITED
    Inventor: Masao Yamamoto
  • Patent number: 9996348
    Abstract: A system and method for reducing the latency of load operations. A register rename unit within a processor determines whether a decoded load instruction is eligible for conversion to a zero-cycle load operation. If so, control logic assigns a physical register identifier associated with a source operand of an older dependent store instruction to the destination operand of the load instruction. Additionally, the register rename unit marks the load instruction to prevent it from reading data associated with the source operand of the store instruction from memory. Due to the duplicate renaming, this data may be forwarded from a physical register file to instructions that are younger and dependent on the load instruction.
    Type: Grant
    Filed: June 14, 2012
    Date of Patent: June 12, 2018
    Assignee: Apple Inc.
    Inventors: Gerard R. Williams, III, John H. Mylius, Conrade Blasco-Allue
  • Patent number: 9996358
    Abstract: A system and method of coupling a Branch Target Buffer (BTB) content of a BTB with an instruction cache content of an instruction cache. The method includes: tagging a plurality of target buffer entries that belong to branches within a same instruction block with a corresponding instruction block address and a branch bitmap to indicate individual branches in the block; coupling an overflow buffer with the BTB to accommodate further target buffer entries of instruction blocks, distinct from the plurality of target buffer entries, which have more branches than the bundle is configured to accommodate in the corresponding instruction's bundle in the BTB; and predicting the instructions or the instruction blocks that are likely to be fetched by the core in the future and fetch those instructions from the lower levels of the memory hierarchy proactively by means of a prefetcher.
    Type: Grant
    Filed: September 30, 2015
    Date of Patent: June 12, 2018
    Assignee: ECOLE POLYTECHNIQUE FEDERALE DE LAUSANNE
    Inventors: Babak Falsafi, Ilknur Cansu Kaynak, Boris Robert Grot
  • Patent number: 9996355
    Abstract: An instruction for parsing a buffer to be utilized within a data processing system including: an operation code field, the operation code field identifies the instruction; a control field, the control field controls operation of the instruction; and one or more general registers, wherein a first general register stores an argument address, a second general register stores a function code, a third general register stores length of an argument-character buffer, and the fourth of which contains the address of the function-code data structure.
    Type: Grant
    Filed: February 2, 2017
    Date of Patent: June 12, 2018
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: John R. Ehrman, Dan F. Greiner
  • Patent number: 9996350
    Abstract: Methods and apparatuses relating to a prefetch instruction to prefetch a multidimensional block of elements from a multidimensional array into a cache. In one embodiment, a hardware processor includes a decoder to decode a prefetch instruction to prefetch a multidimensional block of elements from a multidimensional array into a cache, wherein at least one operand of the prefetch instruction is to indicate a system memory address of an element of the multidimensional block of elements, a stride of the multidimensional block of elements, and boundaries of the multidimensional block of elements, and an execution unit to execute the prefetch instruction to generate system memory addresses of the other elements of the multidimensional block of elements, and load the multidimensional block of elements into the cache from the system memory addresses.
    Type: Grant
    Filed: December 27, 2014
    Date of Patent: June 12, 2018
    Assignee: INTEL CORPORATION
    Inventors: Victor Lee, Mikhail Smelyanskiy, Alexander Heinecke
  • Patent number: 9996352
    Abstract: Systems, methods, and other embodiments associated with a processor that includes selectively enabled features are described. According to one embodiment, a processor includes a plurality of processing routines embedded within the processor that when executed cause the processor to implement corresponding processor features. The processor includes a processor engine configured to determine whether a processing routine of the plurality of processing routines is enabled based, at least in part, on a corresponding value in a control register. The processing engine is configured to selectively execute the processing routine based, at least in part, on whether the value indicates that the processing routine is enabled.
    Type: Grant
    Filed: February 24, 2016
    Date of Patent: June 12, 2018
    Assignee: MARVELL INTERNATIONAL LTD.
    Inventor: Kapil Jain
  • Patent number: 9996347
    Abstract: Methods and apparatuses relating to a vector instruction with a register operand with an elemental offset are described. In one embodiment, a hardware processor includes a decode unit to decode a vector instruction with a register operand with an elemental offset to access a first number of elements in a register specified by the register operand, wherein the first number is a total number of elements in the register minus the elemental offset, access a second number of elements in a next logical register, wherein the second number is the elemental offset, and combine the first number of elements and the second number of elements as a data vector, and an execution unit to execute the vector instruction on the data vector.
    Type: Grant
    Filed: December 24, 2014
    Date of Patent: June 12, 2018
    Assignee: INTEL CORPORATION
    Inventors: Victor Lee, Ugonna Echeruo, George Chrysos, Naveen Mellempudi
  • Patent number: 9990199
    Abstract: A method and system are disclosed. The method may include receiving instructions in a hardware accelerator coupled to a computing device. The instructions may describe operations and data dependencies between the operations. The operations and the data dependencies may be predetermined. The method may include performing a splitter operation in the hardware accelerator, performing an operation in each of a plurality of branches, and performing a combiner operation in the hardware accelerator.
    Type: Grant
    Filed: September 18, 2015
    Date of Patent: June 5, 2018
    Assignee: Axis AB
    Inventors: Niclas Danielsson, Mikael Asker, Hans-Peter Nilsson, Markus Skans, Mikael Pendse
  • Patent number: 9977679
    Abstract: An apparatus and method are provided for processing instructions from a plurality of threads. The apparatus comprises a processing pipeline to process instructions, including fetch circuitry to fetch instructions from a plurality of threads for processing by the processing pipeline, and execution circuitry to execute the fetched instructions. Execution hint instruction handling circuitry is then responsive to the fetch circuitry fetching an execution hint instruction for a first thread, to treat the execution hint instruction, at least in a presence of a suspension condition, as a predicted branch instruction with a predicted behavior, and to cause the fetch circuitry to suspend fetching of instructions for the first thread. The execution circuitry is then arranged to execute the predicted branch instruction with a behavior different to the predicted behavior, in order to trigger a misprediction condition.
    Type: Grant
    Filed: November 9, 2015
    Date of Patent: May 22, 2018
    Assignee: ARM Limited
    Inventors: Ian Michael Caulfield, Antony John Penton, Robert Gwilym Dimond
  • Patent number: 9977619
    Abstract: A computer system processes instructions including an instruction code, source type, source address, destination type, and destination address. The source and destination type may indicate a memory device in which case data is read from the memory device at the source address and written to the destination address. One or both of the source type and destination type may include a transfer descriptor flag, in which case a transfer descriptor identified by the source or destination address is executed. A transfer descriptor referenced by a source address may be executed to obtain an intermediate result that is used for performing the operation indicated by the instruction code. The transfer descriptor referenced by a destination address may be executed to determine a location at which the result of the operation will be stored.
    Type: Grant
    Filed: November 6, 2015
    Date of Patent: May 22, 2018
    Assignee: Vivante Corporation
    Inventor: Mankit Lo
  • Patent number: 9960907
    Abstract: Instructions and logic provide general purpose GF(28) SIMD cryptographic arithmetic functionality. Embodiments include a processor to decode an instruction for a SIMD affine transformation specifying a source data operand, a transformation matrix operand, and a translation vector. The transformation matrix is applied to each element of the source data operand, and the translation vector is applied to each of the transformed elements. A result of the instruction is stored in a SIMD destination register. Some embodiments also decode an instruction for a SIMD binary finite field multiplicative inverse to compute an inverse in a binary finite field modulo an irreducible polynomial for each element of the source data operand. Some embodiments also decode an instruction for a SIMD binary finite field multiplication specifying first and second source data operands to multiply each corresponding pair of elements of the first and second source data operand modulo an irreducible polynomial.
    Type: Grant
    Filed: June 26, 2014
    Date of Patent: May 1, 2018
    Assignee: Intel Corporation
    Inventor: Shay Gueron