Patents Examined by Corey S Faherty
  • Patent number: 10019266
    Abstract: A method includes providing a data processor having an instruction pipeline, where the instruction pipeline has a plurality of instruction pipeline stages, and where the plurality of instruction pipeline stages includes a first instruction pipeline stage and a second instruction pipeline stage. The method further includes providing a data processor instruction that causes the data processor to perform a first set of computational operations during execution of the data processor instruction, performing the first set of computational operations in the first instruction pipeline stage if the data processor instruction is being executed and a first mode has been selected, and performing the first set of computational operations in the second instruction pipeline stage if the data processor instruction is being executed and a second mode has been selected.
    Type: Grant
    Filed: September 11, 2015
    Date of Patent: July 10, 2018
    Assignee: RAMBUS INC.
    Inventors: William C. Moyer, Jeffrey W. Scott
  • Patent number: 10013257
    Abstract: Embodiments relate to register comparison for register comparison for operand store compare (OSC) prediction. An aspect includes, for each instruction in an instruction group of a processor pipeline: determining a base register value of the instruction; determining an index register value of the instruction; and determining a displacement of the instruction. Another aspect includes comparing the base register value, index register value, and displacement of each instruction in the instruction group to the base register value, index register value, and displacement of all other instructions in the instruction group. Another aspect includes based on the comparison, determining that a load instruction of the instruction group has a probable OSC conflict with a store instruction of the instruction group. Yet another aspect includes delaying the load instruction based on the determined probable OSC conflict.
    Type: Grant
    Filed: August 16, 2017
    Date of Patent: July 3, 2018
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: David Hutton, Wen Li, Eric Schwarz
  • Patent number: 10013290
    Abstract: A system and method are provided for synchronizing threads in a divergent region of code within a multi-threaded parallel processing system. The method includes, prior to any thread entering a divergent region, generating a count that represents a number of threads that will enter the divergent region. The method also includes using the count within the divergent region to synchronize the threads in the divergent region.
    Type: Grant
    Filed: September 10, 2012
    Date of Patent: July 3, 2018
    Assignee: Nvidia Corporation
    Inventor: Stephen Jones
  • Patent number: 10007524
    Abstract: Branch history information characterizes results of branch instructions previously executed by a processor. A count is stored of a number of consecutive branch instructions previously executed by the processor whose results all indicate a not taken branch. In a first pipeline stage, a predicted branch result is provided based on at least a portion of the branch history information, and one or more of the branch history information, and the count, is updated based on the predicted branch result. In a second pipeline stage an actual branch result is provided based on an executed branch instruction, and the branch history information is updated based on the actual branch result. If the predicted branch result indicates a taken branch, the branch history information is updated based on the count, and if the predicted branch result indicates a not taken branch, the count is updated but not the branch history information.
    Type: Grant
    Filed: November 14, 2014
    Date of Patent: June 26, 2018
    Assignee: Cavium, Inc.
    Inventor: David Albert Carlson
  • Patent number: 10001992
    Abstract: A method includes: calculating a percentage of an instruction belonging to a certain instruction type among instruction types included in each of a plurality of blocks partitioned from a program; extracting an execution address and a number of execution instructions from an arithmetic processing unit that executes the program and performs sampling of the execution address and the number of execution instructions at a plurality of time points, calculating a first execution frequency of the instruction included in each of the plurality of blocks based on the extracted execution address and the number of execution instructions; calculating a second execution frequency of the instruction belonging to the instruction type by multiplying the first execution frequency of the block by the percentage of the instruction in the block; calculating total number of second execution frequencies calculated for each of the plurality of blocks.
    Type: Grant
    Filed: February 12, 2016
    Date of Patent: June 19, 2018
    Assignee: FUJITSU LIMITED
    Inventor: Masao Yamamoto
  • Patent number: 9996352
    Abstract: Systems, methods, and other embodiments associated with a processor that includes selectively enabled features are described. According to one embodiment, a processor includes a plurality of processing routines embedded within the processor that when executed cause the processor to implement corresponding processor features. The processor includes a processor engine configured to determine whether a processing routine of the plurality of processing routines is enabled based, at least in part, on a corresponding value in a control register. The processing engine is configured to selectively execute the processing routine based, at least in part, on whether the value indicates that the processing routine is enabled.
    Type: Grant
    Filed: February 24, 2016
    Date of Patent: June 12, 2018
    Assignee: MARVELL INTERNATIONAL LTD.
    Inventor: Kapil Jain
  • Patent number: 9996348
    Abstract: A system and method for reducing the latency of load operations. A register rename unit within a processor determines whether a decoded load instruction is eligible for conversion to a zero-cycle load operation. If so, control logic assigns a physical register identifier associated with a source operand of an older dependent store instruction to the destination operand of the load instruction. Additionally, the register rename unit marks the load instruction to prevent it from reading data associated with the source operand of the store instruction from memory. Due to the duplicate renaming, this data may be forwarded from a physical register file to instructions that are younger and dependent on the load instruction.
    Type: Grant
    Filed: June 14, 2012
    Date of Patent: June 12, 2018
    Assignee: Apple Inc.
    Inventors: Gerard R. Williams, III, John H. Mylius, Conrade Blasco-Allue
  • Patent number: 9996347
    Abstract: Methods and apparatuses relating to a vector instruction with a register operand with an elemental offset are described. In one embodiment, a hardware processor includes a decode unit to decode a vector instruction with a register operand with an elemental offset to access a first number of elements in a register specified by the register operand, wherein the first number is a total number of elements in the register minus the elemental offset, access a second number of elements in a next logical register, wherein the second number is the elemental offset, and combine the first number of elements and the second number of elements as a data vector, and an execution unit to execute the vector instruction on the data vector.
    Type: Grant
    Filed: December 24, 2014
    Date of Patent: June 12, 2018
    Assignee: INTEL CORPORATION
    Inventors: Victor Lee, Ugonna Echeruo, George Chrysos, Naveen Mellempudi
  • Patent number: 9996350
    Abstract: Methods and apparatuses relating to a prefetch instruction to prefetch a multidimensional block of elements from a multidimensional array into a cache. In one embodiment, a hardware processor includes a decoder to decode a prefetch instruction to prefetch a multidimensional block of elements from a multidimensional array into a cache, wherein at least one operand of the prefetch instruction is to indicate a system memory address of an element of the multidimensional block of elements, a stride of the multidimensional block of elements, and boundaries of the multidimensional block of elements, and an execution unit to execute the prefetch instruction to generate system memory addresses of the other elements of the multidimensional block of elements, and load the multidimensional block of elements into the cache from the system memory addresses.
    Type: Grant
    Filed: December 27, 2014
    Date of Patent: June 12, 2018
    Assignee: INTEL CORPORATION
    Inventors: Victor Lee, Mikhail Smelyanskiy, Alexander Heinecke
  • Patent number: 9996355
    Abstract: An instruction for parsing a buffer to be utilized within a data processing system including: an operation code field, the operation code field identifies the instruction; a control field, the control field controls operation of the instruction; and one or more general registers, wherein a first general register stores an argument address, a second general register stores a function code, a third general register stores length of an argument-character buffer, and the fourth of which contains the address of the function-code data structure.
    Type: Grant
    Filed: February 2, 2017
    Date of Patent: June 12, 2018
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: John R. Ehrman, Dan F. Greiner
  • Patent number: 9996358
    Abstract: A system and method of coupling a Branch Target Buffer (BTB) content of a BTB with an instruction cache content of an instruction cache. The method includes: tagging a plurality of target buffer entries that belong to branches within a same instruction block with a corresponding instruction block address and a branch bitmap to indicate individual branches in the block; coupling an overflow buffer with the BTB to accommodate further target buffer entries of instruction blocks, distinct from the plurality of target buffer entries, which have more branches than the bundle is configured to accommodate in the corresponding instruction's bundle in the BTB; and predicting the instructions or the instruction blocks that are likely to be fetched by the core in the future and fetch those instructions from the lower levels of the memory hierarchy proactively by means of a prefetcher.
    Type: Grant
    Filed: September 30, 2015
    Date of Patent: June 12, 2018
    Assignee: ECOLE POLYTECHNIQUE FEDERALE DE LAUSANNE
    Inventors: Babak Falsafi, Ilknur Cansu Kaynak, Boris Robert Grot
  • Patent number: 9990199
    Abstract: A method and system are disclosed. The method may include receiving instructions in a hardware accelerator coupled to a computing device. The instructions may describe operations and data dependencies between the operations. The operations and the data dependencies may be predetermined. The method may include performing a splitter operation in the hardware accelerator, performing an operation in each of a plurality of branches, and performing a combiner operation in the hardware accelerator.
    Type: Grant
    Filed: September 18, 2015
    Date of Patent: June 5, 2018
    Assignee: Axis AB
    Inventors: Niclas Danielsson, Mikael Asker, Hans-Peter Nilsson, Markus Skans, Mikael Pendse
  • Patent number: 9977679
    Abstract: An apparatus and method are provided for processing instructions from a plurality of threads. The apparatus comprises a processing pipeline to process instructions, including fetch circuitry to fetch instructions from a plurality of threads for processing by the processing pipeline, and execution circuitry to execute the fetched instructions. Execution hint instruction handling circuitry is then responsive to the fetch circuitry fetching an execution hint instruction for a first thread, to treat the execution hint instruction, at least in a presence of a suspension condition, as a predicted branch instruction with a predicted behavior, and to cause the fetch circuitry to suspend fetching of instructions for the first thread. The execution circuitry is then arranged to execute the predicted branch instruction with a behavior different to the predicted behavior, in order to trigger a misprediction condition.
    Type: Grant
    Filed: November 9, 2015
    Date of Patent: May 22, 2018
    Assignee: ARM Limited
    Inventors: Ian Michael Caulfield, Antony John Penton, Robert Gwilym Dimond
  • Patent number: 9977619
    Abstract: A computer system processes instructions including an instruction code, source type, source address, destination type, and destination address. The source and destination type may indicate a memory device in which case data is read from the memory device at the source address and written to the destination address. One or both of the source type and destination type may include a transfer descriptor flag, in which case a transfer descriptor identified by the source or destination address is executed. A transfer descriptor referenced by a source address may be executed to obtain an intermediate result that is used for performing the operation indicated by the instruction code. The transfer descriptor referenced by a destination address may be executed to determine a location at which the result of the operation will be stored.
    Type: Grant
    Filed: November 6, 2015
    Date of Patent: May 22, 2018
    Assignee: Vivante Corporation
    Inventor: Mankit Lo
  • Patent number: 9960907
    Abstract: Instructions and logic provide general purpose GF(28) SIMD cryptographic arithmetic functionality. Embodiments include a processor to decode an instruction for a SIMD affine transformation specifying a source data operand, a transformation matrix operand, and a translation vector. The transformation matrix is applied to each element of the source data operand, and the translation vector is applied to each of the transformed elements. A result of the instruction is stored in a SIMD destination register. Some embodiments also decode an instruction for a SIMD binary finite field multiplicative inverse to compute an inverse in a binary finite field modulo an irreducible polynomial for each element of the source data operand. Some embodiments also decode an instruction for a SIMD binary finite field multiplication specifying first and second source data operands to multiply each corresponding pair of elements of the first and second source data operand modulo an irreducible polynomial.
    Type: Grant
    Filed: June 26, 2014
    Date of Patent: May 1, 2018
    Assignee: Intel Corporation
    Inventor: Shay Gueron
  • Patent number: 9946541
    Abstract: Systems, methods, and apparatuses for strided access are described. In some embodiments, a plurality of registers are loaded with data from an array of structures. Then data elements that that are not needed in a permute operation are overwritten with index values with a write mask. The register now contains a mix of data and index values. When this same write mask is passed to the permute instruction which overwrites the index register as destination, the data values are preserved and index values are overwritten with data coming from the other two source registers as controlled by the index values.
    Type: Grant
    Filed: December 18, 2015
    Date of Patent: April 17, 2018
    Assignee: Intel Corporation
    Inventors: Elmoustapha Ould-Ahmed-Vall, Suleyman Sair, Joonmoo Huh
  • Patent number: 9946550
    Abstract: A technique for handling predicated code in an out-of-order processor includes detecting a predicate defining instruction associated with a predicated code region. Renaming of predicated instructions, within the predicated code region, is then stalled until a predicate of the predicate defining instruction is resolved.
    Type: Grant
    Filed: September 17, 2007
    Date of Patent: April 17, 2018
    Assignee: International Business Machines Corporation
    Inventors: Ram Rangan, William E. Speight, Mark W. Stephenson, Lixin Zhang
  • Patent number: 9946547
    Abstract: A load/store unit for a processor, and applications thereof. In an embodiment, the load/store unit includes a load/store queue configured to store information and data associated with a particular class of instructions. Data stored in the load/store queue can be bypassed to dependent instructions. When an instruction belonging to the particular class of instructions graduates and the instruction is associated with a cache miss, control logic causes a pointer to be stored in a load/store graduation buffer that points to an entry in the load/store queue associated with the instruction. The load/store graduation buffer ensures that graduated instructions access a shared resource of the load/store unit in program order.
    Type: Grant
    Filed: September 29, 2006
    Date of Patent: April 17, 2018
    Assignee: ARM Finance Overseas Limited
    Inventors: Meng-Bing Yu, Era K. Nangia, Michael Ni
  • Patent number: 9940132
    Abstract: Techniques are disclosed relating to suspending execution of a processor thread while monitoring for a write to a specified memory location. An execution subsystem may be configured to perform a load instruction that causes the processor to retrieve data from a specified memory location and atomically begin monitoring for a write to the specified location. The load instruction may be a load-monitor instruction. The execution subsystem may be further configured to perform a wait instruction that causes the processor to suspend execution of a processor thread during at least a portion of an interval specified by the wait instruction and to resume execution of the processor thread at the end of the interval. The wait instruction may be a monitor-wait instruction. The processor may be further configured to resume execution of the processor thread in response to detecting a write to a memory location specified by a previous monitor instruction.
    Type: Grant
    Filed: December 14, 2015
    Date of Patent: April 10, 2018
    Assignee: Oracle International Corporation
    Inventors: Paul N. Loewenstein, Mark A. Luttrell, Paul J. Jordan
  • Patent number: 9940137
    Abstract: Data processing apparatus comprises a processor configured to execute instructions, the processor having a pipelined instruction fetching unit configured to fetch instructions from memory during a pipeline period of two or more processor clock cycles prior to execution of those instructions by the processor; exception logic configured to respond to a detected processing exception having an exception type selected from a plurality of exception types, by storing a current processor status and diverting program flow to an exception address dependent upon the exception type so as to control the instruction fetching unit to initiate fetching of an exception instruction at the exception address; and an exception cache configured to cache information, for at least one of the exception types, relating to execution of the exception instruction at the exception address corresponding to that exception type and to provide the cached information to the processor in response to detection of an exception of that exception t
    Type: Grant
    Filed: February 12, 2016
    Date of Patent: April 10, 2018
    Assignee: ARM Limited
    Inventors: Matthew Lee Winrow, Antony John Penton