Patents Examined by Corey S Faherty
  • Patent number: 9946541
    Abstract: Systems, methods, and apparatuses for strided access are described. In some embodiments, a plurality of registers are loaded with data from an array of structures. Then data elements that that are not needed in a permute operation are overwritten with index values with a write mask. The register now contains a mix of data and index values. When this same write mask is passed to the permute instruction which overwrites the index register as destination, the data values are preserved and index values are overwritten with data coming from the other two source registers as controlled by the index values.
    Type: Grant
    Filed: December 18, 2015
    Date of Patent: April 17, 2018
    Assignee: Intel Corporation
    Inventors: Elmoustapha Ould-Ahmed-Vall, Suleyman Sair, Joonmoo Huh
  • Patent number: 9946547
    Abstract: A load/store unit for a processor, and applications thereof. In an embodiment, the load/store unit includes a load/store queue configured to store information and data associated with a particular class of instructions. Data stored in the load/store queue can be bypassed to dependent instructions. When an instruction belonging to the particular class of instructions graduates and the instruction is associated with a cache miss, control logic causes a pointer to be stored in a load/store graduation buffer that points to an entry in the load/store queue associated with the instruction. The load/store graduation buffer ensures that graduated instructions access a shared resource of the load/store unit in program order.
    Type: Grant
    Filed: September 29, 2006
    Date of Patent: April 17, 2018
    Assignee: ARM Finance Overseas Limited
    Inventors: Meng-Bing Yu, Era K. Nangia, Michael Ni
  • Patent number: 9946550
    Abstract: A technique for handling predicated code in an out-of-order processor includes detecting a predicate defining instruction associated with a predicated code region. Renaming of predicated instructions, within the predicated code region, is then stalled until a predicate of the predicate defining instruction is resolved.
    Type: Grant
    Filed: September 17, 2007
    Date of Patent: April 17, 2018
    Assignee: International Business Machines Corporation
    Inventors: Ram Rangan, William E. Speight, Mark W. Stephenson, Lixin Zhang
  • Patent number: 9940137
    Abstract: Data processing apparatus comprises a processor configured to execute instructions, the processor having a pipelined instruction fetching unit configured to fetch instructions from memory during a pipeline period of two or more processor clock cycles prior to execution of those instructions by the processor; exception logic configured to respond to a detected processing exception having an exception type selected from a plurality of exception types, by storing a current processor status and diverting program flow to an exception address dependent upon the exception type so as to control the instruction fetching unit to initiate fetching of an exception instruction at the exception address; and an exception cache configured to cache information, for at least one of the exception types, relating to execution of the exception instruction at the exception address corresponding to that exception type and to provide the cached information to the processor in response to detection of an exception of that exception t
    Type: Grant
    Filed: February 12, 2016
    Date of Patent: April 10, 2018
    Assignee: ARM Limited
    Inventors: Matthew Lee Winrow, Antony John Penton
  • Patent number: 9940132
    Abstract: Techniques are disclosed relating to suspending execution of a processor thread while monitoring for a write to a specified memory location. An execution subsystem may be configured to perform a load instruction that causes the processor to retrieve data from a specified memory location and atomically begin monitoring for a write to the specified location. The load instruction may be a load-monitor instruction. The execution subsystem may be further configured to perform a wait instruction that causes the processor to suspend execution of a processor thread during at least a portion of an interval specified by the wait instruction and to resume execution of the processor thread at the end of the interval. The wait instruction may be a monitor-wait instruction. The processor may be further configured to resume execution of the processor thread in response to detecting a write to a memory location specified by a previous monitor instruction.
    Type: Grant
    Filed: December 14, 2015
    Date of Patent: April 10, 2018
    Assignee: Oracle International Corporation
    Inventors: Paul N. Loewenstein, Mark A. Luttrell, Paul J. Jordan
  • Patent number: 9940242
    Abstract: A technique for processing instructions includes examining instructions in an instruction stream of a processor to determine properties of the instructions. The properties indicate whether the instructions may belong in an instruction sequence subject to decode-time instruction optimization (DTIO). Whether the properties of multiple ones of the instructions are compatible for inclusion within an instruction sequence of a same group is determined. The instructions with compatible ones of the properties are grouped into a first instruction group. The instructions of the first instruction group are decoded subsequent to formation of the first instruction group. Whether the first instruction group actually includes a DTIO sequence is verified based on the decoding. Based on the verifying, DTIO is performed on the instructions of the first instruction group or is not performed on the instructions of the first instruction group.
    Type: Grant
    Filed: November 17, 2014
    Date of Patent: April 10, 2018
    Assignee: International Business Machines Corporation
    Inventors: Michael K. Gschwind, Valentina Salapura
  • Patent number: 9934039
    Abstract: Methods of predicting stack pointer values of variables stored in a stack are described. When an instruction is seen which stores a variable in the stack in a position offset from the stack pointer, an entry is added to a data structure which identifies the physical register which currently stores the stack pointer, the physical register which stores the value of the variable and the offset value. Subsequently when an instruction to load a variable from the stack from a position which is identified by reference to the stack pointer is seen, the data structure is searched to see if there is a corresponding entry which includes the same offset and the same physical register storing the stack pointer as the load instruction. If a corresponding entry is found the architectural register in the load instruction is mapped to the physical register storing the value of the variable from the entry.
    Type: Grant
    Filed: January 16, 2015
    Date of Patent: April 3, 2018
    Assignee: MIPS Tech Limited
    Inventor: Hugh Jackson
  • Patent number: 9934035
    Abstract: A data processing device for executing a program is described. The program comprises one or more instruction groups and one or more predicates, each instruction group comprising one or more instructions. The data processing device comprises a processing unit and a trace unit connected to or integrated in the processing unit. The trace unit generates a predicate trace for tracing the values of the one or more predicates. The processing unit executes, in each of a series of execution periods, one of the instruction groups and updated the values of none, one, or more of the predicates in dependence on the respective instruction group. The trace unit appends the updated values of the none, one, or more predicates to the predicate trace and does not append any non-updated values of the predicates. A method of reporting predicate values and a data carrier are also disclosed.
    Type: Grant
    Filed: March 21, 2013
    Date of Patent: April 3, 2018
    Assignee: NXP USA, Inc.
    Inventors: Uri Dayan, Erez Arbel-Meirovich, Liron Artsi, Doron Schupper
  • Patent number: 9928065
    Abstract: A method and apparatus provide means for compressing instruction code size. An Instruction Set Architecture (ISA) encodes instructions compact, usual or extended bit lengths. Commonly used instructions are encoded having both compact and usual bit lengths, with compact or usual bit length instructions chosen based on power, performance or code size requirements. Instructions of the ISA can be used in both privileged and non-privileged operating modes of a microprocessor. The instruction encodings can be used interchangeably in software applications. Instructions from the ISA may be executed on any programmable device enabled for the ISA, including a single instruction set architecture processor or a multi-instruction set architecture processor.
    Type: Grant
    Filed: February 1, 2016
    Date of Patent: March 27, 2018
    Assignee: ARM Finance Overseas Limited
    Inventor: Erik K. Norden
  • Patent number: 9921848
    Abstract: Embodiments relate to address expansion and contraction in a multithreading computer system. According to one aspect, a computer system includes a configuration with a core configurable between a single thread (ST) mode and a multithreading (MT) mode. The ST mode addresses a primary thread and the MT mode addresses the primary thread and one or more secondary threads on shared resources of the core. A multithreading facility is configured to control utilization of the configuration to perform a method that includes accessing the primary thread in the ST mode using a core address value and switching from the ST mode to the MT mode. The primary thread or one of the one or more secondary threads is accessed in the MT mode using an expanded address value, where the expanded address value includes the core address value concatenated with a thread address value.
    Type: Grant
    Filed: March 27, 2014
    Date of Patent: March 20, 2018
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Jonathan D. Bradbury, Fadi Y. Busaba, Mark S. Farrell, Charles W. Gainey, Jr., Dan F. Greiner, Lisa Cranton Heller, Jeffrey P. Kubala, Damian L. Osisek, Donald W. Schmidt, Timothy J. Slegel
  • Patent number: 9921849
    Abstract: Embodiments relate to address expansion and contraction in a multithreading computer system. According to one aspect, a computer implemented method for address adjustment in a configuration is provided. The configuration includes a core configurable between an ST mode and an MT mode, where the ST mode addresses a primary thread and the MT mode addresses the primary thread and one or more secondary threads on shared resources of the core. The primary thread is accessed in the ST mode using a core address value. Switching from the ST mode to the MT mode is performed. The primary thread or one of the one or more secondary threads is accessed in the MT mode using an expanded address value. The expanded address value includes the core address value concatenated with a thread address value.
    Type: Grant
    Filed: August 18, 2015
    Date of Patent: March 20, 2018
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Jonathan D. Bradbury, Fadi Y. Busaba, Mark S. Farrell, Charles W. Gainey, Jr., Dan F. Greiner, Lisa Cranton Heller, Jeffrey P. Kubala, Damian L. Osisek, Donald W. Schmidt, Timothy J. Slegel
  • Patent number: 9904547
    Abstract: A method of an aspect includes receiving a packed data rearrangement control indexes generation instruction. The packed data rearrangement control indexes generation instruction indicates a destination storage location. A result is stored in the destination storage location in response to the packed data rearrangement control indexes generation instruction. The result includes a sequence of at least four non-negative integers representing packed data rearrangement control indexes. In an aspect, values of the at least four non-negative integers are not calculated using a result of a preceding instruction. Other methods, apparatus, systems, and instructions are disclosed.
    Type: Grant
    Filed: December 22, 2011
    Date of Patent: February 27, 2018
    Assignee: Intel Corporation
    Inventors: Elmoustapha Ould-Ahmed-Vall, Seth Abraham, Robert Valentine, Zeev Sperber, Amit Gradstein
  • Patent number: 9904545
    Abstract: According to one general aspect, an apparatus may include a monolithic shifter configured to receive a plurality of bytes of data, and, for each byte of data, a number of bits to shift the respective byte of data, wherein the number of bits for each byte of data need not be the same as for any other byte of data. The monolithic shifter may be configured to shift each byte of data by the respective number of bits. The apparatus may include a mask generator configured to compute a mask for each byte of data, wherein each mask indicates which bits, if any, are to be prevented from being polluted by a neighboring shifted byte of data. The apparatus may include a masking circuit configured to combine the shifted byte of data with a respective mask to create an unpolluted shifted byte of data.
    Type: Grant
    Filed: September 16, 2015
    Date of Patent: February 27, 2018
    Assignee: SAMSUNG ELECTRONICS CO., LTD.
    Inventor: Eric C. Quinnell
  • Patent number: 9906359
    Abstract: Instructions and logic provide general purpose GF(28) SIMD cryptographic arithmetic functionality. Embodiments include a processor to decode an instruction for a SIMD affine transformation specifying a source data operand, a transformation matrix operand, and a translation vector. The transformation matrix is applied to each element of the source data operand, and the translation vector is applied to each of the transformed elements. A result of the instruction is stored in a SIMD destination register. Some embodiments also decode an instruction for a SIMD binary finite field multiplicative inverse to compute an inverse in a binary finite field modulo an irreducible polynomial for each element of the source data operand. Some embodiments also decode an instruction for a SIMD binary finite field multiplication specifying first and second source data operands to multiply each corresponding pair of elements of the first and second source data operand modulo an irreducible polynomial.
    Type: Grant
    Filed: January 13, 2017
    Date of Patent: February 27, 2018
    Assignee: Intel Corporation
    Inventor: Shay Gueron
  • Patent number: 9898283
    Abstract: A method of an aspect includes receiving an instruction. The instruction indicates an integer stride, indicates an integer offset, and indicates a destination storage location. A result is stored in the destination storage location in response to the instruction. The result includes a sequence of at least four integers in numerical order with a smallest one of the at least four integers differing from zero by the integer offset and with all integers of the sequence in consecutive positions differing by the integer stride. Other methods, apparatus, systems, and instructions are disclosed.
    Type: Grant
    Filed: December 22, 2011
    Date of Patent: February 20, 2018
    Assignee: Intel Corporation
    Inventors: Seth Abraham, Elmoustapha Ould-Ahmed-Vall, Robert Valentine, Zeev Sperber, Amit Gradstein
  • Patent number: 9886281
    Abstract: A SIMD processor with a versatile hardware configuration performs efficient range determination that is frequently used in image processing and recognition. A SIMD processor includes a range determination arithmetic unit including first and second registers that can store two values. The SIMD processor uses three values, namely, these two values and the value of source data input from a register file unit, to flexibly set the processing target data for range determination and the two boundaries defining the processing target range of the range determination.
    Type: Grant
    Filed: March 17, 2015
    Date of Patent: February 6, 2018
    Assignee: MegaChips Corporation
    Inventor: Shohei Nomoto
  • Patent number: 9886278
    Abstract: A processing device includes an execute processor configured to execute data processing instructions; and an access processor configured to be coupled with a memory system to execute memory access instructions; wherein the execute processor and the access processor are logically separated units, the execute processor having an execute processor input register file with input registers, and a data processing instruction is executed as soon as all operands for the respective data processing instruction are available in the input registers.
    Type: Grant
    Filed: September 25, 2014
    Date of Patent: February 6, 2018
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventor: Jan Van Lunteren
  • Patent number: 9886325
    Abstract: A large-scale data processing system and method including a plurality of processes, wherein a master process assigns input data blocks to respective map processes and partitions of intermediate data are assigned to respective reduce processes. In each of the plurality of map processes an application-independent map program retrieves a sequence of input data blocks assigned thereto by the master process and applies an application-specific map function to each input data block in the sequence to produce the intermediate data and stores the intermediate data in high speed memory of the interconnected processors. Each of the plurality of reduce processes receives a respective partition of the intermediate data from the high speed memory of the interconnected processors while the map processes continue to process input data blocks an application-specific reduce function is applied to the respective partition of the intermediate data to produce output values.
    Type: Grant
    Filed: July 18, 2016
    Date of Patent: February 6, 2018
    Assignee: GOOGLE LLC
    Inventors: Grzegorz Malewicz, Marian Dvorsky, Christopher B. Colohan, Derek P. Thomson, Joshua Louis Levenberg
  • Patent number: 9886416
    Abstract: A matrix of execution blocks form a set of rows and columns. The rows support parallel execution of instructions and the columns support execution of dependent instructions. The matrix of execution blocks process a single block of instructions specifying parallel and dependent instructions.
    Type: Grant
    Filed: June 8, 2015
    Date of Patent: February 6, 2018
    Assignee: INTEL CORPORATION
    Inventor: Mohammad A. Abdallah
  • Patent number: 9870225
    Abstract: A processor comprises a decoder for decoding an instruction based both on an explicit opcode identifier and on metadata encoded in the instruction. For example, a relative order of source register names may be used to decode the instruction. As an example, an instruction set may have a Branch Equal (BEQ) specifying two registers (r1 and r2) that store values that are compared for equality. An instruction set can provide a single opcode identifier for BEQ and a processor can determine whether to decode a particular instance of that opcode identifier as BEQ or another instruction, in dependence on an order of appearance of the source registers in that instance. For example, the BEQ opcode can be interpreted as a branch not equal, if a higher numbered register appears before a lower numbered register. Additional forms of metadata can include interpreting a constant included in an instruction, as well as determining equality of source registers, among other forms of metadata.
    Type: Grant
    Filed: December 16, 2014
    Date of Patent: January 16, 2018
    Assignee: MIPS Tech, LLC
    Inventor: Ranganathan Sudhakar