Patents Examined by William Nguyen
  • Patent number: 9996360
    Abstract: A TRANSACTION ABORT instruction is used to abort a transaction that is executing in a computing environment. The TRANSACTION ABORT instruction includes at least one field used to specify a user-defined abort code that indicates the specific reason for aborting the transaction. Based on executing the TRANSACTION ABORT instruction, a condition code is provided that indicates whether re-execution of the transaction is recommended.
    Type: Grant
    Filed: August 9, 2016
    Date of Patent: June 12, 2018
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Dan F. Greiner, Christian Jacobi, Marcel Mitran, Timothy J. Slegel
  • Patent number: 9983883
    Abstract: A TRANSACTION ABORT instruction is used to abort a transaction that is executing in a computing environment. The TRANSACTION ABORT instruction includes at least one field used to specify a user-defined abort code that indicates the specific reason for aborting the transaction. Based on executing the TRANSACTION ABORT instruction, a condition code is provided that indicates whether re-execution of the transaction is recommended.
    Type: Grant
    Filed: August 9, 2016
    Date of Patent: May 29, 2018
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Dan F. Greiner, Christian Jacobi, Marcel Mitran, Timothy J. Slegel
  • Patent number: 9959119
    Abstract: A computer processor including an instruction buffer configured to store at least one variable-length instruction having a bit bundle bounded by a head end and a tail end with a plurality of slots each defining a corresponding operation, wherein the plurality of slots and corresponding operations are logically partitioned into a plurality of distinct blocks with a first group of blocks extending from the head end of the bit bundle toward the tail end of the bit bundle and a second group of blocks extending from the tail end of the bit bundle toward the head end of the bit bundle, wherein the second group of blocks includes a tail end block disposed adjacent the tail end of the bit bundle. A decode stage is operably coupled to the instruction buffer and configured to process a given variable-length instruction stored by the instruction buffer by decoding at least one operation of a particular block belonging to the first group of blocks in parallel with decoding at least one operation of the tail end block.
    Type: Grant
    Filed: May 29, 2014
    Date of Patent: May 1, 2018
    Assignee: MILL COMPUTING, INC.
    Inventors: Roger Rawson Godard, Arthur David Kahlich, David Arthur Yost
  • Patent number: 9946540
    Abstract: An apparatus is described having instruction execution logic circuitry. The instruction execution logic circuitry has input vector element routing circuitry to perform the following for each of three different instructions: for each of a plurality of output vector element locations, route into an output vector element location an input vector element from one of a plurality of input vector element locations that are available to source the output vector element. The output vector element and each of the input vector element locations are one of three available bit widths for the three different instructions. The apparatus further includes masking layer circuitry coupled to the input vector element routing circuitry to mask a data structure created by the input vector routing element circuitry. The masking layer circuitry is designed to mask at three different levels of granularity that correspond to the three available bit widths.
    Type: Grant
    Filed: May 22, 2017
    Date of Patent: April 17, 2018
    Assignee: INTEL CORPORATION
    Inventors: Elmoustapha Ould-Ahmed-Vall, Robert Valentine, Jesus Corbal, Bret L. Toll, Mark J. Charney, Zeev Sperber, Amit Gradstein
  • Patent number: 9921847
    Abstract: In one embodiment of the present invention, a streaming multiprocessor (SM) uses a tree of nodes to manage threads. Each node specifies a set of active threads and a program counter. Upon encountering a conditional instruction that causes an execution path to diverge, the SM creates child nodes corresponding to each of the divergent execution paths. Based on the conditional instruction, the SM assigns each active thread included in the parent node to at most one child node, and the SM temporarily discontinues executing instructions specified by the parent node. Instead, the SM concurrently executes instructions specified by the child nodes. After all the divergent paths reconverge to the parent path, the SM resumes executing instructions specified by the parent node. Advantageously, the disclosed techniques enable the SM to execute divergent paths in parallel, thereby reducing undesirable program behavior associated with conventional techniques that serialize divergent paths across thread groups.
    Type: Grant
    Filed: January 21, 2014
    Date of Patent: March 20, 2018
    Assignee: NVIDIA Corporation
    Inventor: John Erik Lindholm
  • Patent number: 9904616
    Abstract: Generating instructions, in particular for mailbox verification in a simulation environment. A sequence of instructions is received, as well as selection data representative of a plurality of commands including a special command. Repeatedly selecting one of the plurality of commands and outputting an instruction based on the selected command. The outputting of an instruction includes outputting a next instruction in the sequence of instructions if the selected command is the special command, and outputting an instruction associated with the command if the selected command is not the special command.
    Type: Grant
    Filed: November 1, 2012
    Date of Patent: February 27, 2018
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Joerg Deutschle, Ursel Hahn, Joerg Walter, Ernst-Dieter Weissenberger
  • Patent number: 9858111
    Abstract: Technologies are generally described for systems, devices and methods relating to multicore processors. The multicore processors may include first and second tiles with first and second caches, respectively. The first cache may include first magnetoresistive random access memory (MRAM) cells with first storage characteristics. The second cache may include second MRAM cells with second storage characteristics different from the first storage characteristics. In some examples, an interconnect structure may be coupled to the first and second tiles and may be configured to provide communication between the first tile and the second tile. Methods for handling migration between tiles and cores are also described.
    Type: Grant
    Filed: June 18, 2014
    Date of Patent: January 2, 2018
    Assignee: EMPIRE TECHNOLOGIES DEVELOPMENT LLC
    Inventor: Yan Solihin
  • Patent number: 9830161
    Abstract: In one embodiment of the present invention, a streaming multiprocessor (SM) uses a tree of nodes to manage threads. Each node specifies a set of active threads and a program counter. Upon encountering a conditional instruction that causes an execution path to diverge, the SM creates child nodes corresponding to each of the divergent execution paths. Based on the conditional instruction, the SM assigns each active thread included in the parent node to at most one child node, and the SM temporarily discontinues executing instructions specified by the parent node. Instead, the SM concurrently executes instructions specified by the child nodes. After all the divergent paths reconverge to the parent path, the SM resumes executing instructions specified by the parent node. Advantageously, the disclosed techniques enable the SM to execute divergent paths in parallel, thereby reducing undesirable program behavior associated with conventional techniques that serialize divergent paths across thread groups.
    Type: Grant
    Filed: January 21, 2014
    Date of Patent: November 28, 2017
    Assignee: NVIDIA Corporation
    Inventors: John Erik Lindholm, Michael C. Shebanow
  • Patent number: 9785441
    Abstract: A computer processor that operates on distinct first and second instruction streams that have a predefined timed semantic relationship. At least one of the first and second instruction streams includes variable-length instructions having a header and associated bundle bounded by a head end and a tail end. An alignment hole within the bundle encodes information representing at least one nop operation. The computer processor includes first and second multi-stage instruction processing components configured to process in parallel the first and second instruction streams. At least one of the first and second multi-stage instruction processing components includes an instruction buffer operably coupled to a decode stage. The decode stage is configured to process a variable-length instruction by isolating and interpreting the alignment hole of the variable length instruction in order to initiate zero or more nop operations that follow the timed semantic relationship between the first and second instruction streams.
    Type: Grant
    Filed: May 29, 2014
    Date of Patent: October 10, 2017
    Assignee: Mill Computing, Inc.
    Inventors: Roger Rawson Godard, Arthur David Kahlich, David Arthur Yost
  • Patent number: 9760371
    Abstract: A method of an aspect includes receiving a packed data operation mask register arithmetic combination instruction. The packed data operation mask register arithmetic combination instruction indicates a first packed data operation mask register, indicates a second packed data operation mask register, and indicates a destination storage location. An arithmetic combination of at least a portion of bits of the first packed data operation mask register and at least a corresponding portion of bits of the second packed data operation mask register is stored in the destination storage location in response to the packed data operation mask register arithmetic combination instruction. Other methods, apparatus, systems, and instructions are disclosed.
    Type: Grant
    Filed: December 22, 2011
    Date of Patent: September 12, 2017
    Assignee: Intel Corporation
    Inventors: Bret L. Toll, Robert Valentine, Jesus Corbal San Adrian, Elmoustapha Ould-Ahmed-Vall, Mark J. Charney
  • Patent number: 9658850
    Abstract: An apparatus is described having instruction execution logic circuitry. The instruction execution logic circuitry has input vector element routing circuitry to perform the following for each of three different instructions: for each of a plurality of output vector element locations, route into an output vector element location an input vector element from one of a plurality of input vector element locations that are available to source the output vector element. The output vector element and each of the input vector element locations are one of three available bit widths for the three different instructions. The apparatus further includes masking layer circuitry coupled to the input vector element routing circuitry to mask a data structure created by the input vector routing element circuitry. The masking layer circuitry is designed to mask at three different levels of granularity that correspond to the three available bit widths.
    Type: Grant
    Filed: December 23, 2011
    Date of Patent: May 23, 2017
    Assignee: Intel Corporation
    Inventors: Elmoustapha Ould-Ahmed-Vall, Robert Valentine, Jesus Corbal, Bret L. Toll, Mark J. Charney, Zeev Sperber, Amit Gradstein
  • Patent number: 9619236
    Abstract: An apparatus is described having instruction execution logic circuitry to execute first, second, third and fourth instruction. Both the first instruction and the second instruction insert a first group of input vector elements to one of multiple first non overlapping sections of respective first and second resultant vectors. The first group has a first bit width. Each of the multiple first non overlapping sections have a same bit width as the first group. Both the third instruction and the fourth instruction insert a second group of input vector elements to one of multiple second non overlapping sections of respective third and fourth resultant vectors. The second group has a second bit width that is larger than said first bit width. Each of the multiple second non overlapping sections have a same bit width as the second group.
    Type: Grant
    Filed: December 23, 2011
    Date of Patent: April 11, 2017
    Assignee: Intel Corporation
    Inventors: Elmoustapha Ould-Ahmed-Vall, Robert Valentine, Jesus Corbal, Bret L. Toll, Mark J. Charney, Zeev Sperber, Amit Gradstein
  • Patent number: 9529598
    Abstract: A TRANSACTION ABORT instruction is used to abort a transaction that is executing in a computing environment. The TRANSACTION ABORT instruction includes at least one field used to specify a user-defined abort code that indicates the specific reason for aborting the transaction. Based on executing the TRANSACTION ABORT instruction, a condition code is provided that indicates whether re-execution of the transaction is recommended.
    Type: Grant
    Filed: March 8, 2013
    Date of Patent: December 27, 2016
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Dan F. Greiner, Christian Jacobi, Marcel Mitran, Timothy J. Slegel
  • Patent number: 9436477
    Abstract: A TRANSACTION ABORT instruction is used to abort a transaction that is executing in a computing environment. The TRANSACTION ABORT instruction includes at least one field used to specify a user-defined abort code that indicates the specific reason for aborting the transaction. Based on executing the TRANSACTION ABORT instruction, a condition code is provided that indicates whether re-execution of the transaction is recommended.
    Type: Grant
    Filed: June 15, 2012
    Date of Patent: September 6, 2016
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Dan F. Greiner, Christian Jacobi, Marcel M. Mitran, Timothy J. Slegel
  • Patent number: 9424031
    Abstract: Various embodiments are generally directed to overcoming limitations of vector registers in their use with bit-parallel string matching algorithms. An apparatus includes a processor element; and logic to receive a pattern comprising a first string of elements to employ in a string matching operation, instantiate a test bitmask in a first vector register of the processor element, the first vector register comprising multiple lanes, copy bit values at MSB bit positions of the multiple lanes of the first vector register to a first vector mask as a vector value, bit-shift the vector value as a scalar value, bit-shift the first vector register, employ the vector value of the first vector mask to selectively fill LSB bit positions of lanes of a second vector register of the processor element; and OR the second vector register into the first vector register. Other embodiments are described and claimed.
    Type: Grant
    Filed: March 13, 2013
    Date of Patent: August 23, 2016
    Assignee: INTEL CORPORATION
    Inventors: Hariharan Thantry, Mani Azimi
  • Patent number: 9390054
    Abstract: In a parallel computer, a largest logical plane from a plurality of logical planes formed of compute nodes of a subcommunicator may be identified by: identifying, by each compute node of the subcommunicator, all logical planes that include the compute node; calculating, by each compute node for each identified logical plane that includes the compute node, an area of the identified logical plane; initiating, by a root node of the subcommunicator, a gather operation; receiving, by the root node from each compute node of the subcommunicator, each node's calculated areas as contribution data to the gather operation; and identifying, by the root node in dependence upon the received calculated areas, a logical plane of the subcommunicator having the greatest area.
    Type: Grant
    Filed: October 14, 2013
    Date of Patent: July 12, 2016
    Assignee: International Business Machines Corporation
    Inventors: Kristan D. Davis, Daniel A. Faraj
  • Patent number: 9389868
    Abstract: An apparatus includes a network interface, memory, and a processor. The processor is coupled with the network interface and memory. The processor is configured to determine that an instruction instance is a branch instruction instance. Responsive to a determination that an instruction instance is a branch instruction instance, the processor is configured to obtain a branch prediction for the branch instruction instance and a confidence value of the branch prediction. The processor is further configured to determine that the confidence for the branch prediction is low based on the confidence value, and responsive to such a determination, generate predicated instruction instances based on the branch instruction instance.
    Type: Grant
    Filed: November 1, 2012
    Date of Patent: July 12, 2016
    Assignee: International Business Machines Corporation
    Inventor: Michael Karl Gschwind
  • Patent number: 9367323
    Abstract: An operation is provided to signal a processor that action is to be taken to facilitate execution of a transaction that has aborted one or more times. The operation is specified within an instruction or is itself an instruction. The instruction is executed based on detecting an abort of the transactions, and includes a field indicating how many times the transaction has aborted. The processor uses this information to determine what action is to be taken.
    Type: Grant
    Filed: June 15, 2012
    Date of Patent: June 14, 2016
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Dan F. Greiner, Christian Jacobi, Randall W. Philley, Peter J. Relson, Timothy J. Slegel
  • Patent number: 9354875
    Abstract: An enhanced loop streaming detection mechanism is provided in a processor to reduce power consumption. The processor includes a decoder to decode instructions in a loop into micro-operations, and a loop streaming detector to detect the presence of the loop in the micro-operations. The processor also includes a loop characteristic tracker unit to identify hardware components downstream from the decoder that are not to be used by the micro-operations in the loop, and to disable the identified hardware components. The processor also includes execution circuitry to execute the micro-operations in the loop with the identified hardware components disabled.
    Type: Grant
    Filed: December 27, 2012
    Date of Patent: May 31, 2016
    Assignee: Intel Corporation
    Inventors: Matthew C. Merten, Justin M. Deinlein, Yury N. Ilin, Alexandre J. Farcy, Tong Li, Srikanth T. Srinivasan
  • Patent number: 9354884
    Abstract: A method and circuit arrangement provide support for a hybrid pipeline that dynamically switches between out-of-order and in-order modes. The hybrid pipeline may selectively execute instructions from at least one instruction stream that require the high performance capabilities provided by out-of-order processing in the out-of-order mode. The hybrid pipeline may also execute instructions that have strict power requirements in the in-order mode where the in-order mode conserves more power compared to the out-of-order mode. Each stage in the hybrid pipeline may be activated and fully functional when the hybrid pipeline is in the out-of-order mode. However, stages in the hybrid pipeline not used for the in-order mode may be deactivated and bypassed by the instructions when the hybrid pipeline dynamically switches from the out-of-order mode to the in-order mode. The deactivated stages may then be reactivated when the hybrid pipeline dynamically switches from the in-order mode to the out-of-order mode.
    Type: Grant
    Filed: March 13, 2013
    Date of Patent: May 31, 2016
    Assignee: International Business Machines Corporation
    Inventors: Miguel Comparan, Andrew D. Hilton, Hans M. Jacobson, Brian M. Rogers, Robert A. Shearer, Ken V. Vu, Alfred T. Watson, III