Patents Examined by Jacob A. Petranek
  • Patent number: 10936314
    Abstract: Branch prediction is suppressed for branch instructions executing in a transaction of a transactional memory (TM) environment in transactions that are re-executions of previously aborted transactions.
    Type: Grant
    Filed: April 9, 2019
    Date of Patent: March 2, 2021
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Michael K. Gschwind, Valentina Salapura, Chung-Lung Shum
  • Patent number: 10929135
    Abstract: Predicting a predicted value to be used in register-indirect branching. The predicted value is stored in a first selected location and a second selected location accessible to one or more instructions of a computing environment. The storing is performed concurrently to processing a register-indirect branch. Further, the first selected location and the second selected location is in addition to another location used to store an instruction address. The predicted value is used in speculative processing that includes the register-indirect branch.
    Type: Grant
    Filed: November 21, 2017
    Date of Patent: February 23, 2021
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Michael K. Gschwind, Valentina Salapura
  • Patent number: 10922080
    Abstract: Disclosed embodiments relate to systems and methods for performing instructions structured to compute a min/max value of a vector. In one example, a processor executes a decoded single instruction to determine on a per data element position of the identified first and second operands a maximum or minimum, store the determined maximum or minimums in corresponding data element positions of the identified first operand, and determine and store, in each data element position of the identified third operand, an indication of where the maximum or minimum came from.
    Type: Grant
    Filed: September 29, 2018
    Date of Patent: February 16, 2021
    Assignee: Intel Corporation
    Inventors: Sunny L. Gogar, Rama Kishan V. Malladi, Elmoustapha Ould-Ahmed-Vall, Christopher J. Hughes
  • Patent number: 10922086
    Abstract: To perform a reduction operation to combine data values for threads in a thread group using a data processor, the data processor performs combining steps that each combine the stored combined data value result of a previous combining operation for a thread with the combined data value result of the previous combining operation for a selected another execution lane that has not yet contributed to the stored combined data value result for the thread. The data processor selects as the another execution lane of the execution processing circuitry that has not yet contributed to the combined data value result for the thread, an execution lane from a group of execution lanes whose values have been combined in the previous combining step and that have not yet contributed to the combined data value result for the thread, and having a particular relative position in the group of execution lanes.
    Type: Grant
    Filed: June 15, 2019
    Date of Patent: February 16, 2021
    Assignee: Arm Limited
    Inventor: Kevin Petit
  • Patent number: 10922098
    Abstract: Apparatuses and methods are disclosed for an FPGA architecture that may improve processing speed and efficiency in processing less complex operands. Some applications may utilize operands that are less complex, such as operands that are 1, 2, or 4 bits, for example. In some examples, the DSP architecture may skip or avoid processing all received operands or may process a common operand more frequently than other operands. An example apparatus may include a first configurable logic unit configured to receive a first operand and a second operand; a second configurable logic unit configured to receive a third operand and the first calculated operand; a first switch configured to receive the first operand and a fourth operand and to output a first selected operand; and a second switch configured to receive the second calculated operand and the first selected operand.
    Type: Grant
    Filed: October 5, 2017
    Date of Patent: February 16, 2021
    Assignee: Micron Technology, Inc.
    Inventors: Gregory Edvenson, Jeremy Chritz, David Hulton
  • Patent number: 10915327
    Abstract: Aspects of the present disclosure relate to an apparatus comprising a plurality of clusters, each cluster having a plurality of execution units to execute instructions. The apparatus comprises dispatch circuitry to determine, for each instruction to be executed, a chosen cluster from amongst the plurality of clusters to which to dispatch that instruction for execution. This determination is performed by selecting between a default dispatch policy wherein said chosen cluster is a cluster to which an earlier instruction to generate at least one source operand of said instruction was dispatched for execution, and an alternative dispatch policy for selecting said chosen cluster. Said selecting is based on a selection parameter. The dispatch circuitry is further configured to dispatch said instruction to the chosen cluster for execution.
    Type: Grant
    Filed: December 14, 2018
    Date of Patent: February 9, 2021
    Assignee: Arm Limited
    Inventors: Luca Nassi, Remi Marius Teyssier, François Donati, Damian Maiorano
  • Patent number: 10915328
    Abstract: An apparatus and method for offloading iterative, parallel work to a data parallel cluster. For example, one embodiment of a processor comprises: a host processor to execute a primary thread; a data parallel cluster coupled to the host processor over a high speed interconnect, the data parallel cluster comprising a plurality of execution lanes to perform parallel execution of one or more secondary threads related to the primary thread; and a data parallel cluster controller integral to the host processor to offload processing of the one or more secondary threads to the data parallel cluster in response to one of the cores executing a parallel processing call instruction from the primary thread.
    Type: Grant
    Filed: December 14, 2018
    Date of Patent: February 9, 2021
    Assignee: Intel Corporation
    Inventors: Jonathan Pearce, David Sheffield, Srikanth Srinivasan, Jeffrey Cook, Deborah Marr
  • Patent number: 10915488
    Abstract: An inter-processor synchronization method using point-to-point links, comprises the steps of defining a point-to-point synchronization channel between a source processor and a target processor; executing in the source processor a wait command expecting a notification associated with the synchronization channel, wherein the wait command is designed to stop the source processor until the notification is received; executing in the target processor a notification command designed to transmit through the point-to-point link the notification expected by the source processor; executing in the target processor a wait command expecting a notification associated with the synchronization channel, wherein the wait command is designed to stop the target processor until the notification is received; and executing in the source processor a notification command designed to transmit through the point-to-point link the notification expected by the target processor.
    Type: Grant
    Filed: May 19, 2015
    Date of Patent: February 9, 2021
    Assignee: KALRAY
    Inventors: Benoît Dupont De Dinechin, Vincent Ray
  • Patent number: 10908911
    Abstract: Predicting a predicted value to be used in register-indirect branching. The predicted value is stored in a first selected location and a second selected location accessible to one or more instructions of a computing environment. The storing is performed concurrently to processing a register-indirect branch. Further, the first selected location and the second selected location is in addition to another location used to store an instruction address. The predicted value is used in speculative processing that includes the register-indirect branch.
    Type: Grant
    Filed: August 18, 2017
    Date of Patent: February 2, 2021
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Michael K. Gschwind, Valentina Salapura
  • Patent number: 10901738
    Abstract: Bulk store and load operations of configuration state registers. An instruction to perform a bulk operation for a group of configuration state registers having a common characteristic is executed. To perform the bulk operation for the group of configuration state registers, a plurality of operations is performed, and based on performing the plurality of operations, the instruction is completed.
    Type: Grant
    Filed: November 14, 2017
    Date of Patent: January 26, 2021
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Michael K. Gschwind, Valentina Salapura
  • Patent number: 10901940
    Abstract: A processor includes a widest set of data registers that corresponds to a given logical processor. Each of the data registers of the widest set have a first width in bits. A decode unit that corresponds to the given logical processor is to decode instructions that specify the data registers of the widest set, and is to decode an atomic store to memory instruction. The atomic store to memory instruction is to indicate data that is to have a second width in bits that is wider than the first width in bits. The atomic store to memory instruction is to indicate memory address information associated with a memory location. An execution unit is coupled with the decode unit. The execution unit, in response to the atomic store to memory instruction, is to atomically store the indicated data to the memory location.
    Type: Grant
    Filed: April 2, 2016
    Date of Patent: January 26, 2021
    Assignee: INTEL CORPORATION
    Inventors: Vedvyas Shanbhogue, Stephen J. Robinson, Christopher D. Bryant, Jason W. Brandt
  • Patent number: 10901741
    Abstract: A fusion opportunity is detected for a sequence of instructions. The sequence of instructions include an indication of an affiliated location and an indication of an affiliated derived location. Based on the detecting, a value to be stored in the affiliated derived location is generated. The value is a predicted value. The value is stored in the affiliated derived location, and the affiliated derived location is accessed to use the value by one or more instructions executing within the computing environment.
    Type: Grant
    Filed: November 21, 2017
    Date of Patent: January 26, 2021
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Michael K. Gschwind, Valentina Salapura
  • Patent number: 10896044
    Abstract: The techniques described herein provide an instruction fetch and decode unit having an operation cache with low latency in switching between fetching decoded operations from the operation cache and fetching and decoding instructions using a decode unit. This low latency is accomplished through a synchronization mechanism that allows work to flow through both the operation cache path and the instruction cache path until that work is stopped due to needing to wait on output from the opposite path. The existence of decoupling buffers in the operation cache path and the instruction cache path allows work to be held until that work is cleared to proceed. Other improvements, such as a specially configured operation cache tag array that allows for detection of multiple hits in a single cycle, also improve latency by, for example, improving the speed at which entries are consumed from a prediction queue that stores predicted address blocks.
    Type: Grant
    Filed: June 21, 2018
    Date of Patent: January 19, 2021
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Marius Evers, Dhanaraj Bapurao Tavare, Ashok Tirupathy Venkatachar, Arunachalam Annamalai, Donald A. Priore, Douglas R. Williams
  • Patent number: 10891230
    Abstract: Systems, methods, and apparatuses relating to linear address masking architecture are described. In one embodiment, a hardware processor includes an address generation unit to generate a linear address for a memory access request to a memory, at least one control register comprising a user mode masking bit and a supervisor mode masking bit, a register comprising a current privilege level indication, and a memory management unit to mask out a proper subset of bits inside an address space of the linear address for the memory access request based on the current privilege level indication and either of the user mode masking bit or the supervisor mode masking bit to produce a resultant linear address, and output the resultant linear address.
    Type: Grant
    Filed: June 29, 2019
    Date of Patent: January 12, 2021
    Assignee: Intel Corporation
    Inventors: Ron Gabor, Igor Yanover
  • Patent number: 10891254
    Abstract: Embodiments relate to a computational device including multiple processor tiles on a die that may have multiple switchable topologies. A topology of the computational device may include one or more virtual circuits. A virtual circuit may include multiple processor tiles. A processor tile of a virtual circuit of a topology may include a configuration vector to control a connection between the processor tile and a neighboring processor tile. A first topology of the computation device may correspond to a first phase of a computation of a program, and a second topology of the computation device may correspond to a second phase of the computation of the program. Other embodiments may be described and/or claimed.
    Type: Grant
    Filed: June 29, 2017
    Date of Patent: January 12, 2021
    Assignee: Intel Corporation
    Inventors: William J. Butera, Simon C. Steely, Jr., Richard J. Dischler
  • Patent number: 10877910
    Abstract: Method, apparatus, and program means for a programmable event driven yield mechanism that may activate other threads. In one embodiment, an apparatus includes execution resources to execute a plurality of instructions and a monitor to detect a condition indicating a low level of progress. The monitor can disrupt processing of a program by transferring to a handler in response to detecting the condition indicating a low level of progress. In another embodiment, thread switch logic may be coupled to a plurality of event monitors which monitor events within the multithreading execution logic. The thread switch logic switches threads based at least partially on a programmable condition of one or more of the performance monitors.
    Type: Grant
    Filed: March 31, 2017
    Date of Patent: December 29, 2020
    Assignee: Intel Corporation
    Inventors: Hong Wang, Per Hammarlund, Xiang Zou, John P. Shen, Xinmin Tian, Milind Girkar, Perry H. Wang, Piyush N. Desai
  • Patent number: 10877751
    Abstract: Embodiments of processors, methods, and systems for a processor core supporting processor identification instruction spoofing are described. In an embodiment, a processor includes an instruction decoder and processor identification instruction spoofing logic. The processor identification spoofing logic is to respond to a processor identification instruction by reporting processor identification information from a processor identification spoofing data structure. The processor identification spoofing data structure is to include processor identification information of one or more other processors.
    Type: Grant
    Filed: September 29, 2018
    Date of Patent: December 29, 2020
    Assignee: Intel Corporation
    Inventors: Toby Opferman, Russell C. Arnold, Vedvyas Shanbhogue
  • Patent number: 10877763
    Abstract: A computer system, processor, and method for processing information is disclosed that includes a Dispatch Unit for dispatching instructions; an Issue Queue for receiving instructions dispatched from the Dispatch Unit; and a queue for receiving instructions issued from the Issue Queue, the queue having a plurality of entry locations for storing data. In an embodiment instructions are dispatched with a virtual indicator, and the virtual indicator is set to a first mode for instructions dispatched where an entry location is available, and to a second mode where an entry location is not available, in the queue to receive the dispatched instruction. In addition to virtual tagging dispatched instructions, a system, processor, and method are disclosed for regional partitioning of queues, region based deallocation of queue entries, and circular thread based assignment of queue entries.
    Type: Grant
    Filed: August 2, 2018
    Date of Patent: December 29, 2020
    Assignee: International Business Machines Corporation
    Inventors: Bryan Lloyd, Brian D. Barrick, Kurt A. Feiste, Hung Q. Le, Dung Q. Nguyen, Kenneth L. Ward
  • Patent number: 10853074
    Abstract: A pipelined run-to-completion processor includes no instruction counter and only fetches instructions either: as a result of being prompted from the outside by an input data value and/or an initial fetch information value, or as a result of execution of a fetch instruction. Initially the processor is not clocking. An incoming value kick-starts the processor to start clocking and to fetch a block of instructions from a section of code in a table. The input data value and/or the initial fetch information value determines the section and table from which the block is fetched. A LUT converts a table number in the initial fetch information value into a base address where the table is found. Fetch instructions at the ends of sections of code cause program execution to jump from section to section. A finished instruction causes an output data value to be output and stops clocking of the processor.
    Type: Grant
    Filed: May 1, 2014
    Date of Patent: December 1, 2020
    Assignee: Netronome Systems, Inc.
    Inventor: Gavin J. Stark
  • Patent number: 10838719
    Abstract: Examples of a carry chain for performing an operation on operands each including elements of a selectable size is provided. Advantageously, the carry chain adapts to elements of different sizes. The carry chain determines a mask based on a selected size of an element. The carry chain selects, based on the mask, whether to carry a partial result of an operation performed on corresponding first portions of a first operand and a second operand into a next operation. The next operation is performed on corresponding second portions of the first operand and the second operand, and, based on the selection, the partial result of the operation. The carry chain stores, in a memory, a result formed from outputs of the operation and the next operation.
    Type: Grant
    Filed: November 13, 2015
    Date of Patent: November 17, 2020
    Assignee: Marvell Asia Pte, Ltd
    Inventor: David Kravitz