Patents Examined by Courtney P Carmichael-Moody
  • Patent number: 11188338
    Abstract: A highly programmable device, referred to generally as a data processing unit, having multiple processing units for processing streams of information, such as network packets or storage packets, is described. The data processing unit includes one or more specialized hardware accelerators configured to perform acceleration for various data-processing functions. This disclosure describes examples of retrieving values represented by one or more previous symbols needed for decoding a current symbol before or in parallel with the insertion of the values represented by the one or more previous symbols in the data stream.
    Type: Grant
    Filed: June 13, 2019
    Date of Patent: November 30, 2021
    Assignee: Fungible, Inc.
    Inventors: Gurumani Senthil Nayakam, Satyanarayana Lakshmipathi Billa, Rajan Goyal
  • Patent number: 11163579
    Abstract: Generating instructions, in particular for mailbox verification in a simulation environment. A sequence of instructions is received, as well as selection data representative of a plurality of commands including a special command. Repeatedly selecting one of the plurality of commands and outputting an instruction based on the selected command. The outputting of an instruction includes outputting a next instruction in the sequence of instructions if the selected command is the special command, and outputting an instruction associated with the command if the selected command is not the special command.
    Type: Grant
    Filed: November 6, 2018
    Date of Patent: November 2, 2021
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Joerg Deutschle, Ursel Hahn, Joerg Walter, Ernst-Dieter Weissenberger
  • Patent number: 11157277
    Abstract: Data processing apparatus comprises a processing element configured to access an architectural register representing a given system register; mapping circuitry to map the architectural register representing the given system register to a physical register selected from a set of physical registers; a register bank having a set of two or more respective banked versions of the given system register, in which a respective one of the banked versions of the system register is associated with each of a plurality of current operating states of the processing element; in which, when the processing element changes operating state from a first operating state associated with a first one of the banked versions of the system register to a second operating state associated with a second, different, one of the banked versions of the system register, the processing element is configured to store the current contents of the architectural register representing the given system register to the first one of the banked versions o
    Type: Grant
    Filed: September 5, 2019
    Date of Patent: October 26, 2021
    Assignee: Arm Limited
    Inventors: Cedric Denis Robert Airaud, Albin Pierrick Tonnerre, Luca Nassi, Remi Marius Teyssier
  • Patent number: 11157276
    Abstract: A computer system, processor, and method for processing information is disclosed. The system, processor and/or method includes at least one computer processor; a register file associated with the at least one processor, the register file having a plurality of entries for storing data where a whole entry has two halves, the register file having multiple ports to write data to the register file and multiple ports to read data from the register file; and one or more execution units associated with the register file, the execution units configured to read data from the register file and to write data to the register file, wherein the processor is configured to write either scalar data or vector data to a single register file entry.
    Type: Grant
    Filed: September 6, 2019
    Date of Patent: October 26, 2021
    Assignee: International Business Machines Corporation
    Inventors: Steven J. Battle, Maarten J. Boersma, Niels Fricke, Hung Q. Le, Dung Q. Nguyen, Brian W. Thompto
  • Patent number: 11157441
    Abstract: A microprocessor system comprises a computational array and a hardware data formatter. The computational array includes a plurality of computation units that each operates on a corresponding value addressed from memory. The values operated by the computation units are synchronously provided together to the computational array as a group of values to be processed in parallel. The hardware data formatter is configured to gather the group of values, wherein the group of values includes a first subset of values located consecutively in memory and a second subset of values located consecutively in memory. The first subset of values is not required to be located consecutively in the memory from the second subset of values.
    Type: Grant
    Filed: March 13, 2018
    Date of Patent: October 26, 2021
    Assignee: Tesla, Inc.
    Inventors: Emil Talpes, William McGee, Peter Joseph Bannon
  • Patent number: 11132198
    Abstract: A computer system, processor, and method for processing information is disclosed that includes at least one computer processor; a main register file associated with the at least one processor, the main register file having a plurality of entries for storing data, one or more write ports to write data to the main register file entries, and one or more read ports to read data from the main register file entries; one or more execution units including a dense math execution unit; and at least one accumulator register file having a plurality of entries for storing data. The results of the dense math execution unit in an aspect are written to the accumulator register file, preferably to the same accumulator register file entry multiple times, and the data from the accumulator register file is written to the main register file.
    Type: Grant
    Filed: August 29, 2019
    Date of Patent: September 28, 2021
    Assignee: International Business Machines Corporation
    Inventors: Brian W. Thompto, Maarten J. Boersma, Andreas Wagner, Jose E. Moreira, Hung Q. Le, Silvia Melitta Mueller, Dung Q. Nguyen
  • Patent number: 11113068
    Abstract: Performing flush recovery using parallel walks of sliced reorder buffers (SROBs) is disclosed herein. In one exemplary embodiment, a register mapping circuit provides a rename mapping table (RMT) comprising RMT entries representing logical register number (LRN) to physical register number (PRN) mappings. The register mapping circuit also provides an SROB comprising multiple SROB slices that each corresponds to a respective LRN. Each SROB slice tracks uncommitted instructions that write to the LRN corresponding to that SROB slice, and maintains those instructions in program order with respect to each other. Upon detecting an uncommitted instruction writing to an LRN, the register mapping circuit allocates an SROB slice entry in the SROB slice corresponding to the LRN. When an pipeline flush from a target instruction occurs, the register mapping circuit restores RMT entries of the RMT to their prior mapping states based on parallel walks of the SROB slices of the SROB.
    Type: Grant
    Filed: August 6, 2020
    Date of Patent: September 7, 2021
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Yusuf Cagatay Tekmen, Rodney Wayne Smith, Kiran Ravi Seth, Shivam Priyadarshi
  • Patent number: 11106469
    Abstract: Methods and systems for implementing an instruction selection mechanism with class-dependent age-array are described. In an example, a system can include a processor that may sequence instructions. The system can further include a memory operatively coupled to the processor. The system can further include an array allocated on the memory. The array can be operable to store instruction age designations associated with a plurality of instructions sequenced by the processor. The array can be further operable to store the instruction age designations based on instruction classes. The processor can be operable to fetch an instruction from the memory. The processor can be operable to dispatch the instruction to a queue. The processor can be operable to store the instruction age designations associated with the instruction, in the array, based on an instruction class of the instruction.
    Type: Grant
    Filed: August 14, 2019
    Date of Patent: August 31, 2021
    Assignee: International Business Machines Corporation
    Inventor: Joel A. Silberman
  • Patent number: 11099850
    Abstract: Branch prediction circuitry comprises: a return address prediction structure to store at least one predicted return address; and a branch target buffer (BTB) structure comprising entries each for specifying predicted branch information for a corresponding block of instructions. Within at least a subset of entries of the BTB structure, each entry specifies the predicted branch information with an encoding incapable of simultaneously indicating both: that the corresponding block of instructions is predicted to include a return branch instruction (for which the return address prediction structure is used to predict the target address); and the predicted target address for the return branch instruction. This can provide a more efficient BTB structure which requires less circuit area and power for a given level of branch prediction performance.
    Type: Grant
    Filed: August 15, 2019
    Date of Patent: August 24, 2021
    Assignee: Arm Limited
    Inventors: Luc Orion, Houdhaifa Bouzguarrou, Guillaume Bolbenes, Eddy Lapeyre
  • Patent number: 11080054
    Abstract: Data processing apparatus comprises processing circuitry to selectively apply vector processing operations to one or more data items of one or more data vectors each comprising an ordered plurality of data items at respective vector positions in the data vector, according to the state of respective predicate indicators associated with the vector positions; predicate generation circuitry to apply a processing operation to generate an ordered set of predicate indicators, each associated with a respective one of the vector positions, the ordered set of predicate indicators being associated with an ordered set of active indicators each having an active or an inactive state; and a detector to detect a status flag indicative of whether a predicate indicator at a position, in the ordered set of predicate indicators, corresponding to the position of an outermost active indicator having the active state, has a given state; in which the detector comprises: first and second circuitry to combine the ordered set of predic
    Type: Grant
    Filed: August 15, 2016
    Date of Patent: August 3, 2021
    Assignee: ARM LIMITED
    Inventors: Neil Burgess, Lee Evan Eisen, Gary Alan Gorman, Daniel Arulraj
  • Patent number: 11030147
    Abstract: Hardware acceleration using a self-programmable coprocessor architecture may include determining that an instruction cache comprises an accelerable instruction sequence; instead of executing the accelerable instruction sequence, providing, to an accelerator block of an accelerator complex comprising a plurality of accelerator blocks, a complex instruction corresponding to the accelerable instruction sequence, wherein the accelerator block comprises one or more reprogrammable logic elements configured to execute the complex instruction; and receiving, from the accelerator complex, a result of the complex instruction.
    Type: Grant
    Filed: March 27, 2019
    Date of Patent: June 8, 2021
    Assignee: International Business Machines Corporation
    Inventors: Justin Ginn, Tony E. Sawan
  • Patent number: 11016779
    Abstract: Various embodiments are disclosed of a multiprocessor system with processing elements optimized for high performance and low power dissipation and an associated method of programming the processing elements. Each processing element may comprise a fetch unit and a plurality of address generator units and a plurality of pipelined datapaths. The fetch unit may be configured to receive a multi-part instruction, wherein the multi-part instruction includes a plurality of fields. A first address generator unit may be configured to perform an arithmetic operation dependent upon a first field of the plurality of fields. A second address generator unit may be configured to generate at least one address of a plurality of addresses, wherein each address is dependent upon a respective field of the plurality of fields. A parallel assembly language may be used to control the plurality of address generator units and the plurality of pipelined datapaths.
    Type: Grant
    Filed: August 13, 2019
    Date of Patent: May 25, 2021
    Assignee: Coherent Logix, Incorporated
    Inventors: Michael B. Doerr, Carl S. Dobbs, Michael B. Solka, Michael R. Trocino, Kenneth R. Faulkner, Keith M. Bindloss, Sumeer Arya, John Mark Beardslee, David A. Gibson
  • Patent number: 11003450
    Abstract: A vector data transfer instruction is provided for triggering a data transfer between storage locations corresponding to a contiguous block of addresses and multiple data elements of at least one vector register. The instruction specifies a start address of the contiguous block using a base register and an immediate offset value specifies as a multiple of the size of the contiguous block of addresses. This is useful for loop unrolling which can help to improve performance of vectorised code by combining multiple iterations of a loop into a single iteration of an unrolled loop, to reduce the loop control overhead.
    Type: Grant
    Filed: September 14, 2016
    Date of Patent: May 11, 2021
    Assignee: ARM Limited
    Inventor: Nigel John Stephens
  • Patent number: 10997116
    Abstract: A computing system is described herein that expedites deep neural network (DNN) operations or other processing operations using a hardware accelerator. The hardware accelerator, in turn, includes a tensor-processing engine that works in conjunction with a scalar-processing unit (SPU). The tensor-processing engine handles various kinds of tensor-based operations required by the DNN, such as multiplying vectors by matrices, combining vectors with other vectors, transforming individual vectors, etc. The SPU performs scalar-based operations, such as forming the reciprocal of a scalar, generating the square root of a scalar, etc. According to one illustrative implementation, the computing system uses the same vector-based programmatic interface to interact with both the tensor-processing engine and the SPU.
    Type: Grant
    Filed: August 6, 2019
    Date of Patent: May 4, 2021
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Steven Karl Reinhardt, Joseph Anthony Mayer, II, Dan Zhang
  • Patent number: 10990409
    Abstract: An apparatus to facilitate control flow in a graphics processing system is disclosed. The apparatus includes logic a plurality of execution units to execute single instruction, multiple data (SIMD) and flow control logic to detect a diverging control flow in a plurality of SIMD channels and reduce the execution of the control flow to a subset of the SIMD channels.
    Type: Grant
    Filed: April 21, 2017
    Date of Patent: April 27, 2021
    Assignee: INTEL CORPORATION
    Inventors: Subramaniam M. Maiyuran, Guei-Yuan Lueh, Supratim Pal, Gang Chen, Ananda V. Kommaraju, Joy Chandra, Altug Koker, Prasoonkumar Surti, David Puffer, Hong Bin Liao, Joydeep Ray, Abhishek R. Appu, Ankur N. Shah, Travis T. Schluessler, Jonathan Kennedy, Devan Burke
  • Patent number: 10983799
    Abstract: Techniques are disclosed relating to selection circuitry configured to select instruction operations to issue to one or more execution circuits of a processor. In some embodiments, an apparatus includes a plurality of execution circuits configured to perform one or more instruction operations. The apparatus may further include a plurality of instruction queues configured to store information indicative of the one or more instruction operations. In some embodiments, the apparatus may include a selection circuit configured to select a first plurality of instruction operations from a first instruction queue. The selection circuit may be configured to select a first instruction operation from the first plurality of instruction operations to issue to a first execution circuits.
    Type: Grant
    Filed: December 19, 2017
    Date of Patent: April 20, 2021
    Assignee: Apple Inc.
    Inventors: Sean M. Reynolds, Gokul V. Ganesan
  • Patent number: 10977044
    Abstract: An apparatus comprising processing circuitry is provided, the processing circuitry comprising execution circuitry, commit circuitry, issue circuitry comprising an issue queue and selection circuitry, and a branch predictor. The processing circuitry is configured to identify a speculation barrier instruction in the commit queue. While an entry in the commit queue identifies a speculation barrier instruction, when a branch instruction that follows the speculation barrier instruction in the program order is selected for issue, the processing circuitry performs a first execution of the instruction, inhibiting updating of branch prediction data items associated with the branch instruction and inhibiting the selection circuitry from invalidating the associated issue queue entry.
    Type: Grant
    Filed: September 5, 2019
    Date of Patent: April 13, 2021
    Assignee: Arm Limited
    Inventors: Remi Marius Teyssier, Luca Nassi, Albin Pierrick Tonnerre, François Donati
  • Patent number: 10963265
    Abstract: Examples described herein include systems and methods which include an apparatus comprising a plurality of configurable logic units and a plurality of switches, with each switch being coupled to at least one configurable logic unit of the plurality of configurable logic units. The apparatus further includes an instruction register configured to provide respective switch instructions of a plurality of switch instructions to each switch based on a computation to be implemented among the plurality of configurable logic units. For example, the switch instructions may include allocating the plurality of configurable logic units to perform the computation and activating an input of the switch and an output of the switch to couple at least a first configurable logic unit and a second configurable logic unit. In various embodiments, configurable logic units can include arithmetic logic units (ALUs), bit manipulation units (BMUs), and multiplier-accumulator units (MACs).
    Type: Grant
    Filed: April 21, 2017
    Date of Patent: March 30, 2021
    Assignee: Micron Technology, Inc.
    Inventors: Fa-Long Luo, Tamara Schmitz, Jeremy Chritz, Jaime Cummins
  • Patent number: 10956156
    Abstract: A Conditional Transaction End (CTEND) instruction is provided that allows a program executing in a nonconstrained transactional execution mode to inspect a storage location that is modified by either another central processing unit or the Input/Output subsystem. Based on the inspected data, transactional execution may be ended or aborted, or the decision to end/abort may be delayed, e.g., until a predefined event occurs. For instance, when the instruction executes, the processor is in a nonconstrained transaction execution mode, and the transaction nesting depth is one at the beginning of the instruction, a second operand of the instruction is inspected, and based on the inspected data, transaction execution may be ended or aborted, or the decision to end/abort may be delayed, e.g., until a predefined event occurs, such as the value of the second operand becomes a prespecified value or a time interval is exceeded.
    Type: Grant
    Filed: June 12, 2019
    Date of Patent: March 23, 2021
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Dan F. Greiner, Christian Jacobi, Marcel Mitran, Donald W. Schmidt, Timothy J. Slegel
  • Patent number: 10929144
    Abstract: A computer system, processor, and method for processing information is disclosed that includes determining whether an instruction is a designated instruction, determining whether an instruction following the designated instruction is a subsequent store instruction, speculatively releasing the subsequent store instruction while the designated instruction is pending and before the subsequent store instruction is complete. Preferably, in response to determining that an instruction is the designated instruction, initiating or advancing a speculative tail pointer in an instruction completion table (ICT) to look through the instructions in the ICT following the designated instruction.
    Type: Grant
    Filed: February 6, 2019
    Date of Patent: February 23, 2021
    Assignee: International Business Machines Corporation
    Inventors: Kenneth L. Ward, Hung Q. Le, Dung Q. Nguyen, Bryan Lloyd