Patents Examined by Courtney P Carmichael-Moody
  • Patent number: 11977936
    Abstract: A differential multiplier-accumulator accepts A and B digital inputs and generates a dot product P by applying the bits of the A input and the bits of the B inputs to respective positive and negative unit elements comprised of groups of AND gates coupled to charge transfer lines through a capacitor Cu. Each positive and negative unit element receives one bit of the B input applied to all of the AND gates of the unit element, and each positive and negative unit element having the bits of A applied to each associated AND gate input of each unit element. The AND gates are coupled to charge transfer lines through a capacitor Cu, and the charge transfer lines couple to binary weighted charge summing capacitors and to an analog to digital converter to generate a digital output product. The charge transfer lines may span multiple unit elements.
    Type: Grant
    Filed: December 31, 2020
    Date of Patent: May 7, 2024
    Assignee: Ceremorphic, Inc.
    Inventors: Martin Kraemer, Ryan Boesch, Wei Xiong
  • Patent number: 11977893
    Abstract: An instruction fetch pipeline includes first, second, and third sub-pipelines that respectively include: a TLB that receives a fetch virtual address, a tag random access memory (RAM) of a physically-indexed physically-tagged set associative instruction cache that receives a predicted set index, and a data RAM that receives the predicted set index and a predicted way number that specifies a way of the entry from which a block of instructions was previously fetched. The predicted set index specifies the instruction cache set that includes the entry. The three sub-pipelines respectively initiate in parallel: a TLB access using the fetch virtual address to obtain a translation thereof into a fetch physical address that includes a tag, a tag RAM access using the predicted set index to read a set of tags, and a data RAM access using the predicted set index and the predicted way number to fetch the block of instructions.
    Type: Grant
    Filed: June 8, 2022
    Date of Patent: May 7, 2024
    Assignee: Ventana Micro Systems Inc.
    Inventors: John G. Favor, Michael N. Michael, Vihar Soneji
  • Patent number: 11972264
    Abstract: Processing circuitry performs processing operations in response to micro-operations. Front end circuitry supplies the micro-operations to be processed by the processing circuitry. Prediction circuitry generates a prediction of a number of loop iterations for which one or more micro-operations per loop iteration are to be supplied by the front end circuitry, where an actual number of loop iterations to be processed by the processing circuitry is resolvable by the processing circuitry based on at least one operand corresponding to a first loop iteration to be processed by the processing circuitry. The front end circuitry varies, based on a level of confidence in the prediction of the number of loop iterations, a supply rate with which the one or more micro-operations for at least a subset of the loop iterations are supplied to the processing circuitry.
    Type: Grant
    Filed: June 13, 2022
    Date of Patent: April 30, 2024
    Inventors: Guillaume Bolbenes, Thibaut Elie Lanois, Houdhaifa Bouzguarrou, Luca Nassi
  • Patent number: 11960893
    Abstract: A method, programming product, and/or system for prefetching instructions includes an instruction prefetch table that has a plurality of entries, each entry for storing a first portion of an indirect branch instruction address and a target address, wherein the indirect branch instruction has multiple target addresses and the instruction prefetch table is accessed by an index obtained by hashing a second portion of bits of the indirect branch instruction address with an information vector of the indirect branch instruction. A further embodiment includes a first prefetch table for uni-target branch instructions and a second prefetch table for multi-target branch instructions. In operation it is determined whether a branch instruction hits in one of the multiple prefetch tables; a target address for the branch instruction is read from the respective prefetch table in which the branch instruction hit; and the branch instruction is prefetched to an instruction cache.
    Type: Grant
    Filed: December 29, 2021
    Date of Patent: April 16, 2024
    Assignee: International Business Machines Corporation
    Inventors: Naga P. Gorti, Mohit Karve
  • Patent number: 11941401
    Abstract: An apparatus includes a processor circuit that includes a return address stack circuit, a return prediction circuit, and a fetch control circuit. The return prediction circuit is configured to store, for previously accessed return addresses, fetch parameters for next fetch addresses. The fetch control circuit is configured to in response to a fetch of a call instruction, push a return address onto the return address stack circuit. In response to a fetch of a return instruction that corresponds to the call instruction, the fetch control circuit is further configured to retrieve the return address from the return address stack circuit, and to create, using the return address and fetch parameters retrieved from the return prediction circuit, a next fetch request to retrieve instructions subsequent to the return instruction.
    Type: Grant
    Filed: June 9, 2022
    Date of Patent: March 26, 2024
    Assignee: Apple Inc.
    Inventors: Pruthivi Vuyyuru, Ian D. Kountanis
  • Patent number: 11922168
    Abstract: A program is executed using a call stack and shadow stack. The call stack includes frames having respective return addresses. The frames may also store variables and/or parameters. The shadow stack stores duplicates of the return addresses in the call stack. The call stack and the shadow stack are maintained by, (i) each time a function is called, adding a corresponding stack frame to the call stack and adding a corresponding return address to the shadow stack, and (ii) each time a function is exited, removing a corresponding frame from the call stack and removing a corresponding return address from the shadow stack. A backtrace of the program's current call chain is generated by accessing the return addresses in the shadow stack. The outputted backtrace includes the return addresses from the shadow stack and/or information about the traced functions that is derived from the shadow stack's return addresses.
    Type: Grant
    Filed: March 23, 2022
    Date of Patent: March 5, 2024
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Ben Niu, Gregory John Colombo, Weidong Cui, Jason Lin, Kenneth Dean Johnson
  • Patent number: 11915006
    Abstract: A method, system and device for pipeline processing of instructions and a computer storage medium. The method comprises: acquiring a target instruction set (S101); acquiring a target prediction result, wherein the target prediction result is a result obtained by predicting a jump mode of the target instruction set (S102); performing pipeline processing on the target instruction set according to the target prediction result (S103); determining if a pipeline flushing request is received (S104); and if so, correspondingly saving the target instruction set and a corresponding pipeline processing result, so as to perform pipeline processing on the target instruction set again on the basis of the pipeline processing result (S105).
    Type: Grant
    Filed: November 28, 2019
    Date of Patent: February 27, 2024
    Assignee: INSPUR SUZHOU INTELLIGENT TECHNOLOGY CO., LTD.
    Inventors: Yulong Zhou, Tongqiang Liu, Xiaofeng Zou
  • Patent number: 11915002
    Abstract: Providing extended branch target buffer (BTB) entries for storing trunk branch metadata and leaf branch metadata is disclosed herein. In one aspect, a processor comprises a BTB circuit comprising a BTB comprising a plurality of extended BTB entries. The BTB circuit is configured to store trunk branch metadata for a first branch instruction in an extended BTB entry of the plurality of extended BTB entries, wherein the extended BTB entry corresponds to a first aligned memory block containing an address of the first branch instruction. The BTB circuit is also configured to store leaf branch metadata for a second branch instruction in the extended BTB entry in association with the trunk branch metadata, wherein an address of the second branch instruction is subsequent to a target address of the first branch instruction within a second aligned memory block.
    Type: Grant
    Filed: June 24, 2022
    Date of Patent: February 27, 2024
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Saransh Jain, Rami Mohammad Al Sheikh, Daren Eugene Streett, Michael Scott McIlvaine
  • Patent number: 11900124
    Abstract: Various embodiments are disclosed of a multiprocessor system with processing elements optimized for high performance and low power dissipation and an associated method of programming the processing elements. Each processing element may comprise a fetch unit and a plurality of address generator units and a plurality of pipelined datapaths. The fetch unit may be configured to receive a multi-part instruction, wherein the multi-part instruction includes a plurality of fields. First and second address generator units may generate, based on different fields of the multi-part instruction, addresses from which to retrieve first and second data for use by an execution unit for the multi-part instruction or a subsequent multi-part instruction. The execution units may perform operations using a single pipeline or multiple pipelines based on third and fourth fields of the multi-part instruction.
    Type: Grant
    Filed: January 3, 2023
    Date of Patent: February 13, 2024
    Assignee: Coherent Logix, Incorporated
    Inventors: Michael B Doerr, Carl S. Dobbs, Michael B. Solka, Michael R. Trocino, Kenneth R. Faulkner, Keith M. Bindloss, Sumeer Arya, John Mark Beardslee, David A. Gibson
  • Patent number: 11900120
    Abstract: Systems and methods of selecting a collection of compatible issue-ready instructions for parallel execution by functional units in a superscalar processor in a single clock cycle. All possible instructions (opcodes) to be executed by the functional units are pre-arranged into several scenarios based on potential resource conflicts among the instructions. Each scenario includes multiple groups of predefined instructions. During operation, concurrently for all the groups, an issue-ready instruction is identified with reference to each group based on group-specific selection policies. Further, based on the identified instructions, predefined policies are applied to select one or more scenarios and select among the picks of the selected scenarios. As a result, the output instructions of the selected scenarios are issued for parallel execution by the functional units.
    Type: Grant
    Filed: March 24, 2022
    Date of Patent: February 13, 2024
    Assignee: Marvell Asia Pte, Ltd.
    Inventor: David Carlson
  • Patent number: 11886875
    Abstract: Disclosed embodiments relate to systems and methods for performing nibble-sized operations on matrix elements. In one example, a processor includes fetch circuitry to fetch an instruction, decode circuitry to decode the fetched instruction the fetched instruction having fields to specify an opcode and locations of first source, second source, and destination matrices, the opcode to indicate the processor is to, for each pair of corresponding elements of the first and second source matrices, logically partition each element into nibble-sized partitions, perform an operation indicated by the instruction on each partition, and store execution results to a corresponding nibble-sized partition of a corresponding element of the destination matrix. The exemplary processor includes execution circuitry to execute the decoded instruction as per the opcode.
    Type: Grant
    Filed: December 26, 2018
    Date of Patent: January 30, 2024
    Assignee: Intel Corporation
    Inventors: Elmoustapha Ould-Ahmed-Vall, Jonathan D. Pearce, Dan Baum, Guei-Yuan Lueh, Michael Espig, Christopher J. Hughes, Raanan Sade, Robert Valentine, Mark J. Charney, Alexander F. Heinecke
  • Patent number: 11880685
    Abstract: An instruction fetch pipeline includes first, second, and third sub-pipelines that respectively include: a TLB that receives a fetch virtual address, a tag random access memory (RAM) of a physically-indexed physically-tagged set associative instruction cache that receives a predicted set index, and a data RAM that receives the predicted set index and a predicted way number that specifies a way of the entry from which a block of instructions was previously fetched. The predicted set index specifies the instruction cache set that includes the entry. The three sub-pipelines respectively initiate in parallel: a TLB access using the fetch virtual address to obtain a translation thereof into a fetch physical address that includes a tag, a tag RAM access using the predicted set index to read a set of tags, and a data RAM access using the predicted set index and the predicted way number to fetch the block of instructions.
    Type: Grant
    Filed: June 8, 2022
    Date of Patent: January 23, 2024
    Assignee: Ventana Micro Systems Inc.
    Inventors: John G. Favor, Michael N. Michael, Vihar Soneji
  • Patent number: 11875155
    Abstract: An integrated circuit comprising instruction processing circuitry for processing a plurality of program instructions and instruction prediction circuitry. The instruction prediction circuitry comprises circuitry for detecting successive occurrences of a same program loop sequence of program instructions. The instruction prediction circuitry also comprises circuitry for predicting a number of iterations of the same program loop sequence of program instructions, in response to detecting, by the circuitry for detecting, that a second occurrence of the same program loop sequence of program instructions comprises a same number of iterations as a first occurrence of the same program loop sequence of program instructions.
    Type: Grant
    Filed: January 19, 2022
    Date of Patent: January 16, 2024
    Assignee: Texas Instruments Incorporated
    Inventors: Kai Chirca, Paul Daniel Gauvreau, David Edward Smith, Jr.
  • Patent number: 11861368
    Abstract: A first type of prediction, for controlling execution of at least one instruction by processing circuitry, is based at least on a first prediction table storing prediction information looked up based on at least a first portion of branch history information stored in branch history storage corresponding to a first predetermined number of branches. In response to detecting an execution state switch of the processing circuitry from a first execution state to a second, more privileged, execution state, use of the first prediction table for determining the first type of prediction is disabled. In response to detecting that a number of branches causing an update to the branch history storage since the execution state switch is greater than or equal to the first predetermined number, use of the first prediction table in determining the first type of prediction is re-enabled.
    Type: Grant
    Filed: May 24, 2022
    Date of Patent: January 2, 2024
    Assignee: Arm Limited
    Inventors: Houdhaifa Bouzguarrou, Michael Brian Schinzler, Yasuo Ishii, Jatin Bhartia, Sumanth Chengad Raghu
  • Patent number: 11853765
    Abstract: The disclosure includes a method of authenticating a processor that includes an arithmetic and logic unit. At least one decoded operand of at least a portion of a to-be-executed opcode is received on a first terminal of the arithmetic and logic unit. A signed instruction is received on a second terminal of the arithmetic and logic unit. The signed instruction combines a decoded instruction of the to-be-executed opcode and a previous calculation result of the arithmetic and logic unit.
    Type: Grant
    Filed: April 14, 2022
    Date of Patent: December 26, 2023
    Assignees: STMICROELECTRONICS (ROUSSET) SAS, PROTON WORLD INTERNATIONAL N.V.
    Inventors: Michael Peeters, Fabrice Marinet
  • Patent number: 11853763
    Abstract: A new device executing an application on a new central processing unit (CPU), determines whether the application is for a legacy device having a legacy CPU. When the new device determines that the application is for the legacy device, it executes the application on the new CPU with selected available resources of the new device restricted to approximate or match a processing behavior of the legacy CPU, e.g., by reducing a usable portion of a return address stack of the new CPU and thereby reducing a number of calls and associated returns that can be tracked.
    Type: Grant
    Filed: June 29, 2022
    Date of Patent: December 26, 2023
    Assignee: SONY INTERACTIVE ENTERTAINMENT LLC
    Inventors: Mark Evan Cerny, David Simpson
  • Patent number: 11847463
    Abstract: A processor includes a load/store unit and an execution pipeline to execute an instruction that represents a single-instruction-multiple-data (SIMD) operation, and which references a memory block storing operand data for one or more lanes of a plurality of lanes and a mask vector indicating which lanes of a plurality of lanes are enabled and which are disabled for the operation. The execution pipeline executes an instruction in a first execution mode unless a memory fault is generated during execution of the instruction in the first execution mode. In response to the memory fault, the execution pipeline re-executes the instruction in a second execution mode. In the first execution mode, a single load operation is attempted to access the memory block via the load/store unit. In the second execution mode, a separate load operation is performed by the load/store unit for each enabled lane of the plurality of lanes prior to executing the SIMD operation.
    Type: Grant
    Filed: September 27, 2019
    Date of Patent: December 19, 2023
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Kai Troester, Scott Thomas Bingham, John M. King, Michael Estlick, Erik Swanson, Robert Weidner
  • Patent number: 11847458
    Abstract: Methods and systems for determining a priority of a threads is described. A processor can execute branch instructions of the thread. The processor can predict branch instruction outcomes of the branch instructions of the thread. The processor can increment a misprediction count of the thread in response to an actual execution of a branch instruction of the thread being different from a corresponding branch instruction prediction outcome of the thread. The processor can determine the priority of the thread based on the misprediction count of the thread.
    Type: Grant
    Filed: July 2, 2021
    Date of Patent: December 19, 2023
    Assignee: International Business Machines Corporation
    Inventors: Richard J. Eickemeyer, Ehsan Fatehi, John B. Griswell, Jr., Naga P. Gorti
  • Patent number: 11846974
    Abstract: A processor has first, second and third ALUs. The first ALU has on a first side an input and an output. The second ALU has a first side facing the first side of the first ALU, an input and an output on the first side of the second ALU and being in a rotated orientation relative to the input and the output of the first side of the first ALU, and an output on a second side of the second ALU. The third ALU has a first side facing the second side of the second ALU, and an input and an output on the first side of the third ALU. The input of the first side of the first ALU is logically directly connected to the output of the first side of the second ALU.
    Type: Grant
    Filed: January 7, 2022
    Date of Patent: December 19, 2023
    Assignee: Tachyum Ltd.
    Inventor: Radoslav Danilak
  • Patent number: 11842200
    Abstract: An apparatus includes a plurality of load buses and a load store unit that includes a plurality of load ports to access the plurality of load buses. The load store unit performs a gather operation to concurrently gather a plurality of subsets of data from a memory via the plurality of load buses in a first mode. The apparatus also includes a register that is partitioned into a plurality of portions to hold the plurality of subsets of data provided by the load store unit. The load store unit ignores exceptions or faults while performing the gather operation in the first mode and transitions to a second mode in response to an exception or fault. Two lanes are dispatched to concurrently perform the gather operation per clock cycle in the first mode and a single lane is dispatched to perform the gather operation per clock cycle in the second mode.
    Type: Grant
    Filed: September 27, 2019
    Date of Patent: December 12, 2023
    Assignee: Advanced Micro Devices, Inc.
    Inventors: John M. King, Magiting Talisayon, Michael Estlick