Patents Examined by Keith E Vicary
  • Patent number: 11210102
    Abstract: An apparatus comprises processing circuitry to execute instructions from one or more of a plurality of execution contexts each associated with a respective execution context identifier; a cache; and a speculative buffer. Control circuitry controls allocation of data to the cache and the speculative buffer. A speculative entry, for which allocation is caused by a speculative memory access associated with a given execution context, is allocated to the speculative buffer instead of to the cache while the speculatively executed memory access instruction remains speculative. The speculative entry specifies, as a tagged execution context identifier, the execution context identifier associated with the given execution context. Presence of the speculative entry in the speculative buffer is prevented from being observable to execution contexts other than the execution context identified by the tagged execution context identifier.
    Type: Grant
    Filed: November 26, 2019
    Date of Patent: December 28, 2021
    Assignee: Arm Limited
    Inventor: Roko Grubisic
  • Patent number: 11204768
    Abstract: Instruction length based parallel instruction demarcators and methods for parallel instruction demarcation are included, wherein an instruction sequence is received at an instruction buffer, the instruction sequence comprising a plurality of instruction syllables, and the instruction sequence is stored at the instruction buffer. It is determined, using one or more logic blocks arranged in a sequence, a length of instructions and at least one boundary. Additionally, using a controlling logic block, the sequence is demarcated into individual instructions.
    Type: Grant
    Filed: August 12, 2020
    Date of Patent: December 21, 2021
    Inventor: Sitaram Yadavalli
  • Patent number: 11188331
    Abstract: A data processing system includes: a processor; a data interface for communication with a control unit, the processor being on one side of the data interface; internal storage accessible by the processor, the internal storage being on the same side of the data interface as the processor; and a register array accessible by the processor and comprising a plurality of registers, each register having a plurality of vector lanes. The storage is arranged to store control data indicating an ordered selection of vector lanes of one or more of the registers. The processor is arranged to, in response to receiving instruction data from a control unit, perform a swizzle operation in which data is selected from one or more source registers in the register array, and transferred to a destination register. The data is selected from vector lanes in accordance with control data stored in the internal storage.
    Type: Grant
    Filed: September 19, 2019
    Date of Patent: November 30, 2021
    Assignees: Arm Limited, Apical Limited
    Inventors: Daren Croxford, Michel Patrick Gabriel Emil Iwaniec, Rune Holm, Diego Lopez Recas
  • Patent number: 11182335
    Abstract: Processors, systems and methods are provided for thread level parallel processing. A processor may comprise a plurality of reconfigurable units that may include a plurality of processing elements (PEs) and a plurality of memory ports (MPs) for the plurality of PEs to access a memory unit. Each of the plurality of reconfigurable units may comprise a configuration buffer and a reconfiguration counter. The processor may further comprise a sequencer coupled to the configuration buffer of each of the plurality of reconfigurable units and configured to distribute a plurality of configurations to the plurality of reconfigurable units for the plurality of PEs and the plurality of MPs to execute a sequence of instructions.
    Type: Grant
    Filed: July 17, 2020
    Date of Patent: November 23, 2021
    Assignee: AZURENGINE TECHNOLOGIES ZHUHAI INC.
    Inventors: Jianbin Zhu, Yuan Li
  • Patent number: 11182336
    Abstract: Processors, systems and methods are provided for thread level parallel processing. A processor may comprise a plurality of processing elements (PEs) that each may comprise a configuration buffer, a sequencer coupled to the configuration buffer of each of the plurality of PEs and configured to distribute one or more PE configurations to the plurality of PEs, and a gasket memory coupled to the plurality of PEs and being configured to store at least one PE execution result to be used by at least one of the plurality of PEs during a next PE configuration.
    Type: Grant
    Filed: July 17, 2020
    Date of Patent: November 23, 2021
    Assignee: AZURENGINE TECHNOLOGIES ZHUHAI INC.
    Inventors: Yuan Li, Jianbin Zhu
  • Patent number: 11182333
    Abstract: Processors, systems and methods are provided for thread level parallel processing. A processor may comprise a plurality of processing elements (PEs) and a plurality of memory ports (MPs) for the plurality of PEs to access a memory unit. Each PE may have a plurality of arithmetic logic units (ALUs) that are configured to execute a same instruction in parallel threads. Each of the plurality of MPs may comprise an address calculation unit configured to generate respective memory addresses for each thread to access a different memory bank in the memory unit.
    Type: Grant
    Filed: June 19, 2020
    Date of Patent: November 23, 2021
    Assignee: AZURENGINE TECHNOLOGIES ZHUHAI INC.
    Inventors: Yuan Li, Jianbin Zhu
  • Patent number: 11182334
    Abstract: Processors, systems and methods are provided for thread level parallel processing. A processor may comprise a plurality of processing elements (PEs) each having a plurality of arithmetic logic units (ALUs) that are configured to execute a same instruction in parallel threads and a plurality of memory ports (MPs) for the plurality of PEs to access a memory unit. Each of the plurality of MPs may comprise an address calculation unit configured to generate respective memory addresses for each thread to access a common area in the memory unit.
    Type: Grant
    Filed: July 16, 2020
    Date of Patent: November 23, 2021
    Assignee: AZURENGINE TECHNOLOGIES ZHUHAI INC.
    Inventors: Jianbin Zhu, Yuan Li
  • Patent number: 11176085
    Abstract: Processors, systems and methods are provided for thread level parallel processing. A processor may comprise a plurality of processing elements (PEs) that each may comprise a configuration buffer, a sequencer coupled to the configuration buffer of each of the plurality of PEs and configured to distribute one or more PE configurations to the plurality of PEs, and a gasket memory coupled to the plurality of PEs and being configured to store at least one PE execution result to be used by at least one of the plurality of PEs during a next PE configuration.
    Type: Grant
    Filed: July 17, 2020
    Date of Patent: November 16, 2021
    Assignee: AZURENGINE TECHNOLOGIES ZHUHAI INC.
    Inventors: Yuan Li, Jianbin Zhu
  • Patent number: 11157330
    Abstract: A barrier-free atomic transfer method of multiword data is described. In the barrier-free method, a producer processor deconstructs an original parameter set of data into a deconstructed parameter set; and performs a series of single-copy-atomic writes to a series of single-copy-atomic locations. Each single-copy-atomic location in the series of single-copy-atomic locations comprises a portion of the deconstructed parameter set and a sequence number. A consumer processor can read the series of single-copy-atomic locations; verifies that the sequence number for each single-copy-atomic location in the series of single-copy-atomic locations is consistent (e.g., are all the same sequence number); and reconstructs the portions of deconstructed parameter set into the original parameter set.
    Type: Grant
    Filed: May 15, 2019
    Date of Patent: October 26, 2021
    Assignee: ARM LIMITED
    Inventor: Alasdair Grant
  • Patent number: 11157428
    Abstract: An integrated circuit includes a plurality of tiles. Each tile includes a processor, a switch including switching circuitry to forward data over data paths from other tiles to the processor and to switches of other tiles, and a switch memory that stores instruction streams that are able to operate independently for respective output ports of the switch. Also disclosed is a direct memory access (DMA) scheme in which sizes of DMA transfers are limited according to whether a cache miss has occurred.
    Type: Grant
    Filed: February 14, 2014
    Date of Patent: October 26, 2021
    Assignee: Massachusetts Institute of Technology
    Inventor: Anant Agarwal
  • Patent number: 11157286
    Abstract: Representative apparatus, method, and system embodiments are disclosed for a self-scheduling processor which also provides additional functionality. Representative embodiments include a self-scheduling processor, comprising: a processor core adapted to execute instructions; and a core control circuit adapted to automatically schedule an instruction for execution by the processor core in response to a received work descriptor data packet. In a representative embodiment, the processor core is further adapted to execute a non-cached load instruction to designate a general purpose register rather than a data cache for storage of data received from a memory circuit. The core control circuit is also adapted to schedule a fiber create instruction for execution by the processor core, and to generate one or more work descriptor data packets to another circuit for execution of corresponding execution threads.
    Type: Grant
    Filed: April 30, 2019
    Date of Patent: October 26, 2021
    Assignee: Micron Technology, Inc.
    Inventor: Tony M. Brewer
  • Patent number: 11151077
    Abstract: A hardware accelerator for computers combines a stand-alone, high-speed, fixed program dataflow functional element with a stream processor, the latter of which may autonomously access memory in predefined access patterns after receiving simple stream instructions and provide them to the dataflow functional element. The result is a compact, high-speed processor that may exploit fixed program dataflow functional elements.
    Type: Grant
    Filed: June 28, 2017
    Date of Patent: October 19, 2021
    Assignee: Wisconsin Alumni Research Foundation
    Inventors: Karthikeyan Sankaralingam, Anthony Nowatzki, Vinay Gangadhar
  • Patent number: 11150721
    Abstract: A system and method are described for providing hints to a processing unit that subsequent operations are likely. Responsively, the processing unit takes steps to prepare for the likely subsequent operations. Where the hints are more likely than not to be correct, the processing unit operates more efficiently. For example, in an embodiment, the processing unit consumes less power. In another embodiment, subsequent operations are performed more quickly because the processing unit is prepared to efficiently handle the subsequent operations.
    Type: Grant
    Filed: November 7, 2012
    Date of Patent: October 19, 2021
    Assignee: NVIDIA Corporation
    Inventors: David Conrad Tannenbaum, Ming Y. Siu, Stuart F Oberman, Colin Sprinkle, Srinivasan Iyer, Ian Chi Yan Kwong
  • Patent number: 11126435
    Abstract: A processor device capable of raising a hit rate of branch destination prediction is provided. Every time a load instruction to a data cache is generated, an equivalent value judgment circuit judges accord/disaccord of present load data and previous load data from a corresponding line. In an N bit region, as history records, a judgment history record circuit records judgment results of N times by the equivalent value judgment circuit before a conditional branch instruction is generated. When the conditional branch instruction is generated, based on the history records in the N bit region, a branch prediction circuit predicts the same branch destination as the previous branch destination obtained by a previous execution result of the conditional branch instruction or a branch destination different from the previous destination. Further, the branch prediction circuit issues an instruction fetch direction of the predicted branch destination to a processor main-body circuit.
    Type: Grant
    Filed: March 8, 2018
    Date of Patent: September 21, 2021
    Assignee: RENESAS ELECTRONICS CORPORATION
    Inventor: Masanao Sasai
  • Patent number: 11113064
    Abstract: A processor core receives a request to execute application code including a trigger instruction and an instruction block that reads a row of data values from a data structure and outputs a data value from a function using the row as input. The data structure is divided into multiple portions and the trigger instruction indicates that multiple instances of the instruction block are to be executed concurrently. In response to the request and to identification of the instruction block and trigger instruction, the processor core generates multiple instances of a support block that causes independent repetitive execution of each instance of the instruction block until all rows of the corresponding portion of the data structure are used as input. The processor core assigns instances of the instruction and support blocks to multiple processor cores, and provides each instance of the instruction block with the corresponding portion of the data structure.
    Type: Grant
    Filed: November 27, 2020
    Date of Patent: September 7, 2021
    Assignee: SAS INSTITUTE INC.
    Inventors: Jack Joseph Rouse, Robert William Pratt, Jared Carl Erickson, Manoj Keshavmurthi Chari
  • Patent number: 11099852
    Abstract: An example apparatus comprises instruction execution circuitry and fetch circuitry to fetch, from memory, instructions for execution by the instruction execution circuitry. The fetch circuitry comprises a plurality of prediction components, each prediction component being configured to predict instructions in anticipation of the predicted instructions being required for execution by the instruction execution circuitry. The fetch circuitry is configured to fetch instructions in dependence on the predicting. The apparatus further comprises prediction tracking circuitry to maintain, for each of a plurality of execution regions, a prediction performance metric for each prediction component. The fetch circuitry is configured, based on at least one of the prediction performance metrics for a given execution region, to implement a prediction adjustment action in respect of at least one of the prediction components.
    Type: Grant
    Filed: October 25, 2018
    Date of Patent: August 24, 2021
    Assignee: ARM LIMITIED
    Inventors: Francisco João Feliciano Gaspar, Mohammadi Shabbirhussain Bharmal
  • Patent number: 11099849
    Abstract: An apparatus includes a branch target cache configured to store one or more branch addresses, a memory configured to store a return target stack, and a circuit. The circuit may be configured to determine, for a group of one or more fetched instructions, a prediction value indicative of whether the group includes a return instruction. In response to the prediction value indicating that the group includes a return instruction, the circuit may be further configured to select a return address from the return target stack. The circuit may also be configured to determine a hit or miss indication in the branch target cache for the group, and to, in response to receiving a miss indication from the branch target cache, select the return address as a target address for the return instruction.
    Type: Grant
    Filed: September 1, 2016
    Date of Patent: August 24, 2021
    Assignee: Oracle International Corporation
    Inventors: Yuan Chou, Manish Shah, Richa Aggarwal
  • Patent number: 11093249
    Abstract: In an embodiment, an apparatus includes a plurality of memories configured to store respective data in a plurality of branch prediction entries. Each branch prediction entry corresponds to at least one of a plurality of branch instructions. The apparatus also includes a control circuit configured to store first data associated with a first branch instruction into a corresponding branch prediction entry in at least one memory of the plurality of memories. The control circuit is further configured to select a first memory of the plurality of memories, to disconnect the first memory from a power supply in response to a detection of a first power mode signal, and to cease storing data in the plurality of memories in response to the detection of the first power mode signal.
    Type: Grant
    Filed: March 4, 2019
    Date of Patent: August 17, 2021
    Assignee: Apple Inc.
    Inventors: Conrado Blasco, Brett S. Feero, David Williamson, Ian D. Kountanis, Shih-Chieh Wen
  • Patent number: 11093251
    Abstract: Representative apparatus, method, and system embodiments are disclosed for configurable computing. In a representative embodiment, a system includes an interconnection network, a processor, a host interface, and a configurable circuit cluster. The configurable circuit cluster may include a plurality of configurable circuits arranged in an array; an asynchronous packet network and a synchronous network coupled to each configurable circuit of the array; and a memory interface circuit and a dispatch interface circuit coupled to the asynchronous packet network and to the interconnection network. Each configurable circuit includes instruction or configuration memories for selection of a current data path configuration, a master synchronous network input, and a data path configuration for a next configurable circuit.
    Type: Grant
    Filed: October 31, 2018
    Date of Patent: August 17, 2021
    Assignee: Micron Technology, Inc.
    Inventor: Tony M. Brewer
  • Patent number: 11093248
    Abstract: A computer system, processor, and method for processing information is disclosed that includes allocating a prefetch stream; providing a protection bubble to a plurality of cachelines for the allocated prefetch stream; accessing a cacheline; and preventing allocation of a different prefetch stream if the accessed cacheline is within the protection bubble. The system, processor and method in an aspect further includes providing a safety zone to a plurality of cachelines for the allocated prefetch stream, and advancing the prefetch stream if the accessed cacheline is one of the plurality of cachelines in the safety zone. In an embodiment, the number of cachelines within the safety zone is less than the number of cachelines in the protection bubble.
    Type: Grant
    Filed: September 10, 2018
    Date of Patent: August 17, 2021
    Assignee: International Business Machines Corporation
    Inventors: Vivek Britto, Mohit Karve, George W. Rohrbaugh, III, Brian W. Thompto