Instruction Issuing Patents (Class 712/214)
  • Patent number: 11914524
    Abstract: An electronic device includes one or more processors for executing one or more virtual machines. In response to a request for initiating a synchronization event, a processor identifies a subset of speculative memory access requests in one or more memory access request queues. Automatically and in accordance with the identifying, the processor purges translations associated with the subset of speculative memory access requests. Subsequent to the purging, the processor initiates the synchronization event. In some implementations, memory access completion is forced in response to a context synchronization event that corresponds to a termination of a first application, a termination of a first virtual machine, or a system call for updating a system register. Alternatively, in some implementations, memory access completion is forced in an operating system level or an application level in response to a data synchronization event that is initiated on a hypervisor layer or a firmware layer.
    Type: Grant
    Filed: March 1, 2022
    Date of Patent: February 27, 2024
    Assignee: QUALCOMM Incorporated
    Inventors: Adrian Montero, Huzefa Sanjeliwala, Paul Kitchin, Prarthna Santhanakrishnan, Conrado Blasco, Pradeep Kanapathipillai
  • Patent number: 11886882
    Abstract: Described herein are systems and methods for secure multithread execution. For example, some methods include fetching an instruction of a first thread from a memory into a processor pipeline that is configured to execute instructions from two or more threads in parallel using execution units of the processor pipeline; detecting that the instruction has been designated as a sensitive instruction; responsive to detection of the sensitive instruction, disabling execution of instructions of threads other than the first thread in the processor pipeline during execution of the sensitive instruction by an execution unit of the processor pipeline; executing the sensitive instruction using an execution unit of the processor pipeline; and, responsive to completion of execution of the sensitive instruction, enabling execution of instructions of threads other than the first thread in the processor pipeline.
    Type: Grant
    Filed: April 5, 2022
    Date of Patent: January 30, 2024
    Assignee: Marvell Asia Pte, Ltd.
    Inventor: Shubhendu Sekhar Mukherjee
  • Patent number: 11822786
    Abstract: Techniques for maintaining cache coherency comprising storing data blocks associated with a main process in a cache line of a main cache memory, storing a first local copy of the data blocks in a first local cache memory of a first processor, storing a second local copy of the set of data blocks in a second local cache memory of a second processor executing a first child process of the main process to generate first output data, writing the first output data to the first data block of the first local copy as a write through, writing the first output data to the first data block of the main cache memory as a part of the write through, transmitting an invalidate request to the second local cache memory, marking the second local copy of the set of data blocks as delayed, and transmitting an acknowledgment to the invalidate request.
    Type: Grant
    Filed: February 1, 2022
    Date of Patent: November 21, 2023
    Assignee: Texas Instruments Incorporated
    Inventors: Kai Chirca, Timothy David Anderson
  • Patent number: 11822510
    Abstract: Embodiments are directed to a processor having a functional slice architecture. The processor is divided into tiles (or functional units) organized into a plurality of functional slices. The functional slices are configured to perform specific operations within the processor, which includes memory slices for storing operand data and arithmetic logic slices for performing operations on received operand data (e.g., vector processing, matrix manipulation). The processor includes a plurality of functional slices of a module type, each functional slice having a plurality of tiles. The processor further includes a plurality of data transport lanes for transporting data in a direction indicated in a corresponding instruction. The processor also includes a plurality of instruction queues, each instruction queue associated with a corresponding functional slice of the plurality of functional slices, wherein the instructions in the instruction queues comprise a functional slice specific operation code.
    Type: Grant
    Filed: March 1, 2022
    Date of Patent: November 21, 2023
    Assignee: Groq, Inc.
    Inventors: Dennis Charles Abts, Jonathan Alexander Ross, John Thompson, Gregory Michael Thorson
  • Patent number: 11816061
    Abstract: A system includes a processing device that includes a vector arithmetic logic unit comprising a plurality of arithmetic logic units (ALUs), and a first processor core operatively coupled to the vector arithmetic logic unit, the processing device to receive a first vector instruction from the first processor core, wherein the first vector instruction specifies at least one first input vector having a first vector length, identify a first subset of the ALUs in view of the first vector length and one or more allocation criteria, execute, using the first subset of the set of ALUs, one or more first ALU operations specified by the first vector instruction, wherein the vector arithmetic logic unit executes the first ALU operations in parallel with one or more second ALU operations specified by a second vector instruction received from a second processor core.
    Type: Grant
    Filed: December 18, 2020
    Date of Patent: November 14, 2023
    Assignee: Red Hat, Inc.
    Inventor: Ulrich Drepper
  • Patent number: 11803389
    Abstract: A reach matrix scheduler circuit for scheduling instructions to be executed in a processor is disclosed. The scheduler circuit includes an N×R matrix wake-up circuit, where ‘N’ is the instruction window size of the scheduler circuit, and ‘R’ is the “reach” within the instruction window of the matrix wake-up circuit, with ‘R’ being less than ‘N’. A grant line associated with each instruction request entry in the N×R matrix wake-up circuit is coupled to ‘R’ other instruction entries among the ‘N’ instruction entries. When a producer instruction in an instruction request entry is ready for issuance, the grant line associated with the instruction request entry is activated so that any other instruction entries coupled to the grant line (i.e., within the “reach” of the instruction request entry) that consume the produced value generated by the producer instruction are “woken-up” and subsequently indicated as ready to be issued.
    Type: Grant
    Filed: January 9, 2020
    Date of Patent: October 31, 2023
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Yusuf Cagatay Tekmen, Rodney Wayne Smith, Douglas C. Burger, Gagan Gupta, Kiran Ravi Seth
  • Patent number: 11782721
    Abstract: Systems, apparatuses, and methods for organizing bits in a memory device are described. In a number of embodiments, an apparatus can include an array of memory cells, a data interface, a multiplexer coupled between the array of memory cells and the data interface, and a controller coupled to the array of memory cells, the controller configured to cause the apparatus to latch bits associated with a row of memory cells in the array in a number of sense amplifiers in a prefetch operation and send the bits from the sense amplifiers, through a multiplexer, to a data interface, which may include or be referred to as DQs. The bits may be sent to the DQs in a particular order that may correspond to a particular matrix configuration and may thus facilitate or reduce the complexity of arithmetic operations performed on the data.
    Type: Grant
    Filed: February 25, 2022
    Date of Patent: October 10, 2023
    Assignee: Micron Technology, Inc.
    Inventors: Glen E. Hush, Aaron P. Boehm, Fa-Long Luo
  • Patent number: 11755365
    Abstract: A method of scheduling tasks in a processor comprises receiving a plurality of tasks that are ready to be executed, i.e. all their dependencies have been met and all the resources required to execute the task are available, and adding the received tasks to a task queue (or “task pool”). The number of tasks that are executing is monitored and in response to determining that an additional task can be executed by the processor, a task is selected from the task pool based at least in part on a comparison of indications of resources used by tasks being executed and indications of resources used by individual tasks in the task pool and the selected task is then sent for execution.
    Type: Grant
    Filed: December 23, 2019
    Date of Patent: September 12, 2023
    Assignee: Imagination Technologies Limited
    Inventors: Isuru Herath, Richard Broadhurst
  • Patent number: 11734002
    Abstract: The present disclosure provides a counting device and counting method. The device includes a storage unit, a counting unit, and a register unit, where the storage unit may be connected to the counting unit for storing input data to be counted and storing a number of elements satisfying a given condition in the input data after counting; the register unit may be configured to store an address where input data to be counted is stored in the storage unit; and the counting unit may be connected to the register unit, and may be configured to acquire a counting instruction, read a storage address of the input data to be counted in the register unit according to the counting instruction, acquire corresponding input data to be counted in the storage unit, perform statistical counting on a number of elements in the input data to be counted that satisfy the given condition, and obtain a counting result.
    Type: Grant
    Filed: November 27, 2019
    Date of Patent: August 22, 2023
    Assignee: SHANGHAI CAMBRICON INFORMATION TECHNOLOGY CO., LTD
    Inventors: Tianshi Chen, Jie Wei, Tian Zhi, Zai Wang
  • Patent number: 11681567
    Abstract: The present disclosure relates to a method for a computer system comprising a plurality of processor cores including a first processor core and a second processor core, wherein a data item is exclusively assigned to the first processor core, of the plurality of processor cores, for executing an atomic primitive by the first processor core. The method includes receiving by the first processor core, from the second processor core, a request for accessing the data item, and in response to determining by the first processor core that the executing of the atomic primitive is not completed by the first processor core, returning a rejection message to the second processor core.
    Type: Grant
    Filed: May 9, 2019
    Date of Patent: June 20, 2023
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Ralf Winkelmann, Michael Fee, Matthias Klein, Carsten Otte, Edward W. Chencinski, Hanno Eichelberger
  • Patent number: 11669333
    Abstract: In certain aspects of the disclosure, an apparatus comprises a first scheduling pool associated with a first minimum scheduling latency and a second scheduling pool associated with a second minimum scheduling latency, the second minimum scheduling latency greater than the first minimum scheduling latency. A common instruction picker is coupled to both the first scheduling pool and the second scheduling pool. The common instruction picker may be configured to select a first instruction from the first scheduling pool and a second instruction from the second scheduling pool, and then choose either the first instruction or second instruction for dispatch according to a picking policy.
    Type: Grant
    Filed: April 26, 2018
    Date of Patent: June 6, 2023
    Assignee: Qualcomm Incorporated
    Inventors: Rodney Wayne Smith, Raghavan Madhavan, Luke Yen, Shivam Priyadarshi, Yusuf Cagatay Tekmen
  • Patent number: 11663130
    Abstract: Described herein are systems and methods for cache replacement mechanisms for speculative execution. For example, some systems include, a buffer comprising entries that are each configured to store a cache line of data and a tag that includes an indication of a status of the cache line stored in the entry, in an integrated circuit that is configured to: responsive to a cache miss caused by a load instruction that is speculatively executed by a processor pipeline, load a cache line of data corresponding to the cache miss into a first entry of the buffer and update the tag of the first entry to indicate the status is speculative; responsive to the load instruction being retired by the processor pipeline, update the tag to indicate the status is validated; and, responsive to the load instruction being flushed from the processor pipeline, update the tag to indicate the status is cancelled.
    Type: Grant
    Filed: April 30, 2021
    Date of Patent: May 30, 2023
    Assignee: Marvell Asia Pte, Ltd.
    Inventor: Rabin Sugumar
  • Patent number: 11599358
    Abstract: Methods and systems relating to improved processing architectures with pre-staged instructions are disclosed herein. A disclosed processor includes an instruction memory, at least one functional processing unit, a bus, a set of instruction registers configured to be loaded, using the bus, with a set of pre-staged instructions from the instruction memory, and a logic circuit configured to provide the set of pre-staged instructions from the set of instruction registers to the at least one functional processing unit in response to receiving an instruction from the instruction memory.
    Type: Grant
    Filed: August 12, 2021
    Date of Patent: March 7, 2023
    Assignee: Tenstorrent Inc.
    Inventors: Miles Robert Dooley, Milos Trajkovic, Rakesh Shaji Lal, Stanislav Sokorac
  • Patent number: 11586267
    Abstract: Embodiments of the present disclosure relate to managing power provided to a semiconductor circuit to prevent undervoltage conditions. A measured voltage value describing a measured supply voltage at a first subcircuit of a semiconductor circuit can be received, the measured voltage value having a first resolution. A selected metric indicative of a supply voltage present at the first subcircuit can be received, the selected metric having a second resolution higher than the first resolution. The selected metric is calibrated to obtain a calibrated metric when a transition of the measured voltage value occurs.
    Type: Grant
    Filed: December 19, 2018
    Date of Patent: February 21, 2023
    Assignee: International Business Machines Corporation
    Inventors: Thomas Strach, Preetham M. Lobo, Tobias Webel
  • Patent number: 11579944
    Abstract: In one embodiment, a processor includes: a plurality of cores each comprising a multi-threaded core to concurrently execute a plurality of threads; and a control circuit to concurrently enable at least one of the plurality of cores to operate in a single-threaded mode and at least one other of the plurality of cores to operate in a multi-threaded mode. Other embodiments are described and claimed.
    Type: Grant
    Filed: November 14, 2018
    Date of Patent: February 14, 2023
    Assignee: Intel Corporation
    Inventors: Daniel J. Ragland, Guy M. Therien, Ankush Varma, Eric J. DeHaemer, David T. Mayo, Ariel Gur, Yoav Ben-Raphael, Mark P. Seconi
  • Patent number: 11567764
    Abstract: Methods and systems relating to improved processing architectures with pre-staged instructions are disclosed herein. A disclosed processor includes an instruction memory, at least one functional processing unit, a bus, a set of instruction registers configured to be loaded, using the bus, with a set of pre-staged instructions from the instruction memory, and a logic circuit configured to provide the set of pre-staged instructions from the set of instruction registers to the at least one functional processing unit in response to receiving an instruction from the instruction memory.
    Type: Grant
    Filed: August 12, 2021
    Date of Patent: January 31, 2023
    Assignee: Tenstorrent Inc.
    Inventors: Miles Robert Dooley, Milos Trajkovic, Rakesh Shaji Lal, Stanislav Sokorac
  • Patent number: 11561882
    Abstract: An apparatus and method are provided for generating and processing a trace stream indicative of instruction execution by processing circuitry. An apparatus has an input interface for receiving instruction execution information from the processing circuitry indicative of a sequence of instructions executed by the processing circuitry, and trace generation circuitry for generating from the instruction execution information a trace stream comprising a plurality of trace elements indicative of execution by the processing circuitry of instruction flow changing instructions within the sequence.
    Type: Grant
    Filed: August 9, 2017
    Date of Patent: January 24, 2023
    Assignee: Arm Limited
    Inventors: François Christopher Jacques Botman, Thomas Christopher Grocutt, John Michael Horley, Michael John Williams, Michael John Gibbs
  • Patent number: 11526361
    Abstract: Devices and techniques for variable pipeline length in a barrel-multithreaded processor are described herein. A completion time for an instruction can be determined prior to insertion into a pipeline of a processor. A conflict between the instruction and a different instruction based on the completion time can be detected. Here, the different instruction is already in the pipeline and the conflict detected when the completion time equals the previously determined completion time for the different instruction. A difference between the completion time and an unconflicted completion time can then be calculated and completion of the instruction delayed by the difference.
    Type: Grant
    Filed: October 20, 2020
    Date of Patent: December 13, 2022
    Assignee: Micron Technology, Inc.
    Inventor: Tony Brewer
  • Patent number: 11513802
    Abstract: An electronic device includes a processor having a micro-operation queue, multiple scheduler entries, and scheduler compression logic. When a pair of micro-operations in the micro-operation queue is compressible in accordance with one or more compressibility rules, the scheduler compression logic acquires the pair of micro-operations from the micro-operation queue and stores information from both micro-operations of the pair of micro-operations into different portions in a single scheduler entry. In this way, the scheduler compression logic compresses the pair of micro-operations into the single scheduler entry.
    Type: Grant
    Filed: September 27, 2020
    Date of Patent: November 29, 2022
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Michael W. Boyer, John Kalamatianos, Pritam Majumder
  • Patent number: 11507412
    Abstract: A disclosed example apparatus includes memory; and processor circuitry to: identify a lock-protected section of instructions in the memory; replace lock/unlock instructions with transactional lock acquire and transactional lock release instructions to form a transactional process; and execute the transactional process in a speculative execution.
    Type: Grant
    Filed: April 28, 2020
    Date of Patent: November 22, 2022
    Assignee: Intel Corporation
    Inventors: Keqiang Wu, Jiwei Lu, Koichi Yamada, Yong-Fong Lee
  • Patent number: 11500632
    Abstract: In a processor device according to the present invention, a memory access unit reads data to be processed from an external memory and writes the data to a first register group that a plurality of processors does not access among a plurality of register groups. A control unit sequentially makes each of the plurality of processors implement a same instruction, in parallel with changing an address of a register group that stores the data to be processed. A scheduler, based on specified scenario information, specifies an instruction to be implemented and a register group to be accessed for the plurality of processors, and specifies a register group to be written to among the plurality of register groups and data to be processed that is to be written for the memory access unit.
    Type: Grant
    Filed: April 23, 2019
    Date of Patent: November 15, 2022
    Assignee: ArchiTek Corporation
    Inventor: Shuichi Takada
  • Patent number: 11481216
    Abstract: Techniques for executing an atomic command in a distributed computing network are provided. A core cluster, including a plurality of processing cores that do not natively issue atomic commands to the distributed computing network, is coupled to a translation unit. To issue an atomic command, a core requests a location in the translation unit to write an opcode and operands for the atomic command. The translation unit identifies a location (a “window”) that is not in use by another atomic command and indicates the location to the processing core. The processing core writes the opcode and operands into the window and indicates to the translation unit that the atomic command is ready. The translation generates an atomic command and issues the command to the distributed computing network for execution. After execution, the distributed computing network provides a response to the translation unit, which provides that response to the core.
    Type: Grant
    Filed: September 10, 2018
    Date of Patent: October 25, 2022
    Assignee: Advanced Micro Devices, Inc.
    Inventor: Stanley Ames Lackey, Jr.
  • Patent number: 11422817
    Abstract: A method and apparatus for executing an instruction are provided. In the method, an instruction queue is first generated, and an instruction from the instruction queue in preset order is acquired. Then, a sending step including: determining a type of the acquired instruction; determining, in response to determining that the acquired instruction is an arithmetic instruction, an executing component for executing the arithmetic instruction from an executing component set; and sending the arithmetic instruction to the determined executing component is executed. Last, in response to determining that the acquired instruction is a blocking instruction, a next instruction is acquired after receiving a signal for instructing an instruction associated with the blocking instruction being completely executed.
    Type: Grant
    Filed: July 1, 2019
    Date of Patent: August 23, 2022
    Assignee: Kunlunxin Technology (Beijing) Company Limited
    Inventors: Jing Wang, Wei Qi, Yupeng Li, Xiaozhang Gong
  • Patent number: 11392537
    Abstract: Exemplary reach-based explicit dataflow processors and related computer-readable media and methods. The reach-based explicit dataflow processors are configured to support execution of producer instructions encoded with explicit naming of consumer instructions intended to consume the values produced by the producer instructions. The reach-based explicit dataflow processors are configured to make available produced values as inputs to explicitly named consumer instructions as a result of processing producer instructions. The reach-based explicit dataflow processors support execution of a producer instruction that explicitly names a consumer instruction based on using the producer instruction as a relative reference point from the producer instruction.
    Type: Grant
    Filed: March 18, 2019
    Date of Patent: July 19, 2022
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Gagan Gupta, Michael Scott McIlvaine, Rodney Wayne Smith, Thomas Philip Speier, David Tennyson Harper, III
  • Patent number: 11392387
    Abstract: Predicting load-based control independent (CI), register data independent (DI) (CIRDI) instructions as CI memory data dependent (DD) (CIMDD) instructions for replay in speculative misprediction recovery in a processor. The processor predicts if a source of a load-based CIRDI instruction will be forwarded by a store-based instruction (i.e. “store-forwarded”). If a load-based CIRDI instruction is predicted as store-forwarded, the load-based CIRDI instruction is considered a CIMDD instruction and is replayed in misprediction recovery. If a load-based CIRDI instruction is not predicted as store-forwarded, the processor considers such load-based CIRDI instruction as a pending load-based CIRDI instruction. If this pending load-based CIRDI instruction is determined in execution to be store-forwarded, the instruction pipeline is flushed and the pending load-based CIRDI instruction is also replayed in misprediction recovery.
    Type: Grant
    Filed: November 4, 2020
    Date of Patent: July 19, 2022
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Vignyan Reddy Kothinti Naresh, Arthur Perais, Rami Mohammad Al Sheikh, Shivam Priyadarshi
  • Patent number: 11366691
    Abstract: A method of scheduling instructions within a parallel processing unit is described. The method comprises decoding, in an instruction decoder, an instruction in a scheduled task in an active state, and checking, by an instruction controller, if an ALU targeted by the decoded instruction is a primary instruction pipeline. If the targeted ALU is a primary instruction pipeline, a list associated with the primary instruction pipeline is checked to determine whether the scheduled task is already included in the list. If the scheduled task is already included in the list, the decoded instruction is sent to the primary instruction pipeline.
    Type: Grant
    Filed: December 1, 2020
    Date of Patent: June 21, 2022
    Assignee: Imagination Technologies Limited
    Inventors: Simon Nield, Yoong-Chert Foo, Adam de Grasse, Luca Iuliano
  • Patent number: 11360536
    Abstract: The vector data path is divided into smaller vector lanes. A register such as a memory mapped control register stores a vector lane number (VLX) indicating the number of vector lanes to be powered. A decoder converts this VLX into a vector lane control word, each bit controlling the ON of OFF state of the corresponding vector lane. This number of contiguous least significant vector lanes are powered. In the preferred embodiment the stored data VLX indicates that 2VLX contiguous least significant vector lanes are to be powered. Thus the number of vector lanes powered is limited to an integral power of 2. This manner of coding produces a very compact controlling bit field while obtaining substantially all the power saving advantage of individually controlling the power of all vector lanes.
    Type: Grant
    Filed: August 3, 2020
    Date of Patent: June 14, 2022
    Assignee: Texas Instruments Incorporated
    Inventors: Timothy David Anderson, Duc Quang Bui
  • Patent number: 11327760
    Abstract: A method for grouping computer instructions includes receiving a set of computer instructions, grouping the set of computer instructions by register dependencies, identifying a plurality of single-definition-use flow (SDF) bundles based on a burstization criteria and a chaining criteria; and based on the SDF bundles, transforming the set of computer instructions. The transformation may include splitting one of the set of computer instructions and setting a burst parameter for the one of the set of computer instruction. The transformation may include grouping a plurality of the set of computer instructions and replacing a pair of register file accesses with a pair of temporary register accesses.
    Type: Grant
    Filed: April 9, 2020
    Date of Patent: May 10, 2022
    Assignee: HUAWEI TECHNOLOGIES CO., LTD.
    Inventors: Andrew Siu Doug Lee, Ahmed Mohammed Elshafiey Mohammed Eltantawy
  • Patent number: 11327791
    Abstract: An apparatus provides an issue queue having a first section and a second section. Each entry in each section stores operation information identifying an operation to be performed. Allocation circuitry allocates each item of received operation information to an entry in the first section or the second section. Selection circuitry selects from the issue queue, during a given selection iteration, an operation from amongst the operations whose required source operands are available. Availability update circuitry updates source operand availability for each entry whose operation information identifies as a source operand a destination operand of the selected operation in the given selection iteration. A deferral mechanism inhibits from selection, during a next selection iteration, any operation associated with an entry in the second section whose source operands are now available due to that operation having as a source operand the destination operand of the selected operation in the given selection iteration.
    Type: Grant
    Filed: August 21, 2019
    Date of Patent: May 10, 2022
    Assignee: Arm Limited
    Inventors: Michael David Achenbach, Robert Greg McDonald, Nicholas Andrew Pfister, Kelvin Domnic Goveas, Michael Filippo, . Abhishek Raja, Zachary Allen Kingsbury
  • Patent number: 11327766
    Abstract: A method of instruction dispatch routing comprises receiving an instruction for dispatch to one of a plurality of issue queues; determining a priority status of the instruction; selecting a rotation order based on the priority status, wherein a first rotation order is associated with priority instructions and a second rotation order, different from the first rotation order, is associated with non-priority instructions; selecting an issue queue of the plurality of issue queues based on the selected rotation order; and dispatching the instruction to the selected issue queue.
    Type: Grant
    Filed: July 31, 2020
    Date of Patent: May 10, 2022
    Assignee: International Business Machines Corporation
    Inventors: Eric Mark Schwarz, Brian W. Thompto, Kurt A. Feiste, Michael Joseph Genden, Dung Q. Nguyen, Susan E. Eisen
  • Patent number: 11321019
    Abstract: An event-processing unit for processing tokens associated with a state or state transition, herein also referred to as an event, of an external device is disclosed. The EPU allows token-processing schemes, in which the processing of incoming tokens and the further handling of a processing result by the EPU are determined not only by the token identifier, but also by the payload data of the incoming token or by data in the data memory. A flag-processing capability of a processing-control stage allows applying flag-processing operations such as logical operations to data obtained as a processing result of an ALU-processing operation. The result of these operations determines a subsequent handling of ALU-result data by the EPU. Thus, whether or not the ALU-result data is written to the data memory also influences the processing of any subsequent incoming tokens for which that data is used in the ALU-processing operation.
    Type: Grant
    Filed: September 11, 2020
    Date of Patent: May 3, 2022
    Assignee: ACCEMIC TECHNOLOGIES GMBH
    Inventor: Alexander Weiss
  • Patent number: 11301252
    Abstract: A data processing apparatus is provided comprising: a plurality of input lanes and a plurality of corresponding output lanes. Processing circuitry executes a first vector instruction and a second vector instruction. The first vector instruction specifies a target of output data from the corresponding output lanes that is specified as a source of input data to the input lanes by the second vector instruction. Mask circuitry stores a first mask that defines a first set of the output lanes that are valid for the first vector instruction, and stores a second mask that defines a second set of the output lanes that are valid for the second vector instruction. The first set and the second set are mutually exclusive. Issue circuitry begins processing of the second vector instruction at a lane index prior to completion of the first vector instruction at the lane index.
    Type: Grant
    Filed: January 15, 2020
    Date of Patent: April 12, 2022
    Assignee: Arm Limited
    Inventor: Kim Richard Schuttenberg
  • Patent number: 11275644
    Abstract: Techniques facilitating voltage droop reduction and/or mitigation in a processor core are provided. In one example, a system can comprise a memory that stores, and a processor that executes, computer executable components. The computer executable components can comprise an observation component that detects one or more events at a first stage of a processor pipeline. An event of the one or more events can be a defined event determined to increase a level of power consumed during a second stage of the processor pipeline. The computer executable components can also comprise an instruction component that applies a voltage droop mitigation countermeasure prior to the increase of the level of power consumed during the second stage of the processor pipeline and a feedback component that provides a notification to the instruction component that indicates a success or a failure of a result of the voltage droop mitigation countermeasure.
    Type: Grant
    Filed: December 6, 2019
    Date of Patent: March 15, 2022
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Giora Biran, Pradip Bose, Alper Buyuktosunoglu, Pierce I-Jen Chuang, Preetham M. Lobo, Ramon Bertran Monfort, Phillip John Restle, Christos Vezyrtzis, Tobias Webel
  • Patent number: 11269646
    Abstract: Apparatuses and methods for instruction scheduling in an out-of-order decoupled access-execute processor are disclosed. The instructions for the decoupled access-execute processor comprises access instructions and execute instructions, where access instructions comprise load instructions and instructions which provide operand values to load instructions. Schedule patterns of groups of linked execute instructions are monitored, where the execute instructions in a group of linked execute instructions are linked by data dependencies. On the basis of an identified repeating schedule pattern configurable execution circuitry adopts a configuration to perform the operations defined by the group of linked execute instructions of the repeating schedule pattern.
    Type: Grant
    Filed: March 29, 2021
    Date of Patent: March 8, 2022
    Assignee: Arm Limited
    Inventors: Mbou Eyole, Michiel Willem Van Tol
  • Patent number: 11212590
    Abstract: Approaches for performing all DOCSIS downstream and upstream data forwarding functions using executable software. DOCSIS data forwarding functions may be performed by classifying one or more packets, of a plurality of received packets, to a particular DOCSIS system component, and then processing the one or more packets classified to the same DOCSIS system component on a single CPU core. The one or more packets may be forwarded between a sequence of one or more software stages. The software stages may each be configured to execute on separate logical cores or on a single logical core.
    Type: Grant
    Filed: July 10, 2017
    Date of Patent: December 28, 2021
    Assignee: Harmonic, Inc.
    Inventors: Adam Levy, Pavlo Shcherbyna, Alex Muller, Vladyslav Buslov, Victoria Sinitsky, Michael W. Patrick, Nitin Sasi Kumar
  • Patent number: 11188341
    Abstract: In one embodiment, an apparatus includes: a plurality of execution lanes to perform parallel execution of instructions; and a unified symbolic store address buffer coupled to the plurality of execution lanes, the unified symbolic store address buffer comprising a plurality of entries each to store a symbolic store address for a store instruction to be executed by at least some of the plurality of execution lanes. Other embodiments are described and claimed.
    Type: Grant
    Filed: March 26, 2019
    Date of Patent: November 30, 2021
    Assignee: Intel Corporation
    Inventors: Jeffrey J. Cook, Srikanth T. Srinivasan, Jonathan D. Pearce, David B. Sheffield
  • Patent number: 11188681
    Abstract: An approach is provided in which an information handling system loads a set of encrypted binary code into a processor that has been encrypted based upon a unique key of the processor. The processor includes an instruction decoder that transforms the set of encrypted binary code into a set of instruction control signals using the unique key. In turn, the processor executes a set of instructions based on the set of instruction control signals.
    Type: Grant
    Filed: April 8, 2019
    Date of Patent: November 30, 2021
    Assignee: International Business Machines Corporation
    Inventors: Guy M. Cohen, Shai Halevi, Lior Horesh
  • Patent number: 11182167
    Abstract: A method to determine an oldest instruction in an instruction queue of a processor with multiple instruction threads, wherein each of the multiple instruction threads have a unique thread identifier. The method includes tagging each instruction thread, of the multiple instruction threads, in the instruction queue with a unique tag number according to a round-robin scheme, wherein the unique tag number includes the unique thread identifier for each instruction thread and a round number in the round-robin scheme. The method further includes selecting, for each instruction thread, of the multiple instruction threads, the instruction thread with a lowest tag number from the multiple instruction threads in the instruction queue that are tagged with an oldest round number from the round-robin scheme.
    Type: Grant
    Filed: March 15, 2019
    Date of Patent: November 23, 2021
    Assignee: International Business Machines Corporation
    Inventors: Arni Ingimundarson, Maarten J. Boersma, Niels Fricke
  • Patent number: 11175916
    Abstract: A system and method for a lightweight fence is described. In particular, micro-operations including a fencing micro-operation are dispatched to a load queue. The fencing micro-operation allows micro-operations younger than the fencing micro-operation to execute, where the micro-operations are related to a type of fencing micro-operation. The fencing micro-operation is executed if the fencing micro-operation is the oldest memory access micro-operation, where the oldest memory access micro-operation is related to the type of fencing micro-operation. The fencing micro-operation determines whether micro-operations younger than the fencing micro-operation have load ordering violations and if load ordering violations are detected, the fencing micro-operation signals the retire queue that instructions younger than the fencing micro-operation should be flushed. The instructions to be flushed should include all micro-operations with load ordering violations.
    Type: Grant
    Filed: December 19, 2017
    Date of Patent: November 16, 2021
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Gregory W. Smaus, John M. King
  • Patent number: 11163576
    Abstract: A system and method for efficiently preventing visible side-effects in the memory hierarchy during speculative execution is disclosed. Hiding the side-effects of executed instructions in the whole memory hierarchy is both expensive, in terms of performance and energy, and complicated. A system and method is disclosed to hide the side-effects of speculative loads in the cache(s) until the earliest time these speculative loads become non-speculative. A refinement is disclosed where loads that hit in the L1 cache are allowed to proceed by keeping their side-effects on the L1 cache hidden until these loads become non-speculative, and all other speculative loads that miss in the cache(s) are prevented from executing until they become non-speculative. To limit the performance deterioration caused by these delayed loads, a system and method is disclosed that augments the cache(s) with a value predictor or a re-computation engine that supplies predicted or recomputed values to the loads that missed in the cache(s).
    Type: Grant
    Filed: March 20, 2020
    Date of Patent: November 2, 2021
    Assignee: ETA SCALE AB
    Inventors: Christos Sakalis, Stefanos Kaxiras, Alberto Ros, Alexandra Jimborean, Magnus Själander
  • Patent number: 11150961
    Abstract: Methods, systems and apparatuses for graph processing are disclosed. One graph streaming processor includes a thread manager, wherein the thread manager is operative to dispatch operation of the plurality of threads of a plurality of thread processors before dependencies of the dependent threads have been resolved, maintain a scorecard of operation of the plurality of threads of the plurality of thread processors, and provide an indication to at least one of the plurality of thread processors when a dependency between the at least one of the plurality of threads that a request has or has not been satisfied. Further, a producer thread provides a response to the dependency when the dependency has been satisfied, and each of the plurality of thread processors is operative to provide processing updates to the thread manager, and provide queries to the thread manager upon reaching a dependency.
    Type: Grant
    Filed: February 8, 2019
    Date of Patent: October 19, 2021
    Assignee: Blaize, Inc.
    Inventors: Lokesh Agarwal, Sarvendra Govindammagari, Venkata Ganapathi Puppala, Satyaki Koneru
  • Patent number: 11144317
    Abstract: An AC parallelization circuit includes a transmitting circuit configured to transmit a stop signal to instruct a device for executing calculation in an iteration immediately preceding an iteration for which a concerned device is responsible to stop the calculation in loop-carried dependency calculation; and an estimating circuit configured to generate, as a result of executing the calculation in the preceding iteration, an estimated value to be provided to an arithmetic circuit when the transmitting circuit transmits the stop signal.
    Type: Grant
    Filed: August 20, 2020
    Date of Patent: October 12, 2021
    Assignee: FUJITSU LIMITED
    Inventor: Hisanao Akima
  • Patent number: 11119774
    Abstract: A system and/or method for processing information is disclosed that has at least one processor; a register file associated with the processor, the register file sliced into a plurality of STF blocks having a plurality of STF entries, and in an embodiment, each STF block is further partitioned into a plurality of sub-blocks, each sub-block having a different portion of the plurality of STF entries; and a plurality of execution units configured to read data from and write data to the register file, where the plurality of execution units are arranged in one or more execution slices. In one or more embodiments, the system is configured so that each execution slice has a plurality of STF blocks, and alternatively or additionally, each of the plurality of execution units in a single execution slice is assigned to write to one, and preferably only one, of the plurality of STF blocks.
    Type: Grant
    Filed: September 6, 2019
    Date of Patent: September 14, 2021
    Assignee: International Business Machines Corporation
    Inventors: Brian W. Thompto, Dung Q. Nguyen, Hung Q. Le, Sam Gat-Shang Chu
  • Patent number: 11115964
    Abstract: A system and method of auto-detection of WLAN packets includes transmitting in a 60 GHz frequency band a wireless packet comprising a first header, a second header, a payload, and a training field, the first header carrying a plurality of bits, a logical value of a subset of the plurality of bits in the first header indicating the presence of the second header in the wireless packet.
    Type: Grant
    Filed: February 12, 2016
    Date of Patent: September 7, 2021
    Assignee: Huawei Technologies Co., Ltd.
    Inventors: Yan Xin, Osama Aboul-Magd, Jung Hoon Suh
  • Patent number: 11112846
    Abstract: Embodiments of the present disclosure relate to detecting undervoltage conditions at a subcircuit. A power supply current of a first subcircuit is determined over a first number of previous clock cycles. A cross current flowing between the first subcircuit and a second subcircuit is determined over the first number of previous clock cycles. An estimated momentary supply voltage present at the first subcircuit is then determined based on the power supply current of the first subcircuit over the first number of previous clock cycles and the cross current flowing between the first subcircuit and the second subcircuit over the first number of previous clock cycles.
    Type: Grant
    Filed: December 19, 2018
    Date of Patent: September 7, 2021
    Assignee: International Business Machines Corporation
    Inventors: Thomas Strach, Preetham M. Lobo, Tobias Webel
  • Patent number: 11100390
    Abstract: A deep neural network (DNN) processor is configured to execute layer descriptors in layer descriptor lists. The descriptors define instructions for performing a forward pass of a DNN by the DNN processor. The layer descriptors can also be utilized to manage the flow of descriptors through the DNN module. For example, layer descriptors can define dependencies upon other descriptors. Descriptors defining a dependency will not execute until the descriptors upon which they are dependent have completed. Layer descriptors can also define a “fence,” or barrier, function that can be used to prevent the processing of upstream layer descriptors until the processing of all downstream layer descriptors is complete. The fence bit guarantees that there are no other layer descriptors in the DNN processing pipeline before the layer descriptor that has the fence to be asserted is processed.
    Type: Grant
    Filed: April 11, 2018
    Date of Patent: August 24, 2021
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Chad Balling McBride, Amol Ashok Ambardekar, Kent D. Cedola, George Petre, Larry Marvin Wall, Boris Bobrov
  • Patent number: 11093245
    Abstract: A computer system and a memory access technology are provided. In the computer system, when load/store instructions having a dependency relationship is processed, dependency information between a producer load/store instruction and a consumer load/store instruction can be obtained from a processor. A consumer load/store request is sent to a memory controller in the computer system based on the obtained dependency information, so that the memory controller can terminate a dependency relationship between load/store requests in the memory controller locally based on the dependency information in the received consumer load/store request, and execute the consumer load/store request.
    Type: Grant
    Filed: June 12, 2019
    Date of Patent: August 17, 2021
    Assignee: Huawei Technologies Co., Ltd.
    Inventors: Lei Fang, Xi Chen, Weiguang Cai
  • Patent number: 11086628
    Abstract: A system and method for load queue (LDQ) and store queue (STQ) entry allocations at address generation time that maintains age-order of instructions is described. In particular, writing LDQ and STQ entries are delayed until address generation time. This allows the load and store operations to dispatch, and younger operations (which may not be store and load operations) to also dispatch and execute their instructions. The address generation of the load or store operation is held at an address generation scheduler queue (AGSQ) until a load or store queue entry is available for the operation. The tracking of load queue entries or store queue entries is effectively being done in the AGSQ instead of at the decode engine. The LDQ and STQ depth is not visible from a decode engine's perspective, and increases the effective processing and queue depth.
    Type: Grant
    Filed: August 15, 2016
    Date of Patent: August 10, 2021
    Assignee: Advanced Micro Devices, Inc.
    Inventor: John M. King
  • Patent number: 11080055
    Abstract: Techniques are disclosed relating to arbitration among register file accesses. In some embodiments, an apparatus includes a register file configured to store operands for multiple client circuits and arbitration circuitry configured to select from among multiple received requests to access the register file. In some embodiments, the apparatus includes first interface circuitry configured to provide access requests from a first client circuit to the arbitration circuitry and supplemental interface circuitry configured to receive unsuccessful requests from the first client circuit and provide the received unsuccessful requests to the arbitration circuitry. The supplemental interface circuitry may provide additional catch-up bandwidth to clients that lose arbitration, which may result in fairness during bandwidth shortages.
    Type: Grant
    Filed: August 22, 2019
    Date of Patent: August 3, 2021
    Assignee: Apple Inc.
    Inventors: Robert D. Kenney, Terence M. Potter
  • Patent number: 11074079
    Abstract: A method of providing instructions to computer processing apparatus for improved event handling comprises the following. Instructions for execution on the computer processing apparatus are provided to an event processor generator. These instructions comprise a plurality of functional steps, a set of dependencies between the functional steps, and configuration data. The event processor generator creates instances of the functional steps from the instructions and represents the instances as directed acyclic graphs. The event processor generator identifies a plurality of event types and topologically sort is the directed acyclic graphs to determine a topologically ordered event path for each event type. The event processor generator then provides a revised set of instructions for execution on the computer processing apparatus in which original instructions have been replaced by instructions requiring each event type to be executed according to its topologically ordered event path.
    Type: Grant
    Filed: November 17, 2017
    Date of Patent: July 27, 2021
    Inventor: Greg Higgins