Instruction Issuing Patents (Class 712/214)
-
Patent number: 11599358Abstract: Methods and systems relating to improved processing architectures with pre-staged instructions are disclosed herein. A disclosed processor includes an instruction memory, at least one functional processing unit, a bus, a set of instruction registers configured to be loaded, using the bus, with a set of pre-staged instructions from the instruction memory, and a logic circuit configured to provide the set of pre-staged instructions from the set of instruction registers to the at least one functional processing unit in response to receiving an instruction from the instruction memory.Type: GrantFiled: August 12, 2021Date of Patent: March 7, 2023Assignee: Tenstorrent Inc.Inventors: Miles Robert Dooley, Milos Trajkovic, Rakesh Shaji Lal, Stanislav Sokorac
-
Patent number: 11586267Abstract: Embodiments of the present disclosure relate to managing power provided to a semiconductor circuit to prevent undervoltage conditions. A measured voltage value describing a measured supply voltage at a first subcircuit of a semiconductor circuit can be received, the measured voltage value having a first resolution. A selected metric indicative of a supply voltage present at the first subcircuit can be received, the selected metric having a second resolution higher than the first resolution. The selected metric is calibrated to obtain a calibrated metric when a transition of the measured voltage value occurs.Type: GrantFiled: December 19, 2018Date of Patent: February 21, 2023Assignee: International Business Machines CorporationInventors: Thomas Strach, Preetham M. Lobo, Tobias Webel
-
Patent number: 11579944Abstract: In one embodiment, a processor includes: a plurality of cores each comprising a multi-threaded core to concurrently execute a plurality of threads; and a control circuit to concurrently enable at least one of the plurality of cores to operate in a single-threaded mode and at least one other of the plurality of cores to operate in a multi-threaded mode. Other embodiments are described and claimed.Type: GrantFiled: November 14, 2018Date of Patent: February 14, 2023Assignee: Intel CorporationInventors: Daniel J. Ragland, Guy M. Therien, Ankush Varma, Eric J. DeHaemer, David T. Mayo, Ariel Gur, Yoav Ben-Raphael, Mark P. Seconi
-
Patent number: 11567764Abstract: Methods and systems relating to improved processing architectures with pre-staged instructions are disclosed herein. A disclosed processor includes an instruction memory, at least one functional processing unit, a bus, a set of instruction registers configured to be loaded, using the bus, with a set of pre-staged instructions from the instruction memory, and a logic circuit configured to provide the set of pre-staged instructions from the set of instruction registers to the at least one functional processing unit in response to receiving an instruction from the instruction memory.Type: GrantFiled: August 12, 2021Date of Patent: January 31, 2023Assignee: Tenstorrent Inc.Inventors: Miles Robert Dooley, Milos Trajkovic, Rakesh Shaji Lal, Stanislav Sokorac
-
Patent number: 11561882Abstract: An apparatus and method are provided for generating and processing a trace stream indicative of instruction execution by processing circuitry. An apparatus has an input interface for receiving instruction execution information from the processing circuitry indicative of a sequence of instructions executed by the processing circuitry, and trace generation circuitry for generating from the instruction execution information a trace stream comprising a plurality of trace elements indicative of execution by the processing circuitry of instruction flow changing instructions within the sequence.Type: GrantFiled: August 9, 2017Date of Patent: January 24, 2023Assignee: Arm LimitedInventors: François Christopher Jacques Botman, Thomas Christopher Grocutt, John Michael Horley, Michael John Williams, Michael John Gibbs
-
Patent number: 11526361Abstract: Devices and techniques for variable pipeline length in a barrel-multithreaded processor are described herein. A completion time for an instruction can be determined prior to insertion into a pipeline of a processor. A conflict between the instruction and a different instruction based on the completion time can be detected. Here, the different instruction is already in the pipeline and the conflict detected when the completion time equals the previously determined completion time for the different instruction. A difference between the completion time and an unconflicted completion time can then be calculated and completion of the instruction delayed by the difference.Type: GrantFiled: October 20, 2020Date of Patent: December 13, 2022Assignee: Micron Technology, Inc.Inventor: Tony Brewer
-
Patent number: 11513802Abstract: An electronic device includes a processor having a micro-operation queue, multiple scheduler entries, and scheduler compression logic. When a pair of micro-operations in the micro-operation queue is compressible in accordance with one or more compressibility rules, the scheduler compression logic acquires the pair of micro-operations from the micro-operation queue and stores information from both micro-operations of the pair of micro-operations into different portions in a single scheduler entry. In this way, the scheduler compression logic compresses the pair of micro-operations into the single scheduler entry.Type: GrantFiled: September 27, 2020Date of Patent: November 29, 2022Assignee: Advanced Micro Devices, Inc.Inventors: Michael W. Boyer, John Kalamatianos, Pritam Majumder
-
Patent number: 11507412Abstract: A disclosed example apparatus includes memory; and processor circuitry to: identify a lock-protected section of instructions in the memory; replace lock/unlock instructions with transactional lock acquire and transactional lock release instructions to form a transactional process; and execute the transactional process in a speculative execution.Type: GrantFiled: April 28, 2020Date of Patent: November 22, 2022Assignee: Intel CorporationInventors: Keqiang Wu, Jiwei Lu, Koichi Yamada, Yong-Fong Lee
-
Patent number: 11500632Abstract: In a processor device according to the present invention, a memory access unit reads data to be processed from an external memory and writes the data to a first register group that a plurality of processors does not access among a plurality of register groups. A control unit sequentially makes each of the plurality of processors implement a same instruction, in parallel with changing an address of a register group that stores the data to be processed. A scheduler, based on specified scenario information, specifies an instruction to be implemented and a register group to be accessed for the plurality of processors, and specifies a register group to be written to among the plurality of register groups and data to be processed that is to be written for the memory access unit.Type: GrantFiled: April 23, 2019Date of Patent: November 15, 2022Assignee: ArchiTek CorporationInventor: Shuichi Takada
-
Patent number: 11481216Abstract: Techniques for executing an atomic command in a distributed computing network are provided. A core cluster, including a plurality of processing cores that do not natively issue atomic commands to the distributed computing network, is coupled to a translation unit. To issue an atomic command, a core requests a location in the translation unit to write an opcode and operands for the atomic command. The translation unit identifies a location (a “window”) that is not in use by another atomic command and indicates the location to the processing core. The processing core writes the opcode and operands into the window and indicates to the translation unit that the atomic command is ready. The translation generates an atomic command and issues the command to the distributed computing network for execution. After execution, the distributed computing network provides a response to the translation unit, which provides that response to the core.Type: GrantFiled: September 10, 2018Date of Patent: October 25, 2022Assignee: Advanced Micro Devices, Inc.Inventor: Stanley Ames Lackey, Jr.
-
Patent number: 11422817Abstract: A method and apparatus for executing an instruction are provided. In the method, an instruction queue is first generated, and an instruction from the instruction queue in preset order is acquired. Then, a sending step including: determining a type of the acquired instruction; determining, in response to determining that the acquired instruction is an arithmetic instruction, an executing component for executing the arithmetic instruction from an executing component set; and sending the arithmetic instruction to the determined executing component is executed. Last, in response to determining that the acquired instruction is a blocking instruction, a next instruction is acquired after receiving a signal for instructing an instruction associated with the blocking instruction being completely executed.Type: GrantFiled: July 1, 2019Date of Patent: August 23, 2022Assignee: Kunlunxin Technology (Beijing) Company LimitedInventors: Jing Wang, Wei Qi, Yupeng Li, Xiaozhang Gong
-
Patent number: 11392537Abstract: Exemplary reach-based explicit dataflow processors and related computer-readable media and methods. The reach-based explicit dataflow processors are configured to support execution of producer instructions encoded with explicit naming of consumer instructions intended to consume the values produced by the producer instructions. The reach-based explicit dataflow processors are configured to make available produced values as inputs to explicitly named consumer instructions as a result of processing producer instructions. The reach-based explicit dataflow processors support execution of a producer instruction that explicitly names a consumer instruction based on using the producer instruction as a relative reference point from the producer instruction.Type: GrantFiled: March 18, 2019Date of Patent: July 19, 2022Assignee: Microsoft Technology Licensing, LLCInventors: Gagan Gupta, Michael Scott McIlvaine, Rodney Wayne Smith, Thomas Philip Speier, David Tennyson Harper, III
-
Patent number: 11392387Abstract: Predicting load-based control independent (CI), register data independent (DI) (CIRDI) instructions as CI memory data dependent (DD) (CIMDD) instructions for replay in speculative misprediction recovery in a processor. The processor predicts if a source of a load-based CIRDI instruction will be forwarded by a store-based instruction (i.e. “store-forwarded”). If a load-based CIRDI instruction is predicted as store-forwarded, the load-based CIRDI instruction is considered a CIMDD instruction and is replayed in misprediction recovery. If a load-based CIRDI instruction is not predicted as store-forwarded, the processor considers such load-based CIRDI instruction as a pending load-based CIRDI instruction. If this pending load-based CIRDI instruction is determined in execution to be store-forwarded, the instruction pipeline is flushed and the pending load-based CIRDI instruction is also replayed in misprediction recovery.Type: GrantFiled: November 4, 2020Date of Patent: July 19, 2022Assignee: Microsoft Technology Licensing, LLCInventors: Vignyan Reddy Kothinti Naresh, Arthur Perais, Rami Mohammad Al Sheikh, Shivam Priyadarshi
-
Patent number: 11366691Abstract: A method of scheduling instructions within a parallel processing unit is described. The method comprises decoding, in an instruction decoder, an instruction in a scheduled task in an active state, and checking, by an instruction controller, if an ALU targeted by the decoded instruction is a primary instruction pipeline. If the targeted ALU is a primary instruction pipeline, a list associated with the primary instruction pipeline is checked to determine whether the scheduled task is already included in the list. If the scheduled task is already included in the list, the decoded instruction is sent to the primary instruction pipeline.Type: GrantFiled: December 1, 2020Date of Patent: June 21, 2022Assignee: Imagination Technologies LimitedInventors: Simon Nield, Yoong-Chert Foo, Adam de Grasse, Luca Iuliano
-
Patent number: 11360536Abstract: The vector data path is divided into smaller vector lanes. A register such as a memory mapped control register stores a vector lane number (VLX) indicating the number of vector lanes to be powered. A decoder converts this VLX into a vector lane control word, each bit controlling the ON of OFF state of the corresponding vector lane. This number of contiguous least significant vector lanes are powered. In the preferred embodiment the stored data VLX indicates that 2VLX contiguous least significant vector lanes are to be powered. Thus the number of vector lanes powered is limited to an integral power of 2. This manner of coding produces a very compact controlling bit field while obtaining substantially all the power saving advantage of individually controlling the power of all vector lanes.Type: GrantFiled: August 3, 2020Date of Patent: June 14, 2022Assignee: Texas Instruments IncorporatedInventors: Timothy David Anderson, Duc Quang Bui
-
Patent number: 11327766Abstract: A method of instruction dispatch routing comprises receiving an instruction for dispatch to one of a plurality of issue queues; determining a priority status of the instruction; selecting a rotation order based on the priority status, wherein a first rotation order is associated with priority instructions and a second rotation order, different from the first rotation order, is associated with non-priority instructions; selecting an issue queue of the plurality of issue queues based on the selected rotation order; and dispatching the instruction to the selected issue queue.Type: GrantFiled: July 31, 2020Date of Patent: May 10, 2022Assignee: International Business Machines CorporationInventors: Eric Mark Schwarz, Brian W. Thompto, Kurt A. Feiste, Michael Joseph Genden, Dung Q. Nguyen, Susan E. Eisen
-
Patent number: 11327760Abstract: A method for grouping computer instructions includes receiving a set of computer instructions, grouping the set of computer instructions by register dependencies, identifying a plurality of single-definition-use flow (SDF) bundles based on a burstization criteria and a chaining criteria; and based on the SDF bundles, transforming the set of computer instructions. The transformation may include splitting one of the set of computer instructions and setting a burst parameter for the one of the set of computer instruction. The transformation may include grouping a plurality of the set of computer instructions and replacing a pair of register file accesses with a pair of temporary register accesses.Type: GrantFiled: April 9, 2020Date of Patent: May 10, 2022Assignee: HUAWEI TECHNOLOGIES CO., LTD.Inventors: Andrew Siu Doug Lee, Ahmed Mohammed Elshafiey Mohammed Eltantawy
-
Patent number: 11327791Abstract: An apparatus provides an issue queue having a first section and a second section. Each entry in each section stores operation information identifying an operation to be performed. Allocation circuitry allocates each item of received operation information to an entry in the first section or the second section. Selection circuitry selects from the issue queue, during a given selection iteration, an operation from amongst the operations whose required source operands are available. Availability update circuitry updates source operand availability for each entry whose operation information identifies as a source operand a destination operand of the selected operation in the given selection iteration. A deferral mechanism inhibits from selection, during a next selection iteration, any operation associated with an entry in the second section whose source operands are now available due to that operation having as a source operand the destination operand of the selected operation in the given selection iteration.Type: GrantFiled: August 21, 2019Date of Patent: May 10, 2022Assignee: Arm LimitedInventors: Michael David Achenbach, Robert Greg McDonald, Nicholas Andrew Pfister, Kelvin Domnic Goveas, Michael Filippo, . Abhishek Raja, Zachary Allen Kingsbury
-
Patent number: 11321019Abstract: An event-processing unit for processing tokens associated with a state or state transition, herein also referred to as an event, of an external device is disclosed. The EPU allows token-processing schemes, in which the processing of incoming tokens and the further handling of a processing result by the EPU are determined not only by the token identifier, but also by the payload data of the incoming token or by data in the data memory. A flag-processing capability of a processing-control stage allows applying flag-processing operations such as logical operations to data obtained as a processing result of an ALU-processing operation. The result of these operations determines a subsequent handling of ALU-result data by the EPU. Thus, whether or not the ALU-result data is written to the data memory also influences the processing of any subsequent incoming tokens for which that data is used in the ALU-processing operation.Type: GrantFiled: September 11, 2020Date of Patent: May 3, 2022Assignee: ACCEMIC TECHNOLOGIES GMBHInventor: Alexander Weiss
-
Patent number: 11301252Abstract: A data processing apparatus is provided comprising: a plurality of input lanes and a plurality of corresponding output lanes. Processing circuitry executes a first vector instruction and a second vector instruction. The first vector instruction specifies a target of output data from the corresponding output lanes that is specified as a source of input data to the input lanes by the second vector instruction. Mask circuitry stores a first mask that defines a first set of the output lanes that are valid for the first vector instruction, and stores a second mask that defines a second set of the output lanes that are valid for the second vector instruction. The first set and the second set are mutually exclusive. Issue circuitry begins processing of the second vector instruction at a lane index prior to completion of the first vector instruction at the lane index.Type: GrantFiled: January 15, 2020Date of Patent: April 12, 2022Assignee: Arm LimitedInventor: Kim Richard Schuttenberg
-
Patent number: 11275644Abstract: Techniques facilitating voltage droop reduction and/or mitigation in a processor core are provided. In one example, a system can comprise a memory that stores, and a processor that executes, computer executable components. The computer executable components can comprise an observation component that detects one or more events at a first stage of a processor pipeline. An event of the one or more events can be a defined event determined to increase a level of power consumed during a second stage of the processor pipeline. The computer executable components can also comprise an instruction component that applies a voltage droop mitigation countermeasure prior to the increase of the level of power consumed during the second stage of the processor pipeline and a feedback component that provides a notification to the instruction component that indicates a success or a failure of a result of the voltage droop mitigation countermeasure.Type: GrantFiled: December 6, 2019Date of Patent: March 15, 2022Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Giora Biran, Pradip Bose, Alper Buyuktosunoglu, Pierce I-Jen Chuang, Preetham M. Lobo, Ramon Bertran Monfort, Phillip John Restle, Christos Vezyrtzis, Tobias Webel
-
Patent number: 11269646Abstract: Apparatuses and methods for instruction scheduling in an out-of-order decoupled access-execute processor are disclosed. The instructions for the decoupled access-execute processor comprises access instructions and execute instructions, where access instructions comprise load instructions and instructions which provide operand values to load instructions. Schedule patterns of groups of linked execute instructions are monitored, where the execute instructions in a group of linked execute instructions are linked by data dependencies. On the basis of an identified repeating schedule pattern configurable execution circuitry adopts a configuration to perform the operations defined by the group of linked execute instructions of the repeating schedule pattern.Type: GrantFiled: March 29, 2021Date of Patent: March 8, 2022Assignee: Arm LimitedInventors: Mbou Eyole, Michiel Willem Van Tol
-
Patent number: 11212590Abstract: Approaches for performing all DOCSIS downstream and upstream data forwarding functions using executable software. DOCSIS data forwarding functions may be performed by classifying one or more packets, of a plurality of received packets, to a particular DOCSIS system component, and then processing the one or more packets classified to the same DOCSIS system component on a single CPU core. The one or more packets may be forwarded between a sequence of one or more software stages. The software stages may each be configured to execute on separate logical cores or on a single logical core.Type: GrantFiled: July 10, 2017Date of Patent: December 28, 2021Assignee: Harmonic, Inc.Inventors: Adam Levy, Pavlo Shcherbyna, Alex Muller, Vladyslav Buslov, Victoria Sinitsky, Michael W. Patrick, Nitin Sasi Kumar
-
Patent number: 11188341Abstract: In one embodiment, an apparatus includes: a plurality of execution lanes to perform parallel execution of instructions; and a unified symbolic store address buffer coupled to the plurality of execution lanes, the unified symbolic store address buffer comprising a plurality of entries each to store a symbolic store address for a store instruction to be executed by at least some of the plurality of execution lanes. Other embodiments are described and claimed.Type: GrantFiled: March 26, 2019Date of Patent: November 30, 2021Assignee: Intel CorporationInventors: Jeffrey J. Cook, Srikanth T. Srinivasan, Jonathan D. Pearce, David B. Sheffield
-
Patent number: 11188681Abstract: An approach is provided in which an information handling system loads a set of encrypted binary code into a processor that has been encrypted based upon a unique key of the processor. The processor includes an instruction decoder that transforms the set of encrypted binary code into a set of instruction control signals using the unique key. In turn, the processor executes a set of instructions based on the set of instruction control signals.Type: GrantFiled: April 8, 2019Date of Patent: November 30, 2021Assignee: International Business Machines CorporationInventors: Guy M. Cohen, Shai Halevi, Lior Horesh
-
Patent number: 11182167Abstract: A method to determine an oldest instruction in an instruction queue of a processor with multiple instruction threads, wherein each of the multiple instruction threads have a unique thread identifier. The method includes tagging each instruction thread, of the multiple instruction threads, in the instruction queue with a unique tag number according to a round-robin scheme, wherein the unique tag number includes the unique thread identifier for each instruction thread and a round number in the round-robin scheme. The method further includes selecting, for each instruction thread, of the multiple instruction threads, the instruction thread with a lowest tag number from the multiple instruction threads in the instruction queue that are tagged with an oldest round number from the round-robin scheme.Type: GrantFiled: March 15, 2019Date of Patent: November 23, 2021Assignee: International Business Machines CorporationInventors: Arni Ingimundarson, Maarten J. Boersma, Niels Fricke
-
Patent number: 11175916Abstract: A system and method for a lightweight fence is described. In particular, micro-operations including a fencing micro-operation are dispatched to a load queue. The fencing micro-operation allows micro-operations younger than the fencing micro-operation to execute, where the micro-operations are related to a type of fencing micro-operation. The fencing micro-operation is executed if the fencing micro-operation is the oldest memory access micro-operation, where the oldest memory access micro-operation is related to the type of fencing micro-operation. The fencing micro-operation determines whether micro-operations younger than the fencing micro-operation have load ordering violations and if load ordering violations are detected, the fencing micro-operation signals the retire queue that instructions younger than the fencing micro-operation should be flushed. The instructions to be flushed should include all micro-operations with load ordering violations.Type: GrantFiled: December 19, 2017Date of Patent: November 16, 2021Assignee: Advanced Micro Devices, Inc.Inventors: Gregory W. Smaus, John M. King
-
Patent number: 11163576Abstract: A system and method for efficiently preventing visible side-effects in the memory hierarchy during speculative execution is disclosed. Hiding the side-effects of executed instructions in the whole memory hierarchy is both expensive, in terms of performance and energy, and complicated. A system and method is disclosed to hide the side-effects of speculative loads in the cache(s) until the earliest time these speculative loads become non-speculative. A refinement is disclosed where loads that hit in the L1 cache are allowed to proceed by keeping their side-effects on the L1 cache hidden until these loads become non-speculative, and all other speculative loads that miss in the cache(s) are prevented from executing until they become non-speculative. To limit the performance deterioration caused by these delayed loads, a system and method is disclosed that augments the cache(s) with a value predictor or a re-computation engine that supplies predicted or recomputed values to the loads that missed in the cache(s).Type: GrantFiled: March 20, 2020Date of Patent: November 2, 2021Assignee: ETA SCALE ABInventors: Christos Sakalis, Stefanos Kaxiras, Alberto Ros, Alexandra Jimborean, Magnus Själander
-
Patent number: 11150961Abstract: Methods, systems and apparatuses for graph processing are disclosed. One graph streaming processor includes a thread manager, wherein the thread manager is operative to dispatch operation of the plurality of threads of a plurality of thread processors before dependencies of the dependent threads have been resolved, maintain a scorecard of operation of the plurality of threads of the plurality of thread processors, and provide an indication to at least one of the plurality of thread processors when a dependency between the at least one of the plurality of threads that a request has or has not been satisfied. Further, a producer thread provides a response to the dependency when the dependency has been satisfied, and each of the plurality of thread processors is operative to provide processing updates to the thread manager, and provide queries to the thread manager upon reaching a dependency.Type: GrantFiled: February 8, 2019Date of Patent: October 19, 2021Assignee: Blaize, Inc.Inventors: Lokesh Agarwal, Sarvendra Govindammagari, Venkata Ganapathi Puppala, Satyaki Koneru
-
Patent number: 11144317Abstract: An AC parallelization circuit includes a transmitting circuit configured to transmit a stop signal to instruct a device for executing calculation in an iteration immediately preceding an iteration for which a concerned device is responsible to stop the calculation in loop-carried dependency calculation; and an estimating circuit configured to generate, as a result of executing the calculation in the preceding iteration, an estimated value to be provided to an arithmetic circuit when the transmitting circuit transmits the stop signal.Type: GrantFiled: August 20, 2020Date of Patent: October 12, 2021Assignee: FUJITSU LIMITEDInventor: Hisanao Akima
-
Patent number: 11119774Abstract: A system and/or method for processing information is disclosed that has at least one processor; a register file associated with the processor, the register file sliced into a plurality of STF blocks having a plurality of STF entries, and in an embodiment, each STF block is further partitioned into a plurality of sub-blocks, each sub-block having a different portion of the plurality of STF entries; and a plurality of execution units configured to read data from and write data to the register file, where the plurality of execution units are arranged in one or more execution slices. In one or more embodiments, the system is configured so that each execution slice has a plurality of STF blocks, and alternatively or additionally, each of the plurality of execution units in a single execution slice is assigned to write to one, and preferably only one, of the plurality of STF blocks.Type: GrantFiled: September 6, 2019Date of Patent: September 14, 2021Assignee: International Business Machines CorporationInventors: Brian W. Thompto, Dung Q. Nguyen, Hung Q. Le, Sam Gat-Shang Chu
-
Patent number: 11115964Abstract: A system and method of auto-detection of WLAN packets includes transmitting in a 60 GHz frequency band a wireless packet comprising a first header, a second header, a payload, and a training field, the first header carrying a plurality of bits, a logical value of a subset of the plurality of bits in the first header indicating the presence of the second header in the wireless packet.Type: GrantFiled: February 12, 2016Date of Patent: September 7, 2021Assignee: Huawei Technologies Co., Ltd.Inventors: Yan Xin, Osama Aboul-Magd, Jung Hoon Suh
-
Patent number: 11112846Abstract: Embodiments of the present disclosure relate to detecting undervoltage conditions at a subcircuit. A power supply current of a first subcircuit is determined over a first number of previous clock cycles. A cross current flowing between the first subcircuit and a second subcircuit is determined over the first number of previous clock cycles. An estimated momentary supply voltage present at the first subcircuit is then determined based on the power supply current of the first subcircuit over the first number of previous clock cycles and the cross current flowing between the first subcircuit and the second subcircuit over the first number of previous clock cycles.Type: GrantFiled: December 19, 2018Date of Patent: September 7, 2021Assignee: International Business Machines CorporationInventors: Thomas Strach, Preetham M. Lobo, Tobias Webel
-
Patent number: 11100390Abstract: A deep neural network (DNN) processor is configured to execute layer descriptors in layer descriptor lists. The descriptors define instructions for performing a forward pass of a DNN by the DNN processor. The layer descriptors can also be utilized to manage the flow of descriptors through the DNN module. For example, layer descriptors can define dependencies upon other descriptors. Descriptors defining a dependency will not execute until the descriptors upon which they are dependent have completed. Layer descriptors can also define a “fence,” or barrier, function that can be used to prevent the processing of upstream layer descriptors until the processing of all downstream layer descriptors is complete. The fence bit guarantees that there are no other layer descriptors in the DNN processing pipeline before the layer descriptor that has the fence to be asserted is processed.Type: GrantFiled: April 11, 2018Date of Patent: August 24, 2021Assignee: Microsoft Technology Licensing, LLCInventors: Chad Balling McBride, Amol Ashok Ambardekar, Kent D. Cedola, George Petre, Larry Marvin Wall, Boris Bobrov
-
Patent number: 11093245Abstract: A computer system and a memory access technology are provided. In the computer system, when load/store instructions having a dependency relationship is processed, dependency information between a producer load/store instruction and a consumer load/store instruction can be obtained from a processor. A consumer load/store request is sent to a memory controller in the computer system based on the obtained dependency information, so that the memory controller can terminate a dependency relationship between load/store requests in the memory controller locally based on the dependency information in the received consumer load/store request, and execute the consumer load/store request.Type: GrantFiled: June 12, 2019Date of Patent: August 17, 2021Assignee: Huawei Technologies Co., Ltd.Inventors: Lei Fang, Xi Chen, Weiguang Cai
-
Patent number: 11086628Abstract: A system and method for load queue (LDQ) and store queue (STQ) entry allocations at address generation time that maintains age-order of instructions is described. In particular, writing LDQ and STQ entries are delayed until address generation time. This allows the load and store operations to dispatch, and younger operations (which may not be store and load operations) to also dispatch and execute their instructions. The address generation of the load or store operation is held at an address generation scheduler queue (AGSQ) until a load or store queue entry is available for the operation. The tracking of load queue entries or store queue entries is effectively being done in the AGSQ instead of at the decode engine. The LDQ and STQ depth is not visible from a decode engine's perspective, and increases the effective processing and queue depth.Type: GrantFiled: August 15, 2016Date of Patent: August 10, 2021Assignee: Advanced Micro Devices, Inc.Inventor: John M. King
-
Patent number: 11080055Abstract: Techniques are disclosed relating to arbitration among register file accesses. In some embodiments, an apparatus includes a register file configured to store operands for multiple client circuits and arbitration circuitry configured to select from among multiple received requests to access the register file. In some embodiments, the apparatus includes first interface circuitry configured to provide access requests from a first client circuit to the arbitration circuitry and supplemental interface circuitry configured to receive unsuccessful requests from the first client circuit and provide the received unsuccessful requests to the arbitration circuitry. The supplemental interface circuitry may provide additional catch-up bandwidth to clients that lose arbitration, which may result in fairness during bandwidth shortages.Type: GrantFiled: August 22, 2019Date of Patent: August 3, 2021Assignee: Apple Inc.Inventors: Robert D. Kenney, Terence M. Potter
-
Patent number: 11074079Abstract: A method of providing instructions to computer processing apparatus for improved event handling comprises the following. Instructions for execution on the computer processing apparatus are provided to an event processor generator. These instructions comprise a plurality of functional steps, a set of dependencies between the functional steps, and configuration data. The event processor generator creates instances of the functional steps from the instructions and represents the instances as directed acyclic graphs. The event processor generator identifies a plurality of event types and topologically sort is the directed acyclic graphs to determine a topologically ordered event path for each event type. The event processor generator then provides a revised set of instructions for execution on the computer processing apparatus in which original instructions have been replaced by instructions requiring each event type to be executed according to its topologically ordered event path.Type: GrantFiled: November 17, 2017Date of Patent: July 27, 2021Inventor: Greg Higgins
-
Patent number: 11036514Abstract: A method and apparatus for performing an indexed data dependency instruction wakeup is disclosed. A scheduler may issue one or more instruction operations from a number of entries therein, including a first instruction operation. In a second entry, a comparison operation may be performed between a dependency index and an index of the first instruction operation. A match between the index of the first instruction and the dependency index in the second entry indicates a dependency of the corresponding instruction on the first instruction, and further indicates that the first instruction operation has issued. The dependency may be determined based solely on the match between the dependency index and the index of the first instruction. Responsive to determining that the first instruction operation has issued in the second entry, an indication that a corresponding second instruction operation is ready to issue may be provided.Type: GrantFiled: August 23, 2016Date of Patent: June 15, 2021Assignee: Apple Inc.Inventors: Sean M. Reynolds, Gokul V. Ganesan
-
Patent number: 11023243Abstract: Latency-based instruction reservation clustering in a scheduler circuit in a processor is disclosed. The scheduler circuit includes a plurality of latency-based reservation circuits each having an assigned producer instruction cycle latency. Producer instructions with the same cycle latency can be clustered in the same latency-based reservation circuit. Thus, the number of reservation entries is distributed among the plurality of latency-based reservation circuits to avoid or reduce an increase in the number of scheduling path connections and complexity in each reservation circuit to avoid or reduce an increase in scheduling latency. The scheduling path connections are reduced for a given number of reservation entries over a non-clustered pick circuit, because signals (e.g., wake-up signals, pick-up signals) used for scheduling instructions in each latency-based reservation circuit do not have to have the same clock cycle latency so as to not impact performance.Type: GrantFiled: July 22, 2019Date of Patent: June 1, 2021Assignee: Microsoft Technology Licensing, LLCInventors: Yusuf Cagatay Tekmen, Shivam Priyadarshi, Rodney Wayne Smith
-
Patent number: 10996954Abstract: By including a storing device that stores a plurality of memory access instructions decoded by a decoder and outputs the memory access instruction stored therein to a cache memory, a determiner that determines whether the storing device is afford to store the plurality of memory access instructions; and an inhibitor that inhibits, when the determiner determines that the storing device is not afford to store a first memory access instruction included in the plurality of memory access instructions, execution of a second memory access instruction being included in the plurality of memory access instructions and being subsequent to the first memory access instruction for a predetermined time period, regardless of a result of determination made on the second memory access instruction by the determiner, the calculation processing apparatus inhibits a switch of the order of a store instruction and a load instruction.Type: GrantFiled: October 7, 2019Date of Patent: May 4, 2021Assignee: FUJITSU LIMITEDInventors: Sota Sakashita, Yasunobu Akizuki
-
Patent number: 10996994Abstract: A plurality of ordered lists of dispatch queues corresponding to a plurality of processing entities are maintained, wherein each dispatch queue includes one or more task control blocks or is empty. A determination is made as to whether a primary dispatch queue of a processing entity is empty in an ordered list of dispatch queues for the processing entity. In response to determining that the primary dispatch queue of the processing entity is empty, a task control block is selected for processing by the processing entity from another dispatch queue of the ordered list of dispatch queues for the processing entity, wherein the another dispatch queue from which the task control block is selected meets a threshold criteria for the processing entity.Type: GrantFiled: February 22, 2019Date of Patent: May 4, 2021Assignee: International Business Machines CorporationInventors: Seamus J. Burke, Trung N. Nguyen, Louis A. Rasor
-
Patent number: 10990406Abstract: An instruction execution device includes a processor. The processor includes an instruction translator, a reorder buffer, an architecture register, and an execution unit. The instruction translator receives a macro-instruction and translates the macro-instruction into a first micro-instruction, a second micro-instruction and a third micro-instruction. The instruction translator marks the first micro-instruction and the second micro-instruction with the same atomic operation flag. The execution unit executes the first micro-instruction to generate a first execution result and to store the first execution result in a temporary register. The execution unit executes the second micro-instruction to generate a second execution result and to store the second execution result in the architecture register. The execution unit executes the third micro-instruction to read the first execution result from the temporary register and to store the first execution result in the architecture register.Type: GrantFiled: September 26, 2019Date of Patent: April 27, 2021Assignee: SHANGHAI ZHAOXIN SEMICONDUCTOR CO., LTD.Inventors: Penghao Zou, Zhi Zhang
-
Patent number: 10983800Abstract: A processor core having multiple parallel instruction execution slices and coupled to multiple dispatch queues provides flexible and efficient use of internal resources. The configuration of the execution slices is selectable so that capabilities of the processor core can be adjusted according to execution requirements for the instruction streams. A plurality of load-store slices coupled to the execution slices provides access to a plurality of cache slices that partition the lowest level of cache memory among the load-store slices.Type: GrantFiled: June 6, 2018Date of Patent: April 20, 2021Assignee: International Business Machines CorporationInventors: Lee Evan Eisen, Hung Qui Le, Jentje Leenstra, Jose Eduardo Moreira, Bruce Joseph Ronchetti, Brian William Thompto, Albert James Van Norstrand, Jr.
-
Patent number: 10977762Abstract: One embodiment provides for a general-purpose graphics processing unit multiple processing elements having a single instruction, multiple thread (SIMT) architecture configured to perform hardware multithreading during execution of a plurality of thread groups. The plurality of thread groups can include one or more sub-groups of threads, with a first sub-group is associated with a first thread group and a second sub-group associated with a second thread group. Data dependencies can be used to trigger the launch of threads, such that when a first thread in the second sub-group has a data dependency upon a first thread in the first sub-group, circuitry in the general-purpose graphics processing unit can launch at least the first thread in the second sub-group to execute in response to satisfaction of the data dependency.Type: GrantFiled: March 30, 2020Date of Patent: April 13, 2021Assignee: Intel CorporationInventors: Balaji Vembu, Altug Koker, Joydeep Ray
-
Patent number: 10936402Abstract: Aspects include copying a plurality of input data into a buffer of a processor configured to perform speculatively executing pipelined streaming of the input data. A bit counter maintains a difference in a number of input bits from the input data entering a pipeline of the processor and a number of the input bits consumed in the pipeline. The pipeline is flushed based on detecting an error. A portion of the input data is recirculated from the buffer into the pipeline based on a value of the bit counter.Type: GrantFiled: November 26, 2018Date of Patent: March 2, 2021Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Bulent Abali, Bartholomew Blaner, John J. Reilly
-
Patent number: 10915328Abstract: An apparatus and method for offloading iterative, parallel work to a data parallel cluster. For example, one embodiment of a processor comprises: a host processor to execute a primary thread; a data parallel cluster coupled to the host processor over a high speed interconnect, the data parallel cluster comprising a plurality of execution lanes to perform parallel execution of one or more secondary threads related to the primary thread; and a data parallel cluster controller integral to the host processor to offload processing of the one or more secondary threads to the data parallel cluster in response to one of the cores executing a parallel processing call instruction from the primary thread.Type: GrantFiled: December 14, 2018Date of Patent: February 9, 2021Assignee: Intel CorporationInventors: Jonathan Pearce, David Sheffield, Srikanth Srinivasan, Jeffrey Cook, Deborah Marr
-
Patent number: 10915317Abstract: The present disclosure relates to a computing device with a multiple pipeline architecture. The multiple pipeline architecture comprises a first and second pipeline for which are concurrently running, where the first pipeline runs at least one cycle ahead of the second pipeline. Special number detection is utilized on the first pipeline, where a special number is a numerical value which yields a predictable result. Upon the detection of a special number, a computation is optimized.Type: GrantFiled: December 10, 2018Date of Patent: February 9, 2021Assignee: ALIBABA GROUP HOLDING LIMITEDInventors: Liang Han, Xiaowei Jiang
-
Patent number: 10908880Abstract: An integrated circuit for processing audio signals from a microphone assembly, combinations thereof and methods therefor, including a multi-issue processor configured to execute multiple instructions concurrently and connectable to a memory with a plurality of locations each represented by a corresponding index. Bit-reversal is performed on a sequence of audio data bits stored in memory by concurrently performing a load or store operation related to a first index and determining whether to perform a load operation for a second index.Type: GrantFiled: October 18, 2019Date of Patent: February 2, 2021Assignee: Knowles Electronics, LLCInventor: Leonardo Rub
-
Patent number: 10884753Abstract: Aspects include monitoring a number of instructions of a first type dispatched to a first shared port of an issue queue of a processor and determining whether the number of instructions of the first type dispatched to the first shared port exceeds a port selection threshold. An instruction of a third type is dispatched to a second shared port of the issue queue associated with a plurality of instructions of a second type based on determining that the number of instructions of the first type dispatched to the first shared port exceeds the port selection threshold. The instruction of the third type is dispatched to the first shared port of the issue queue associated with a plurality of instructions of the first type based on determining that the number of instructions of the first type dispatched to the first shared port does not exceed the port selection threshold.Type: GrantFiled: November 30, 2017Date of Patent: January 5, 2021Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Balaram Sinharoy, Joel A. Silberman, Brian W. Thompto