Patents Examined by Michael Sun
-
Patent number: 11016763Abstract: Systems, apparatuses, and methods for compacting multiple groups of micro-operations into individual cache lines of a micro-operation cache are disclosed. A processor includes at least a decode unit and a micro-operation cache. When a new group of micro-operations is decoded and ready to be written to the micro-operation cache, the micro-operation cache determines which set is targeted by the new group of micro-operations. If there is a way in this set that can store the new group without evicting any existing group already stored in the way, then the new group is stored into the way with the existing group(s) of micro-operations. Metadata is then updated to indicate that the new group of micro-operations has been written to the way. Additionally, the micro-operation cache manages eviction and replacement policy at the granularity of micro-operation groups rather than at the granularity of cache lines.Type: GrantFiled: March 8, 2019Date of Patent: May 25, 2021Assignee: Advanced Micro Devices, Inc.Inventors: Jagadish B. Kotra, John Kalamatianos
-
Patent number: 10990393Abstract: Address-based filtering for load/store speculation includes maintaining a filtering table including table entries associated with ranges of addresses; in response to receiving an ordering check triggering transaction, querying the filtering table using a target address of the ordering check triggering transaction to determine if an instruction dependent upon the ordering check triggering transaction has previously been generated a physical address; and in response to determining that the filtering table lacks an indication that the instruction dependent upon the ordering check triggering transaction has previously been generated a physical address, bypassing a lookup operation in an ordering violation memory structure to determine whether the instruction dependent upon the ordering check triggering transaction is currently in-flight.Type: GrantFiled: October 21, 2019Date of Patent: April 27, 2021Assignee: ADVANCED MICRO DEVICES, INC.Inventors: John Kalamatianos, Krishnan V. Ramani, Susumu Mashimo
-
Patent number: 10990406Abstract: An instruction execution device includes a processor. The processor includes an instruction translator, a reorder buffer, an architecture register, and an execution unit. The instruction translator receives a macro-instruction and translates the macro-instruction into a first micro-instruction, a second micro-instruction and a third micro-instruction. The instruction translator marks the first micro-instruction and the second micro-instruction with the same atomic operation flag. The execution unit executes the first micro-instruction to generate a first execution result and to store the first execution result in a temporary register. The execution unit executes the second micro-instruction to generate a second execution result and to store the second execution result in the architecture register. The execution unit executes the third micro-instruction to read the first execution result from the temporary register and to store the first execution result in the architecture register.Type: GrantFiled: September 26, 2019Date of Patent: April 27, 2021Assignee: SHANGHAI ZHAOXIN SEMICONDUCTOR CO., LTD.Inventors: Penghao Zou, Zhi Zhang
-
Patent number: 10983794Abstract: An processor to facilitate register sharing is disclosed. The processor includes a plurality of execution units (EUs), each including a General Purpose Register File (GRF) having a plurality of registers; and register sharing hardware to divide the plurality of registers into a first set of registers dedicated for execution of a first set of threads and a second set of registers shared for execution of a second set of threads.Type: GrantFiled: June 17, 2019Date of Patent: April 20, 2021Assignee: Intel CorporationInventors: Guei-Yuan Lueh, Subramaniam Maiyuran, Weiyu Chen, Konrad Trifunovic, Supratim Pal, Chandra S. Gurram, Jorge E. Parra, Pratik J. Ashar, Tomasz Bujewski
-
Patent number: 10983793Abstract: The present disclosure is directed to systems and methods of performing one or more broadcast or reduction operations using direct memory access (DMA) control circuitry. The DMA control circuitry executes a modified instruction set architecture (ISA) that facilitates the broadcast distribution of data to a plurality of destination addresses in system memory circuitry. The broadcast instruction may include broadcast of a single data value to each destination address. The broadcast instruction may include broadcast of a data array to each destination address. The DMA control circuitry may also execute a reduction instruction that facilitates the retrieval of data from a plurality of source addresses in system memory and performing one or more operations using the retrieved data. Since the DMA control circuitry, rather than the processor circuitry performs the broadcast and reduction operations, system speed and efficiency is beneficially enhanced.Type: GrantFiled: March 29, 2019Date of Patent: April 20, 2021Assignee: Intel CorporationInventors: Joshua Fryman, Ankit More, Jason Howard, Robert Pawlowski, Yigit Demir, Nick Pepperling, Fabrizio Petrini, Sriram Aananthakrishnan, Shaden Smith
-
Patent number: 10977038Abstract: A processing apparatus supporting register renaming is provided with checkpoint circuitry to capture register mapping checkpoints indicative of speculative register mappings between logical registers and physical registers at a given point of speculative execution, and register group tracking circuitry to maintain tracking information for groups of logical registers. The tracking information for a given group indicates whether the given group is a changed group comprising at least one logical register for which a corresponding speculative register mapping has changed since a last checkpoint was captured, or an unchanged group for which none of the logical registers in that group have had their speculative register mappings changed since the last checkpoint was captured. When capturing a new register mapping checkpoint, unchanged groups of logical registers are excluded from the new register mapping checkpoint. This can save power in a register mapping checkpointing scheme.Type: GrantFiled: June 19, 2019Date of Patent: April 13, 2021Assignee: Arm LimitedInventor: William Elton Burky
-
Patent number: 10970070Abstract: An apparatus has processing circuitry to perform, in response to decoding of an iterative-operation instruction by the instruction decoder, an iterative operation comprising at least two iterations of processing where one iteration depends on an operand generated in a previous iteration. Preliminary information generating circuitry performs a preliminary portion of processing for a given iteration to generate preliminary information. Result generating circuitry performs a remaining portion of processing for the given iteration, to generate a result value using the preliminary information. Forwarding circuitry forwards the result value as an operand for a next iteration of the iterative operation, for iterations other than the final iteration. The preliminary information generating circuitry starts performing the preliminary portion for the next iteration in parallel with the result generating circuitry completing the remaining portion for the current iteration, to improve performance.Type: GrantFiled: March 29, 2019Date of Patent: April 6, 2021Assignee: Arm LimitedInventors: Nicholas Andrew Pfister, Srinivas Vemuri, David Raymond Lutz
-
Patent number: 10970108Abstract: The present invention discloses a method and an apparatus for executing a non-maskable interrupt. The method includes: obtaining a secure interrupt request in a non-secure mode, and interrupting an operation of an operating system OS, where the secure interrupt request cannot be masked; entering a secure mode by using the secure interrupt request, and saving, in the secure mode, an interrupt context of an OS status when the operation of the OS is interrupted; returning to the non-secure mode to execute user-defined processing; after the user-defined processing is completed, entering the secure mode again, and resuming the OS status in the secure mode according to the interrupt context; and returning to the non-secure mode again, and continuing to execute an operation of the OS. The method and the apparatus for executing a non-maskable interrupt in embodiments of the present invention can easily implement an NMI mechanism without depending on hardware.Type: GrantFiled: October 3, 2019Date of Patent: April 6, 2021Assignee: Huawei Technologies Co., Ltd.Inventors: Jun Ma, Tianhong Ding, Zhaozhe Tong
-
Patent number: 10963404Abstract: A DIMM is described. The DIMM includes circuitry to simultaneously transfer data of different ranks of memory chips on the DIMM over a same data bus during a same burst write sequence.Type: GrantFiled: June 25, 2018Date of Patent: March 30, 2021Assignee: Intel CorporationInventors: James A. McCall, Rajat Agarwal, George Vergis, Bill Nale
-
Patent number: 10956343Abstract: Systems and methods are disclosed and include a processor configured to execute instructions stored in a nontransitory computer-readable medium. The instructions include generating first message authentication code (MAC) bytes based on a shared secret key. The instructions include generating first nonce bytes and an authenticated packet based on the first MAC bytes, the first nonce bytes, and a message byte. The instructions include generating a de-whitened tone byte based on the shared secret key. The instructions include generating a message packet that includes the authenticated packet and the de-whitened tone byte. Generating the message packet includes pseudo-randomly identifying a first location of the authenticated packet and inserting the de-whitened tone byte at the first location.Type: GrantFiled: October 7, 2019Date of Patent: March 23, 2021Assignees: DENSO International America, Inc., DENSO CORPORATIONInventors: Raymond Michael Stitt, Thomas Peterson, Karl Jager, Kyle Golsch
-
Patent number: 10956168Abstract: A computer data processing system includes an instruction pipeline having a front end and a back end, a decoding and dispatch unit to dispatch a current instruction; and a pipeline by-pass unit to invoke an out-of-order pipeline by-pass operation. The pipeline by-pass unit by-passes a section of the instruction pipeline such that the current instruction architecturally completes before initiating instruction execution. The computer data processing system further includes a post-completion execution unit that executes the current instruction after the current instruction architecturally completes.Type: GrantFiled: March 8, 2019Date of Patent: March 23, 2021Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Avery Francois, Christian Jacobi, Gregory William Alexander
-
Patent number: 10956160Abstract: A processor and method are described for a multi-level reservation station.Type: GrantFiled: March 27, 2019Date of Patent: March 23, 2021Assignee: Intel CorporationInventors: Mark Dechene, Srikanth Srinivasan, Matthew Merten, Ammon Christiansen
-
Patent number: 10956166Abstract: A data processing apparatus includes obtain circuitry that obtains a stream of instructions. The stream of instructions includes a barrier creation instruction and a barrier inhibition instruction. Track circuitry orders sending each instruction in the stream of instructions to processing circuitry based on one or more dependencies. The track circuitry is responsive to the barrier creation instruction to cause the one or more dependencies to include one or more barrier dependencies in which pre-barrier instructions, occurring before the barrier creation instruction in the stream, are sent before post-barrier instructions, occurring after the barrier creation instruction in the stream, are sent. The track circuitry is also responsive to the barrier inhibition instruction to relax the barrier dependencies to permit post-inhibition instructions, occurring after the barrier inhibition instruction in the stream, to be sent before the pre-barrier instructions.Type: GrantFiled: March 8, 2019Date of Patent: March 23, 2021Assignees: Arm Limited, The Regents of The University of MichiganInventors: Vaibhav Gogte, Wei Wang, Stephan Diestelhorst, Peter M Chen, Satish Narayanasamy, Thomas Friedrich Wenisch
-
Patent number: 10949210Abstract: A computing device, having: a processor; memory; a first cache coupled between the memory and the processor; and a second cache coupled between the memory and the processor. During speculative execution of one or more instructions, effects of the speculative execution are contained within the second cache.Type: GrantFiled: July 6, 2018Date of Patent: March 16, 2021Assignee: Micron Technology, Inc.Inventor: Steven Jeffrey Wallach
-
Patent number: 10949362Abstract: Technologies for facilitating remote memory requests in accelerator devices are disclosed. The accelerator device includes circuitry to receive, from a kernel of the present accelerator device, a request through an application programming interface exposed to a high level software language in which the kernel of the present accelerator device is implemented, to establish a logical communication path between the kernel of the present accelerator device and a target accelerator device kernel, based on one or more physical communication paths. The communication protocol supported by the accelerator device may allow kernels operating on the accelerator device to send memory requests for memory locations at remote devices, with the communication protocol performing all of the operations necessary to carry out the memory request.Type: GrantFiled: June 28, 2019Date of Patent: March 16, 2021Assignee: Intel CorporationInventors: Susanne M. Balle, Evan Custodio, Paul H. Dormitzer, Narayan Ranganathan
-
Patent number: 10942738Abstract: The present disclosure is directed to systems and methods for performing one or more operations on a two dimensional tile register using an accelerator that includes a tiled matrix multiplication unit (TMU). The processor circuitry includes reservation station (RS) circuitry to communicatively couple the processor circuitry to the TMU. The RS circuitry coordinates the operations performed by the TMU. TMU dispatch queue (TDQ) circuitry in the TMU maintains the operations received from the RS circuitry in the order that the operations are received from the RS circuitry. Since the duration of each operation is not known prior to execution by the TMU, the RS circuitry maintains shadow dispatch queue (RS-TDQ) circuitry that mirrors the operations in the TDQ circuitry.Type: GrantFiled: March 29, 2019Date of Patent: March 9, 2021Assignee: Intel CorporationInventors: Zeev Sperber, Amit Gradstein, Simon Rubanovich, Igor Yanover, Gavri Berger, Eyal Hadas, Saeed Kharouf, Ron Schneider, Sagi Meller, Jose Yallouz
-
Patent number: 10922081Abstract: Establishing a conditional branch frame barrier is described. A conditional branch in a function epilogue is used to provide frame-specific control. The conditional branch evaluates a return condition to determine whether to return from a callee function to a calling function, or to execute a slow path instead. The return condition is evaluated based on a thread local value. The thread local value is set such that returns to potentially unsafe frames in a call stack are prohibited. The prohibition to return to a potentially unsafe frame may be referred to as a “frame barrier.” Additionally, the thread local value may be used to establish safepointing and/or thread local handshakes, both after execution of a function body and after execution of a loop body.Type: GrantFiled: June 19, 2019Date of Patent: February 16, 2021Assignee: Oracle International CorporationInventor: Erik Österlund
-
Patent number: 10922078Abstract: A system includes a host processor and at least one storage device coupled to the host processor. The host processor is configured to execute instructions of an instruction set, the instruction set comprising a first move instruction for moving data identified by at least one operand of the first move instruction into each of multiple distinct storage locations. The host processor, in executing the first move instruction, is configured to store the data in a first one of the storage locations identified by one or more additional operands of the first move instruction, and to store the data in a second one of the storage locations identified based at least in part on the first storage location. The instruction set in some embodiments further comprises a second move instruction for moving the data from the multiple distinct storage locations to another storage location.Type: GrantFiled: June 18, 2019Date of Patent: February 16, 2021Assignee: EMC IP Holding Company LLCInventors: Michael Robillard, Adrian Michaud, Dragan Savic
-
Patent number: 10915322Abstract: A processor predicts a number of loop iterations associated with a set of loop instructions. In response to the predicted number of loop iterations exceeding a first loop iteration threshold, the set of loop instructions are executed in a loop mode that includes placing at least one component of an instruction pipeline of the processor in a low-power mode or state and executing the set of loop instructions from a loop buffer. In response to the predicted number of loop iterations being less than or equal to a second loop iteration threshold, the set of instructions are executed in a non-loop mode that includes maintaining at least one component of the instruction pipeline in a powered up state and executing the set of loop instructions from an instruction fetch unit of the instruction pipeline.Type: GrantFiled: September 18, 2018Date of Patent: February 9, 2021Assignee: ADVANCED MICRO DEVICES, INC.Inventors: Arunachalam Annamalai, Marius Evers, Aparna Thyagarajan, Anthony Jarvis
-
Patent number: 10915326Abstract: A cache system, having a first cache, a second cache, and a logic circuit coupled to control the first cache and the second cache according to an execution type of a processor. When an execution type of a processor is a first type indicating non-speculative execution of instructions and the first cache is configured to service commands from a command bus for accessing a memory system, the logic circuit is configured to copy a portion of content cached in the first cache to the second cache. The cache system can include a configurable data bit. The logic circuit can be coupled to control the caches according to the bit. Alternatively, the caches can include cache sets. The caches can also include registers associated with the cache sets respectively. The logic circuit can be coupled to control the cache sets according to the registers.Type: GrantFiled: July 31, 2019Date of Patent: February 9, 2021Assignee: Micron Technology, Inc.Inventor: Steven Jeffrey Wallach