Scoreboarding, Reservation Station, Or Aliasing Patents (Class 712/217)
-
Patent number: 12190117Abstract: Techniques are provided for allocating registers for a processor. The techniques include identifying a first instruction of an instruction dispatch set that meets all register allocation suppression criteria of a first set of register allocation suppression criteria, suppressing register allocation for the first instruction, identifying a second instruction of the instruction dispatch set that does not meet all register allocation suppression criteria of a second set of register allocation suppression criteria, and allocating a register for the second instruction.Type: GrantFiled: November 26, 2019Date of Patent: January 7, 2025Assignee: Advanced Micro Devices, Inc.Inventors: Neil N. Marketkar, Arun A. Nair
-
Patent number: 12141580Abstract: A processor includes an instruction issue unit that receives a first instruction, and issues the first instruction with a write time, which for a load instruction corresponds to a data cache latency time or to a non-cacheable latency time of a non-cacheable predictor. The non-cacheable predictor includes a tag array and data array with a plurality of entries to predict non-cacheable latency times of non-cacheable load instructions. The non-cacheable predictor can be implemented as a direct map, an N-way associative cache, or a fully associative cache.Type: GrantFiled: April 20, 2022Date of Patent: November 12, 2024Assignee: Simplex Micro, Inc.Inventors: David Witt, Thang Minh Tran
-
Patent number: 12086591Abstract: Techniques and mechanisms for determining a relative order in which a load instruction and a store instruction are to be executed. In an embodiment, a processor detects an address collision event wherein two instructions, corresponding to different respective instruction pointer values, target the same memory address. Based on the address collision event, the processor identifies respective instruction types of the two instructions as an aliasing instruction type pair. The processor further determines a count of decisions each to forego a reversal of an order of execution of instructions. Each decision represented in the count is based on instructions which are each of a different respective instruction type of the aliasing instruction type pair. In another embodiment, the processor determines, based on the count of decisions, whether a later load instruction is to be advanced in an order of instruction execution.Type: GrantFiled: March 26, 2021Date of Patent: September 10, 2024Assignee: Intel CorporationInventors: Sudhanshu Shukla, Jayesh Gaur, Stanislav Shwartsman, Pavel I. Kryukov
-
Patent number: 12079589Abstract: Systems, apparatuses, and methods related to a memory array data structure for posit operations are described. Universal number (unum) bit strings, such as posit bit string operands and posit bit strings representing results of arithmetic and/or logical operations performed using the posit bit string operands may be stored in a memory array. Circuitry deployed in a memory device may access the memory array to retrieve the unum bit string operands and/or the results of the arithmetic and/or logical operations performed using the unum bit string operands from the memory array. For instance, an arithmetic operation and/or a logical operation may be performed using a first unum bit string stored in the memory array and a second unum bit string stored in the memory array. The result of the arithmetic operation and/or the logical operation may be stored in the memory array and subsequently retrieved.Type: GrantFiled: July 7, 2022Date of Patent: September 3, 2024Assignee: Micron Technology, Inc.Inventor: Vijay S. Ramesh
-
Patent number: 12061905Abstract: Transformations in fused multiply-add instructions, including: receiving an extended fused multiply-add (FMA) instruction comprising: a first subset of bits corresponding to each operand of a fused multiply-add (FMA) operation, and a second subset of bits comprising an opcode for the extended FMA instruction, wherein the opcode identifies extended FMA instruction from an instruction set of a plurality of extended FMA instructions each having a different predefined opcode corresponding to a different combination of transformation-operand groupings; and performing, based on the opcode of the extended FMA instruction, the one or more transformations and the FMA operation.Type: GrantFiled: June 20, 2023Date of Patent: August 13, 2024Assignee: GHOST AUTONOMY INC.Inventors: John Hayes, Volkmar Uhlig
-
Patent number: 12045620Abstract: A data processing apparatus is provided that comprises rename circuitry for performing a register rename stage of a pipeline in respect of a stream of operations. Move elimination circuitry performs a move elimination operation on the stream of operations in which a move operation is eliminated and the register rename stage performs an adjustment of an identity of registers in the stream of operations to compensate for the move operation being eliminated and demotion circuitry reverses or inhibits the adjustment in response to one or more conditions being met.Type: GrantFiled: December 17, 2021Date of Patent: July 23, 2024Assignee: Arm LimitedInventors: Yasuo Ishii, Muhammad Umar Farooq, William Elton Burky, Michael Brian Schinzler, Jason Lee Setter, David Gum Lim
-
Patent number: 11989583Abstract: Circuitry comprises two or more clusters of execution units, each cluster comprising one or more execution units to execute processing instructions; and scheduler circuitry to maintain one or more queues of processing instructions, the scheduler circuitry comprising picker circuitry to select a queued processing instruction for issue to an execution unit of one of the clusters of execution units for execution; in which: the scheduler circuitry is configured to maintain dependency data associated with each queued processing instruction, the dependency data for a queued processing instruction indicating any source operands which are required to be available for use in execution of that queued processing instruction and to inhibit issue of that queued processing instruction until all of the required source operands for that queued processing instruction are available and is configured to be responsive to an indication to the scheduler circuitry of the availability of the given operand as a source operand for useType: GrantFiled: March 31, 2021Date of Patent: May 21, 2024Assignee: Arm LimitedInventors: Chris Abernathy, Eric Charles Quinnell, Abhishek Raja, Michael David Achenbach
-
Patent number: 11989560Abstract: The present disclosure provides an instruction execution method, device, and electronic equipment. In the instruction execution method described above, after obtaining an exceptional signal generated by a neural network processor during an operation, the electronic equipment determines an exception processing instruction corresponding to the exceptional signal according to the exceptional signal, then it determines a first instruction queue needed to be executed by the neural network processor, and then it generates a second instruction queue based on the exception processing instruction and the first instruction queue, and finally it controls the neural network processor to execute the second instruction queue, so that errors encountered by the neural network processor can be timely processed, thereby shortening the error processing delay and improving the data processing efficiency of the hardware system in the electronic equipment.Type: GrantFiled: April 7, 2021Date of Patent: May 21, 2024Assignee: BEIJING HORIZON ROBOTICS TECHNOLOGY RESEARCH AND DEVELOPMENT CO., LTD.Inventors: Yitong Zhao, Xing Wei
-
Patent number: 11966328Abstract: A memory module includes register selection logic to select alternate local source and/or destination registers to process PIM commands. The register selection logic uses an address-based register selection approach to select an alternate local source and/or destination register based upon address data specified by a PIM command and a split address maintained by a memory module. The register selection logic may alternatively use a register data-based approach to select an alternate local source and/or destination register based upon data stored in one or more local registers. A PIM-enabled memory module configured with the register selection logic described herein is capable of selecting an alternate local source and/or destination register to process PIM commands at or near the PIM execution unit where the PIM commands are executed.Type: GrantFiled: December 18, 2020Date of Patent: April 23, 2024Assignee: ADVANCED MICRO DEVICES, INC.Inventors: Onur Kayiran, Mohamed Assem Ibrahim, Shaizeen Aga
-
Patent number: 11900111Abstract: A device includes a vector register file, a memory, and a processor. The vector register file includes a plurality of vector registers. The memory is configured to store a permutation instruction. The processor is configured to access a periodicity parameter of the permutation instruction. The periodicity parameter indicates a count of a plurality of data sources that contain source data for the permutation instruction. The processor is also configured to execute the permutation instruction to, for each particular element of multiple elements of a first permutation result register of the plurality of vector registers, select a data source of the plurality of data sources based at least in part on the count of the plurality of data sources and populate the particular element based on a value in a corresponding element of the selected data source.Type: GrantFiled: September 24, 2021Date of Patent: February 13, 2024Assignee: QUALCOMM IncorporatedInventors: Srijesh Sudarsanan, Deepak Mathew, Marc Hoffman, Gerald Sweeney, Sundar Rajan Balasubramanian, Hongfeng Dong, Yurong Sun, Seyedmehdi Sadeghzadeh
-
Patent number: 11900122Abstract: Methods and parallel processing units for avoiding inter-pipeline data hazards identified at compile time. For each identified inter-pipeline data hazard the primary instruction and secondary instruction(s) thereof are identified as such and are linked by a counter which is used to track that inter-pipeline data hazard. When a primary instruction is output by the instruction decoder for execution the value of the counter associated therewith is adjusted to indicate that there is hazard related to the primary instruction, and when primary instruction has been resolved by one of multiple parallel processing pipelines the value of the counter associated therewith is adjusted to indicate that the hazard related to the primary instruction has been resolved.Type: GrantFiled: July 10, 2023Date of Patent: February 13, 2024Assignee: Imagination Technologies LimitedInventors: Luca Iuliano, Simon Nield, Yoong-Chert Foo, Ollie Mower
-
Patent number: 11842192Abstract: An arithmetic processing device, includes a memory; and a processor coupled to the memory and the processor configured to: execute arithmetic processing which executes a plurality of instructions issued out of order, execute control processing which commits the plurality of instructions for which execution has been completed in order, identify, for each instruction included in the plurality of instructions, a count value which indicates a number of cycles from when execution of the instruction has been completed, and identify, among a plurality of uncommitted instructions, the instruction with the count value which matches the number of cycles in which an error is detected in the arithmetic processing as a specific instruction to be retried.Type: GrantFiled: March 16, 2021Date of Patent: December 12, 2023Assignee: FUJITSU LIMITEDInventors: Yuhei Takata, Shiro Kamoshida
-
Patent number: 11836518Abstract: Techniques for data manipulation using processor graph execution using interrupt conservation are disclosed. Processing elements are configured to implement a data flow graph. The processing elements comprise a multilayer graph execution engine. A data engine is loaded with computational parameters for the multilayer graph execution engine. The data engine is coupled to the multilayer graph execution engine, and the computational parameters supply layer-by-layer execution data to the multilayer graph execution engine for data flow graph execution. A first command FIFO is used for loading the data engine with computational parameters, and a second command FIFO is used for loading the multilayer graph execution engine with layer definition data. An input image is provided for a first layer of the multilayer graph execution engine. The data flow graph is executed using the input image and the computational parameters.Type: GrantFiled: December 10, 2021Date of Patent: December 5, 2023Inventor: David John Simpson
-
Register scoreboard for a microprocessor with a time counter for statically dispatching instructions
Patent number: 11829767Abstract: A processor includes a time counter and a register scoreboard and operates to statically dispatch instructions with preset execution times based on a write time of a register in the register scoreboard and a time count of the time counter provided to an execution pipeline.Type: GrantFiled: February 15, 2022Date of Patent: November 28, 2023Assignee: Simplex Micro, Inc.Inventor: Thang Minh Tran -
Patent number: 11776195Abstract: Examples are described here that can be used to enable a main routine to request subroutines or other related code to be executed with other instantiations of the same subroutine or other related code for parallel execution. A sorting unit can be used to accumulate requests to execute instantiations of the subroutine. The sorting unit can request execution of a number of multiple instantiations of the subroutine corresponding to a number of lanes in a SIMD unit. A call stack can be used to share information to be accessed by a main routine after execution of the subroutine completes.Type: GrantFiled: August 31, 2021Date of Patent: October 3, 2023Assignee: Intel CorporationInventors: John G. Gierach, Karthik Vaidyanathan, Thomas F. Raoux
-
Patent number: 11748104Abstract: Technology for fusing certain load instructions and compare-immediate instructions in a computer processor having a load-store architecture with respect to transferring data between memory and registers of the computer processor. In some embodiments the load and compare-immediate instructions are consecutive. In some embodiments, the instructions are only merged if: (i) the respective RA and RT fields of the two instructions match; (ii) the immediate field of the compare-immediate instruction has a certain value, or falls within a range of certain values; and/or (iii) the instructions are received in a consecutive manner.Type: GrantFiled: July 29, 2020Date of Patent: September 5, 2023Assignee: International Business Machines CorporationInventors: Bryan Lloyd, David A. Hrusecky, Sundeep Chadha, Dung Q. Nguyen, Christian Gerhard Zoellin, Brian W. Thompto, Sheldon Bernard Levenstein, Phillip G. Williams
-
Patent number: 11727530Abstract: Techniques are disclosed relating to low-level instruction storage in a processing unit. In some embodiments, a graphics unit includes execution circuitry, decode circuitry, hazard circuitry, and caching circuitry. In some embodiments the execution circuitry is configured to execute clauses of graphics instructions. In some embodiments, the decode circuitry is configured to receive graphics instructions and a clause identifier for each received graphics instruction and to decode the received graphics instructions. In some embodiments, the caching circuitry includes a plurality of entries each configured to store a set of decoded instructions in the same clause. A given clause may be fetched and executed multiple times, e.g., for different SIMD groups, while stored in the caching circuitry.Type: GrantFiled: May 28, 2021Date of Patent: August 15, 2023Assignee: Apple Inc.Inventors: Andrew M. Havlir, Dzung Q. Vu, Liang Kai Wang
-
Patent number: 11687345Abstract: Apparatus and methods are disclosed for implementing block-based processors including field programmable gate-array implementations. In one example of the disclosed technology, a block-based processor includes an instruction decoder configured to generate decoded ready dependencies for a transactional block of instructions, where each of the instructions is associated with a different instruction identifier encoded in the transactional block. The processor further includes an instruction scheduler configured to issue an instruction from a set of instructions of the transactional block of instructions. The instruction is issued based on determining that decoded ready state dependencies for an instruction are satisfied. The determining includes accessing storage with the decoded ready dependencies indexed with a respective instruction identifier that is encoded in the transactional block of instructions.Type: GrantFiled: July 29, 2016Date of Patent: June 27, 2023Assignee: Microsoft Technology Licensing, LLCInventors: Aaron L. Smith, Jan S. Gray
-
Patent number: 11687347Abstract: A microprocessor and a method for issuing a load/store instruction is introduced. The microprocessor includes a decode/issue unit, a load/store queue, a scoreboard, and a load/store unit. The scoreboard includes a plurality of scoreboard entries, in which each scoreboard entry includes an unknown bit value and a count value, wherein the unknown bit value or the count value is set when instructions are issued. The decode/issue unit checks for WAR, WAW, and RAW data dependencies from the scoreboard and dispatches load/store instructions to the load/store queue with the recorded scoreboard values. The load/store queue is configured to resolve the data dependencies and dispatch the load/store instructions to the load/store unit for execution.Type: GrantFiled: May 25, 2021Date of Patent: June 27, 2023Assignee: ANDES TECHNOLOGY CORPORATIONInventor: Thang Minh Tran
-
Patent number: 11599359Abstract: A processor in a data processing system includes a master-shadow physical register file and a renaming unit. The master-shadow physical register file has a master storage coupled to shadow storage. The renaming unit is coupled to the master-shadow physical register file. Based on an occurrence of shadow transfer activation conditions verified by the renaming unit, data in the master storage is transferred from the master storage to the shadow storage for storage. Data is transferred from the shadow storage back to the master storage based on the occurrence of a shadow-to-master transfer event, which includes, for example, a flush of the master storage by the processor.Type: GrantFiled: May 18, 2020Date of Patent: March 7, 2023Assignee: Advanced Micro Devices, Inc.Inventors: Arun A. Nair, Ashok T. Venkatachar, Emil Talpes, Srikanth Arekapudi, Rajesh Kumar Arunachalam
-
Patent number: 11593116Abstract: A system and corresponding method unwind instructions in an out-of-order (OoO) processor. The system comprises a mapper. In response to a restart event causing at least one instruction to be unwound, the mapper restores a present integer mapper state and present floating-point (FP) mapper state, used for mapping instructions, to a former integer mapper state and former FP mapper state, respectively. The mapper stores integer snapshots and FP snapshots of the present integer and FP mapper state, respectively, to expedite restoration to the former integer and FP mapper state, respectively. Access to the FP snapshots is blocked, intermittently, as a function of at least one FP present indicator used by the mapper to record presence of FP registers used as destinations in the instructions. Blocking the access, intermittently, improves power efficiency of the OoO processor.Type: GrantFiled: April 30, 2021Date of Patent: February 28, 2023Assignee: Marvell Asia Pte, Ltd.Inventor: David A. Carlson
-
Patent number: 11593115Abstract: The present disclosure discloses an instruction execution device, a processor including the instruction execution device, a system on chip, and a method for executing a data storage instruction in the processor. The method includes: splitting the data storage instruction into a first split instruction and a second split instruction, wherein the first split instruction is associated with an address operand of the data storage instruction, and the second split instruction is associated with a data operand of the data storage instruction; executing the first split instruction to determine a data storage address corresponding to the address operand; executing the second split instruction to acquire data content corresponding to the data operand; and storing the acquired data content to the determined data storage address in a data storage region. The present disclosure further discloses a corresponding instruction execution device, a processor including the execution device and a system on chip.Type: GrantFiled: March 20, 2020Date of Patent: February 28, 2023Assignee: Alibaba Group Holding LimitedInventors: Yimin Lu, Xiaoyan Xiang
-
Patent number: 11561794Abstract: A computer system, processor, programming instructions and/or method of processing data that includes a main register file having a plurality of entries for storing data; an accumulator register file having a plurality of entries for storing data wherein multiple main register file entries are mapped to one accumulator register file entry in the at least one accumulator register file; a logical register mapper to track and map logical registers to main register file entries, and a history buffer. Processing wide data width instructions includes evicting and restoring information from a single primary entry in the logical register mapper through a single read or write port in the logical register mapper without evicting or restoring the remaining other multiple main register file entries mapped in the accumulator register.Type: GrantFiled: May 26, 2021Date of Patent: January 24, 2023Assignee: International Business Machines CorporationInventors: Steven J. Battle, Brian W. Thompto, Dung Q. Nguyen, Cliff Kucharski, Susan E. Eisen, Salma Ayub
-
Patent number: 11531547Abstract: Data processing circuitry comprises out-of-order instruction execution circuitry; register mapping circuitry to map zero or more architectural processor registers relating to execution of that program instruction to respective ones of a set of physical processor registers; commit circuitry to commit, in a program code order, the results of executed program instructions, the commit circuitry being configured to access a data store which stores register tag data to indicate which physical registers mapped by the register mapping circuitry relate to a given program instruction; fault detection circuitry to detect a memory access fault in respect of a vector memory access operation and to generate fault indication data indicative of an element earliest in the element order for which a memory access fault was detected; a fault indication register to store the fault indication data, in which the register mapping circuitry is configured to generate a register mapping for a program instruction for any architectural pType: GrantFiled: May 21, 2021Date of Patent: December 20, 2022Assignee: Arm LimitedInventors: Damian Maiorano, Luca Nassi, Cédric Denis Robert Airaud, Christophe Laurent Carbonne, Jocelyn François Orion Jaubert, Pasquale Ranone
-
Patent number: 11507362Abstract: A system and method for executing a method generating a binary patch file for live patching of an application is disclosed. In one exemplary aspect, the method comprises creating shared object by compiling source code patch file that contains source code of a new function corresponding to an old function, a global external symbol referenced in the source code of the new function, and at least one link to a symbol in an application binary code corresponding to the global external symbol, wherein the shared object contains binary code of the new function for replacing the old function during the live patching, and the result of a compilation of the link, generating metadata usable to facilitate the live patching, creating bindings between calculated relative addresses and the global external symbol referenced by the shared object, and creating the binary patch file by adding metadata to the shared object.Type: GrantFiled: October 5, 2020Date of Patent: November 22, 2022Assignee: Virtuozzo International GmbHInventors: Stanislav Kinsburskiy, Alexey Kobets, Eugene Kolomeetz
-
Patent number: 11494190Abstract: Instruction decoder circuitry decodes processing instructions each generating an output multi-bit data item in a destination architectural register by applying a processing operation to source data item(s) in respective source architectural register(s). The decoder circuitry detects whether an instruction defines a predicated merge operation that propagates a set of zero or more portions of the prevailing contents of the destination architectural register as respective portions of the output multi-bit data item. The portions are defined by predicate data. Register allocation circuitry associates physical registers with the destination architectural register and the source architectural register(s). When detector circuitry detects that an instruction defines a predicated merge operation, the register allocation circuitry associates a further physical register with that instruction to store a copy of the prevailing contents.Type: GrantFiled: March 31, 2021Date of Patent: November 8, 2022Assignee: Arm LimitedInventors: Zachary Allen Kingsbury, Kurt Matthew Fellows, Thomas Gilles Tarridec
-
Patent number: 11481315Abstract: A method includes using a memory address map, locating a first portion of an application in a first memory and loading a second portion of the application from a second memory. The method includes executing in place from the first memory the first portion of the application, during a first period, and by completion of the loading of the second portion of the application from the second memory. The method further includes executing the second portion of the application during a second period, wherein the first period precedes the second period.Type: GrantFiled: September 4, 2020Date of Patent: October 25, 2022Assignee: INFINEON TECHNOLOGIES LLCInventors: Stephan Rosner, Qamrul Hasan, Venkat Natarajan
-
Patent number: 11475282Abstract: Techniques in advanced deep learning provide improvements in one or more of accuracy, performance, and energy efficiency. An array of compute elements and routers performs flow-based computations on wavelets of data. Some instructions are performed in iterations, such as one iteration per element of a fabric vector or FIFO. When sources for an iteration of an instruction are unavailable, and/or there is insufficient space to store results of the iteration, indicators associated with operands of the instruction are checked to determine whether other work can be performed. In some scenarios, other work cannot be performed and processing stalls. Alternatively, information about the instruction is saved, the other work is performed, and sometime after the sources become available and/or sufficient space to store the results becomes available, the iteration is performed using the saved information.Type: GrantFiled: April 17, 2018Date of Patent: October 18, 2022Assignee: Cerebras Systems Inc.Inventors: Sean Lie, Michael Morrison, Michael Edwin James, Gary R. Lauterbach, Srikanth Arekapudi
-
Patent number: 11455156Abstract: Systems and methods for binary translation of executable code.Type: GrantFiled: May 11, 2020Date of Patent: September 27, 2022Assignee: Parallels International GMBHInventors: Alexey Koryakin, Nikolay Dobrovolskiy, Serguei M. Beloussov
-
Patent number: 11449341Abstract: The invention relates to an electronic device for data processing, which includes an execution unit with a temporary register, a register file, a first feedback path from the data output of the execution unit to the register file, a second feedback path from the data output of the execution unit to the temporary register, a switch configured to connect the first feedback path and/or the second feedback path, and a logic stage coupled to control the switch. The control stage is configured to control the switch to connect the second feedback path if the data output of an execution unit is used as an operand in the subsequent operation of an execution unit.Type: GrantFiled: September 9, 2019Date of Patent: September 20, 2022Assignee: Texas Instruments IncorporatedInventors: Marko Krüger, Steven Bartling, Markus Kösler
-
Patent number: 11429391Abstract: A processor core, a processor, an apparatus, and an instruction processing method are disclosed. The processor core includes: an instruction fetch unit, where the instruction fetch unit includes a speculative execution predictor and the speculative execution predictor compares a program counter of a memory access instruction with a table entry stored in the speculative execution predictor and marks the memory access instruction; a scheduler unit adapted to adjust a send order of marked memory access instructions and send the marked memory access instructions according to the send order; an execution unit adapted to execute the memory access instructions according to the send order. In the instruction fetch unit, a memory access instruction is marked according to a speculative execution prediction result. In the scheduler unit, a send order of memory access instructions is determined according to the marked memory access instruction and the memory access instructions are sent.Type: GrantFiled: August 27, 2020Date of Patent: August 30, 2022Assignee: Alibaba Group Holding LImitedInventors: Dongqi Liu, Chang Liu, Yimin Lu, Tao Jiang, Chaojun Zhao
-
Patent number: 11416252Abstract: A data processing system includes an instruction pipeline containing instruction queue circuitry, fusion circuitry and decoder circuitry. The fusion circuitry serves to identify fusible groups of program instructions within a Y-wide window of program instructions and supply a stream of program instructions including such replacement fused program instructions to a X-wide decoder circuitry which decodes X program instructions in parallel using parallel decoders.Type: GrantFiled: December 27, 2017Date of Patent: August 16, 2022Assignee: Arm LimitedInventors: Vladimir Vasekin, Chiloda Ashan Senarath Pathirane, Jungsoo Kim, Alexei Fedorov
-
Patent number: 11416254Abstract: Systems, apparatuses, and methods for implementing zero cycle load bypass operations are described. A system includes a processor with at least a decode unit, control logic, mapper, and free list. When a load operation is detected, the control logic determines if the load operation qualifies to be converted to a zero cycle load bypass operation. Conditions for qualifying include the load operation being in the same decode group as an older store operation to the same address. Qualifying load operations are converted to zero cycle load bypass operations. A lookup of the free list is prevented for a zero cycle load bypass operation and a destination operand of the load is renamed with a same physical register identifier used for a source operand of the store. Also, the data of the store is bypassed to the load.Type: GrantFiled: December 5, 2019Date of Patent: August 16, 2022Assignee: Apple Inc.Inventors: Deepankar Duggal, Kulin N. Kothari, Conrado Blasco, Muawya M. Al-Otoom
-
Patent number: 11403103Abstract: A microprocessor is shown, in which a branch predictor and an instruction cache are decoupled by a fetch-target queue (FTQ). The branch predictor performs branch prediction for N instruction addresses in parallel in the same cycle, wherein N is an integer greater than 1. In the current cycle, the branch predictor finishes branch prediction for N instruction addresses in parallel and, among the N instruction addresses with finished branch prediction, those that are not bypassed and do not overlap previously-predicted instruction addresses are pushed into the fetch-target queue, to be read out later as an instruction-fetching address for the instruction cache. The previously-predicted instruction addresses are pushed into the fetch-target queue in a previous cycle.Type: GrantFiled: October 13, 2020Date of Patent: August 2, 2022Assignee: SHANGHAI ZHAOXIN SEMICONDUCTOR CO., LTD.Inventors: Fangong Gong, Mengchen Yang
-
Patent number: 11403107Abstract: Methods, systems, and apparatuses related to re-order buffers and for protection from timing-based security attacks are described. A processor may have functional units configured to execute instructions out of order, a re-order buffer configured to buffer the execution results of instructions for output in order, and a controller configured to randomize data timing in the re-order buffer. For example, the controller can make random adjustments to the capacity of the re-order buffer in buffering and/or sorting execution results and thus randomize data timing in the re-order buffer.Type: GrantFiled: December 5, 2018Date of Patent: August 2, 2022Assignee: Micron Technology, Inc.Inventor: Steven Jeffrey Wallach
-
Patent number: 11327763Abstract: Opportunistic consumer instruction steering based on producer instruction value prediction in a multi-cluster processor is disclosed. A processor provides producer instructions and consumer instructions to a steering circuit that steers the program instructions to clusters of instruction execution circuits. An input value provided to a consumer instruction may be a produced value of a producer instruction, creating a dependency. The steering circuit steers a producer instruction to a first cluster and, in response to receiving the consumer instruction and the predicted value of the producer instruction, provides the predicted value to at least a second cluster and steers the consumer instruction to the second cluster for execution with the predicted value as the input value.Type: GrantFiled: June 11, 2020Date of Patent: May 10, 2022Assignee: Microsoft Technology Licensing, LLCInventors: Arthur Perais, Shivam Priyadarshi, Yusuf Cagatay Tekmen, Rami Mohammad Al Sheikh, Vignyan Reddy Kothinti Naresh
-
Patent number: 11327762Abstract: An instruction prefetching method, a device and a medium are provided. The method includes the following: instructions in a target buffer are precompiled before a processor core fetches a required instruction from the target buffer corresponding to the processor core; if it is determined that a jump instruction exists in the target buffer and a jump target instruction corresponding to the jump instruction is not cached in the target buffer according to a precompiled result, the jump target instruction is prefetched from an icache into a candidate buffer corresponding to the processor core to wait for the processor core to fetch the jump target instruction from the candidate buffer; the target buffer and the candidate buffer are alternately reused during instruction prefetching.Type: GrantFiled: September 29, 2020Date of Patent: May 10, 2022Assignee: KUNLUNXIN TECHNOLOGY (BEIJING) COMPANY LIMITEDInventors: Chao Tang, Xueliang Du, Yingnan Xu
-
Patent number: 11321799Abstract: Examples described herein relate to a software and hardware optimization that manages scenarios where a write operation to a register is less than an entirety of the register. A compiler detects instructions that make partial writes to the same register, groups such instructions, and provides hints to hardware of the partial write. The execution unit combines the output data for grouped instructions and updates the destination register as single write instead of multiple separate partial writes.Type: GrantFiled: December 24, 2019Date of Patent: May 3, 2022Assignee: Intel CorporationInventors: Chandra S. Gurram, Gang Y. Chen, Subramaniam Maiyuran, Supratim Pal, Ashutosh Garg, Jorge E. Parra, Darin M. Starkey, Guei-Yuan Lueh, Wei-Yu Chen
-
Patent number: 11294683Abstract: Systems and methods are disclosed for duplicate detection for register renaming. For example, a method includes checking a map table for duplicates of a first physical register, wherein the map table stores entries that each map an architectural register of an instruction set architecture to a physical register of a microarchitecture and a duplicate is two or more architectural registers that are mapped to a same physical register; and, responsive to a duplicate of the first physical register in the map table, preventing the first physical register from being added to a free list upon retirement of an instruction that renames an architectural register that was previously mapped to the first physical register to a different physical register, wherein the free list stores entries that indicate which physical registers are available for renaming.Type: GrantFiled: April 17, 2020Date of Patent: April 5, 2022Assignee: SiFive, Inc.Inventor: Joshua Smith
-
Patent number: 11269631Abstract: Extending fused multiply-add instructions, the method comprising: receiving an extended fused multiply-add (FMA) instruction indicating one or more operands of a fused multiply-add (FMA) operation and one or more transformations to be applied to the one or more operands; and performing, based on the extended FMA instruction, the one or more transformations and the FMA operation.Type: GrantFiled: July 29, 2020Date of Patent: March 8, 2022Assignee: GHOST LOCOMOTION INC.Inventors: John Hayes, Volkmar Uhlig
-
Patent number: 11231933Abstract: A method and apparatus for controlling pre-fetching in a processor. A processor includes an execution pipeline and an instruction pre-fetch unit. The execution pipeline is configured to execute instructions. The instruction pre-fetch unit is coupled to the execution pipeline. The instruction pre-fetch unit includes instruction storage to store pre-fetched instructions, and pre-fetch control logic. The pre-fetch control logic is configured to fetch instructions from memory and store the fetched instructions in the instruction storage. The pre-fetch control logic is also configured to provide instructions stored in the instruction storage to the execution pipeline for execution. The pre-fetch control logic is further configured set a maximum number of instruction words to be pre-fetched for execution subsequent to execution of an instruction currently being executed in the execution pipeline.Type: GrantFiled: April 9, 2020Date of Patent: January 25, 2022Assignee: Texas Instruments IncorporatedInventors: Christian Wiencke, Johann Zipperer
-
Patent number: 11210101Abstract: An arithmetic processing device includes: a decoding circuit configured to decode a command; a command execution circuit configured to execute the command decoded by the decoding circuit; a register circuit configured to include a plurality of registers for holding data used by the command execution circuit; an identification information holding circuit configured to store identification information for identifying a register for writing a specific value when the command is a register writing command; a setting circuit configured to hold the specific value; and an operation control circuit configured to execute inhibiting processing when the command is a register reading command, the inhibiting processing including inhibiting an access of the register by the register reading command and selecting the specific value held in the setting circuit.Type: GrantFiled: August 22, 2019Date of Patent: December 28, 2021Assignee: FUJITSU LIMITEDInventors: Ryohei Okazaki, Sota Sakashita, Atushi Fusejima
-
Patent number: 11204769Abstract: A global front end scheduler to schedule instruction sequences to a plurality of virtual cores implemented via a plurality of partitionable engines. The global front end scheduler includes a thread allocation array to store a set of allocation thread pointers to point to a set of buckets in a bucket buffer in which execution blocks for respective threads are placed, a bucket buffer to provide a matrix of buckets, the bucket buffer including storage for the execution blocks, and a bucket retirement array to store a set of retirement thread pointers that track a next execution block to retire for a thread.Type: GrantFiled: January 2, 2020Date of Patent: December 21, 2021Assignee: Intel CorporationInventor: Mohammad Abdallah
-
Patent number: 11163562Abstract: A processing unit, in particular, a microcontroller for a control unit, which includes at least one processing core, one primary memory device, and at least one main connection unit for connecting the at least one processing core to the primary memory device, the processing unit including at least two functional units, the processing unit including at least one functional unit designed as a data flow control unit, which is designed to receive input data, to evaluate the input data, and to generate output data as a function of the evaluation.Type: GrantFiled: October 9, 2018Date of Patent: November 2, 2021Assignee: Robert Bosch GmbHInventor: Nico Bannow
-
Patent number: 11163564Abstract: The present disclosure is directed to methods to generate a packed result array using parallel vector processing, of an input array and a comparison operation. In one aspect, an additive scan operation can be used to generate memory offsets for each successful comparison operation of the input array and to generate a count of the number of data elements satisfying the comparison operation. In another aspect, the input array can be segmented to allow more efficient processing using the vector registers. In another aspect, a vector processing system is disclosed that is operable to receive a data array, a comparison operation, and threshold criteria, and output a packed array, at a specified memory address, comprising of the data elements satisfying the comparison operation.Type: GrantFiled: October 8, 2018Date of Patent: November 2, 2021Assignees: VeriSilicon Microelectronics (Shanghai) Co., Ltd., VeriSilicon Holdings Co., Ltd.Inventors: Charles H. Stewart, Charles R. Bezet
-
Patent number: 11144321Abstract: Examples of techniques for store hit multiple load side register for operand store compare are described herein. An aspect includes, based on detecting a store hit multiple load condition in the processor, updating a register of the processor to hold information corresponding to a first store instruction that triggered the detected store hit multiple load condition. Another aspect includes, based on issuing a second store instruction in the processor, determining whether the second store instruction corresponds to the information in the register. Another aspect includes, based on determining that the second store instruction corresponds to the information in the register, tagging the second store instruction with an operand store compare mark.Type: GrantFiled: February 20, 2019Date of Patent: October 12, 2021Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Yair Fried, Jonathan Hsieh, Eyal Naor, James Bonanno, Gregory William Alexander
-
Patent number: 11132202Abstract: An apparatus comprises execution circuitry to perform operations on source data values and to generate result data values; issue circuitry comprising one or more issue queues identifying pending operations awaiting performance by the execution circuitry, and selection circuitry to select pending operations to issue to the execution circuitry; data value cache storage comprising first and second cache regions; and cache control circuitry to control the storing to the first cache region of result data values generated by the execution circuitry and the eviction of stored result data values from the first cache region in response to newly generated result data values being stored in the first cache region; the cache control circuitry being configured to store to the second cache region result data values required as source data values for one or more oldest pending operations identified by the one or more issue queues and to inhibit eviction of a given result data value stored in the second cache region until inType: GrantFiled: September 24, 2019Date of Patent: September 28, 2021Assignee: Arm LimitedInventors: Luca Nassi, Rémi Marius Teyssier, Cédric Denis Robert Airaud, Albin Pierrick Tonnerre, Francois Donati, Christophe Carbonne, Damian Maiorano
-
Patent number: 11113068Abstract: Performing flush recovery using parallel walks of sliced reorder buffers (SROBs) is disclosed herein. In one exemplary embodiment, a register mapping circuit provides a rename mapping table (RMT) comprising RMT entries representing logical register number (LRN) to physical register number (PRN) mappings. The register mapping circuit also provides an SROB comprising multiple SROB slices that each corresponds to a respective LRN. Each SROB slice tracks uncommitted instructions that write to the LRN corresponding to that SROB slice, and maintains those instructions in program order with respect to each other. Upon detecting an uncommitted instruction writing to an LRN, the register mapping circuit allocates an SROB slice entry in the SROB slice corresponding to the LRN. When an pipeline flush from a target instruction occurs, the register mapping circuit restores RMT entries of the RMT to their prior mapping states based on parallel walks of the SROB slices of the SROB.Type: GrantFiled: August 6, 2020Date of Patent: September 7, 2021Assignee: Microsoft Technology Licensing, LLCInventors: Yusuf Cagatay Tekmen, Rodney Wayne Smith, Kiran Ravi Seth, Shivam Priyadarshi
-
Patent number: 11086626Abstract: Circuitry comprises decode circuitry to decode program instructions including producer instructions and consumer instructions, a consumer instruction requiring, as an input operand, a result generated by execution of a producer instruction; and execution circuitry to execute the program instructions; in which: the decode circuitry is configured to control operation of the execution circuitry in response to hint data associated with a given producer instruction and indicating, for the given producer instruction, a number of consumer instructions which require, as an input operand, a result generated by the given producer instruction.Type: GrantFiled: October 24, 2019Date of Patent: August 10, 2021Assignee: Arm LimitedInventors: Roko Grubisic, Giacomo Gabrielli, Matthew James Horsnell, Syed Ali Mustafa Zaidi
-
Patent number: 11086630Abstract: A computer system includes a dispatch stage configured to dispatch a plurality of instructions in a program order, and an issue stage configured to issue at least one instruction among the plurality of instructions. The computer system further includes an execution stage configured to execute the at least one instruction to generate a finish report and to determine the at least one instruction is one of an exception-free instruction or an exception instruction. In response to determining the exception-free instruction, a first finish report associated with the exception-free instruction is output to a completion stage. In response to determining the exception instruction, a second finish report associated with the exception instruction is output to an exception unit so as to halt output of the second finish report to the completion stage.Type: GrantFiled: February 27, 2020Date of Patent: August 10, 2021Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Kenneth L. Ward, Susan Eisen, Christopher M. Mueller, Glenn O. Kincaid, Dhivya Jeganathan