Patents Examined by Corey S Faherty
-
Patent number: 12379931Abstract: A compute node capable of enhanced performance and/or energy savings is proposed. The proposed compute node may check whether a last instruction of a first group—retrieved in a first decode cycle—is potentially a fusible instruction. If so, the proposed compute node may refrain from decoding the last instruction in the first decode cycle. Instead, the proposed compute node may determine if a first instruction of a second group of instructions retrieved in a second decode cycle (subsequent to the first decode cycle) is fusible with the last instruction of the first group. If so, the two instructions may be fused to a single micro-operation.Type: GrantFiled: October 19, 2023Date of Patent: August 5, 2025Assignee: Ampere Computing LLCInventors: Benjamin Crawford Chaffin, Bret Toll, Jacob Daniel Morgan, Michael Spradling, David Nuechterlein
-
Patent number: 12379927Abstract: Techniques for scale and reduction of BF16 data elements are described. An exemplary instruction includes fields for an having fields for an opcode, an identification of a location of a first packed data source operand, an identification of a location of a second packed data source operand, and an identification of a packed data destination operand, wherein the opcode is to indicate that execution circuitry is to perform, for each data element position of the packed data source operands, a floating point scale operation of a BF16 data element of the first packed data source by multiplying the data element by a power of 2 value, wherein a value of the exponent of the power of 2 value is a floor value of a BF16 data element of the second packed data source, and store a result of the floating point scale operation into a corresponding data element position of the packed data destination operand.Type: GrantFiled: August 31, 2021Date of Patent: August 5, 2025Assignee: Intel CorporationInventors: Menachem Adelman, Alexander Heinecke, Robert Valentine, Zeev Sperber, Amit Gradstein, Mark Charney, Evangelos Georganas, Dhiraj Kalamkar, Christopher Hughes, Cristina Anderson
-
Patent number: 12379928Abstract: This application relates to the field of computer technologies, and discloses methods and apparatuses, for example, for rectifying a weak memory ordering problem. An example method includes: determining a read/write instruction set in to-be-repaired code; classifying instructions in the read/write instruction set to determine a target instruction; and inserting a memory barrier instruction between a previous read/write instruction of the target instruction and the target instruction. The read/write instruction set includes a read instruction and/or a write instruction in the to-be-repaired code, and an instruction in the read/write instruction set is used for memory access.Type: GrantFiled: May 18, 2023Date of Patent: August 5, 2025Assignee: Huawei Technologies Co., Ltd.Inventors: Di Yu, Yandong Lv, Rutao Zhang
-
Patent number: 12340224Abstract: Embodiments of instructions are detailed herein including one or more of 1) a branch fence instruction, prefix, or variants (BFENCE); 2) a predictor fence instruction, prefix, or variants (PFENCE); 3) an exception fence instruction, prefix, or variants (EFENCE); 4) an address computation fence instruction, prefix, or variants (AFENCE); 5) a register fence instruction, prefix, or variants (RFENCE); and, additionally, modes that apply the above semantics to some or all ordinary instructions.Type: GrantFiled: May 4, 2023Date of Patent: June 24, 2025Assignee: Intel CorporationInventors: Robert S. Chappell, Jason W. Brandt, Alan Cox, Asit Mallick, Joseph Nuzman, Arjan Van De Ven
-
Patent number: 12333307Abstract: An approach is provided for managing near-memory processing commands (“PIM commands”) from multiple processor threads in a manner to prevent interference and maintain correctness at near-memory processing elements. A memory controller uses thread identification information and last command information to issue a PIM command sequence from a first processor thread, directed to a PIM-enabled memory element, while deferring the issuance of PIM command sequences from other processor threads, directed to the same PIM-enabled memory element. After the last PIM command in the PIM command sequence for the first processor thread has been issued, a PIM command sequence for another processor thread is issued, and so on. The approach allows multiple processor threads to concurrently issue fine grained PIM commands to the same PIM-enabled memory element without having to be aware of address-to-memory element mapping, and without having to coordinate with other threads.Type: GrantFiled: June 29, 2022Date of Patent: June 17, 2025Assignee: Advanced Micro Devices, Inc.Inventors: Johnathan Alsop, Laurent S. White, Shaizeen Aga
-
Patent number: 12321782Abstract: In a storage system in which processor cores are exclusively allocated to run process threads of individual emulations, the allocations of cores to emulations are dynamically reconfigured based on forecasted workload. A workload configuration model is created by testing different core allocation permutations with different workloads. The best performing permutations are stored in the model as workload configurations. The workload configurations are characterized by counts of tasks required to service the workloads. Actual task counts are monitored during normal operation and used to forecast changes in actual task counts. The forecasted task counts are compared with the task counts of the workload configurations of the model to select the best match. Allocation of cores is reconfigured to the best match workload configuration.Type: GrantFiled: June 2, 2023Date of Patent: June 3, 2025Assignee: Dell Products L.P.Inventors: Owen Martin, Ramesh Doddaiah, Michael Scharland
-
Patent number: 12314724Abstract: An arithmetic processing device executes instructions by pipeline processing. In the arithmetic processing device, a branch predictor includes a prediction holder holding a prediction value of a consecutively taken branch count, a current taken branch count, and a prediction value of a remaining taken branch count for conditional branch instructions. A branch instruction issuance scheduler outputs an instruction refetch request upon a branch prediction miss of a conditional branch instruction held at an entry other than a head of a branch instruction completion queue holding branch instructions to be completed, and a repair request to the branch predictor for conditional branch instructions held between the entry and the head of the queue. In response to the requests, the branch predictor updates the prediction value of the remaining taken branch count for the conditional branch instructions corresponding to the repair request.Type: GrantFiled: June 29, 2023Date of Patent: May 27, 2025Assignee: FUJITSU LIMITEDInventor: Ryohei Okazaki
-
Patent number: 12299444Abstract: A system includes a memory and a processor coupled to the memory. The processor executes an instruction set having a word size. The processor includes arithmetic processing circuitry, which, in operation, executes arithmetic operations on operands having the word size. The arithmetic processing circuitry includes an arithmetic logic circuit (ALU) having an operand size smaller than the word size of the instruction set. The ALU, in operation, generates partial results of the arithmetic operations. A multiplexing network coupled to inputs of the ALU provides portions of the operands to the ALU. A shift register having the word size of the instruction set accumulates partial results generated by the ALU over a plurality of clock cycles and outputs results of the arithmetic operations based on the accumulated partial results.Type: GrantFiled: May 24, 2023Date of Patent: May 13, 2025Assignee: STMicroelectronics International N.V.Inventor: Sofiane Landi
-
Patent number: 12299576Abstract: Disclosed is a neural network-based inference method and apparatus. The neural network-based inference method includes compressing a matrix comprising processing elements corresponding to an operation of a neural network, balancing workloads related to the operation by reordering the compressed matrix based on the workloads, and performing inference based on the reordered matrix.Type: GrantFiled: June 9, 2021Date of Patent: May 13, 2025Assignees: Samsung Electronics Co., Ltd., University of ZurichInventors: Chang Gao, Shih-Chii Liu, Tobi Delbruck, Xi Chen
-
Patent number: 12293185Abstract: Apparatuses and methods for branch prediction are provided. Branch prediction circuitry generates prediction with respect to branch instructions of whether those branches will be taken or not-taken. Hypervector generation circuitry assigns an arbitrary hypervector in deterministic dependence on an address of each branch instruction, wherein the hypervectors comprises at least 500 bits. Upon the resolution of a branch a corresponding hypervector is added to a stored taken hypervector or a stored not-taken hypervector in dependence on the resolution of the branch. The branch prediction circuitry generates a prediction for a branch instructions in dependence on a mathematical distance metric of a hypervector generated for that branch instruction from the stored taken hypervector or the not-taken hypervector.Type: GrantFiled: November 26, 2020Date of Patent: May 6, 2025Assignee: Arm LimitedInventors: Ilias Vougioukas, Andreas Lars Sandberg, Nikos Nikoleris
-
Patent number: 12293187Abstract: Disclosed in some examples, are methods, systems, devices, and machine-readable mediums which provide for more efficient CGRA execution by assigning different initiation intervals to different PEs executing a same code base. The initiation intervals may be a multiple of each other and the PE with the lowest initiation interval may be used to execute instructions of the code that is to be executed at a greater frequency than other instructions than other instructions that may be assigned to PEs with higher initiation intervals.Type: GrantFiled: November 30, 2023Date of Patent: May 6, 2025Assignee: Micron Technology, Inc.Inventors: Douglas Vanesko, Tony M. Brewer
-
Patent number: 12288062Abstract: Disclosed embodiments relate to instructions for fused multiply-add (FMA) operations with variable-precision inputs. In one example, a processor to execute an asymmetric FMA instruction includes fetch circuitry to fetch an FMA instruction having fields to specify an opcode, a destination, and first and second source vectors having first and second widths, respectively, decode circuitry to decode the fetched FMA instruction, and a single instruction multiple data (SIMD) execution circuit to process as many elements of the second source vector as fit into an SIMD lane width by multiplying each element by a corresponding element of the first source vector, and accumulating a resulting product with previous contents of the destination, wherein the SIMD lane width is one of 16 bits, 32 bits, and 64 bits, the first width is one of 4 bits and 8 bits, and the second width is one of 1 bit, 2 bits, and 4 bits.Type: GrantFiled: December 28, 2023Date of Patent: April 29, 2025Assignee: Intel CorporationInventors: Dipankar Das, Naveen K. Mellempudi, Mrinmay Dutta, Arun Kumar, Dheevatsa Mudigere, Abhisek Kundu
-
Patent number: 12288066Abstract: Techniques are disclosed that relate to fusing operations for execution of certain instructions. A processor may include a first execution circuit, of a first type, coupled to a first register file, a second execution circuit, of a second type, coupled to a second register file and a load/store circuit coupled to the first and second register files. The load/store circuit includes an issue port configured to receive an instruction operation for execution, a memory execution circuit configured to execute memory access operations, and a register transfer execution circuit. The register transfer execution circuit is configured to execute instruction operations specifying data transfer from the first register file to the second register file and an operation to be performed using the data, and the load/store circuit is configured to direct a given instruction operation from the issue port to one of the memory execution circuit or the register transfer execution circuit.Type: GrantFiled: May 18, 2023Date of Patent: April 29, 2025Assignee: Apple Inc.Inventors: Zhaoxiang Jin, Francesco Spadini, Skanda K. Srinivasa, Milos Becvar
-
Patent number: 12288073Abstract: An apparatus is provided for limiting the effective utilisation of an instruction fetch queue. The instruction fetch entries are used to control the prefetching of instructions from memory, such that those instructions are stored in an instruction cache prior to being required by execution circuitry while executing a program. By limiting the effective utilisation of the instruction fetch queue, fewer instructions will be prefetched and fewer instructions will be allocated to the instruction cache, thus causing fewer evictions from the instruction cache. In the event that the instruction fetch entries are for instructions that are unnecessary to the program, the pollution of the instruction cache with these unnecessary instructions can be mitigated.Type: GrantFiled: April 3, 2023Date of Patent: April 29, 2025Assignee: Arm LimitedInventors: Chang Joo Lee, Jason Lee Setter, Julia Kay Lanier, Michael Brian Schinzler, Yasuo Ishii
-
Patent number: 12288098Abstract: Approaches presented herein provide for the optimization of tasks performed for an operation such as the rendering of an image. A Frame Interceptor (FI) can generate a resource dependency graph (RDG) by intercepting API calls during the rendering process and determining dependencies. FI can analyze the RDG to identify potential optimizations, such as may correspond to reordering or parallel execution of certain tasks. FI can automatically test optimizations to determine whether sufficient improvement is obtained. This testing can be performed in real time by replacing the originally intercepted API calls with the newly ordered API calls generated by FI. FI can then issue a report that indicates information such as the changes made, the time taken to render the image, and potentially the fact that the images were determined to be identical.Type: GrantFiled: June 7, 2023Date of Patent: April 29, 2025Assignee: Nvidia CorporationInventor: Michael Murphy
-
Patent number: 12282377Abstract: A hardware controller within a core of a processor is described. The hardware controller includes telemetry logic to generate telemetry data that indicates an activity state of the core; core stall detection logic to determine, based on the telemetry data from the telemetry logic, whether the core is in an idle loop state; and a power controller that, in response to the core stall detection logic determining that the core is in the idle loop state, is to decrease a power mode of the core from a first power mode associated with a first set of power settings to a second power mode associated with a second set of power settings.Type: GrantFiled: June 25, 2021Date of Patent: April 22, 2025Assignee: Intel CorporationInventors: Pritesh P. Shah, Suresh Chemudupati, Alexander Gendler, David Hunt, Christopher M. Macnamara, Ofer Nathan, Adwait Purandare, Ankush Varma
-
Patent number: 12254320Abstract: Disclosed are a method and system for processing an instruction timeout, a device and a storage medium. The method includes: in response to a timeout of an original instruction sent by a host end reaching a first threshold value, sending an abort instruction, and detecting whether the abort instruction times out; in response to the abort instruction timing out and the timeout of the original instruction reaching a second threshold value, sending a reset instruction to reset a target end; in response to the reset instruction timing out and the timeout of the original instruction reaching a maximum threshold value, removing the target end, and determining whether the original instruction is blocked at the target end; and in response to the original instruction not being blocked at the target end, returning an instruction error prompt to the host end.Type: GrantFiled: September 29, 2021Date of Patent: March 18, 2025Assignee: INSPUR SUZHOU INTELLIGENT TECHNOLOGY CO., LTD.Inventor: Zhongke Yuan
-
Patent number: 12254319Abstract: Systems, methods, and apparatuses relating to circuitry to implement toggle point insertion for a clustered decode pipeline are described.Type: GrantFiled: September 24, 2021Date of Patent: March 18, 2025Assignee: Intel CorporationInventors: Sundararajan Ramakrishnan, Jonathan Combs, Martin J. Licht, Santhosh Srinath
-
Patent number: 12248789Abstract: Techniques are provided for executing wavefronts. The techniques include at a first time for issuing instructions for execution, performing first identifying, including identifying that sufficient processing resources exist to execute a first set of instructions together within a processing lane; in response to the first identifying, executing the first set of instructions together; at a second time for issuing instructions for execution, performing second identifying, including identifying that no instructions are available for which sufficient processing resources exist for execution together within the processing lane; and in response to the second identifying, executing an instruction independently of any other instruction.Type: GrantFiled: April 28, 2023Date of Patent: March 11, 2025Assignee: Advanced Micro Devices, Inc.Inventor: Maxim V. Kazakov
-
Patent number: 12248788Abstract: Distributed shared memory (DSMEM) comprises blocks of memory that are distributed or scattered across a processor (such as a GPU). Threads executing on a processing core local to one memory block are able to access a memory block local to a different processing core. In one embodiment, shared access to these DSMEM allocations distributed across a collection of processing cores is implemented by communications between the processing cores. Such distributed shared memory provides very low latency memory access for processing cores located in proximity to the memory blocks, and also provides a way for more distant processing cores to also access the memory blocks in a manner and using interconnects that do not interfere with the processing cores' access to main or global memory such as hacked by an L2 cache.Type: GrantFiled: March 10, 2022Date of Patent: March 11, 2025Assignee: NVIDIA CorporationInventors: Prakash Bangalore Prabhakar, Gentaro Hirota, Ronny Krashinsky, Ze Long, Brian Pharris, Rajballav Dash, Jeff Tuckey, Jerome F. Duluk, Jr., Lacky Shah, Luke Durant, Jack Choquette, Eric Werness, Naman Govil, Manan Patel, Shayani Deb, Sandeep Navada, John Edmondson, Greg Palmer, Wish Gandhi, Ravi Manyam, Apoorv Parle, Olivier Giroux, Shirish Gadre, Steve Heinrich