Patents Examined by Eric Coleman
  • Patent number: 12639398
    Abstract: Described herein is a graphics processor including a plurality of processing clusters coupled with a host interface, each processing cluster comprising a plurality of multiprocessors, the plurality of multiprocessors interconnected via a data interconnect, and each multiprocessor comprising sparse matrix multiply acceleration hardware including a systolic processing array with feedback inputs.
    Type: Grant
    Filed: June 27, 2024
    Date of Patent: May 26, 2026
    Assignee: Intel Corporation
    Inventors: Subramaniam Maiyuran, Jorge Parra, Supratim Pal, Ashutosh Garg, Shubra Marwaha, Chandra Gurram, Darin Starkey, Durgesh Borkar, Varghese George
  • Patent number: 12608211
    Abstract: A processor with instruction storage configured to store processor instructions, data storage configured to store processor data representing an array, the array including plural data elements, a controller, and an instruction pipeline. The instruction pipeline includes: a load stage circuit configured to load an array element from the data storage, a compare stage circuit configured to compare the array element to a reference value, a store stage circuit configured to store a set of results that includes a result of the comparison of the array element to the reference value, and a loop hit detect stage circuit configured to determine whether any of the set of results is associated with a hit on the reference value.
    Type: Grant
    Filed: May 22, 2024
    Date of Patent: April 21, 2026
    Assignee: Texas Instruments Incorporated
    Inventors: Alan Davis, Venkatesh Natarajan, Alexander Tessarolo
  • Patent number: 12591633
    Abstract: A processing device includes a two-dimensional array of processing elements, each processing element including an arithmetic logic unit to perform an operation. The device further includes interconnections among the two-dimensional array of processing elements to provide direct communication among neighboring processing elements of the two-dimensional array of processing elements. A processing element of the two-dimensional array of processing elements is connected to a first neighbor processing element that is immediately adjacent the processing element in a first dimension of the two-dimensional array. The processing element is further connected to a second neighbor processing element that is immediately adjacent the processing element in a second dimension of the two-dimensional array.
    Type: Grant
    Filed: September 10, 2024
    Date of Patent: March 31, 2026
    Assignee: UNTETHER AI CORPORATION
    Inventor: William Martin Snelgrove
  • Patent number: 12591539
    Abstract: An array of interconnected processing elements is modelled as a graph of nodes. Each layer of the graph represents a possible arrangement of data elements within the array of interconnected processing elements. An edge between nodes of adjacent layers of the graph represents a movement of a data element between the nodes. Constraints are set for a starting arrangement of the data elements stored in the array, an ending arrangement of the data elements stored in the array, and a limit for each node of the graph to have one input edge from a previous layer and one output edge to a subsequent layer. The model and constraints are processed with an integer programming solver to obtain a program of movements of data elements among the interconnected processing elements. The program implements a rearrangement of the data elements from the starting arrangement to the ending arrangement.
    Type: Grant
    Filed: January 26, 2024
    Date of Patent: March 31, 2026
    Assignee: UNTETHER AI CORPORATION
    Inventors: John Kitamura, Andrew Vincent Rock, William Martin Snelgrove
  • Patent number: 12585467
    Abstract: A Data Flow processor (DFP or DFPU) including a plurality of loop registers to effectively control the processing of multi-dimensional data; and a plurality of stride registers that are operable for precise addressing of data, enabling comprehensive management of data fetching, execution, accumulation, and result write-back processes; and wherein the processor is configured to be operable for high-dimensional computing.
    Type: Grant
    Filed: May 14, 2024
    Date of Patent: March 24, 2026
    Inventors: Joshua Huang, Hsilin Huang
  • Patent number: 12579098
    Abstract: Methods, apparatus, systems, and articles of manufacture to process web-scale graphs are disclosed. An example apparatus comprises: at least one memory; instructions; and processor circuitry to execute the instructions to: retrieve a compute based tile (CBT) from a first external memory, the CBT to include source and destination nodes of a graph; assign a stripe of the CBT to a single instruction multiple data compute unit, the stripe including a first tile and a second tile, the first tile to include first destination nodes and first source nodes, the second tile to include the first destination nodes and second source nodes; retrieve source node embeddings of the stripe based on a node identifier to source node embedding lookup; and provide the source node embeddings to the single instruction multiple data compute unit.
    Type: Grant
    Filed: March 31, 2022
    Date of Patent: March 17, 2026
    Assignee: Intel Corporation
    Inventors: Tarjinder Singh Munday, Vidhya Thyagarajan, Santebennur Ramakrishnachar Sridhar, Jagan Jeyaraj, C Ranga Sumiran
  • Patent number: 12572357
    Abstract: Systems and methods related to vector mask buffers in a vector instruction execution pipeline are disclosed herein. The vector instruction execution pipeline may include several lanes. Each lane may include a vector register file, a vector mask buffer, and a functional processing unit. The vector register file may store operand data and the vector mask buffer may store a vector mask associated with the operand data. In a lane, the operand data may be read from the register file into a functional processing unit, and the vector mask may be read from the vector mask buffer to the functional processing unit. The functional processing unit may process the operand data based on the vector mask. The lane-specific vector mask buffers improve the efficiency of the vector instruction execution pipeline by storing the vector masks proximate to where the vector masks will be used.
    Type: Grant
    Filed: June 20, 2024
    Date of Patent: March 10, 2026
    Inventors: Dongjie Xie, Alexander C. Rucker
  • Patent number: 12572359
    Abstract: Techniques for performing square root or reciprocal square root calculations on FP8 data elements in response to an instruction are described. An example of an instruction is one that includes fields for an opcode, an identification of a location of a packed data source operand, and an identification of a packed data destination operand, wherein the opcode is to indicate that execution circuitry is to perform, for each data element position of the packed data source operand, a calculation of a square root value of a FP8 data element in that position and store a result of each square root into a corresponding data element position of the packed data destination operand.
    Type: Grant
    Filed: October 1, 2022
    Date of Patent: March 10, 2026
    Assignee: Intel Corporation
    Inventors: Alexander Heinecke, Menachem Adelman, Evangelos Georganas, Amit Gradstein, Christopher Hughes, Naveen Mellempudi, Simon Rubanovich, Uri Sherman, Zeev Sperber
  • Patent number: 12566608
    Abstract: Disclosed is an apparatus comprising: instruction decoding circuitry; data storage; and processing circuitry to process data responsive to an instruction decoded by instruction decoding circuitry configured to, responsive to a data transfer instruction specifying a data source and a region of the source to perform data transfer, control processing circuitry to: when the data transfer operation comprises an out-of-bounds memory access corresponding to an attempt to read data outside the indicated region of source storage, read data not associated with the out-of-bounds memory access from source storage and write data not associated with the out-of-bounds memory access to a first portion of target storage by overwriting preloaded values stored in the first portion of the target storage; and omit writing to a different second portion of the target storage data associated with the out-of-bounds memory access to preserve preloaded values stored in the second portion of target storage.
    Type: Grant
    Filed: May 10, 2024
    Date of Patent: March 3, 2026
    Assignee: Arm Limited
    Inventors: John David Robson, Kévin Petit
  • Patent number: 12554495
    Abstract: An apparatus of an aspect includes a plurality of cores and shared core extension logic coupled with each of the plurality of cores. The shared core extension logic has shared data processing logic that is shared by each of the plurality of cores. Instruction execution logic, for each of the cores, in response to a shared core extension call instruction, is to call the shared core extension logic. The call is to have data processing performed by the shared data processing logic on behalf of a corresponding core. Other apparatus, methods, and systems are also disclosed.
    Type: Grant
    Filed: September 6, 2024
    Date of Patent: February 17, 2026
    Assignee: Intel Corporation
    Inventors: Eran Shifer, Mostafa Hagog, Eliyahu Turiel
  • Patent number: 12554491
    Abstract: A computing system including a hardware accelerator configured to receive vector processing instructions from a processor. The vector processing instructions include an initial read address, an input increment size, and a vector processing operation. During vector processing iterations performed at vector processor tiles included in a vector processor tile array, the hardware accelerator reads vector elements into respective vector processor tiles in an input stream. The vector elements are read into the vector processor tiles from locations in memory that start at the initial read address and advance by the input increment size at successive vector processing iterations. At each of the vector processor tiles, the hardware accelerator computes a vector processing result at least in part by performing the vector processing operation on the vector element read into the vector processor tile. The hardware accelerator outputs the vector processing results from the vector processor tiles in an output stream.
    Type: Grant
    Filed: May 13, 2024
    Date of Patent: February 17, 2026
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Joel Ephraim Bud, Segev Ravgad
  • Patent number: 12554497
    Abstract: An apparatus, system, and method for efficiently processing pairs of operations repeatedly used in applications. In various implementations, a computing system includes a parallel data processing circuit with multiple compute circuits. Each of the compute circuits includes multiple lanes of execution, each with a corresponding arithmetic logic unit (ALU). The ALU supports executing a single fused conditional ternary instruction that replaces two separate instructions that provide two operations (comparison and add). When executing the fused conditional ternary instruction, the ALU does not retrieve the intermediate result from the scalar register file, the vector register file, or bypass circuitry located externally from the ALU. Rather, the ALU generates the intermediate result and uses the intermediate result without routing the intermediate result externally from ALU.
    Type: Grant
    Filed: March 15, 2024
    Date of Patent: February 17, 2026
    Assignee: Advanced Micro Devices, Inc.
    Inventor: Steven Isaac Reeves
  • Patent number: 12554499
    Abstract: In one embodiment, a computing system may set data to a first group of registers. The first group of registers may be configured to be accessed during a single operation cycle. The system may set a number of patterns to a second group of registers. Each pattern of the number of patterns may include an array of index for the data stored in the first group of registers. The system may select, for a first vector register associated with a vector engine, a first pattern from the patterns stored in the second group of registers. The system may load a first portion of the data from the first group of registers to the first vector register based on the first pattern selected for the first vector register from the patterns stored in the second group of registers.
    Type: Grant
    Filed: November 30, 2023
    Date of Patent: February 17, 2026
    Assignee: Meta Platforms Technologies, LLC
    Inventors: Tomonari Tohara, Vignesh Vivekraja, Alagappan Valliappan, Andrey Bushev, Javid Jaffari
  • Patent number: 12547411
    Abstract: A method performed in a processor, includes: receiving, in the processor, a branch instruction in the processing; determining, by the processor, an address of an instruction after the branch instruction as a candidate for speculative execution, the address including an object identification and an offset; and determining, by the processor, whether or not to perform speculative execution of the instruction after the branch instruction based on the object identification of the address.
    Type: Grant
    Filed: July 17, 2023
    Date of Patent: February 10, 2026
    Inventor: Steven Jeffrey Wallach
  • Patent number: 12541483
    Abstract: An integrated circuit (IC) chip receives an input signal on a bus connecting a number of IC chips in series. The IC chip is one of the number of IC chips. The IC chip performs a combining operation and an inverting operation on a signal produced by the IC chip and the input signal to generate an output signal. The IC chip sends the output signal to a next chip of the number of IC chips on the bus.
    Type: Grant
    Filed: June 30, 2023
    Date of Patent: February 3, 2026
    Assignee: Auradine, Inc.
    Inventors: David Carlson, Tao Xu
  • Patent number: 12541366
    Abstract: A method comprising comparing pairs of elements in an array. The method comprises generating two new vectors from an original array in memory, each comprising one part of each pair to be compared. The two new vectors are compared to generate a mask which indicates which of each pair of elements is less. Based on the mask, elements of the vectors can be swapped as necessary. This could be using a XOR algorithm or a merge algorithm. The vectors are then written back to the original array in memory. This process can be repeated on elements of an array as part of a bitonic sorting algorithm.
    Type: Grant
    Filed: December 9, 2023
    Date of Patent: February 3, 2026
    Assignee: Imagination Technologies Limited
    Inventor: Fabrizio Cabaleiro
  • Patent number: 12536106
    Abstract: A method for sorting of a vector in a processor is provided that includes performing, by the processor in response to a vector sort instruction, sorting of values stored in lanes of the vector to generate a sorted vector, wherein the values are sorted in an order indicated by the vector sort instruction, and storing the sorted vector in a storage location.
    Type: Grant
    Filed: July 3, 2024
    Date of Patent: January 27, 2026
    Assignee: TEXAS INSTRUMENTS INCORPORATED
    Inventors: Timothy David Anderson, Mujibur Rahman
  • Patent number: 12536133
    Abstract: A stacked processor-plus-memory device includes a processing die with an array of processing elements of an artificial neural network. Each processing element multiplies a first operand—e.g. a weight—by a second operand to produce a partial result to a subsequent processing element. To prepare for these computations, a sequencer loads the weights into the processing elements as a sequence of operands that step through the processing elements, each operand stored in the corresponding processing element. The operands can be sequenced directly from memory to the processing elements or can be stored first in cache. The processing elements include streaming logic that disregards interruptions in the stream of operands.
    Type: Grant
    Filed: April 2, 2024
    Date of Patent: January 27, 2026
    Assignee: Rambus Inc.
    Inventors: Steven C. Woo, Michael Raymond Miller
  • Patent number: 12536020
    Abstract: Embodiments detailed herein relate to matrix operations. In particular, the loading of a matrix (tile) from memory. For example, support for a loading instruction is described in at least a form of decode circuitry to decode an instruction having fields for an opcode, a source matrix operand identifier, and destination memory information, and execution circuitry to execute the decoded instruction to store each data element of configured rows of the identified source matrix operand to memory based on the destination memory information.
    Type: Grant
    Filed: February 5, 2024
    Date of Patent: January 27, 2026
    Assignee: Intel Corporation
    Inventors: Robert Valentine, Menachem Adelman, Elmoustapha Ould-Ahmed-Vall, Bret L. Toll, Milind B. Girkar, Zeev Sperber, Mark J. Charney, Rinat Rappoport, Jesus Corbal, Stanislav Shwartsman, Igor Yanover, Alexander F. Heinecke, Barukh Ziv, Dan Baum, Yuri Gebil, Raanan Sade
  • Patent number: 12530195
    Abstract: Techniques for instructions for min-max operations are described. An example apparatus comprises decoder circuitry to decode a single instruction, the single instruction to include fields for identifiers of a first source operand, a second source operand, an a destination operand, a field for an immediate operand, and a field for an opcode, the opcode to indicate execution circuitry is to perform a min-max operation, and execution circuitry to execute the decoded instruction according to the opcode to perform the min-max operation to determine a particular operation of five or more minimum and maximum operations in accordance with a value of the immediate operand, perform the determined particular operation on the identified first source operand and the identified second source operand to return a result, and store the result into the identified destination operand. Other examples are described and claimed.
    Type: Grant
    Filed: May 18, 2022
    Date of Patent: January 20, 2026
    Assignee: Intel Corporation
    Inventors: Menachem Adelman, Amit Gradstein, Cristina Anderson, Marius Cornea-Hasegan