Patents Examined by Eric Coleman
  • Patent number: 12386620
    Abstract: The invention discloses a processor and a method for executing an instruction with a processor. The processor comprises a set of tiny register files, each of which is connected correspondingly to one of the set of register files and is configured to temporarily store the operand and the output result of the instruction executed by the plurality of physical threads; and an operand collector, which is connected to the set of register files and to the set of tiny register files and is configured to read the operand of the instruction executed by the plurality of physical threads from the set of register files and/or from the set of tiny register files and write the output result of the instruction executed by the plurality of physical threads to the set of register files and/or to the set of tiny register files.
    Type: Grant
    Filed: December 9, 2021
    Date of Patent: August 12, 2025
    Assignee: METAX INTEGRATED CIRCUITS (SHANGHAI) CO., LTD.
    Inventor: Ying Li
  • Patent number: 12373182
    Abstract: The technology disclosed provides a system that comprises a processor with computing units on an integrated circuit substrate. The processor is configured to map a program across multiple hardware stages with each hardware stage executing a corresponding operation of the program at a different stage latency dependent on an operation type and an operand format. The system further comprises a runtime logic that configures the compute units with configuration data. The configuration data causes first and second producer hardware stages in a given compute unit to execute first and second data processing operations and produce first and second outputs at first and second stage latencies, and synchronizes consumption of the first and second outputs by a consumer hardware stage in the given compute unit for execution of a third data processing operation by introducing a register storage delay that compensates for a difference between the first and second stage latencies.
    Type: Grant
    Filed: December 27, 2022
    Date of Patent: July 29, 2025
    Assignee: SambaNova Systems, Inc.
    Inventors: Weiwei Chen, Raghu Prabhakar, David Alan Koeplinger
  • Patent number: 12360941
    Abstract: Disclosed in some examples, are methods, systems, programmable atomic units, and machine-readable mediums that provide an exception as a response to the calling processor. That is, the programmable atomic unit will send a response to the calling processor. The calling processor will recognize that the exception has been raised and will handle the exception. Because the calling processor knows which process triggered the exception, the calling processor (e.g., the Operating System) can take appropriate action, such as terminating the calling process. The calling processor may be a same processor as that executing the programmable atomic transaction, or a different processor (e.g., on a different chiplet).
    Type: Grant
    Filed: October 24, 2023
    Date of Patent: July 15, 2025
    Assignee: Micron Technology, Inc.
    Inventor: Tony Brewer
  • Patent number: 12353916
    Abstract: Provided is a method for performing computations near memory, the method including receiving, at a processor core of a storage device, a request to perform a first function on first data, the first function including a first operation and a second operation, performing, by a first processor-core acceleration engine of the storage device, the first operation on the first data, based on first processor-core custom instructions, to generate first result data, and performing, by a first co-processor acceleration engine of the storage device, the second operation on the first result data, based on first co-processor custom instructions.
    Type: Grant
    Filed: June 2, 2023
    Date of Patent: July 8, 2025
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Jonghyeon Kim, Soogil Jeong
  • Patent number: 12346695
    Abstract: Techniques for copying a subset of status flags from a control and status register to a flags register in response to an instruction are described. An exemplary instruction includes a field for an opcode, the opcode to indicate execution circuitry is to copy from a first register a saturation flag value, an overflow value, and a carry value to a second register into one or more instructions of a different instruction set.
    Type: Grant
    Filed: September 25, 2021
    Date of Patent: July 1, 2025
    Assignee: Intel Corporation
    Inventors: Vedvyas Shanbhogue, Robert Valentine, Mark Charney, Venkateswara Madduri
  • Patent number: 12340226
    Abstract: Apparatus and methods for vector instruction cracking after scalar dispatch are described. An integrated circuit includes a primary pipeline and a vector pipeline. The primary pipeline is configured to determine a type of instruction, responsive to a determination that the instruction is a vector instruction, create a reorder buffer entry in a reorder buffer for the vector instruction prior to out-of-order processing in the primary pipeline, and send the vector instruction to a vector pipeline. The vector pipeline is configured to process the vector instruction.
    Type: Grant
    Filed: September 18, 2023
    Date of Patent: June 24, 2025
    Assignee: SiFive, Inc.
    Inventor: Kathlene Rose Magnus
  • Patent number: 12327139
    Abstract: An apparatus for accelerating neural networks, includes: a memory for storing graph input data including vertices and edges; an aggregation engine that processes the accumulation of features and generates feature vectors by taking the graph input data and performing an aggregation operation on the graph input data; an on-chip cache for caching the feature vectors; and a combination engine that generates a systolic array for matrix multiplications based on the feature vectors taken from the on-chip cache and weights taken from the memory.
    Type: Grant
    Filed: September 27, 2023
    Date of Patent: June 10, 2025
    Assignee: UNIVERSITY INDUSTRY FOUNDATION, YONSEI UNIVERSITY
    Inventors: Youngsok Kim, Jinho Lee, Mingi Yoo, Jaeyong Song, Jounghoo Lee
  • Patent number: 12327123
    Abstract: An embodiment of an integrated circuit may comprise a return stack buffer (RSB), a speculative return stack buffer (SRSB), and circuitry coupled to the RSB and the SRSB, the circuitry to track a count until the SRSB is empty at a time of a prediction by a branch prediction unit, and return an output from the branch prediction unit that corresponds to one of the RSB and the SRSB based at least in part on the count until the SRSB is empty. Other embodiments are disclosed and claimed.
    Type: Grant
    Filed: June 21, 2021
    Date of Patent: June 10, 2025
    Assignee: Intel Corporation
    Inventors: Mathew Lowes, Martin Licht
  • Patent number: 12314217
    Abstract: Techniques are disclosed for the use of a hybrid architecture that combines a programmable processing array and a hardware accelerator. The hybrid architecture dedicates the most computationally intensive blocks to the hardware accelerator, while maintaining flexibility for additional computations to be performed by the programmable processing array. An interface is also described for coupling the processing array to the hardware accelerator, which achieves a division of functionality and connects the programmable processing array components to the hardware accelerator components without sacrificing flexibility. This results in a balance between power/area and flexibility.
    Type: Grant
    Filed: December 23, 2021
    Date of Patent: May 27, 2025
    Assignee: Intel Corporation
    Inventors: Zoran Zivkovic, Kameran Azadet, Kannan Rajamani, Thomas Smith
  • Patent number: 12299413
    Abstract: A processing system executes wavefronts at multiple arithmetic logic unit (ALU) pipelines of a single instruction multiple data (SIMD) unit in a single execution cycle. The ALU pipelines each include a number of ALUs that execute instructions on wavefront operands that are collected from vector general process register (VGPR) banks at a cache and output results of the instructions executed on the wavefronts at a buffer. By storing wavefronts supplied by the VGPR banks at the cache, a greater number of wavefronts can be made available to the SIMD unit without increasing the VGPR bandwidth, enabling multiple ALU pipelines to execute instructions during a single execution cycle.
    Type: Grant
    Filed: January 16, 2024
    Date of Patent: May 13, 2025
    Assignee: ADVANCED MICRO DEVICES, INC.
    Inventors: Bin He, Brian Emberling, Mark Leather, Michael Mantor
  • Patent number: 12293184
    Abstract: An illegal address mask method for cores of a DSP includes: S1, initializing a core of a DSP; S2, configuring a start address register and an end address register, and taking an address range defined by the start address register and the end address register as a masked address range; configuring a first comparator and a second comparator to send out illegal address decision signals for instructions within the masked address range; S3, acquiring a PC pointer, and determining whether the PC pointer is located in the masked address range; if so, sending out an illegal address decision signal to stop an operation; if not, performing pre-decoding to obtain a memory access instruction; and S4, determining whether an address of the memory access instruction is located in the masked address range; if so, sending out an illegal address decision signal to stop an operation; otherwise, completing a memory access operation.
    Type: Grant
    Filed: December 27, 2024
    Date of Patent: May 6, 2025
    Assignee: Jiangsu Huachuang Microsystem Company Limited
    Inventors: Haibin Zhou, Guoqiang He, Wenjun Han, Ming Hao
  • Patent number: 12287756
    Abstract: A systolic array cell is described, the cell including two general-purpose arithmetic logic units (ALUs) and register-file. A plurality of the cells may be configured in a matrix or array, such that the output of the first ALU in a first cell is provided to a second cell to the right of the first cell, and the output of the second ALU in the first cell is provided to a third cell below the first cell. The two ALUs in each cell of the array allow for processing of a different instruction in each cycle.
    Type: Grant
    Filed: October 4, 2023
    Date of Patent: April 29, 2025
    Assignee: GOOGLE LLC
    Inventors: Reginald Clifford Young, Trevor Gale, Sushma Honnavara-Prasad, Paolo Mantovani
  • Patent number: 12288068
    Abstract: An instruction simulation device and a method thereof are provided. The instruction simulation device includes a processor. The processor includes an instruction decoder which generates format information of a ready-for-execution instruction. The processor determines whether the ready-for-execution instruction currently executed by the processor is a compatible instruction or an extended instruction based on the format information of the ready-for-execution instruction. If the ready-for-execution instruction is an extended instruction under the new instruction set or the extended instruction set, the processor converts the ready-for-execution instruction into a simulation program corresponding to the extended instruction, and simulates an execution result of the ready-for-execution instruction by executing the simulation program. The simulation program is composed of at least one compatible instructions of the processor.
    Type: Grant
    Filed: September 12, 2023
    Date of Patent: April 29, 2025
    Assignee: Shanghai Zhaoxin Semiconductor Co., Ltd.
    Inventors: Weilin Wang, Yingbing Guan, Mengchen Yang
  • Patent number: 12282773
    Abstract: Embodiments detailed herein relate to matrix (tile) operations. For example, decode circuitry to decode an instruction having fields for an opcode and a memory address; and execution circuitry to execute the decoded instruction to set a tile configuration for the processor to utilize tiles in matrix operations based on a description retrieved from the memory address, wherein a tile a set of 2-dimensional registers are discussed.
    Type: Grant
    Filed: December 8, 2023
    Date of Patent: April 22, 2025
    Assignee: Intel Corporation
    Inventors: Menachem Adelman, Robert Valentine, Zeev Sperber, Mark J. Charney, Bret L. Toll, Rinat Rappoport, Jesus Corbal, Dan Baum, Alexander F. Heinecke, Elmoustapha Ould-Ahmed-Vall, Yuri Gebil, Raanan Sade
  • Patent number: 12271339
    Abstract: Embodiments are directed to a processor having a functional slice architecture. The processor is divided into tiles (or functional units) organized into a plurality of functional slices. The functional slices are configured to perform specific operations within the processor, which includes memory slices for storing operand data and arithmetic logic slices for performing operations on received operand data (e.g., vector processing, matrix manipulation). The processor includes a plurality of functional slices of a module type, each functional slice having a plurality of tiles. The processor further includes a plurality of data transport lanes for transporting data in a direction indicated in a corresponding instruction. The processor also includes a plurality of instruction queues, each instruction queue associated with a corresponding functional slice of the plurality of functional slices, wherein the instructions in the instruction queues comprise a functional slice specific operation code.
    Type: Grant
    Filed: October 9, 2023
    Date of Patent: April 8, 2025
    Assignee: Groq, Inc.
    Inventors: Dennis Charles Abts, Jonathan Alexander Ross, John Thompson, Gregory Michael Thorson
  • Patent number: 12260215
    Abstract: In-memory computing circuits can be used to determine distances between vectors. Such circuits can be used for machine learning applications. Examples include obtaining at least one dimension of a query vector wherein the dimension includes one or more bits and comparing respective bits of the dimension to corresponding bits of at least one dimension of a reference vector. This obtains a control signal dependent upon whether the bits of the dimension of the query vector are the same as corresponding bits of the dimension of the reference vector. The control signal can then be used to control a pulse modifying circuit such that a modification applied to a pulse signal is dependent upon whether the bits of the dimension of the query vector are the same as corresponding bits of the dimension of the reference vector.
    Type: Grant
    Filed: May 22, 2023
    Date of Patent: March 25, 2025
    Assignee: Nokia Technologies Oy
    Inventor: Marijan Herceg
  • Patent number: 12248429
    Abstract: A computer comprising a plurality of interconnected processing nodes arranged in a configuration in which multiple layers of interconnected nodes are arranged along an axis, each layer comprising at least four processing nodes connected in a non-axial ring by at least respective intralayer link between each pair of neighbouring processing nodes, wherein each of the at least four processing nodes in each layer is connected to a respective corresponding node in one or more adjacent layer by a respective interlayer link, the computer being programmed to provide in the configuration two embedded one dimensional paths and to transmit data around each of the two embedded one dimensional paths, each embedded one dimensional path using all processing nodes of the computer in such a manner that the two embedded one dimensional paths operate simultaneously without sharing links.
    Type: Grant
    Filed: March 17, 2023
    Date of Patent: March 11, 2025
    Assignee: GRAPHCORE LIMITED
    Inventor: Simon Knowles
  • Patent number: 12242894
    Abstract: A device can be used to implement a neural network in hardware. The device can include a processor, a memory, and a neural network accelerator. The neural network accelerator can be configured to implement, in hardware, a neural network by using a residue number system (RNS). At least one function of the neural network can have a corresponding approximation in the RNS system, and the at least one function can be provided by implementing the corresponding approximation in hardware.
    Type: Grant
    Filed: March 31, 2023
    Date of Patent: March 4, 2025
    Assignee: Khalifa University of Science and Technology
    Inventors: Athanasios Stouraitis, Sakellariou Vasileios, Vasileios Paliouras, Ioannis Kouretas, Hani Saleh
  • Patent number: 12236238
    Abstract: An apparatus to facilitate large integer multiplication enhancements in a graphics environment is disclosed. The apparatus includes a processor comprising processing resources, the processing resources comprising multiplier circuitry to: receive operands for a multiplication operation, wherein the multiplication operation is part of a chain of multiplication operations for a large integer multiplication; and issue a multiply and add (MAD) instruction for the multiplication operation utilizing at least one of a double precision multiplier or a 48 bit output, wherein the MAD instruction to generate an output in a single clock cycle of the processor.
    Type: Grant
    Filed: June 25, 2021
    Date of Patent: February 25, 2025
    Assignee: INTEL CORPORATION
    Inventors: Supratim Pal, Li-An Tang, Changwon Rhee, Timothy R. Bauer, Alexander Lyashevsky, Jiasheng Chen
  • Patent number: 12229558
    Abstract: A processor includes a front end, an execution unit, a retirement stage, a counter, and a performance monitoring unit. The front end includes logic to receive an event instruction to enable supervision of a front end event that will delay execution of instructions. The execution unit includes logic to set a register with parameters for supervision of the front end event. The front end further includes logic to receive a candidate instruction and match the candidate instruction to the front end event. The counter includes logic to generate the front end event upon retirement of the candidate instruction.
    Type: Grant
    Filed: September 22, 2023
    Date of Patent: February 18, 2025
    Assignee: Intel Corporation
    Inventor: Ahmad Yasin