Patents Examined by Eric Coleman
-
Patent number: 12386620Abstract: The invention discloses a processor and a method for executing an instruction with a processor. The processor comprises a set of tiny register files, each of which is connected correspondingly to one of the set of register files and is configured to temporarily store the operand and the output result of the instruction executed by the plurality of physical threads; and an operand collector, which is connected to the set of register files and to the set of tiny register files and is configured to read the operand of the instruction executed by the plurality of physical threads from the set of register files and/or from the set of tiny register files and write the output result of the instruction executed by the plurality of physical threads to the set of register files and/or to the set of tiny register files.Type: GrantFiled: December 9, 2021Date of Patent: August 12, 2025Assignee: METAX INTEGRATED CIRCUITS (SHANGHAI) CO., LTD.Inventor: Ying Li
-
Patent number: 12373182Abstract: The technology disclosed provides a system that comprises a processor with computing units on an integrated circuit substrate. The processor is configured to map a program across multiple hardware stages with each hardware stage executing a corresponding operation of the program at a different stage latency dependent on an operation type and an operand format. The system further comprises a runtime logic that configures the compute units with configuration data. The configuration data causes first and second producer hardware stages in a given compute unit to execute first and second data processing operations and produce first and second outputs at first and second stage latencies, and synchronizes consumption of the first and second outputs by a consumer hardware stage in the given compute unit for execution of a third data processing operation by introducing a register storage delay that compensates for a difference between the first and second stage latencies.Type: GrantFiled: December 27, 2022Date of Patent: July 29, 2025Assignee: SambaNova Systems, Inc.Inventors: Weiwei Chen, Raghu Prabhakar, David Alan Koeplinger
-
Patent number: 12360941Abstract: Disclosed in some examples, are methods, systems, programmable atomic units, and machine-readable mediums that provide an exception as a response to the calling processor. That is, the programmable atomic unit will send a response to the calling processor. The calling processor will recognize that the exception has been raised and will handle the exception. Because the calling processor knows which process triggered the exception, the calling processor (e.g., the Operating System) can take appropriate action, such as terminating the calling process. The calling processor may be a same processor as that executing the programmable atomic transaction, or a different processor (e.g., on a different chiplet).Type: GrantFiled: October 24, 2023Date of Patent: July 15, 2025Assignee: Micron Technology, Inc.Inventor: Tony Brewer
-
Patent number: 12353916Abstract: Provided is a method for performing computations near memory, the method including receiving, at a processor core of a storage device, a request to perform a first function on first data, the first function including a first operation and a second operation, performing, by a first processor-core acceleration engine of the storage device, the first operation on the first data, based on first processor-core custom instructions, to generate first result data, and performing, by a first co-processor acceleration engine of the storage device, the second operation on the first result data, based on first co-processor custom instructions.Type: GrantFiled: June 2, 2023Date of Patent: July 8, 2025Assignee: Samsung Electronics Co., Ltd.Inventors: Jonghyeon Kim, Soogil Jeong
-
Patent number: 12346695Abstract: Techniques for copying a subset of status flags from a control and status register to a flags register in response to an instruction are described. An exemplary instruction includes a field for an opcode, the opcode to indicate execution circuitry is to copy from a first register a saturation flag value, an overflow value, and a carry value to a second register into one or more instructions of a different instruction set.Type: GrantFiled: September 25, 2021Date of Patent: July 1, 2025Assignee: Intel CorporationInventors: Vedvyas Shanbhogue, Robert Valentine, Mark Charney, Venkateswara Madduri
-
Patent number: 12340226Abstract: Apparatus and methods for vector instruction cracking after scalar dispatch are described. An integrated circuit includes a primary pipeline and a vector pipeline. The primary pipeline is configured to determine a type of instruction, responsive to a determination that the instruction is a vector instruction, create a reorder buffer entry in a reorder buffer for the vector instruction prior to out-of-order processing in the primary pipeline, and send the vector instruction to a vector pipeline. The vector pipeline is configured to process the vector instruction.Type: GrantFiled: September 18, 2023Date of Patent: June 24, 2025Assignee: SiFive, Inc.Inventor: Kathlene Rose Magnus
-
Patent number: 12327139Abstract: An apparatus for accelerating neural networks, includes: a memory for storing graph input data including vertices and edges; an aggregation engine that processes the accumulation of features and generates feature vectors by taking the graph input data and performing an aggregation operation on the graph input data; an on-chip cache for caching the feature vectors; and a combination engine that generates a systolic array for matrix multiplications based on the feature vectors taken from the on-chip cache and weights taken from the memory.Type: GrantFiled: September 27, 2023Date of Patent: June 10, 2025Assignee: UNIVERSITY INDUSTRY FOUNDATION, YONSEI UNIVERSITYInventors: Youngsok Kim, Jinho Lee, Mingi Yoo, Jaeyong Song, Jounghoo Lee
-
Patent number: 12327123Abstract: An embodiment of an integrated circuit may comprise a return stack buffer (RSB), a speculative return stack buffer (SRSB), and circuitry coupled to the RSB and the SRSB, the circuitry to track a count until the SRSB is empty at a time of a prediction by a branch prediction unit, and return an output from the branch prediction unit that corresponds to one of the RSB and the SRSB based at least in part on the count until the SRSB is empty. Other embodiments are disclosed and claimed.Type: GrantFiled: June 21, 2021Date of Patent: June 10, 2025Assignee: Intel CorporationInventors: Mathew Lowes, Martin Licht
-
Patent number: 12314217Abstract: Techniques are disclosed for the use of a hybrid architecture that combines a programmable processing array and a hardware accelerator. The hybrid architecture dedicates the most computationally intensive blocks to the hardware accelerator, while maintaining flexibility for additional computations to be performed by the programmable processing array. An interface is also described for coupling the processing array to the hardware accelerator, which achieves a division of functionality and connects the programmable processing array components to the hardware accelerator components without sacrificing flexibility. This results in a balance between power/area and flexibility.Type: GrantFiled: December 23, 2021Date of Patent: May 27, 2025Assignee: Intel CorporationInventors: Zoran Zivkovic, Kameran Azadet, Kannan Rajamani, Thomas Smith
-
Patent number: 12299413Abstract: A processing system executes wavefronts at multiple arithmetic logic unit (ALU) pipelines of a single instruction multiple data (SIMD) unit in a single execution cycle. The ALU pipelines each include a number of ALUs that execute instructions on wavefront operands that are collected from vector general process register (VGPR) banks at a cache and output results of the instructions executed on the wavefronts at a buffer. By storing wavefronts supplied by the VGPR banks at the cache, a greater number of wavefronts can be made available to the SIMD unit without increasing the VGPR bandwidth, enabling multiple ALU pipelines to execute instructions during a single execution cycle.Type: GrantFiled: January 16, 2024Date of Patent: May 13, 2025Assignee: ADVANCED MICRO DEVICES, INC.Inventors: Bin He, Brian Emberling, Mark Leather, Michael Mantor
-
Patent number: 12293184Abstract: An illegal address mask method for cores of a DSP includes: S1, initializing a core of a DSP; S2, configuring a start address register and an end address register, and taking an address range defined by the start address register and the end address register as a masked address range; configuring a first comparator and a second comparator to send out illegal address decision signals for instructions within the masked address range; S3, acquiring a PC pointer, and determining whether the PC pointer is located in the masked address range; if so, sending out an illegal address decision signal to stop an operation; if not, performing pre-decoding to obtain a memory access instruction; and S4, determining whether an address of the memory access instruction is located in the masked address range; if so, sending out an illegal address decision signal to stop an operation; otherwise, completing a memory access operation.Type: GrantFiled: December 27, 2024Date of Patent: May 6, 2025Assignee: Jiangsu Huachuang Microsystem Company LimitedInventors: Haibin Zhou, Guoqiang He, Wenjun Han, Ming Hao
-
Patent number: 12287756Abstract: A systolic array cell is described, the cell including two general-purpose arithmetic logic units (ALUs) and register-file. A plurality of the cells may be configured in a matrix or array, such that the output of the first ALU in a first cell is provided to a second cell to the right of the first cell, and the output of the second ALU in the first cell is provided to a third cell below the first cell. The two ALUs in each cell of the array allow for processing of a different instruction in each cycle.Type: GrantFiled: October 4, 2023Date of Patent: April 29, 2025Assignee: GOOGLE LLCInventors: Reginald Clifford Young, Trevor Gale, Sushma Honnavara-Prasad, Paolo Mantovani
-
Patent number: 12288068Abstract: An instruction simulation device and a method thereof are provided. The instruction simulation device includes a processor. The processor includes an instruction decoder which generates format information of a ready-for-execution instruction. The processor determines whether the ready-for-execution instruction currently executed by the processor is a compatible instruction or an extended instruction based on the format information of the ready-for-execution instruction. If the ready-for-execution instruction is an extended instruction under the new instruction set or the extended instruction set, the processor converts the ready-for-execution instruction into a simulation program corresponding to the extended instruction, and simulates an execution result of the ready-for-execution instruction by executing the simulation program. The simulation program is composed of at least one compatible instructions of the processor.Type: GrantFiled: September 12, 2023Date of Patent: April 29, 2025Assignee: Shanghai Zhaoxin Semiconductor Co., Ltd.Inventors: Weilin Wang, Yingbing Guan, Mengchen Yang
-
Patent number: 12282773Abstract: Embodiments detailed herein relate to matrix (tile) operations. For example, decode circuitry to decode an instruction having fields for an opcode and a memory address; and execution circuitry to execute the decoded instruction to set a tile configuration for the processor to utilize tiles in matrix operations based on a description retrieved from the memory address, wherein a tile a set of 2-dimensional registers are discussed.Type: GrantFiled: December 8, 2023Date of Patent: April 22, 2025Assignee: Intel CorporationInventors: Menachem Adelman, Robert Valentine, Zeev Sperber, Mark J. Charney, Bret L. Toll, Rinat Rappoport, Jesus Corbal, Dan Baum, Alexander F. Heinecke, Elmoustapha Ould-Ahmed-Vall, Yuri Gebil, Raanan Sade
-
Patent number: 12271339Abstract: Embodiments are directed to a processor having a functional slice architecture. The processor is divided into tiles (or functional units) organized into a plurality of functional slices. The functional slices are configured to perform specific operations within the processor, which includes memory slices for storing operand data and arithmetic logic slices for performing operations on received operand data (e.g., vector processing, matrix manipulation). The processor includes a plurality of functional slices of a module type, each functional slice having a plurality of tiles. The processor further includes a plurality of data transport lanes for transporting data in a direction indicated in a corresponding instruction. The processor also includes a plurality of instruction queues, each instruction queue associated with a corresponding functional slice of the plurality of functional slices, wherein the instructions in the instruction queues comprise a functional slice specific operation code.Type: GrantFiled: October 9, 2023Date of Patent: April 8, 2025Assignee: Groq, Inc.Inventors: Dennis Charles Abts, Jonathan Alexander Ross, John Thompson, Gregory Michael Thorson
-
Patent number: 12260215Abstract: In-memory computing circuits can be used to determine distances between vectors. Such circuits can be used for machine learning applications. Examples include obtaining at least one dimension of a query vector wherein the dimension includes one or more bits and comparing respective bits of the dimension to corresponding bits of at least one dimension of a reference vector. This obtains a control signal dependent upon whether the bits of the dimension of the query vector are the same as corresponding bits of the dimension of the reference vector. The control signal can then be used to control a pulse modifying circuit such that a modification applied to a pulse signal is dependent upon whether the bits of the dimension of the query vector are the same as corresponding bits of the dimension of the reference vector.Type: GrantFiled: May 22, 2023Date of Patent: March 25, 2025Assignee: Nokia Technologies OyInventor: Marijan Herceg
-
Patent number: 12248429Abstract: A computer comprising a plurality of interconnected processing nodes arranged in a configuration in which multiple layers of interconnected nodes are arranged along an axis, each layer comprising at least four processing nodes connected in a non-axial ring by at least respective intralayer link between each pair of neighbouring processing nodes, wherein each of the at least four processing nodes in each layer is connected to a respective corresponding node in one or more adjacent layer by a respective interlayer link, the computer being programmed to provide in the configuration two embedded one dimensional paths and to transmit data around each of the two embedded one dimensional paths, each embedded one dimensional path using all processing nodes of the computer in such a manner that the two embedded one dimensional paths operate simultaneously without sharing links.Type: GrantFiled: March 17, 2023Date of Patent: March 11, 2025Assignee: GRAPHCORE LIMITEDInventor: Simon Knowles
-
Patent number: 12242894Abstract: A device can be used to implement a neural network in hardware. The device can include a processor, a memory, and a neural network accelerator. The neural network accelerator can be configured to implement, in hardware, a neural network by using a residue number system (RNS). At least one function of the neural network can have a corresponding approximation in the RNS system, and the at least one function can be provided by implementing the corresponding approximation in hardware.Type: GrantFiled: March 31, 2023Date of Patent: March 4, 2025Assignee: Khalifa University of Science and TechnologyInventors: Athanasios Stouraitis, Sakellariou Vasileios, Vasileios Paliouras, Ioannis Kouretas, Hani Saleh
-
Patent number: 12236238Abstract: An apparatus to facilitate large integer multiplication enhancements in a graphics environment is disclosed. The apparatus includes a processor comprising processing resources, the processing resources comprising multiplier circuitry to: receive operands for a multiplication operation, wherein the multiplication operation is part of a chain of multiplication operations for a large integer multiplication; and issue a multiply and add (MAD) instruction for the multiplication operation utilizing at least one of a double precision multiplier or a 48 bit output, wherein the MAD instruction to generate an output in a single clock cycle of the processor.Type: GrantFiled: June 25, 2021Date of Patent: February 25, 2025Assignee: INTEL CORPORATIONInventors: Supratim Pal, Li-An Tang, Changwon Rhee, Timothy R. Bauer, Alexander Lyashevsky, Jiasheng Chen
-
Patent number: 12229558Abstract: A processor includes a front end, an execution unit, a retirement stage, a counter, and a performance monitoring unit. The front end includes logic to receive an event instruction to enable supervision of a front end event that will delay execution of instructions. The execution unit includes logic to set a register with parameters for supervision of the front end event. The front end further includes logic to receive a candidate instruction and match the candidate instruction to the front end event. The counter includes logic to generate the front end event upon retirement of the candidate instruction.Type: GrantFiled: September 22, 2023Date of Patent: February 18, 2025Assignee: Intel CorporationInventor: Ahmad Yasin