Patents Examined by Eric Coleman
  • Patent number: 12293184
    Abstract: An illegal address mask method for cores of a DSP includes: S1, initializing a core of a DSP; S2, configuring a start address register and an end address register, and taking an address range defined by the start address register and the end address register as a masked address range; configuring a first comparator and a second comparator to send out illegal address decision signals for instructions within the masked address range; S3, acquiring a PC pointer, and determining whether the PC pointer is located in the masked address range; if so, sending out an illegal address decision signal to stop an operation; if not, performing pre-decoding to obtain a memory access instruction; and S4, determining whether an address of the memory access instruction is located in the masked address range; if so, sending out an illegal address decision signal to stop an operation; otherwise, completing a memory access operation.
    Type: Grant
    Filed: December 27, 2024
    Date of Patent: May 6, 2025
    Assignee: Jiangsu Huachuang Microsystem Company Limited
    Inventors: Haibin Zhou, Guoqiang He, Wenjun Han, Ming Hao
  • Patent number: 12287756
    Abstract: A systolic array cell is described, the cell including two general-purpose arithmetic logic units (ALUs) and register-file. A plurality of the cells may be configured in a matrix or array, such that the output of the first ALU in a first cell is provided to a second cell to the right of the first cell, and the output of the second ALU in the first cell is provided to a third cell below the first cell. The two ALUs in each cell of the array allow for processing of a different instruction in each cycle.
    Type: Grant
    Filed: October 4, 2023
    Date of Patent: April 29, 2025
    Assignee: GOOGLE LLC
    Inventors: Reginald Clifford Young, Trevor Gale, Sushma Honnavara-Prasad, Paolo Mantovani
  • Patent number: 12288068
    Abstract: An instruction simulation device and a method thereof are provided. The instruction simulation device includes a processor. The processor includes an instruction decoder which generates format information of a ready-for-execution instruction. The processor determines whether the ready-for-execution instruction currently executed by the processor is a compatible instruction or an extended instruction based on the format information of the ready-for-execution instruction. If the ready-for-execution instruction is an extended instruction under the new instruction set or the extended instruction set, the processor converts the ready-for-execution instruction into a simulation program corresponding to the extended instruction, and simulates an execution result of the ready-for-execution instruction by executing the simulation program. The simulation program is composed of at least one compatible instructions of the processor.
    Type: Grant
    Filed: September 12, 2023
    Date of Patent: April 29, 2025
    Assignee: Shanghai Zhaoxin Semiconductor Co., Ltd.
    Inventors: Weilin Wang, Yingbing Guan, Mengchen Yang
  • Patent number: 12282773
    Abstract: Embodiments detailed herein relate to matrix (tile) operations. For example, decode circuitry to decode an instruction having fields for an opcode and a memory address; and execution circuitry to execute the decoded instruction to set a tile configuration for the processor to utilize tiles in matrix operations based on a description retrieved from the memory address, wherein a tile a set of 2-dimensional registers are discussed.
    Type: Grant
    Filed: December 8, 2023
    Date of Patent: April 22, 2025
    Assignee: Intel Corporation
    Inventors: Menachem Adelman, Robert Valentine, Zeev Sperber, Mark J. Charney, Bret L. Toll, Rinat Rappoport, Jesus Corbal, Dan Baum, Alexander F. Heinecke, Elmoustapha Ould-Ahmed-Vall, Yuri Gebil, Raanan Sade
  • Patent number: 12271339
    Abstract: Embodiments are directed to a processor having a functional slice architecture. The processor is divided into tiles (or functional units) organized into a plurality of functional slices. The functional slices are configured to perform specific operations within the processor, which includes memory slices for storing operand data and arithmetic logic slices for performing operations on received operand data (e.g., vector processing, matrix manipulation). The processor includes a plurality of functional slices of a module type, each functional slice having a plurality of tiles. The processor further includes a plurality of data transport lanes for transporting data in a direction indicated in a corresponding instruction. The processor also includes a plurality of instruction queues, each instruction queue associated with a corresponding functional slice of the plurality of functional slices, wherein the instructions in the instruction queues comprise a functional slice specific operation code.
    Type: Grant
    Filed: October 9, 2023
    Date of Patent: April 8, 2025
    Assignee: Groq, Inc.
    Inventors: Dennis Charles Abts, Jonathan Alexander Ross, John Thompson, Gregory Michael Thorson
  • Patent number: 12260215
    Abstract: In-memory computing circuits can be used to determine distances between vectors. Such circuits can be used for machine learning applications. Examples include obtaining at least one dimension of a query vector wherein the dimension includes one or more bits and comparing respective bits of the dimension to corresponding bits of at least one dimension of a reference vector. This obtains a control signal dependent upon whether the bits of the dimension of the query vector are the same as corresponding bits of the dimension of the reference vector. The control signal can then be used to control a pulse modifying circuit such that a modification applied to a pulse signal is dependent upon whether the bits of the dimension of the query vector are the same as corresponding bits of the dimension of the reference vector.
    Type: Grant
    Filed: May 22, 2023
    Date of Patent: March 25, 2025
    Assignee: Nokia Technologies Oy
    Inventor: Marijan Herceg
  • Patent number: 12248429
    Abstract: A computer comprising a plurality of interconnected processing nodes arranged in a configuration in which multiple layers of interconnected nodes are arranged along an axis, each layer comprising at least four processing nodes connected in a non-axial ring by at least respective intralayer link between each pair of neighbouring processing nodes, wherein each of the at least four processing nodes in each layer is connected to a respective corresponding node in one or more adjacent layer by a respective interlayer link, the computer being programmed to provide in the configuration two embedded one dimensional paths and to transmit data around each of the two embedded one dimensional paths, each embedded one dimensional path using all processing nodes of the computer in such a manner that the two embedded one dimensional paths operate simultaneously without sharing links.
    Type: Grant
    Filed: March 17, 2023
    Date of Patent: March 11, 2025
    Assignee: GRAPHCORE LIMITED
    Inventor: Simon Knowles
  • Patent number: 12242894
    Abstract: A device can be used to implement a neural network in hardware. The device can include a processor, a memory, and a neural network accelerator. The neural network accelerator can be configured to implement, in hardware, a neural network by using a residue number system (RNS). At least one function of the neural network can have a corresponding approximation in the RNS system, and the at least one function can be provided by implementing the corresponding approximation in hardware.
    Type: Grant
    Filed: March 31, 2023
    Date of Patent: March 4, 2025
    Assignee: Khalifa University of Science and Technology
    Inventors: Athanasios Stouraitis, Sakellariou Vasileios, Vasileios Paliouras, Ioannis Kouretas, Hani Saleh
  • Patent number: 12236238
    Abstract: An apparatus to facilitate large integer multiplication enhancements in a graphics environment is disclosed. The apparatus includes a processor comprising processing resources, the processing resources comprising multiplier circuitry to: receive operands for a multiplication operation, wherein the multiplication operation is part of a chain of multiplication operations for a large integer multiplication; and issue a multiply and add (MAD) instruction for the multiplication operation utilizing at least one of a double precision multiplier or a 48 bit output, wherein the MAD instruction to generate an output in a single clock cycle of the processor.
    Type: Grant
    Filed: June 25, 2021
    Date of Patent: February 25, 2025
    Assignee: INTEL CORPORATION
    Inventors: Supratim Pal, Li-An Tang, Changwon Rhee, Timothy R. Bauer, Alexander Lyashevsky, Jiasheng Chen
  • Patent number: 12229558
    Abstract: A processor includes a front end, an execution unit, a retirement stage, a counter, and a performance monitoring unit. The front end includes logic to receive an event instruction to enable supervision of a front end event that will delay execution of instructions. The execution unit includes logic to set a register with parameters for supervision of the front end event. The front end further includes logic to receive a candidate instruction and match the candidate instruction to the front end event. The counter includes logic to generate the front end event upon retirement of the candidate instruction.
    Type: Grant
    Filed: September 22, 2023
    Date of Patent: February 18, 2025
    Assignee: Intel Corporation
    Inventor: Ahmad Yasin
  • Patent number: 12222894
    Abstract: Embodiments are directed to a processor having a functional slice architecture. The processor is divided into tiles (or functional units) organized into a plurality of functional slices. The functional slices are configured to perform specific operations within the processor, which includes memory slices for storing operand data and arithmetic logic slices for performing operations on received operand data (e.g., vector processing, matrix manipulation). The processor includes a plurality of functional slices of a module type, each functional slice having a plurality of tiles. The processor further includes a plurality of data transport lanes for transporting data in a direction indicated in a corresponding instruction. The processor also includes a plurality of instruction queues, each instruction queue associated with a corresponding functional slice of the plurality of functional slices, wherein the instructions in the instruction queues comprise a functional slice specific operation code.
    Type: Grant
    Filed: July 13, 2023
    Date of Patent: February 11, 2025
    Assignee: GROQ, INC.
    Inventors: Dennis Charles Abts, Jonathan Alexander Ross, John Thompson, Gregory Michael Thorson
  • Patent number: 12223328
    Abstract: Examples of the present disclosure provide apparatuses and methods related to generating and executing a control flow. An example apparatus can include a first device configured to generate control flow instructions, and a second device including an array of memory cells, an execution unit to execute the control flow instructions, and a controller configured to control an execution of the control flow instructions on data stored in the array.
    Type: Grant
    Filed: August 10, 2023
    Date of Patent: February 11, 2025
    Inventors: Kyle B. Wheeler, Richard C. Murphy, Troy A. Manning, Dean A. Klein
  • Patent number: 12217056
    Abstract: A method for processing a tensor is described including obtaining a first register for a number of items in the tensor. One or more second registers for a number of items in a first and a second axis of the tensor are obtained. A stride in the first and the second axis is obtained A next item in the tensor is obtained using the stride in the first axis and a first offset register, when the first register indicates the tensor has additional items to process and the second registers indicate the next item resides in the first axis. A next item in the tensor is obtained using the stride in the first axis and the second axis, the first offset register, and a second offset register. The first register and a second register is modified. The first and the second offset registers are modified.
    Type: Grant
    Filed: January 25, 2024
    Date of Patent: February 4, 2025
    Assignee: Celestial AI Inc.
    Inventor: Philip Winterbottom
  • Patent number: 12210871
    Abstract: A method includes: adding a first conversion operator to the model, to convert data input into the first conversion operator into a general format, where the general format is a data format supported by all the plurality of calculation circuits; and modifying another operator that is in the model and that is after the first conversion operator, to cause formats of both input data and output data of the another operator to be the general format.
    Type: Grant
    Filed: June 29, 2023
    Date of Patent: January 28, 2025
    Assignee: Huawei Technologies Co., Ltd.
    Inventors: Wenlong Xie, Linmu Wang, Xiaopeng Du
  • Patent number: 12204897
    Abstract: Apparatuses, systems, and techniques to perform computational operations in response to one or more compute uniform device architecture (CUDA) programs. In at least one embodiment, one or more computational operations are to cause one or more other computational operations to wait until a portion of matrix multiply-accumulate (MMA) operations have been performed.
    Type: Grant
    Filed: November 30, 2022
    Date of Patent: January 21, 2025
    Assignee: NVIDIA CORPORATION
    Inventors: Harold Carter Edwards, Kyrylo Perelygin, Maciej Tyrlik, Gokul Ramaswamy Hirisave Chandra Shekhara, Balaji Krishna Yugandhar Atukuri, Rishkul Kulkarni, Konstantinos Kyriakopoulos, Edward H. Gornish, David Allan Berson, Bageshri Sathe, James Player, Aman Arora, Alan Kaatz, Andrew Kerr, Haicheng Wu, Cris Cecka, Vijay Thakkar, Sean Treichler, Jack H. Choquette, Aditya Avinash Atluri, Apoorv Parle, Ronny Meir Krashinsky, Cody Addison, Girish Bhaskarrao Bharambe
  • Patent number: 12205025
    Abstract: The present application discloses a processor video memory optimization method and apparatus for deep learning training tasks, and relates to the technical field of artificial intelligence. In the method, by determining an optimal path for transferring a computing result, the computing result of a first computing unit is transferred to a second computing unit by using the optimal path. Thus, occupying the video memory is avoided, and meanwhile, a problem of low utilization rate of the computing unit of a GPU caused by video memory swaps is avoided, so that training speed of most tasks is hardly reduced.
    Type: Grant
    Filed: March 24, 2021
    Date of Patent: January 21, 2025
    Assignee: Beijing Baidu Netcom Science Technology Co., Ltd.
    Inventors: Haifeng Wang, Xiaoguang Hu, Dianhai Yu
  • Patent number: 12204898
    Abstract: A processor of an aspect includes a decode unit to decode a matrix multiplication instruction. The matrix multiplication instruction is to indicate a first memory location of a first source matrix, is to indicate a second memory location of a second source matrix, and is to indicate a third memory location where a result matrix is to be stored. The processor also includes an execution unit coupled with the decode unit. The execution unit, in response to the matrix multiplication instruction, is to multiply a portion of the first and second source matrices prior to an interruption, and store a completion progress indicator in response to the interruption. The completion progress indicator to indicate an amount of progress in multiplying the first and second source matrices, and storing corresponding result data to the third memory location, that is to have been completed prior to the interruption.
    Type: Grant
    Filed: August 30, 2023
    Date of Patent: January 21, 2025
    Assignee: Intel Corporation
    Inventors: Edward T. Grochowski, Asit K. Mishra, Robert Valentine, Mark J. Charney, Simon C. Steely, Jr.
  • Patent number: 12189571
    Abstract: A processing apparatus described herein includes a general-purpose parallel processing engine comprising a systolic array having multiple pipelines, each of the multiple pipelines including multiple pipeline stages, wherein the multiple pipelines include a first pipeline, a second pipeline, and a common input shared between the first pipeline and the second pipeline.
    Type: Grant
    Filed: June 25, 2021
    Date of Patent: January 7, 2025
    Assignee: Intel Corporation
    Inventors: Jorge Parra, Jiasheng Chen, Supratim Pal, Fangwen Fu, Sabareesh Ganapathy, Chandra Gurram, Chunhui Mei, Yue Qi
  • Patent number: 12182064
    Abstract: Systems and methods are provided to enable parallelized multiply-accumulate operations in a systolic array. Each column of the systolic array can include multiple busses enabling independent transmission of input partial sums along the respective bus. Each processing element of a given columnar bus can receive an input partial sum from a prior element of the given columnar bus, and perform arithmetic operations on the input partial sum. Each processing element can generate an output partial sum based on the arithmetic operations, provide the output partial sum to a next processing element of the given columnar bus, without the output partial sum being processed by a processing element of the column located between the two processing elements that uses a different columnar bus. Use of columnar busses can enable parallelization to increase speed or enable increased latency at individual processing elements.
    Type: Grant
    Filed: August 8, 2023
    Date of Patent: December 31, 2024
    Assignee: Amazon Technologies, Inc.
    Inventors: Thomas A Volpe, Sundeep Amirineni, Thomas Elmer
  • Patent number: 12182570
    Abstract: Systems, methods, and apparatuses to support packed data convolution instructions with shift control and width control are described.
    Type: Grant
    Filed: June 25, 2021
    Date of Patent: December 31, 2024
    Assignee: Intel Corporation
    Inventors: Deepti Aggarwal, Michael Espig, Robert Valentine, Sumit Mohan, Prakaram Joshi, Richard Winterton