Patents Examined by Eric Coleman
-
Patent number: 12293184Abstract: An illegal address mask method for cores of a DSP includes: S1, initializing a core of a DSP; S2, configuring a start address register and an end address register, and taking an address range defined by the start address register and the end address register as a masked address range; configuring a first comparator and a second comparator to send out illegal address decision signals for instructions within the masked address range; S3, acquiring a PC pointer, and determining whether the PC pointer is located in the masked address range; if so, sending out an illegal address decision signal to stop an operation; if not, performing pre-decoding to obtain a memory access instruction; and S4, determining whether an address of the memory access instruction is located in the masked address range; if so, sending out an illegal address decision signal to stop an operation; otherwise, completing a memory access operation.Type: GrantFiled: December 27, 2024Date of Patent: May 6, 2025Assignee: Jiangsu Huachuang Microsystem Company LimitedInventors: Haibin Zhou, Guoqiang He, Wenjun Han, Ming Hao
-
Patent number: 12287756Abstract: A systolic array cell is described, the cell including two general-purpose arithmetic logic units (ALUs) and register-file. A plurality of the cells may be configured in a matrix or array, such that the output of the first ALU in a first cell is provided to a second cell to the right of the first cell, and the output of the second ALU in the first cell is provided to a third cell below the first cell. The two ALUs in each cell of the array allow for processing of a different instruction in each cycle.Type: GrantFiled: October 4, 2023Date of Patent: April 29, 2025Assignee: GOOGLE LLCInventors: Reginald Clifford Young, Trevor Gale, Sushma Honnavara-Prasad, Paolo Mantovani
-
Patent number: 12288068Abstract: An instruction simulation device and a method thereof are provided. The instruction simulation device includes a processor. The processor includes an instruction decoder which generates format information of a ready-for-execution instruction. The processor determines whether the ready-for-execution instruction currently executed by the processor is a compatible instruction or an extended instruction based on the format information of the ready-for-execution instruction. If the ready-for-execution instruction is an extended instruction under the new instruction set or the extended instruction set, the processor converts the ready-for-execution instruction into a simulation program corresponding to the extended instruction, and simulates an execution result of the ready-for-execution instruction by executing the simulation program. The simulation program is composed of at least one compatible instructions of the processor.Type: GrantFiled: September 12, 2023Date of Patent: April 29, 2025Assignee: Shanghai Zhaoxin Semiconductor Co., Ltd.Inventors: Weilin Wang, Yingbing Guan, Mengchen Yang
-
Patent number: 12282773Abstract: Embodiments detailed herein relate to matrix (tile) operations. For example, decode circuitry to decode an instruction having fields for an opcode and a memory address; and execution circuitry to execute the decoded instruction to set a tile configuration for the processor to utilize tiles in matrix operations based on a description retrieved from the memory address, wherein a tile a set of 2-dimensional registers are discussed.Type: GrantFiled: December 8, 2023Date of Patent: April 22, 2025Assignee: Intel CorporationInventors: Menachem Adelman, Robert Valentine, Zeev Sperber, Mark J. Charney, Bret L. Toll, Rinat Rappoport, Jesus Corbal, Dan Baum, Alexander F. Heinecke, Elmoustapha Ould-Ahmed-Vall, Yuri Gebil, Raanan Sade
-
Patent number: 12271339Abstract: Embodiments are directed to a processor having a functional slice architecture. The processor is divided into tiles (or functional units) organized into a plurality of functional slices. The functional slices are configured to perform specific operations within the processor, which includes memory slices for storing operand data and arithmetic logic slices for performing operations on received operand data (e.g., vector processing, matrix manipulation). The processor includes a plurality of functional slices of a module type, each functional slice having a plurality of tiles. The processor further includes a plurality of data transport lanes for transporting data in a direction indicated in a corresponding instruction. The processor also includes a plurality of instruction queues, each instruction queue associated with a corresponding functional slice of the plurality of functional slices, wherein the instructions in the instruction queues comprise a functional slice specific operation code.Type: GrantFiled: October 9, 2023Date of Patent: April 8, 2025Assignee: Groq, Inc.Inventors: Dennis Charles Abts, Jonathan Alexander Ross, John Thompson, Gregory Michael Thorson
-
Patent number: 12260215Abstract: In-memory computing circuits can be used to determine distances between vectors. Such circuits can be used for machine learning applications. Examples include obtaining at least one dimension of a query vector wherein the dimension includes one or more bits and comparing respective bits of the dimension to corresponding bits of at least one dimension of a reference vector. This obtains a control signal dependent upon whether the bits of the dimension of the query vector are the same as corresponding bits of the dimension of the reference vector. The control signal can then be used to control a pulse modifying circuit such that a modification applied to a pulse signal is dependent upon whether the bits of the dimension of the query vector are the same as corresponding bits of the dimension of the reference vector.Type: GrantFiled: May 22, 2023Date of Patent: March 25, 2025Assignee: Nokia Technologies OyInventor: Marijan Herceg
-
Patent number: 12248429Abstract: A computer comprising a plurality of interconnected processing nodes arranged in a configuration in which multiple layers of interconnected nodes are arranged along an axis, each layer comprising at least four processing nodes connected in a non-axial ring by at least respective intralayer link between each pair of neighbouring processing nodes, wherein each of the at least four processing nodes in each layer is connected to a respective corresponding node in one or more adjacent layer by a respective interlayer link, the computer being programmed to provide in the configuration two embedded one dimensional paths and to transmit data around each of the two embedded one dimensional paths, each embedded one dimensional path using all processing nodes of the computer in such a manner that the two embedded one dimensional paths operate simultaneously without sharing links.Type: GrantFiled: March 17, 2023Date of Patent: March 11, 2025Assignee: GRAPHCORE LIMITEDInventor: Simon Knowles
-
Patent number: 12242894Abstract: A device can be used to implement a neural network in hardware. The device can include a processor, a memory, and a neural network accelerator. The neural network accelerator can be configured to implement, in hardware, a neural network by using a residue number system (RNS). At least one function of the neural network can have a corresponding approximation in the RNS system, and the at least one function can be provided by implementing the corresponding approximation in hardware.Type: GrantFiled: March 31, 2023Date of Patent: March 4, 2025Assignee: Khalifa University of Science and TechnologyInventors: Athanasios Stouraitis, Sakellariou Vasileios, Vasileios Paliouras, Ioannis Kouretas, Hani Saleh
-
Patent number: 12236238Abstract: An apparatus to facilitate large integer multiplication enhancements in a graphics environment is disclosed. The apparatus includes a processor comprising processing resources, the processing resources comprising multiplier circuitry to: receive operands for a multiplication operation, wherein the multiplication operation is part of a chain of multiplication operations for a large integer multiplication; and issue a multiply and add (MAD) instruction for the multiplication operation utilizing at least one of a double precision multiplier or a 48 bit output, wherein the MAD instruction to generate an output in a single clock cycle of the processor.Type: GrantFiled: June 25, 2021Date of Patent: February 25, 2025Assignee: INTEL CORPORATIONInventors: Supratim Pal, Li-An Tang, Changwon Rhee, Timothy R. Bauer, Alexander Lyashevsky, Jiasheng Chen
-
Patent number: 12229558Abstract: A processor includes a front end, an execution unit, a retirement stage, a counter, and a performance monitoring unit. The front end includes logic to receive an event instruction to enable supervision of a front end event that will delay execution of instructions. The execution unit includes logic to set a register with parameters for supervision of the front end event. The front end further includes logic to receive a candidate instruction and match the candidate instruction to the front end event. The counter includes logic to generate the front end event upon retirement of the candidate instruction.Type: GrantFiled: September 22, 2023Date of Patent: February 18, 2025Assignee: Intel CorporationInventor: Ahmad Yasin
-
Patent number: 12222894Abstract: Embodiments are directed to a processor having a functional slice architecture. The processor is divided into tiles (or functional units) organized into a plurality of functional slices. The functional slices are configured to perform specific operations within the processor, which includes memory slices for storing operand data and arithmetic logic slices for performing operations on received operand data (e.g., vector processing, matrix manipulation). The processor includes a plurality of functional slices of a module type, each functional slice having a plurality of tiles. The processor further includes a plurality of data transport lanes for transporting data in a direction indicated in a corresponding instruction. The processor also includes a plurality of instruction queues, each instruction queue associated with a corresponding functional slice of the plurality of functional slices, wherein the instructions in the instruction queues comprise a functional slice specific operation code.Type: GrantFiled: July 13, 2023Date of Patent: February 11, 2025Assignee: GROQ, INC.Inventors: Dennis Charles Abts, Jonathan Alexander Ross, John Thompson, Gregory Michael Thorson
-
Patent number: 12223328Abstract: Examples of the present disclosure provide apparatuses and methods related to generating and executing a control flow. An example apparatus can include a first device configured to generate control flow instructions, and a second device including an array of memory cells, an execution unit to execute the control flow instructions, and a controller configured to control an execution of the control flow instructions on data stored in the array.Type: GrantFiled: August 10, 2023Date of Patent: February 11, 2025Inventors: Kyle B. Wheeler, Richard C. Murphy, Troy A. Manning, Dean A. Klein
-
Patent number: 12217056Abstract: A method for processing a tensor is described including obtaining a first register for a number of items in the tensor. One or more second registers for a number of items in a first and a second axis of the tensor are obtained. A stride in the first and the second axis is obtained A next item in the tensor is obtained using the stride in the first axis and a first offset register, when the first register indicates the tensor has additional items to process and the second registers indicate the next item resides in the first axis. A next item in the tensor is obtained using the stride in the first axis and the second axis, the first offset register, and a second offset register. The first register and a second register is modified. The first and the second offset registers are modified.Type: GrantFiled: January 25, 2024Date of Patent: February 4, 2025Assignee: Celestial AI Inc.Inventor: Philip Winterbottom
-
Patent number: 12210871Abstract: A method includes: adding a first conversion operator to the model, to convert data input into the first conversion operator into a general format, where the general format is a data format supported by all the plurality of calculation circuits; and modifying another operator that is in the model and that is after the first conversion operator, to cause formats of both input data and output data of the another operator to be the general format.Type: GrantFiled: June 29, 2023Date of Patent: January 28, 2025Assignee: Huawei Technologies Co., Ltd.Inventors: Wenlong Xie, Linmu Wang, Xiaopeng Du
-
Patent number: 12204897Abstract: Apparatuses, systems, and techniques to perform computational operations in response to one or more compute uniform device architecture (CUDA) programs. In at least one embodiment, one or more computational operations are to cause one or more other computational operations to wait until a portion of matrix multiply-accumulate (MMA) operations have been performed.Type: GrantFiled: November 30, 2022Date of Patent: January 21, 2025Assignee: NVIDIA CORPORATIONInventors: Harold Carter Edwards, Kyrylo Perelygin, Maciej Tyrlik, Gokul Ramaswamy Hirisave Chandra Shekhara, Balaji Krishna Yugandhar Atukuri, Rishkul Kulkarni, Konstantinos Kyriakopoulos, Edward H. Gornish, David Allan Berson, Bageshri Sathe, James Player, Aman Arora, Alan Kaatz, Andrew Kerr, Haicheng Wu, Cris Cecka, Vijay Thakkar, Sean Treichler, Jack H. Choquette, Aditya Avinash Atluri, Apoorv Parle, Ronny Meir Krashinsky, Cody Addison, Girish Bhaskarrao Bharambe
-
Patent number: 12205025Abstract: The present application discloses a processor video memory optimization method and apparatus for deep learning training tasks, and relates to the technical field of artificial intelligence. In the method, by determining an optimal path for transferring a computing result, the computing result of a first computing unit is transferred to a second computing unit by using the optimal path. Thus, occupying the video memory is avoided, and meanwhile, a problem of low utilization rate of the computing unit of a GPU caused by video memory swaps is avoided, so that training speed of most tasks is hardly reduced.Type: GrantFiled: March 24, 2021Date of Patent: January 21, 2025Assignee: Beijing Baidu Netcom Science Technology Co., Ltd.Inventors: Haifeng Wang, Xiaoguang Hu, Dianhai Yu
-
Patent number: 12204898Abstract: A processor of an aspect includes a decode unit to decode a matrix multiplication instruction. The matrix multiplication instruction is to indicate a first memory location of a first source matrix, is to indicate a second memory location of a second source matrix, and is to indicate a third memory location where a result matrix is to be stored. The processor also includes an execution unit coupled with the decode unit. The execution unit, in response to the matrix multiplication instruction, is to multiply a portion of the first and second source matrices prior to an interruption, and store a completion progress indicator in response to the interruption. The completion progress indicator to indicate an amount of progress in multiplying the first and second source matrices, and storing corresponding result data to the third memory location, that is to have been completed prior to the interruption.Type: GrantFiled: August 30, 2023Date of Patent: January 21, 2025Assignee: Intel CorporationInventors: Edward T. Grochowski, Asit K. Mishra, Robert Valentine, Mark J. Charney, Simon C. Steely, Jr.
-
Patent number: 12189571Abstract: A processing apparatus described herein includes a general-purpose parallel processing engine comprising a systolic array having multiple pipelines, each of the multiple pipelines including multiple pipeline stages, wherein the multiple pipelines include a first pipeline, a second pipeline, and a common input shared between the first pipeline and the second pipeline.Type: GrantFiled: June 25, 2021Date of Patent: January 7, 2025Assignee: Intel CorporationInventors: Jorge Parra, Jiasheng Chen, Supratim Pal, Fangwen Fu, Sabareesh Ganapathy, Chandra Gurram, Chunhui Mei, Yue Qi
-
Patent number: 12182064Abstract: Systems and methods are provided to enable parallelized multiply-accumulate operations in a systolic array. Each column of the systolic array can include multiple busses enabling independent transmission of input partial sums along the respective bus. Each processing element of a given columnar bus can receive an input partial sum from a prior element of the given columnar bus, and perform arithmetic operations on the input partial sum. Each processing element can generate an output partial sum based on the arithmetic operations, provide the output partial sum to a next processing element of the given columnar bus, without the output partial sum being processed by a processing element of the column located between the two processing elements that uses a different columnar bus. Use of columnar busses can enable parallelization to increase speed or enable increased latency at individual processing elements.Type: GrantFiled: August 8, 2023Date of Patent: December 31, 2024Assignee: Amazon Technologies, Inc.Inventors: Thomas A Volpe, Sundeep Amirineni, Thomas Elmer
-
Patent number: 12182570Abstract: Systems, methods, and apparatuses to support packed data convolution instructions with shift control and width control are described.Type: GrantFiled: June 25, 2021Date of Patent: December 31, 2024Assignee: Intel CorporationInventors: Deepti Aggarwal, Michael Espig, Robert Valentine, Sumit Mohan, Prakaram Joshi, Richard Winterton