Patents Examined by Eric Coleman
-
Patent number: 11720367Abstract: A method performed in a processor, includes: receiving, in the processor, a branch instruction in the processing; determining, by the processor, an address of an instruction after the branch instruction as a candidate for speculative execution, the address including an object identification and an offset; and determining, by the processor, whether or not to perform speculative execution of the instruction after the branch instruction based on the object identification of the address.Type: GrantFiled: March 29, 2022Date of Patent: August 8, 2023Inventor: Steven Jeffrey Wallach
-
Patent number: 11714642Abstract: Embodiments detailed herein relate to matrix operations. In particular, the loading of a matrix (tile) from memory. For example, support for a loading instruction is described in at least a form of decode circuitry to decode an instruction having fields for an opcode, a source matrix operand identifier, and destination memory information, and execution circuitry to execute the decoded instruction to store each data element of configured rows of the identified source matrix operand to memory based on the destination memory information.Type: GrantFiled: March 28, 2022Date of Patent: August 1, 2023Assignee: Intel CorporationInventors: Robert Valentine, Menachem Adelman, Elmoustapha Ould-Ahmed-Vall, Bret L. Toll, Milind B. Girkar, Zeev Sperber, Mark J. Charney, Rinat Rappoport, Jesus Corbal, Stanislav Shwartsman, Igor Yanover, Alexander F. Heinecke, Barukh Ziv, Dan Baum, Yuri Gebil
-
Patent number: 11714639Abstract: A data processing device has an instruction decoder, a control logic unit, and ALU. The instruction decoder decodes instruction codes of an arithmetic instruction. The control logic unit detects the effective data width of operation data to be processed according to the decode result from the instruction decoder and determines the number of cycles for the instruction execution corresponding to the effective, data width. The ALU executes the instruction with the number of cycles of the instruction execution determined by the control logic unit.Type: GrantFiled: December 29, 2021Date of Patent: August 1, 2023Assignee: RENESAS ELECTRONICS CORPORATIONInventors: Sugako Ohtani, Hiroyuki Kondo
-
Patent number: 11709795Abstract: Disclosed is an electronic device which includes a main processor, and a systolic array processor, and the systolic array processor includes processing elements, a kernel data memory that provides a kernel data set to the processing elements, a data memory that provides an input data set to the processing elements, and a controller that provides commands to the processing elements. The main processor translates source codes associated with the systolic array processor into commands of the systolic array processor, calculates a switching activity value based on the commands, and stores the translated commands and the switching activity value to a machine learning module, which is based on the systolic array processor.Type: GrantFiled: November 12, 2021Date of Patent: July 25, 2023Assignee: ELECTRONICS AND TELECOMMUNICATIONS RESEARCH INSTITUTEInventor: Jaehoon Chung
-
Patent number: 11709794Abstract: Two or more die are stacked together in a stacked integrated circuit device. Each of the processors on these die is able to communicate with other processors on its die by sending data over the switching fabric of its respective die. The mechanism for sending data between processors on the same die (i.e. intradie communication) is reused for sending data between processors on different die (i.e. interdie communication). The reuse of the mechanism is enabled by assigning each processor a vertical neighbour on its opposing die. Each processor has an interdie connection that connects it to the output exchange bus of its neighbour. A processor is able to borrow the output exchange bus of its neighbour by sending data along the output exchange bus of its neighbour.Type: GrantFiled: October 19, 2021Date of Patent: July 25, 2023Assignee: GRAPHCORE LIMITEDInventors: Stephen Felix, Richard Luke Southwell Osborne, Alan Graham Alexander
-
Patent number: 11704046Abstract: A method of clearing of registers and logic designs with AND and OR logics to propagate the zero values provided to write enable signal buses upon the execution of clear instruction of more than one registers, allowing more than one architecturally visible registers to be cleared with one signal instruction regardless of the values of data buses.Type: GrantFiled: April 18, 2022Date of Patent: July 18, 2023Assignee: Texas Instruments IncorporatedInventors: Timothy David Anderson, Duc Quang Bui, Soujanya Narnur
-
Patent number: 11698787Abstract: A processor of an aspect includes a decode unit to decode a matrix multiplication instruction. The matrix multiplication instruction is to indicate a first memory location of a first source matrix, is to indicate a second memory location of a second source matrix, and is to indicate a third memory location where a result matrix is to be stored. The processor also includes an execution unit coupled with the decode unit. The execution unit, in response to the matrix multiplication instruction, is to multiply a portion of the first and second source matrices prior to an interruption, and store a completion progress indicator in response to the interruption. The completion progress indicator to indicate an amount of progress in multiplying the first and second source matrices, and storing corresponding result data to the third memory location, that is to have been completed prior to the interruption.Type: GrantFiled: June 29, 2021Date of Patent: July 11, 2023Assignee: INTEL CORPORATIONInventors: Edward T. Grochowski, Asit K. Mishra, Robert Valentine, Mark J. Charney, Simon C. Steely, Jr.
-
Patent number: 11687837Abstract: A system to support a machine learning (ML) operation comprises a core configured to receive and interpret commands into a set of instructions for the ML operation and a memory unit configured to maintain data for the ML operation. The system further comprises an inference engine having a plurality of processing tiles, each comprising an on-chip memory (OCM) configured to maintain data for local access by components in the processing tile and one or more processing units configured to perform tasks of the ML operation on the data in the OCM. The system also comprises an instruction streaming engine configured to distribute the instructions to the processing tiles to control their operations and to synchronize data communication between the core and the inference engine so that data transmitted between them correctly reaches the corresponding processing tiles while ensuring coherence of data shared and distributed among the core and the OCMs.Type: GrantFiled: June 27, 2022Date of Patent: June 27, 2023Assignee: Marvell Asia Pte LtdInventors: Avinash Sodani, Gopal Nalamalapu
-
Patent number: 11675568Abstract: A processing system executes wavefronts at multiple arithmetic logic unit (ALU) pipelines of a single instruction multiple data (SIMD) unit in a single execution cycle. The ALU pipelines each include a number of ALUs that execute instructions on wavefront operands that are collected from vector general process register (VGPR) banks at a cache and output results of the instructions executed on the wavefronts at a buffer. By storing wavefronts supplied by the VGPR banks at the cache, a greater number of wavefronts can be made available to the SIMD unit without increasing the VGPR bandwidth, enabling multiple ALU pipelines to execute instructions during a single execution cycle.Type: GrantFiled: December 14, 2020Date of Patent: June 13, 2023Assignee: Advanced Micro Devices, Inc.Inventors: Bin He, Brian Emberling, Mark Leather, Michael Mantor
-
Patent number: 11675587Abstract: A method for changing a processor instruction randomly, covertly, and uniquely, so that the reverse process can restore it faithfully to its original form, making it virtually impossible for a malicious user to know how the bits are changed, preventing them from using a buffer overflow attack to write code with the same processor instruction changes into said processor's memory with the goal of taking control of the processor. When the changes are reversed prior to the instruction being executed, reverting the instruction back to its original value, malicious code placed in memory will be randomly altered so that when it is executed by the processor it produces chaotic, random behavior that will not allow control of the processor to be compromised, eventually producing a processing error that will cause the processor to either shut down the software process where the code exists to reload, or reset.Type: GrantFiled: August 13, 2021Date of Patent: June 13, 2023Inventor: Forrest L. Pierson
-
Patent number: 11669490Abstract: An apparatus to facilitate computing efficient cross channel operations in parallel computing machines using systolic arrays is disclosed. The apparatus includes a plurality of registers and one or more processing elements communicably coupled to the plurality of registers. The one or more processing elements include a systolic array circuit to perform cross-channel operations on source data received from a single source register of the plurality of registers, wherein the systolic array circuit is modified to: receive inputs from the single source register at different stages of the systolic array circuit; perform cross-channel operations at channels of the systolic array circuit; bypass disabled channels of the systolic array circuit, the disabled channels not used to compute the cross-channel operations; and broadcast a final result of a final stage of the systolic array circuit to all channels of a destination register.Type: GrantFiled: November 3, 2021Date of Patent: June 6, 2023Assignee: INTEL CORPORATIONInventors: Subramaniam Maiyuran, Jorge Parra, Supratim Pal, Chandra Gurram
-
Patent number: 11669489Abstract: A systolic array can be configured to skip distributed operands that have zero-values, resulting in improved resource efficiency. A skip module is introduced to receive operands from memory, identify whether they have a zero value or not, and, if they are nonzero, generate an operand vector including an index before sending the operand vector to a processing element.Type: GrantFiled: September 30, 2021Date of Patent: June 6, 2023Assignee: International Business Machines CorporationInventors: Swagath Venkataramani, Sanchari Sen, Vijayalakshmi Srinivasan, Ankur Agrawal, Sunil K Shukla, Bruce Fleischer, Kailash Gopalakrishnan
-
Patent number: 11669463Abstract: A method is provided that includes receiving, in a permute network, a plurality of data elements for a vector instruction from a streaming engine, and mapping, by the permute network, the plurality of data elements to vector locations for execution of the vector instruction by a vector functional unit in a vector data path of a processor.Type: GrantFiled: January 31, 2022Date of Patent: June 6, 2023Assignee: Texas Instruments IncorporatedInventors: Soujanya Narnur, Timothy David Anderson, Mujibur Rahman, Duc Quang Bui
-
Patent number: 11669586Abstract: The present disclosure relates to an apparatus that includes decoding circuitry that decodes a single instruction. The single instruction includes an identifier of a first source operand, an identifier of a second source operand, an identifier of a destination, and an opcode indicative of execution circuitry is to multiply from the identified first source operand and the identified second source operand and store a result in the identified destination. Additionally, the apparatus includes execution circuitry to execute the single decoded instruction to calculate a dot product by calculating a plurality of products using data elements of the identified first and second operands using values less precise than the identified first and second source operands, summing the calculated products, and storing the summed products in the destination.Type: GrantFiled: February 25, 2022Date of Patent: June 6, 2023Assignee: Intel CorporationInventors: Gregory Henry, Alexander Heinecke
-
Patent number: 11650819Abstract: Systems, methods, and apparatuses relating to instructions to multiply floating-point values of about one are described.Type: GrantFiled: December 13, 2019Date of Patent: May 16, 2023Assignee: Intel CorporationInventors: Mohamed Elmalaki, Elmoustapha Ould-Ahmed-Vall
-
Patent number: 11645226Abstract: Embodiments are directed to a processor having a functional slice architecture. The processor is divided into tiles (or functional units) organized into a plurality of functional slices. The functional slices are configured to perform specific operations within the processor, which includes memory slices for storing operand data and arithmetic logic slices for performing operations on received operand data (e.g., vector processing, matrix manipulation). The processor includes a plurality of functional slices of a module type, each functional slice having a plurality of tiles. The processor further includes a plurality of data transport lanes for transporting data in a direction indicated in a corresponding instruction. The processor also includes a plurality of instruction queues, each instruction queue associated with a corresponding functional slice of the plurality of functional slices, wherein the instructions in the instruction queues comprise a functional slice specific operation code.Type: GrantFiled: March 17, 2022Date of Patent: May 9, 2023Assignee: Groq, Inc.Inventors: Dennis Charles Abts, Jonathan Alexander Ross, John Thompson, Gregory Michael Thorson
-
Patent number: 11636174Abstract: Described herein is an accelerator device including a host interface, a fabric interconnect coupled with the host interface, and one or more hardware tiles coupled with the fabric interconnect, the one or more hardware tiles including sparse matrix multiply acceleration hardware including a systolic array with feedback inputs.Type: GrantFiled: November 16, 2021Date of Patent: April 25, 2023Assignee: Intel CorporationInventors: Subramaniam Maiyuran, Jorge Parra, Supratim Pal, Ashutosh Garg, Shubra Marwaha, Chandra Gurram, Darin Starkey, Durgesh Borkar, Varghese George
-
Patent number: 11625584Abstract: Examples described herein relate to a neural network whose weights from a matrix are selected from a set of weights stored in a memory on-chip with a processing engine for generating multiply and carry operations. The number of weights in the set of weights stored in the memory can be less than a number of weights in the matrix thereby reducing an amount of memory used to store weights in a matrix. The weights in the memory can be generated in training using gradients from back propagation. Weights in the memory can be selected using a tabulation hash calculation on entries in a table.Type: GrantFiled: June 17, 2019Date of Patent: April 11, 2023Assignee: Intel CorporationInventors: Raghavan Kumar, Gregory K. Chen, Huseyin Ekin Sumbul, Phil Knag, Ram Krishnamurthy
-
Patent number: 11625356Abstract: A computer comprising a plurality of interconnected processing nodes arranged in a configuration in which multiple layers of interconnected nodes are arranged along an axis, each layer comprising at least four processing nodes connected in a non-axial ring by at least respective intralayer link between each pair of neighbouring processing nodes, wherein each of the at least four processing nodes in each layer is connected to a respective corresponding node in one or more adjacent layer by a respective interlayer link, the computer being programmed to provide in the configuration two embedded one dimensional paths and to transmit data around each of the two embedded one dimensional paths, each embedded one dimensional path using all processing nodes of the computer in such a manner that the two embedded one dimensional paths operate simultaneously without sharing links.Type: GrantFiled: March 24, 2021Date of Patent: April 11, 2023Assignee: GRAPHCORE LIMITEDInventor: Simon Knowles
-
Patent number: 11625244Abstract: Embodiments are directed to systems and methods for reuse of FMA execution unit hardware logic to provide native support for execution of get exponent, get mantissa, and/or scale instructions within a GPU. These new instructions may be used to implement branch-free emulation algorithms for mathematical functions and analytic functions (e.g., transcendental functions) by detecting and handling various special case inputs within a pre-processing stage of the FMA execution unit, which allows the main dataflow of the FMA execution unit to be bypassed for such special cases. Since special cases are handled by the FMA execution unit, library functions emulating various functions, including, but not limited to logarithm, exponential, and division operations may be implemented with significantly fewer lines of machine-level code, thereby providing improved performance for HPC applications.Type: GrantFiled: June 22, 2021Date of Patent: April 11, 2023Assignee: Intel CorporationInventors: Shuai Mu, Cristina S. Anderson, Subramaniam Maiyuran