Patents Examined by Eric Coleman

Securing conditional speculative instruction execution

Patent number: 11720367

Abstract: A method performed in a processor, includes: receiving, in the processor, a branch instruction in the processing; determining, by the processor, an address of an instruction after the branch instruction as a candidate for speculative execution, the address including an object identification and an offset; and determining, by the processor, whether or not to perform speculative execution of the instruction after the branch instruction based on the object identification of the address.

Type: Grant

Filed: March 29, 2022

Date of Patent: August 8, 2023

Inventor: Steven Jeffrey Wallach
Systems, methods, and apparatuses for tile store

Patent number: 11714642

Abstract: Embodiments detailed herein relate to matrix operations. In particular, the loading of a matrix (tile) from memory. For example, support for a loading instruction is described in at least a form of decode circuitry to decode an instruction having fields for an opcode, a source matrix operand identifier, and destination memory information, and execution circuitry to execute the decoded instruction to store each data element of configured rows of the identified source matrix operand to memory based on the destination memory information.

Type: Grant

Filed: March 28, 2022

Date of Patent: August 1, 2023

Assignee: Intel Corporation

Inventors: Robert Valentine, Menachem Adelman, Elmoustapha Ould-Ahmed-Vall, Bret L. Toll, Milind B. Girkar, Zeev Sperber, Mark J. Charney, Rinat Rappoport, Jesus Corbal, Stanislav Shwartsman, Igor Yanover, Alexander F. Heinecke, Barukh Ziv, Dan Baum, Yuri Gebil
Data processing device

Patent number: 11714639

Abstract: A data processing device has an instruction decoder, a control logic unit, and ALU. The instruction decoder decodes instruction codes of an arithmetic instruction. The control logic unit detects the effective data width of operation data to be processed according to the decode result from the instruction decoder and determines the number of cycles for the instruction execution corresponding to the effective, data width. The ALU executes the instruction with the number of cycles of the instruction execution determined by the control logic unit.

Type: Grant

Filed: December 29, 2021

Date of Patent: August 1, 2023

Assignee: RENESAS ELECTRONICS CORPORATION

Inventors: Sugako Ohtani, Hiroyuki Kondo
Electronic device including main processor and systolic array processor and operating method of electronic device

Patent number: 11709795

Abstract: Disclosed is an electronic device which includes a main processor, and a systolic array processor, and the systolic array processor includes processing elements, a kernel data memory that provides a kernel data set to the processing elements, a data memory that provides an input data set to the processing elements, and a controller that provides commands to the processing elements. The main processor translates source codes associated with the systolic array processor into commands of the systolic array processor, calculates a switching activity value based on the commands, and stores the translated commands and the switching activity value to a machine learning module, which is based on the systolic array processor.

Type: Grant

Filed: November 12, 2021

Date of Patent: July 25, 2023

Assignee: ELECTRONICS AND TELECOMMUNICATIONS RESEARCH INSTITUTE

Inventor: Jaehoon Chung
Exchange between stacked die

Patent number: 11709794

Abstract: Two or more die are stacked together in a stacked integrated circuit device. Each of the processors on these die is able to communicate with other processors on its die by sending data over the switching fabric of its respective die. The mechanism for sending data between processors on the same die (i.e. intradie communication) is reused for sending data between processors on different die (i.e. interdie communication). The reuse of the mechanism is enabled by assigning each processor a vertical neighbour on its opposing die. Each processor has an interdie connection that connects it to the output exchange bus of its neighbour. A processor is able to borrow the output exchange bus of its neighbour by sending data along the output exchange bus of its neighbour.

Type: Grant

Filed: October 19, 2021

Date of Patent: July 25, 2023

Assignee: GRAPHCORE LIMITED

Inventors: Stephen Felix, Richard Luke Southwell Osborne, Alan Graham Alexander
Quick clearing of registers

Patent number: 11704046

Abstract: A method of clearing of registers and logic designs with AND and OR logics to propagate the zero values provided to write enable signal buses upon the execution of clear instruction of more than one registers, allowing more than one architecturally visible registers to be cleared with one signal instruction regardless of the values of data buses.

Type: Grant

Filed: April 18, 2022

Date of Patent: July 18, 2023

Assignee: Texas Instruments Incorporated

Inventors: Timothy David Anderson, Duc Quang Bui, Soujanya Narnur
Interruptible and restartable matrix multiplication instructions, processors, methods, and systems

Patent number: 11698787

Abstract: A processor of an aspect includes a decode unit to decode a matrix multiplication instruction. The matrix multiplication instruction is to indicate a first memory location of a first source matrix, is to indicate a second memory location of a second source matrix, and is to indicate a third memory location where a result matrix is to be stored. The processor also includes an execution unit coupled with the decode unit. The execution unit, in response to the matrix multiplication instruction, is to multiply a portion of the first and second source matrices prior to an interruption, and store a completion progress indicator in response to the interruption. The completion progress indicator to indicate an amount of progress in multiplying the first and second source matrices, and storing corresponding result data to the third memory location, that is to have been completed prior to the interruption.

Type: Grant

Filed: June 29, 2021

Date of Patent: July 11, 2023

Assignee: INTEL CORPORATION

Inventors: Edward T. Grochowski, Asit K. Mishra, Robert Valentine, Mark J. Charney, Simon C. Steely, Jr.
Architecture to support synchronization between core and inference engine for machine learning

Patent number: 11687837

Abstract: A system to support a machine learning (ML) operation comprises a core configured to receive and interpret commands into a set of instructions for the ML operation and a memory unit configured to maintain data for the ML operation. The system further comprises an inference engine having a plurality of processing tiles, each comprising an on-chip memory (OCM) configured to maintain data for local access by components in the processing tile and one or more processing units configured to perform tasks of the ML operation on the data in the OCM. The system also comprises an instruction streaming engine configured to distribute the instructions to the processing tiles to control their operations and to synchronize data communication between the core and the inference engine so that data transmitted between them correctly reaches the corresponding processing tiles while ensuring coherence of data shared and distributed among the core and the OCMs.

Type: Grant

Filed: June 27, 2022

Date of Patent: June 27, 2023

Assignee: Marvell Asia Pte Ltd

Inventors: Avinash Sodani, Gopal Nalamalapu
Dual vector arithmetic logic unit

Patent number: 11675568

Abstract: A processing system executes wavefronts at multiple arithmetic logic unit (ALU) pipelines of a single instruction multiple data (SIMD) unit in a single execution cycle. The ALU pipelines each include a number of ALUs that execute instructions on wavefront operands that are collected from vector general process register (VGPR) banks at a cache and output results of the instructions executed on the wavefronts at a buffer. By storing wavefronts supplied by the VGPR banks at the cache, a greater number of wavefronts can be made available to the SIMD unit without increasing the VGPR bandwidth, enabling multiple ALU pipelines to execute instructions during a single execution cycle.

Type: Grant

Filed: December 14, 2020

Date of Patent: June 13, 2023

Assignee: Advanced Micro Devices, Inc.

Inventors: Bin He, Brian Emberling, Mark Leather, Michael Mantor
Enhanced protection of processors from a buffer overflow attack

Patent number: 11675587

Abstract: A method for changing a processor instruction randomly, covertly, and uniquely, so that the reverse process can restore it faithfully to its original form, making it virtually impossible for a malicious user to know how the bits are changed, preventing them from using a buffer overflow attack to write code with the same processor instruction changes into said processor's memory with the goal of taking control of the processor. When the changes are reversed prior to the instruction being executed, reverting the instruction back to its original value, malicious code placed in memory will be randomly altered so that when it is executed by the processor it produces chaotic, random behavior that will not allow control of the processor to be compromised, eventually producing a processing error that will cause the processor to either shut down the software process where the code exists to reload, or reset.

Type: Grant

Filed: August 13, 2021

Date of Patent: June 13, 2023

Inventor: Forrest L. Pierson
Computing efficient cross channel operations in parallel computing machines using systolic arrays

Patent number: 11669490

Abstract: An apparatus to facilitate computing efficient cross channel operations in parallel computing machines using systolic arrays is disclosed. The apparatus includes a plurality of registers and one or more processing elements communicably coupled to the plurality of registers. The one or more processing elements include a systolic array circuit to perform cross-channel operations on source data received from a single source register of the plurality of registers, wherein the systolic array circuit is modified to: receive inputs from the single source register at different stages of the systolic array circuit; perform cross-channel operations at channels of the systolic array circuit; bypass disabled channels of the systolic array circuit, the disabled channels not used to compute the cross-channel operations; and broadcast a final result of a final stage of the systolic array circuit to all channels of a destination register.

Type: Grant

Filed: November 3, 2021

Date of Patent: June 6, 2023

Assignee: INTEL CORPORATION

Inventors: Subramaniam Maiyuran, Jorge Parra, Supratim Pal, Chandra Gurram
Sparse systolic array design

Patent number: 11669489

Abstract: A systolic array can be configured to skip distributed operands that have zero-values, resulting in improved resource efficiency. A skip module is introduced to receive operands from memory, identify whether they have a zero value or not, and, if they are nonzero, generate an operand vector including an index before sending the operand vector to a processing element.

Type: Grant

Filed: September 30, 2021

Date of Patent: June 6, 2023

Assignee: International Business Machines Corporation

Inventors: Swagath Venkataramani, Sanchari Sen, Vijayalakshmi Srinivasan, Ankur Agrawal, Sunil K Shukla, Bruce Fleischer, Kailash Gopalakrishnan
Method and apparatus for permuting streamed data elements

Patent number: 11669463

Abstract: A method is provided that includes receiving, in a permute network, a plurality of data elements for a vector instruction from a streaming engine, and mapping, by the permute network, the plurality of data elements to vector locations for execution of the vector instruction by a vector functional unit in a vector data path of a processor.

Type: Grant

Filed: January 31, 2022

Date of Patent: June 6, 2023

Assignee: Texas Instruments Incorporated

Inventors: Soujanya Narnur, Timothy David Anderson, Mujibur Rahman, Duc Quang Bui
Systems and methods for combining low-mantissa units to achieve and exceed FP64 emulation of matrix multiplication

Patent number: 11669586

Abstract: The present disclosure relates to an apparatus that includes decoding circuitry that decodes a single instruction. The single instruction includes an identifier of a first source operand, an identifier of a second source operand, an identifier of a destination, and an opcode indicative of execution circuitry is to multiply from the identified first source operand and the identified second source operand and store a result in the identified destination. Additionally, the apparatus includes execution circuitry to execute the single decoded instruction to calculate a dot product by calculating a plurality of products using data elements of the identified first and second operands using values less precise than the identified first and second source operands, summing the calculated products, and storing the summed products in the destination.

Type: Grant

Filed: February 25, 2022

Date of Patent: June 6, 2023

Assignee: Intel Corporation

Inventors: Gregory Henry, Alexander Heinecke
Apparatuses, methods, and systems for instructions to multiply floating-point values of about one

Patent number: 11650819

Abstract: Systems, methods, and apparatuses relating to instructions to multiply floating-point values of about one are described.

Type: Grant

Filed: December 13, 2019

Date of Patent: May 16, 2023

Assignee: Intel Corporation

Inventors: Mohamed Elmalaki, Elmoustapha Ould-Ahmed-Vall
Compiler operations for tensor streaming processor

Patent number: 11645226

Abstract: Embodiments are directed to a processor having a functional slice architecture. The processor is divided into tiles (or functional units) organized into a plurality of functional slices. The functional slices are configured to perform specific operations within the processor, which includes memory slices for storing operand data and arithmetic logic slices for performing operations on received operand data (e.g., vector processing, matrix manipulation). The processor includes a plurality of functional slices of a module type, each functional slice having a plurality of tiles. The processor further includes a plurality of data transport lanes for transporting data in a direction indicated in a corresponding instruction. The processor also includes a plurality of instruction queues, each instruction queue associated with a corresponding functional slice of the plurality of functional slices, wherein the instructions in the instruction queues comprise a functional slice specific operation code.

Type: Grant

Filed: March 17, 2022

Date of Patent: May 9, 2023

Assignee: Groq, Inc.

Inventors: Dennis Charles Abts, Jonathan Alexander Ross, John Thompson, Gregory Michael Thorson
Scalable sparse matrix multiply acceleration using systolic arrays with feedback inputs

Patent number: 11636174

Abstract: Described herein is an accelerator device including a host interface, a fabric interconnect coupled with the host interface, and one or more hardware tiles coupled with the fabric interconnect, the one or more hardware tiles including sparse matrix multiply acceleration hardware including a systolic array with feedback inputs.

Type: Grant

Filed: November 16, 2021

Date of Patent: April 25, 2023

Assignee: Intel Corporation

Inventors: Subramaniam Maiyuran, Jorge Parra, Supratim Pal, Ashutosh Garg, Shubra Marwaha, Chandra Gurram, Darin Starkey, Durgesh Borkar, Varghese George
Reconfigurable memory compression techniques for deep neural networks

Patent number: 11625584

Abstract: Examples described herein relate to a neural network whose weights from a matrix are selected from a set of weights stored in a memory on-chip with a processing engine for generating multiply and carry operations. The number of weights in the set of weights stored in the memory can be less than a number of weights in the matrix thereby reducing an amount of memory used to store weights in a matrix. The weights in the memory can be generated in training using gradients from back propagation. Weights in the memory can be selected using a tabulation hash calculation on entries in a table.

Type: Grant

Filed: June 17, 2019

Date of Patent: April 11, 2023

Assignee: Intel Corporation

Inventors: Raghavan Kumar, Gregory K. Chen, Huseyin Ekin Sumbul, Phil Knag, Ram Krishnamurthy
Network computer with two embedded rings

Patent number: 11625356

Abstract: A computer comprising a plurality of interconnected processing nodes arranged in a configuration in which multiple layers of interconnected nodes are arranged along an axis, each layer comprising at least four processing nodes connected in a non-axial ring by at least respective intralayer link between each pair of neighbouring processing nodes, wherein each of the at least four processing nodes in each layer is connected to a respective corresponding node in one or more adjacent layer by a respective interlayer link, the computer being programmed to provide in the configuration two embedded one dimensional paths and to transmit data around each of the two embedded one dimensional paths, each embedded one dimensional path using all processing nodes of the computer in such a manner that the two embedded one dimensional paths operate simultaneously without sharing links.

Type: Grant

Filed: March 24, 2021

Date of Patent: April 11, 2023

Assignee: GRAPHCORE LIMITED

Inventor: Simon Knowles
Native support for execution of get exponent, get mantissa, and scale instructions within a graphics processing unit via reuse of fused multiply-add execution unit hardware logic

Patent number: 11625244

Abstract: Embodiments are directed to systems and methods for reuse of FMA execution unit hardware logic to provide native support for execution of get exponent, get mantissa, and/or scale instructions within a GPU. These new instructions may be used to implement branch-free emulation algorithms for mathematical functions and analytic functions (e.g., transcendental functions) by detecting and handling various special case inputs within a pre-processing stage of the FMA execution unit, which allows the main dataflow of the FMA execution unit to be bypassed for such special cases. Since special cases are handled by the FMA execution unit, library functions emulating various functions, including, but not limited to logarithm, exponential, and division operations may be implemented with significantly fewer lines of machine-level code, thereby providing improved performance for HPC applications.

Type: Grant

Filed: June 22, 2021

Date of Patent: April 11, 2023

Assignee: Intel Corporation

Inventors: Shuai Mu, Cristina S. Anderson, Subramaniam Maiyuran

prev 1 2 3 4 5 6 7 8 … next