Patents Examined by Eric Coleman

Multiple busses interleaved in a systolic array

Patent number: 11308026

Abstract: Systems and methods are provided to enable parallelized multiply-accumulate operations in a systolic array. Each row of the systolic array can include multiple busses enabling independent transmission of inputs along the respective bus. Each processing element of a given row-oriented bus can receive an input from a prior element of the given row-oriented bus, and perform arithmetic operations on the input. Each processing element can generate an output partial sum based on the arithmetic operations, provide the input to a next processing element of the given row-oriented bus, without the input being processed by a processing element of the row located between the two processing elements that uses a different row-oriented bus. Use of row-oriented busses can enable parallelization to increase speed or enable increased latency at individual processing elements.

Type: Grant

Filed: June 29, 2020

Date of Patent: April 19, 2022

Assignee: Amazon Technologies, Inc.

Inventors: Thomas A Volpe, Vasanta Kumar Palisetti, Thomas Elmer, Kiran K Seshadri, FNU Arun Kumar
Securing conditional speculative instruction execution

Patent number: 11307861

Abstract: A method performed in a processor, includes: receiving, in the processor, a branch instruction in the processing; determining, by the processor, an address of an instruction after the branch instruction as a candidate for speculative execution, the address including an object identification and an offset; and determining, by the processor, whether or not to perform speculative execution of the instruction after the branch instruction based on the object identification of the address.

Type: Grant

Filed: July 29, 2020

Date of Patent: April 19, 2022

Assignee: Micron Technology, Inc.

Inventor: Steven Jeffrey Wallach
Multiple accumulate busses in a systolic array

Patent number: 11308027

Abstract: Systems and methods are provided to enable parallelized multiply-accumulate operations in a systolic array. Each column of the systolic array can include multiple busses enabling independent transmission of input partial sums along the respective bus. Each processing element of a given columnar bus can receive an input partial sum from a prior element of the given columnar bus, and perform arithmetic operations on the input partial sum. Each processing element can generate an output partial sum based on the arithmetic operations, provide the output partial sum to a next processing element of the given columnar bus, without the output partial sum being processed by a processing element of the column located between the two processing elements that uses a different columnar bus. Use of columnar busses can enable parallelization to increase speed or enable increased latency at individual processing elements.

Type: Grant

Filed: June 29, 2020

Date of Patent: April 19, 2022

Assignee: Amazon Technologies, Inc.

Inventors: Thomas A Volpe, Sundeep Amirineni, Thomas Elmer
Systems, methods, and apparatuses for tile store

Patent number: 11288069

Abstract: Embodiments detailed herein relate to matrix operations. In particular, the loading of a matrix (tile) from memory.

Type: Grant

Filed: July 1, 2017

Date of Patent: March 29, 2022

Assignee: Intel Corporation

Inventors: Robert Valentine, Menachem Adelman, Elmoustapha Ould-Ahmed-Vall, Bret L. Toll, Milind B. Girkar, Zeev Sperber, Mark J. Charney, Rinat Rappoport, Jesus Corbal, Stanislav Shwartsman, Igor Yanover, Alexander F. Heinecke, Barukh Ziv, Dan Baum, Yuri Gebil
Context save with variable save state size

Patent number: 11275588

Abstract: Embodiments of an apparatus comprising a decoder to decode an instruction having fields for an opcode and a destination operand and execution circuitry to execute the decoded instruction to perform a save of processor state components to an area located at a destination memory address specified by the destination operand, wherein a size of the area is defined by at least one indication of an execution of an instruction operating on a specified group of processor states are described.

Type: Grant

Filed: July 1, 2017

Date of Patent: March 15, 2022

Assignee: Intel Corporation

Inventors: Robert Valentine, Mark J. Charney, Rinat Rappoport, Vivekananthan Sanjeepan
SIMD controller and SIMD predication scheme

Patent number: 11275712

Abstract: In an embodiment, a method for processing data in a single instruction multiple data (SIMD) computer architecture is provided. A processing element (PE) may determine based on a masking instruction, a predication state indicative of one of a conditional predication mode and an absolute predication mode. The PE may receive a predicated instruction and, based on a value of a head bit of the bits of a predication mask and on the value indicative of the predication state whether to commit a computation corresponding to execution of the predicated instruction. In another embodiment, a SIMD controller stores loops and sections of a program as a separate instruction stream record for generating the memory address of the next instruction. For data streams, the SIMD controller records information for each data memory access that references the same register files that are used by the instruction streams.

Type: Grant

Filed: August 20, 2020

Date of Patent: March 15, 2022

Assignee: NORTHROP GRUMMAN SYSTEMS CORPORATION

Inventors: Paul Kenton Tschirhart, Brian Konigsburg
Validating machine-readable instructions using an iterative validation process

Patent number: 11269637

Abstract: In some examples, a system includes a first processor, a second processor, and a storage medium to store first information comprising machine-readable instructions executable by the second processor. The first processor is to validate the machine-readable instructions using an iterative validation process involving a plurality of iterations at different times, where each respective iteration of the plurality of iterations includes issuing a respective indication to the second processor to compute a value based on a respective subset of the first information, wherein the respective indication includes respective subset information identifying the respective subset, wherein the respective subset information differs from different subset information included in another indication issued in another iteration of the plurality of iterations, the different subset information identifies a different subset of the first information.

Type: Grant

Filed: July 23, 2020

Date of Patent: March 8, 2022

Assignee: Hewlett Packard Enterprise Development LP

Inventor: Justin York
Hardware efficient statistical moment computation

Patent number: 11269635

Abstract: The document generally describes hardware for computing multiple orders of statistical moments. In one aspect, a system includes multiple stages of compute units. A first stage includes a first sequence of compute units includes a first compute unit configured to compute a first raw statistical moment for a first portion of data points in the time series of data points and one or more first additional compute units that are each configured to compute a respective first statistical moment for the first portion of data points. Each additional stage includes a second sequence of compute units for computing statistical moments for a respective second portion of the time series of data points. Each additional stage includes a second compute unit configured to compute the first raw statistical moment for the respective second portion of the time series of data points and one or more second additional compute units.

Type: Grant

Filed: October 20, 2020

Date of Patent: March 8, 2022

Assignee: Accenture Global Solutions Limited

Inventor: Eric Tristan Lemoine
Systems, methods, and apparatuses for tile broadcast

Patent number: 11263008

Abstract: Embodiments detailed herein relate to matrix operations. In particular, embodiment of broadcasting elements are described. For example, some embodiments describe broadcasting a scalar to all configured data element positons of a destination matrix (tile). For example, some embodiments describe broadcasting a row to all configured data element positons of a destination matrix (tile). For example, some embodiments describe broadcasting a column to all configured data element positons of a destination matrix (tile).

Type: Grant

Filed: July 1, 2017

Date of Patent: March 1, 2022

Assignee: Intel Corporation

Inventors: Robert Valentine, Zeev Sperber, Mark J. Charney, Bret L. Toll, Jesus Corbal, Alexander Heinecke, Barukh Ziv, Dan Baum, Elmoustapha Ould-Ahmed-Vall, Stanislav Shwartsman
Microarchitectural sensitive tag flow

Patent number: 11263015

Abstract: Described herein are systems and methods for microarchitectural sensitive tag flow. For example, some methods include detecting dependence of data stored in a second data storage circuitry on the first instruction, where the first instruction will output a value to be stored in the second data storage circuitry, and wherein the second data storage circuitry is associated with a third tag indicating whether the second data storage circuitry has been designated as storing sensitive data; responsive to the dependence of data stored in the second data storage circuitry on the first instruction, checking whether the second tag indicates a sensitive instruction; and, responsive to the second tag indicating a sensitive instruction, updating the third tag to indicate that data stored in the second data storage circuitry has been designated as sensitive.

Type: Grant

Filed: October 27, 2020

Date of Patent: March 1, 2022

Assignee: Marvell Asia Pte, Ltd.

Inventor: Shubhendu Sekhar Mukherjee
Systems and methods for combining low-mantissa units to achieve and exceed FP64 emulation of matrix multiplication

Patent number: 11263291

Abstract: The present disclosure relates to an apparatus that includes decoding circuitry that decodes a single instruction. The single instruction includes an identifier of a first source operand, an identifier of a second source operand, an identifier of a destination, and an opcode indicative of execution circuitry is to multiply from the identified first source operand and the identified second source operand and store a result in the identified destination. Additionally, the apparatus includes execution circuitry to execute the single decoded instruction to calculate a dot product by calculating a plurality of products using data elements of the identified first and second operands using values less precise than the identified first and second source operands, summing the calculated products, and storing the summed products in the destination.

Type: Grant

Filed: June 26, 2020

Date of Patent: March 1, 2022

Assignee: Intel Corporation

Inventors: Gregory Henry, Alexander Heinecke
Instruction and logic for tracking fetch performance bottlenecks

Patent number: 11256506

Abstract: A processor includes a front end, an execution unit, a retirement stage, a counter, and a performance monitoring unit. The front end includes logic to receive an event instruction to enable supervision of a front end event that will delay execution of instructions. The execution unit includes logic to set a register with parameters for supervision of the front end event. The front end further includes logic to receive a candidate instruction and match the candidate instruction to the front end event. The counter includes logic to generate the front end event upon retirement of the candidate instruction.

Type: Grant

Filed: March 26, 2020

Date of Patent: February 22, 2022

Assignee: Intel Corporation

Inventor: Ahmad Yasin
Computational memory

Patent number: 11256503

Abstract: A processing device includes an array of processing elements, each processing element including an arithmetic logic unit to perform an operation. The processing device further includes interconnections among the array of processing elements to provide direct communication among neighboring processing elements of the array of processing elements. A processing element of the array of processing elements may be connected to a first neighbor processing element that is immediately adjacent the processing element. The processing element may be further connected to a second neighbor processing element that is immediately adjacent the first neighbor processing element. A processing element of the array of processing elements may be connected to a neighbor processing element via an input selector to selectively take output of the neighbor processing element as input to the processing element. A computing device may include such processing devices in an arrangement of banks.

Type: Grant

Filed: March 11, 2020

Date of Patent: February 22, 2022

Assignee: UNTETHER AI CORPORATION

Inventors: Trevis Chandler, William Martin Snelgrove, Darrick John Wiebe
Low latency fetch circuitry for compute kernels

Patent number: 11256510

Abstract: Techniques are disclosed relating to fetching items from a compute command stream that includes compute kernels. In some embodiments, stream fetch circuitry sequentially pre-fetches items from the stream and stores them in a buffer. In some embodiments, fetch parse circuitry iterate through items in the buffer using a fetch parse pointer to detect indirect-data-access items and/or redirect items in the buffer. The fetch parse circuitry may send detected indirect data accesses to indirect-fetch circuitry, which may buffer requests. In some embodiments, execute parse circuitry iterates through items in the buffer using an execute parse pointer (e.g., which may trail the fetch parse pointer) and outputs both item data from the buffer and indirect-fetch results from indirect-fetch circuitry for execution. In various embodiments, the disclosed techniques may reduce fetch latency for compute kernels.

Type: Grant

Filed: October 8, 2020

Date of Patent: February 22, 2022

Assignee: Apple Inc.

Inventors: Andrew M. Havlir, Jeffrey T. Brady
RISC processor having specialized data path for specialized registers

Patent number: 11243905

Abstract: A data path block circuit is disclosed. The data path block circuit includes a data path circuit having logic circuits, each configured to perform a data path operation to generate a result based on first and second operands. The data path block circuit also includes a first operand multiplexer, having inputs, each connected to one of a first register file, including a quantity of read and write ports, and a second register file, including a different quantity of read and write ports. The data path block circuit also includes a second operand multiplexer, having inputs, each connected to one of the first register file and the second register file. At least one of the first and second operand multiplexers includes a data input connected to the first register file. At least one of the first and second operand multiplexers includes a data input connected to the second register file.

Type: Grant

Filed: July 28, 2020

Date of Patent: February 8, 2022

Assignee: SHENZHEN GOODIX TECHNOLOGY CO., LTD.

Inventor: Jaehoon Heo
Method and apparatus for permuting streamed data elements

Patent number: 11237831

Abstract: A method is provided that includes receiving, in a permute network, a plurality of data elements for a vector instruction from a streaming engine, and mapping, by the permute network, the plurality of data elements to vector locations for execution of the vector instruction by a vector functional unit in a vector data path of a processor.

Type: Grant

Filed: May 20, 2020

Date of Patent: February 1, 2022

Assignee: Texas Instmments Incorporated

Inventors: Soujanya Narnur, Timothy David Anderson, Mujibur Rahman, Duc Quang Bui
Multiply-accumulate instruction processing method and apparatus

Patent number: 11237833

Abstract: The present invention discloses an instruction processing apparatus, comprising a first register adapted to store first source data, a second register adapted to store second source data, a third register adapted to store accumulated data, a decoder adapted to receive and decode a multiply-accumulate instruction, and an execution unit. The multiply-accumulate instruction indicates that the first register serves as a first operand, the second register serves as a second operand, the third register serves as a third operand, and a shift flag.

Type: Grant

Filed: April 10, 2020

Date of Patent: February 1, 2022

Assignee: Alibaba Group Holding Limited

Inventors: Jiahui Luo, Zhijian Chen, Yubo Guo, Wenmeng Zhang
Load exploitation and improved pipelineability of hardware instructions

Patent number: 11237909

Abstract: A method, computer program product, and a computer system are disclosed for processing information using hardware instructions in a processor of a computer system by performing a hardware reduction instruction using an input to calculate at least one range reduction factor of the input; performing a hardware restoration instruction using the input to calculate at least one range restoration factor of the input; and performing a final fused multiply add (FMA) type of hardware instruction or a multiply (FM) hardware instruction by combining an approximation based on a value reduced by the at least one range reduction factor with the at least one range restoration factor.

Type: Grant

Filed: August 21, 2020

Date of Patent: February 1, 2022

Assignee: International Business Machines Corporation

Inventors: Robert F. Enenkel, Christopher Anand, Adele Olejarz, Lucas Dutton
Vectorized sorted-set intersection using conflict-detection SIMD instructions

Patent number: 11231935

Abstract: Vectorized sorted-set intersection is performed using conflict-detection single instruction, multiple data (SIMD) instructions. A first ordered subset of values of a first ordered set of distinct values and a second ordered subset of values of a second ordered set of distinct values is loaded into a register. A first value in the register that matches another value in the register (i.e., common values) is identified by performing an SIMD instruction. The first value is then stored in a result set representing a merge-sort result set between the first ordered set of distinct values and the second ordered set of distinct values.

Type: Grant

Filed: April 13, 2020

Date of Patent: January 25, 2022

Assignee: Oracle International Corporation

Inventors: Benjamin Schlegel, Pit Fender, Matthias Brantner, Hassan Chafi
Parallelism within a systolic array using multiple accumulate busses

Patent number: 11232062

Abstract: Systems and methods are provided to enable parallelized multiply-accumulate operations in a systolic array. Each column of the systolic array can include multiple busses enabling independent transmission of input partial sums along the respective bus. Each processing element can include a plurality of interconnects to receive a plurality of inputs corresponding to the multiple busses. Each processing element of a given columnar bus can receive an input from a prior element of the given columnar bus at an active bus position and perform arithmetic operations on the input. Each processing element can further receive a plurality of inputs at passive bus positions and provide the plurality of inputs to subsequent processing elements without the plurality of inputs being processed by the processing element. Use of columnar busses can enable parallelization to increase speed or enable increased latency at individual processing elements.

Type: Grant

Filed: June 29, 2020

Date of Patent: January 25, 2022

Inventors: Thomas A Volpe, Sundeep Amirineni, Thomas Elmer

prev … 5 6 7 8 9 10 11 12 13 … next