Patents Examined by Eric Coleman
  • Patent number: 11308026
    Abstract: Systems and methods are provided to enable parallelized multiply-accumulate operations in a systolic array. Each row of the systolic array can include multiple busses enabling independent transmission of inputs along the respective bus. Each processing element of a given row-oriented bus can receive an input from a prior element of the given row-oriented bus, and perform arithmetic operations on the input. Each processing element can generate an output partial sum based on the arithmetic operations, provide the input to a next processing element of the given row-oriented bus, without the input being processed by a processing element of the row located between the two processing elements that uses a different row-oriented bus. Use of row-oriented busses can enable parallelization to increase speed or enable increased latency at individual processing elements.
    Type: Grant
    Filed: June 29, 2020
    Date of Patent: April 19, 2022
    Assignee: Amazon Technologies, Inc.
    Inventors: Thomas A Volpe, Vasanta Kumar Palisetti, Thomas Elmer, Kiran K Seshadri, FNU Arun Kumar
  • Patent number: 11307861
    Abstract: A method performed in a processor, includes: receiving, in the processor, a branch instruction in the processing; determining, by the processor, an address of an instruction after the branch instruction as a candidate for speculative execution, the address including an object identification and an offset; and determining, by the processor, whether or not to perform speculative execution of the instruction after the branch instruction based on the object identification of the address.
    Type: Grant
    Filed: July 29, 2020
    Date of Patent: April 19, 2022
    Assignee: Micron Technology, Inc.
    Inventor: Steven Jeffrey Wallach
  • Patent number: 11308027
    Abstract: Systems and methods are provided to enable parallelized multiply-accumulate operations in a systolic array. Each column of the systolic array can include multiple busses enabling independent transmission of input partial sums along the respective bus. Each processing element of a given columnar bus can receive an input partial sum from a prior element of the given columnar bus, and perform arithmetic operations on the input partial sum. Each processing element can generate an output partial sum based on the arithmetic operations, provide the output partial sum to a next processing element of the given columnar bus, without the output partial sum being processed by a processing element of the column located between the two processing elements that uses a different columnar bus. Use of columnar busses can enable parallelization to increase speed or enable increased latency at individual processing elements.
    Type: Grant
    Filed: June 29, 2020
    Date of Patent: April 19, 2022
    Assignee: Amazon Technologies, Inc.
    Inventors: Thomas A Volpe, Sundeep Amirineni, Thomas Elmer
  • Patent number: 11288069
    Abstract: Embodiments detailed herein relate to matrix operations. In particular, the loading of a matrix (tile) from memory.
    Type: Grant
    Filed: July 1, 2017
    Date of Patent: March 29, 2022
    Assignee: Intel Corporation
    Inventors: Robert Valentine, Menachem Adelman, Elmoustapha Ould-Ahmed-Vall, Bret L. Toll, Milind B. Girkar, Zeev Sperber, Mark J. Charney, Rinat Rappoport, Jesus Corbal, Stanislav Shwartsman, Igor Yanover, Alexander F. Heinecke, Barukh Ziv, Dan Baum, Yuri Gebil
  • Patent number: 11275588
    Abstract: Embodiments of an apparatus comprising a decoder to decode an instruction having fields for an opcode and a destination operand and execution circuitry to execute the decoded instruction to perform a save of processor state components to an area located at a destination memory address specified by the destination operand, wherein a size of the area is defined by at least one indication of an execution of an instruction operating on a specified group of processor states are described.
    Type: Grant
    Filed: July 1, 2017
    Date of Patent: March 15, 2022
    Assignee: Intel Corporation
    Inventors: Robert Valentine, Mark J. Charney, Rinat Rappoport, Vivekananthan Sanjeepan
  • Patent number: 11275712
    Abstract: In an embodiment, a method for processing data in a single instruction multiple data (SIMD) computer architecture is provided. A processing element (PE) may determine based on a masking instruction, a predication state indicative of one of a conditional predication mode and an absolute predication mode. The PE may receive a predicated instruction and, based on a value of a head bit of the bits of a predication mask and on the value indicative of the predication state whether to commit a computation corresponding to execution of the predicated instruction. In another embodiment, a SIMD controller stores loops and sections of a program as a separate instruction stream record for generating the memory address of the next instruction. For data streams, the SIMD controller records information for each data memory access that references the same register files that are used by the instruction streams.
    Type: Grant
    Filed: August 20, 2020
    Date of Patent: March 15, 2022
    Assignee: NORTHROP GRUMMAN SYSTEMS CORPORATION
    Inventors: Paul Kenton Tschirhart, Brian Konigsburg
  • Patent number: 11269637
    Abstract: In some examples, a system includes a first processor, a second processor, and a storage medium to store first information comprising machine-readable instructions executable by the second processor. The first processor is to validate the machine-readable instructions using an iterative validation process involving a plurality of iterations at different times, where each respective iteration of the plurality of iterations includes issuing a respective indication to the second processor to compute a value based on a respective subset of the first information, wherein the respective indication includes respective subset information identifying the respective subset, wherein the respective subset information differs from different subset information included in another indication issued in another iteration of the plurality of iterations, the different subset information identifies a different subset of the first information.
    Type: Grant
    Filed: July 23, 2020
    Date of Patent: March 8, 2022
    Assignee: Hewlett Packard Enterprise Development LP
    Inventor: Justin York
  • Patent number: 11269635
    Abstract: The document generally describes hardware for computing multiple orders of statistical moments. In one aspect, a system includes multiple stages of compute units. A first stage includes a first sequence of compute units includes a first compute unit configured to compute a first raw statistical moment for a first portion of data points in the time series of data points and one or more first additional compute units that are each configured to compute a respective first statistical moment for the first portion of data points. Each additional stage includes a second sequence of compute units for computing statistical moments for a respective second portion of the time series of data points. Each additional stage includes a second compute unit configured to compute the first raw statistical moment for the respective second portion of the time series of data points and one or more second additional compute units.
    Type: Grant
    Filed: October 20, 2020
    Date of Patent: March 8, 2022
    Assignee: Accenture Global Solutions Limited
    Inventor: Eric Tristan Lemoine
  • Patent number: 11263008
    Abstract: Embodiments detailed herein relate to matrix operations. In particular, embodiment of broadcasting elements are described. For example, some embodiments describe broadcasting a scalar to all configured data element positons of a destination matrix (tile). For example, some embodiments describe broadcasting a row to all configured data element positons of a destination matrix (tile). For example, some embodiments describe broadcasting a column to all configured data element positons of a destination matrix (tile).
    Type: Grant
    Filed: July 1, 2017
    Date of Patent: March 1, 2022
    Assignee: Intel Corporation
    Inventors: Robert Valentine, Zeev Sperber, Mark J. Charney, Bret L. Toll, Jesus Corbal, Alexander Heinecke, Barukh Ziv, Dan Baum, Elmoustapha Ould-Ahmed-Vall, Stanislav Shwartsman
  • Patent number: 11263015
    Abstract: Described herein are systems and methods for microarchitectural sensitive tag flow. For example, some methods include detecting dependence of data stored in a second data storage circuitry on the first instruction, where the first instruction will output a value to be stored in the second data storage circuitry, and wherein the second data storage circuitry is associated with a third tag indicating whether the second data storage circuitry has been designated as storing sensitive data; responsive to the dependence of data stored in the second data storage circuitry on the first instruction, checking whether the second tag indicates a sensitive instruction; and, responsive to the second tag indicating a sensitive instruction, updating the third tag to indicate that data stored in the second data storage circuitry has been designated as sensitive.
    Type: Grant
    Filed: October 27, 2020
    Date of Patent: March 1, 2022
    Assignee: Marvell Asia Pte, Ltd.
    Inventor: Shubhendu Sekhar Mukherjee
  • Patent number: 11263291
    Abstract: The present disclosure relates to an apparatus that includes decoding circuitry that decodes a single instruction. The single instruction includes an identifier of a first source operand, an identifier of a second source operand, an identifier of a destination, and an opcode indicative of execution circuitry is to multiply from the identified first source operand and the identified second source operand and store a result in the identified destination. Additionally, the apparatus includes execution circuitry to execute the single decoded instruction to calculate a dot product by calculating a plurality of products using data elements of the identified first and second operands using values less precise than the identified first and second source operands, summing the calculated products, and storing the summed products in the destination.
    Type: Grant
    Filed: June 26, 2020
    Date of Patent: March 1, 2022
    Assignee: Intel Corporation
    Inventors: Gregory Henry, Alexander Heinecke
  • Patent number: 11256506
    Abstract: A processor includes a front end, an execution unit, a retirement stage, a counter, and a performance monitoring unit. The front end includes logic to receive an event instruction to enable supervision of a front end event that will delay execution of instructions. The execution unit includes logic to set a register with parameters for supervision of the front end event. The front end further includes logic to receive a candidate instruction and match the candidate instruction to the front end event. The counter includes logic to generate the front end event upon retirement of the candidate instruction.
    Type: Grant
    Filed: March 26, 2020
    Date of Patent: February 22, 2022
    Assignee: Intel Corporation
    Inventor: Ahmad Yasin
  • Patent number: 11256503
    Abstract: A processing device includes an array of processing elements, each processing element including an arithmetic logic unit to perform an operation. The processing device further includes interconnections among the array of processing elements to provide direct communication among neighboring processing elements of the array of processing elements. A processing element of the array of processing elements may be connected to a first neighbor processing element that is immediately adjacent the processing element. The processing element may be further connected to a second neighbor processing element that is immediately adjacent the first neighbor processing element. A processing element of the array of processing elements may be connected to a neighbor processing element via an input selector to selectively take output of the neighbor processing element as input to the processing element. A computing device may include such processing devices in an arrangement of banks.
    Type: Grant
    Filed: March 11, 2020
    Date of Patent: February 22, 2022
    Assignee: UNTETHER AI CORPORATION
    Inventors: Trevis Chandler, William Martin Snelgrove, Darrick John Wiebe
  • Patent number: 11256510
    Abstract: Techniques are disclosed relating to fetching items from a compute command stream that includes compute kernels. In some embodiments, stream fetch circuitry sequentially pre-fetches items from the stream and stores them in a buffer. In some embodiments, fetch parse circuitry iterate through items in the buffer using a fetch parse pointer to detect indirect-data-access items and/or redirect items in the buffer. The fetch parse circuitry may send detected indirect data accesses to indirect-fetch circuitry, which may buffer requests. In some embodiments, execute parse circuitry iterates through items in the buffer using an execute parse pointer (e.g., which may trail the fetch parse pointer) and outputs both item data from the buffer and indirect-fetch results from indirect-fetch circuitry for execution. In various embodiments, the disclosed techniques may reduce fetch latency for compute kernels.
    Type: Grant
    Filed: October 8, 2020
    Date of Patent: February 22, 2022
    Assignee: Apple Inc.
    Inventors: Andrew M. Havlir, Jeffrey T. Brady
  • Patent number: 11243905
    Abstract: A data path block circuit is disclosed. The data path block circuit includes a data path circuit having logic circuits, each configured to perform a data path operation to generate a result based on first and second operands. The data path block circuit also includes a first operand multiplexer, having inputs, each connected to one of a first register file, including a quantity of read and write ports, and a second register file, including a different quantity of read and write ports. The data path block circuit also includes a second operand multiplexer, having inputs, each connected to one of the first register file and the second register file. At least one of the first and second operand multiplexers includes a data input connected to the first register file. At least one of the first and second operand multiplexers includes a data input connected to the second register file.
    Type: Grant
    Filed: July 28, 2020
    Date of Patent: February 8, 2022
    Assignee: SHENZHEN GOODIX TECHNOLOGY CO., LTD.
    Inventor: Jaehoon Heo
  • Patent number: 11237831
    Abstract: A method is provided that includes receiving, in a permute network, a plurality of data elements for a vector instruction from a streaming engine, and mapping, by the permute network, the plurality of data elements to vector locations for execution of the vector instruction by a vector functional unit in a vector data path of a processor.
    Type: Grant
    Filed: May 20, 2020
    Date of Patent: February 1, 2022
    Assignee: Texas Instmments Incorporated
    Inventors: Soujanya Narnur, Timothy David Anderson, Mujibur Rahman, Duc Quang Bui
  • Patent number: 11237833
    Abstract: The present invention discloses an instruction processing apparatus, comprising a first register adapted to store first source data, a second register adapted to store second source data, a third register adapted to store accumulated data, a decoder adapted to receive and decode a multiply-accumulate instruction, and an execution unit. The multiply-accumulate instruction indicates that the first register serves as a first operand, the second register serves as a second operand, the third register serves as a third operand, and a shift flag.
    Type: Grant
    Filed: April 10, 2020
    Date of Patent: February 1, 2022
    Assignee: Alibaba Group Holding Limited
    Inventors: Jiahui Luo, Zhijian Chen, Yubo Guo, Wenmeng Zhang
  • Patent number: 11237909
    Abstract: A method, computer program product, and a computer system are disclosed for processing information using hardware instructions in a processor of a computer system by performing a hardware reduction instruction using an input to calculate at least one range reduction factor of the input; performing a hardware restoration instruction using the input to calculate at least one range restoration factor of the input; and performing a final fused multiply add (FMA) type of hardware instruction or a multiply (FM) hardware instruction by combining an approximation based on a value reduced by the at least one range reduction factor with the at least one range restoration factor.
    Type: Grant
    Filed: August 21, 2020
    Date of Patent: February 1, 2022
    Assignee: International Business Machines Corporation
    Inventors: Robert F. Enenkel, Christopher Anand, Adele Olejarz, Lucas Dutton
  • Patent number: 11231935
    Abstract: Vectorized sorted-set intersection is performed using conflict-detection single instruction, multiple data (SIMD) instructions. A first ordered subset of values of a first ordered set of distinct values and a second ordered subset of values of a second ordered set of distinct values is loaded into a register. A first value in the register that matches another value in the register (i.e., common values) is identified by performing an SIMD instruction. The first value is then stored in a result set representing a merge-sort result set between the first ordered set of distinct values and the second ordered set of distinct values.
    Type: Grant
    Filed: April 13, 2020
    Date of Patent: January 25, 2022
    Assignee: Oracle International Corporation
    Inventors: Benjamin Schlegel, Pit Fender, Matthias Brantner, Hassan Chafi
  • Patent number: 11232062
    Abstract: Systems and methods are provided to enable parallelized multiply-accumulate operations in a systolic array. Each column of the systolic array can include multiple busses enabling independent transmission of input partial sums along the respective bus. Each processing element can include a plurality of interconnects to receive a plurality of inputs corresponding to the multiple busses. Each processing element of a given columnar bus can receive an input from a prior element of the given columnar bus at an active bus position and perform arithmetic operations on the input. Each processing element can further receive a plurality of inputs at passive bus positions and provide the plurality of inputs to subsequent processing elements without the plurality of inputs being processed by the processing element. Use of columnar busses can enable parallelization to increase speed or enable increased latency at individual processing elements.
    Type: Grant
    Filed: June 29, 2020
    Date of Patent: January 25, 2022
    Inventors: Thomas A Volpe, Sundeep Amirineni, Thomas Elmer