Patents Examined by Eric Coleman
  • Patent number: 11520726
    Abstract: A processor comprises a plurality of processing units on an integrated circuit interconnected by an exchange. The exchange has a group of exchange paths extending between first and second portions of the integrated circuit. Each group has a first exchange block in the first portion and a second exchange block in the second portion. The processor has a first external interface in the first portion a second external interface in the second portion and a routing bus which routes packets between the external interfaces and the exchange blocks. The first external interface exchanges packets between the integrated circuit and a host. The second interface exchanges packets between the integrated circuit and another integrated circuit. Errors may be trapped when packets are wrongly addressed. A network of such processors is also provided.
    Type: Grant
    Filed: July 14, 2021
    Date of Patent: December 6, 2022
    Assignee: GRAPHCORE LIMITED
    Inventor: Hachem Yassine
  • Patent number: 11520587
    Abstract: Systems and methods for instruction decoding using hash tables. An example method of constructing a decoding tree comprises: generating an aggregated vector of differentiating bit scores representing at least a subset of a set of processor instructions; identifying, based on the aggregated vector of differentiating bit scores, one or more opcode bit positions; and constructing a hash table implementing a current level of a decoding tree representing the subset of the set of processor instructions, wherein the hash table is indexed by one or more opcode bits identified by the one or more opcode bit positions.
    Type: Grant
    Filed: May 17, 2021
    Date of Patent: December 6, 2022
    Assignee: Parallels International GmbH
    Inventors: Alexey Koryakin, Nikolay Dobrovolskiy
  • Patent number: 11507377
    Abstract: An arithmetic processing circuit includes an fetch unit configured to generate fetch addresses, an address table configured to store a branch address and a first tag for each of a plurality of indexes, the indexes being a first bit string extracted from a fetch address by including at least one bit among instruction address bits whose values vary within one fetch line, the first tag being a second bit string situated at higher bit positions than the first bit string, an upper tag storage unit configured to store a second tag situated at higher bit positions than the first tag, and a branch determination unit configured to supply to the fetch unit the branch address retrieved from the address table, upon determining that the first tag retrieved from the address table and the second tag in the upper tag storage unit match respective portions of the fetch address.
    Type: Grant
    Filed: October 5, 2021
    Date of Patent: November 22, 2022
    Assignee: FUJITSU LIMITED
    Inventor: Ryohei Okazaki
  • Patent number: 11507375
    Abstract: In an example, an apparatus comprises a plurality of execution units, and a first general register file (GRF) communicatively couple to the plurality of execution units, wherein the first GRF is shared by the plurality of execution units. Other embodiments are also disclosed and claimed.
    Type: Grant
    Filed: May 12, 2021
    Date of Patent: November 22, 2022
    Assignee: INTEL CORPORATION
    Inventors: Abhishek R. Appu, Altug Koker, Joydeep Ray, Kamal Sinha, Kiran C. Veernapu, Subramaniam Maiyuran, Prasoonkumar Surti, Guei-Yuan Lueh, David Puffer, Supratim Pal, Eric J. Hoekstra, Travis T. Schluessler, Linda L. Hurd
  • Patent number: 11507814
    Abstract: Disclosed herein includes a system, a method, and a device for improving power efficiency of a neural network implemented in an AI chip. In a neural network, large amounts of computations for multiply and accumulate can result in frequent toggles or transitions in states of logic circuits in the AI chip. Such frequent toggles or transitions of states of logic circuits can cause a large overall power consumption. In one aspect, to minimize the number of toggles, a sequence or order of computations can be rearranged. In one approach, total hamming distances for weights or input strings in different arrangements or sequences can be identified, and an arrangement or a sequence of weights or input strings with a reduced or minimum total hamming distance can be identified. An arrangement or a sequence of weights that render a reduced total hamming distance can be identified.
    Type: Grant
    Filed: June 12, 2020
    Date of Patent: November 22, 2022
    Assignee: Meta Platforms Technologies, LLC
    Inventors: Meng Li, Yilei Li
  • Patent number: 11501145
    Abstract: In one example, a neural network accelerator executes instructions to: load a first weight data element of an array of weight data elements from a memory into a systolic array; extract, from the instructions, information indicating a first number of input data elements to be obtained from a first address of the memory and a second number of input data elements to be skipped between adjacent input data elements to be obtained, the first address being based on first coordinates of the first weight data element, and the first and second numbers being based on a stride of a convolution operation; based on the information, obtain first input data elements from the first address of the memory; and control the systolic array to perform first computations based on the first weight data element and the first input data elements to generate first output data elements of an output data array.
    Type: Grant
    Filed: September 17, 2019
    Date of Patent: November 15, 2022
    Assignee: Amazon Technologies, Inc.
    Inventors: Jeffrey T. Huynh, Ron Diamant
  • Patent number: 11494194
    Abstract: An apparatus of an aspect includes a plurality of cores and shared core extension logic coupled with each of the plurality of cores. The shared core extension logic has shared data processing logic that is shared by each of the plurality of cores. Instruction execution logic, for each of the cores, in response to a shared core extension call instruction, is to call the shared core extension logic. The call is to have data processing performed by the shared data processing logic on behalf of a corresponding core. Other apparatus, methods, and systems are also disclosed.
    Type: Grant
    Filed: March 29, 2021
    Date of Patent: November 8, 2022
    Assignee: Intel Corporation
    Inventors: Eran Shifer, Mostafa Hagog, Eliyahu Turiel
  • Patent number: 11494624
    Abstract: Systems and methods for accelerating computation of an artificial neural network (ANN) are provided. An example method comprises receiving, by processing units coupled with arithmetic units and accumulation units, a first plurality of first values and a second plurality of second values associated with one or more neurons of the ANN, generating, by the processing units, a plurality of pairs, wherein each pair of the plurality of pairs has a first value of the first plurality and a second value of the second plurality and the first value and the second value satisfy criteria, performing, by the arithmetic units, mathematical operations on pairs of the plurality of pairs to obtain results; accumulating, by the accumulation units, the results to obtain accumulated results, and determining, by the processing units and based on the accumulated results, an output of the neurons.
    Type: Grant
    Filed: May 20, 2019
    Date of Patent: November 8, 2022
    Assignee: MIPSOLOGY SAS
    Inventors: Ludovic Larzul, Sebastien Delerse
  • Patent number: 11481327
    Abstract: A streaming engine employed in a digital data processor specifies a fixed read only data stream defined by plural nested loops. An address generator produces address of data elements for the nested loops. A steam head register stores data elements next to be supplied to functional units for use as operands. A stream template specifies loop count and loop dimension for each nested loop. A format definition field in the stream template specifies the number of loops and the stream template bits devoted to the loop counts and loop dimensions. This permits the same bits of the stream template to be interpreted differently enabling trade off between the number of loops supported and the size of the loop counts and loop dimensions.
    Type: Grant
    Filed: January 12, 2021
    Date of Patent: October 25, 2022
    Assignee: Texas Instruments Incorporated
    Inventor: Joseph Zbiciak
  • Patent number: 11481612
    Abstract: Some embodiments provide a neural network inference circuit (NNIC) for executing a neural network that includes multiple computation nodes at multiple layers. Each of a set of the computation nodes includes a dot product of input values and weight values. The NNIC includes dot product cores, each of which includes (i) partial dot product computation circuits to compute dot products between input values and weight values and (ii) memories to store the weight values and input values for a layer of the NN. The input values for a particular layer of the NN are stored in the memories of multiple cores. A starting memory location in a first core for the input values of the layer stored in the first core is the same as a starting memory location for the input values in each of the other cores that store the input values for the layer.
    Type: Grant
    Filed: March 15, 2019
    Date of Patent: October 25, 2022
    Assignee: PERCEIVE CORPORATION
    Inventors: Kenneth Duong, Jung Ko, Steven L. Teig
  • Patent number: 11475283
    Abstract: Embodiments of the present disclosure relate to a neural engine of a neural processor circuit having multiple multiply-add circuits and an accumulator circuit coupled to the multiply-add circuits. The multiply-add circuits perform multiply-add operations of a three dimensional convolution on a work unit of input data using a kernel to generate at least a portion of output data in a processing cycle. The accumulator circuit includes multiple batches of accumulators. Each batch of accumulators receives and stores, after the processing cycle, the portion of the output data for each output depth plane of multiple output depth planes. A corresponding batch of accumulators stores, after the processing cycle, the portion of the output data for a subset of the output channels and for each output depth plane.
    Type: Grant
    Filed: October 24, 2019
    Date of Patent: October 18, 2022
    Assignee: Apple Inc.
    Inventors: Christopher L. Mills, Sung Hee Park
  • Patent number: 11468002
    Abstract: A computing device includes an array of processing elements mutually connected to perform single instruction multiple data (SIMD) operations, memory cells connected to each processing element to store data related to the SIMD operations, and a cache connected to each processing element to cache data related to the SIMD operations. Caches of adjacent processing elements are connected. The same or another computing device includes rows of mutually connected processing elements to share data. The computing device further includes a row arithmetic logic unit (ALU) at each row of processing elements. The row ALU of a respective row is configured to perform an operation with processing elements of the respective row.
    Type: Grant
    Filed: February 26, 2021
    Date of Patent: October 11, 2022
    Assignee: UNTETHER AI CORPORATION
    Inventors: William Martin Snelgrove, Jonathan Scobbie
  • Patent number: 11467831
    Abstract: Systems and/or methods can include a ring based inverter chain that constructs multi-bit flip-flops that store time. The time flip-flops serve as storage units and enable pipeline operations. Single cells used in time series analysis, such as dynamic time warping are rendered by the time-domain circuits. The circuits include time flip-flops, Min, and ABS circuits. A and the matrix can be constructed through the single cells.
    Type: Grant
    Filed: December 18, 2019
    Date of Patent: October 11, 2022
    Assignee: Northwestern University
    Inventors: Jie Gu, Zhengyu Chen
  • Patent number: 11468306
    Abstract: A storage system includes a host device and a storage device. The host device provides first input data for data storage function and second input data for artificial intelligence (AI) function. The storage device stores the first input data from the host device, and performs AI calculation based on the second input data to generate calculation result data. The storage device includes a first processor, a first nonvolatile memory, a second processor and a second nonvolatile memory. The first processor controls an operation of the storage device. The first nonvolatile memory stores the first input data. The second processor performs the AI calculation, and is distinguished from the first processor. The second nonvolatile memory stores weight data associated with the AI calculation, and is distinguished from the first nonvolatile memory.
    Type: Grant
    Filed: June 19, 2020
    Date of Patent: October 11, 2022
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Jaehun Jang, Hongrak Son, Changkyu Seol, Hyejeong So, Hwaseok Oh, Pilsang Yoon, Jinsoo Lim
  • Patent number: 11468304
    Abstract: In one example, a hardware accelerator comprises an event register that stores an event; a hardware execution engine; and a controller configured to: extract, from an instruction, parameters of an operation to be performed by the hardware execution engine, and a synchronization primitive of a plurality of synchronization primitives for the event; and based on the synchronization primitive, perform at least one of: controlling a start time of the operation at the hardware execution engine, or determining whether to access the event register. The synchronization primitives include a set operation to set the event and/or a wait operation to suspend the operation at the hardware execution engine until the event is set. The plurality of synchronization primitive defines different conditions to be satisfied in order to perform the set operation.
    Type: Grant
    Filed: November 26, 2019
    Date of Patent: October 11, 2022
    Assignee: Amazon Technologies, Inc.
    Inventor: Ron Diamant
  • Patent number: 11461097
    Abstract: A content-addressable processing engine, also referred to herein as CAPE, is provided. Processing-in-memory (PIM) architectures attempt to overcome the von Neumann bottleneck by combining computation and storage logic into a single component. CAPE provides a general-purpose PIM microarchitecture that provides acceleration of vector operations while being programmable with standard reduced instruction set computing (RISC) instructions, such as RISC-V instructions with standard vector extensions. CAPE can be implemented as a standalone core that specializes in associative computing, and that can be integrated in a tiled multicore chip alongside other types of compute engines. Certain embodiments of CAPE achieve average speedups of 14× (up to 254×) over an area-equivalent out-of-order processor core tile with three levels of caches across a diverse set of representative applications.
    Type: Grant
    Filed: January 15, 2021
    Date of Patent: October 4, 2022
    Assignee: CORNELL UNIVERSITY
    Inventors: José F. Martínez, Helena Caminal, Kailin Yang, Khalid Al-Hawaj, Christopher Batten
  • Patent number: 11436187
    Abstract: Methods, systems, programmable atomic units, and machine-readable mediums that provide an exception as a response to the calling processor. That is, the programmable atomic unit will send a response to the calling processor. The calling processor will recognize that the exception has been raised and will handle the exception. Because the calling processor knows which process triggered the exception, the calling processor (e.g., the Operating System) can take appropriate action, such as terminating the calling process. The calling processor may be a same processor as that executing the programmable atomic transaction, or a different processor (e.g., on a different chiplet).
    Type: Grant
    Filed: October 20, 2020
    Date of Patent: September 6, 2022
    Assignee: Micron Technology, Inc.
    Inventor: Tony Brewer
  • Patent number: 11436477
    Abstract: A processor-implemented data processing method includes: generating compressed data of first matrix data based on information of a distance between valid elements included in the first matrix data; fetching second matrix data based on the compressed data; and generating output matrix data based on the compressed data and the second matrix data.
    Type: Grant
    Filed: April 24, 2020
    Date of Patent: September 6, 2022
    Assignees: Samsung Electronics Co., Ltd., SEOUL NATIONAL UNIVERSITY R&DB FOUNDATION
    Inventors: Yuhwan Ro, Byeongho Kim, Jaehyun Park, Jungho Ahn, Minbok Wi, Sunjung Lee, Eojin Lee, Wonkyung Jung, Jongwook Chung, Jaewan Choi
  • Patent number: 11436011
    Abstract: A processor-implemented method includes: determining a first multiplication matrix and a second multiplication matrix, based on an input multiplicand matrix and an input multiplier matrix that are generated from an input signal; determining a matrix to be restored, based on the first multiplication matrix and the second multiplication matrix; determining a matrix restoration constraint value, based on the matrix to be restored; determining a multiplication result of the input multiplicand matrix and the input multiplier matrix, based on the matrix restoration constraint value and the matrix to be restored; and analyzing the input signal based on the multiplication result.
    Type: Grant
    Filed: February 10, 2021
    Date of Patent: September 6, 2022
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Bochao Dang, Hao Wang
  • Patent number: 11429850
    Abstract: A circuit arrangement includes an array of MAC circuits, wherein each MAC circuit includes a cache configured for storage of a plurality of kernels. The MAC circuits are configured to receive a first set of data elements of an IFM at a first rate. The MAC circuits are configured to perform first MAC operations on the first set of the data elements and a first one of the kernels associated with a first OFM depth index during a first MAC cycle, wherein a rate of MAC cycles is faster than the first rate. The MAC circuits are configured to perform second MAC operations on the first set of the data elements and a second one of the kernels associated with a second OFM depth index during a second MAC cycle that consecutively follows the first MAC cycle.
    Type: Grant
    Filed: July 19, 2018
    Date of Patent: August 30, 2022
    Assignee: XILINX, INC.
    Inventors: Xiaoqian Zhang, Ephrem C. Wu, David Berman