Patents Examined by Eric Coleman
  • Patent number: 11544213
    Abstract: A neural processor is provided. The neural processor includes a matrix device which is configured to generate an output feature map by processing a standard convolution operation and which has a systolic array architecture, and accelerators with an adder-tree structure which are configured to process depth-wise convolution operations for each of elements of the output feature map corresponding to lanes of the matrix device.
    Type: Grant
    Filed: July 7, 2021
    Date of Patent: January 3, 2023
    Assignees: Samsung Electronics Co., Ltd., Seoul National University R&DB Foundation
    Inventors: Dongyoung Kim, Jung Ho Ahn, Sunjung Lee, Jaewan Choi
  • Patent number: 11537859
    Abstract: Neural inference chips are provided. A neural core of the neural inference chip comprises a vector-matrix multiplier; a vector processor; and an activation unit operatively coupled to the vector processor. The vector-matrix multiplier, vector processor, and/or activation unit is adapted to operate at variable precision.
    Type: Grant
    Filed: December 6, 2019
    Date of Patent: December 27, 2022
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Andrew S. Cassidy, Rathinakumar Appuswamy, John V. Arthur, Pallab Datta, Steve Esser, Myron D. Flickner, Jeffrey McKinstry, Dharmendra S. Modha, Jun Sawada, Brian Taba
  • Patent number: 11537398
    Abstract: Method, apparatus, and program means for performing a string comparison operation. In one embodiment, an apparatus includes execution resources to execute a first instruction. In response to the first instruction, said execution resources store a result of a comparison between each data element of a first and second operand corresponding to a first and second text string, respectively.
    Type: Grant
    Filed: June 7, 2021
    Date of Patent: December 27, 2022
    Assignee: Intel Corporation
    Inventors: Michael A. Julier, Jeffrey D. Gray, Srinivas Chennupaty, Sean P. Mirkes, Mark P. Seconi
  • Patent number: 11526355
    Abstract: Examples of the present disclosure provide apparatuses and methods for smallest value element or largest value element determination in memory. An example method comprises: storing an elements vector comprising a plurality of elements in a group of memory cells coupled to an access line of an array; performing, using sensing circuitry coupled to the array, a logical operation using a first vector and a second vector as inputs, with a result of the logical operation being stored in the array as a result vector; updating the result vector responsive to performing a plurality of subsequent logical operations using the sensing circuitry; and providing an indication of which of the plurality of elements have one of a smallest value and a largest value.
    Type: Grant
    Filed: June 4, 2021
    Date of Patent: December 13, 2022
    Assignee: Micron Technology, Inc.
    Inventor: Sanjay Tiwari
  • Patent number: 11520587
    Abstract: Systems and methods for instruction decoding using hash tables. An example method of constructing a decoding tree comprises: generating an aggregated vector of differentiating bit scores representing at least a subset of a set of processor instructions; identifying, based on the aggregated vector of differentiating bit scores, one or more opcode bit positions; and constructing a hash table implementing a current level of a decoding tree representing the subset of the set of processor instructions, wherein the hash table is indexed by one or more opcode bits identified by the one or more opcode bit positions.
    Type: Grant
    Filed: May 17, 2021
    Date of Patent: December 6, 2022
    Assignee: Parallels International GmbH
    Inventors: Alexey Koryakin, Nikolay Dobrovolskiy
  • Patent number: 11520726
    Abstract: A processor comprises a plurality of processing units on an integrated circuit interconnected by an exchange. The exchange has a group of exchange paths extending between first and second portions of the integrated circuit. Each group has a first exchange block in the first portion and a second exchange block in the second portion. The processor has a first external interface in the first portion a second external interface in the second portion and a routing bus which routes packets between the external interfaces and the exchange blocks. The first external interface exchanges packets between the integrated circuit and a host. The second interface exchanges packets between the integrated circuit and another integrated circuit. Errors may be trapped when packets are wrongly addressed. A network of such processors is also provided.
    Type: Grant
    Filed: July 14, 2021
    Date of Patent: December 6, 2022
    Assignee: GRAPHCORE LIMITED
    Inventor: Hachem Yassine
  • Patent number: 11507814
    Abstract: Disclosed herein includes a system, a method, and a device for improving power efficiency of a neural network implemented in an AI chip. In a neural network, large amounts of computations for multiply and accumulate can result in frequent toggles or transitions in states of logic circuits in the AI chip. Such frequent toggles or transitions of states of logic circuits can cause a large overall power consumption. In one aspect, to minimize the number of toggles, a sequence or order of computations can be rearranged. In one approach, total hamming distances for weights or input strings in different arrangements or sequences can be identified, and an arrangement or a sequence of weights or input strings with a reduced or minimum total hamming distance can be identified. An arrangement or a sequence of weights that render a reduced total hamming distance can be identified.
    Type: Grant
    Filed: June 12, 2020
    Date of Patent: November 22, 2022
    Assignee: Meta Platforms Technologies, LLC
    Inventors: Meng Li, Yilei Li
  • Patent number: 11507375
    Abstract: In an example, an apparatus comprises a plurality of execution units, and a first general register file (GRF) communicatively couple to the plurality of execution units, wherein the first GRF is shared by the plurality of execution units. Other embodiments are also disclosed and claimed.
    Type: Grant
    Filed: May 12, 2021
    Date of Patent: November 22, 2022
    Assignee: INTEL CORPORATION
    Inventors: Abhishek R. Appu, Altug Koker, Joydeep Ray, Kamal Sinha, Kiran C. Veernapu, Subramaniam Maiyuran, Prasoonkumar Surti, Guei-Yuan Lueh, David Puffer, Supratim Pal, Eric J. Hoekstra, Travis T. Schluessler, Linda L. Hurd
  • Patent number: 11507377
    Abstract: An arithmetic processing circuit includes an fetch unit configured to generate fetch addresses, an address table configured to store a branch address and a first tag for each of a plurality of indexes, the indexes being a first bit string extracted from a fetch address by including at least one bit among instruction address bits whose values vary within one fetch line, the first tag being a second bit string situated at higher bit positions than the first bit string, an upper tag storage unit configured to store a second tag situated at higher bit positions than the first tag, and a branch determination unit configured to supply to the fetch unit the branch address retrieved from the address table, upon determining that the first tag retrieved from the address table and the second tag in the upper tag storage unit match respective portions of the fetch address.
    Type: Grant
    Filed: October 5, 2021
    Date of Patent: November 22, 2022
    Assignee: FUJITSU LIMITED
    Inventor: Ryohei Okazaki
  • Patent number: 11501145
    Abstract: In one example, a neural network accelerator executes instructions to: load a first weight data element of an array of weight data elements from a memory into a systolic array; extract, from the instructions, information indicating a first number of input data elements to be obtained from a first address of the memory and a second number of input data elements to be skipped between adjacent input data elements to be obtained, the first address being based on first coordinates of the first weight data element, and the first and second numbers being based on a stride of a convolution operation; based on the information, obtain first input data elements from the first address of the memory; and control the systolic array to perform first computations based on the first weight data element and the first input data elements to generate first output data elements of an output data array.
    Type: Grant
    Filed: September 17, 2019
    Date of Patent: November 15, 2022
    Assignee: Amazon Technologies, Inc.
    Inventors: Jeffrey T. Huynh, Ron Diamant
  • Patent number: 11494194
    Abstract: An apparatus of an aspect includes a plurality of cores and shared core extension logic coupled with each of the plurality of cores. The shared core extension logic has shared data processing logic that is shared by each of the plurality of cores. Instruction execution logic, for each of the cores, in response to a shared core extension call instruction, is to call the shared core extension logic. The call is to have data processing performed by the shared data processing logic on behalf of a corresponding core. Other apparatus, methods, and systems are also disclosed.
    Type: Grant
    Filed: March 29, 2021
    Date of Patent: November 8, 2022
    Assignee: Intel Corporation
    Inventors: Eran Shifer, Mostafa Hagog, Eliyahu Turiel
  • Patent number: 11494624
    Abstract: Systems and methods for accelerating computation of an artificial neural network (ANN) are provided. An example method comprises receiving, by processing units coupled with arithmetic units and accumulation units, a first plurality of first values and a second plurality of second values associated with one or more neurons of the ANN, generating, by the processing units, a plurality of pairs, wherein each pair of the plurality of pairs has a first value of the first plurality and a second value of the second plurality and the first value and the second value satisfy criteria, performing, by the arithmetic units, mathematical operations on pairs of the plurality of pairs to obtain results; accumulating, by the accumulation units, the results to obtain accumulated results, and determining, by the processing units and based on the accumulated results, an output of the neurons.
    Type: Grant
    Filed: May 20, 2019
    Date of Patent: November 8, 2022
    Assignee: MIPSOLOGY SAS
    Inventors: Ludovic Larzul, Sebastien Delerse
  • Patent number: 11481327
    Abstract: A streaming engine employed in a digital data processor specifies a fixed read only data stream defined by plural nested loops. An address generator produces address of data elements for the nested loops. A steam head register stores data elements next to be supplied to functional units for use as operands. A stream template specifies loop count and loop dimension for each nested loop. A format definition field in the stream template specifies the number of loops and the stream template bits devoted to the loop counts and loop dimensions. This permits the same bits of the stream template to be interpreted differently enabling trade off between the number of loops supported and the size of the loop counts and loop dimensions.
    Type: Grant
    Filed: January 12, 2021
    Date of Patent: October 25, 2022
    Assignee: Texas Instruments Incorporated
    Inventor: Joseph Zbiciak
  • Patent number: 11481612
    Abstract: Some embodiments provide a neural network inference circuit (NNIC) for executing a neural network that includes multiple computation nodes at multiple layers. Each of a set of the computation nodes includes a dot product of input values and weight values. The NNIC includes dot product cores, each of which includes (i) partial dot product computation circuits to compute dot products between input values and weight values and (ii) memories to store the weight values and input values for a layer of the NN. The input values for a particular layer of the NN are stored in the memories of multiple cores. A starting memory location in a first core for the input values of the layer stored in the first core is the same as a starting memory location for the input values in each of the other cores that store the input values for the layer.
    Type: Grant
    Filed: March 15, 2019
    Date of Patent: October 25, 2022
    Assignee: PERCEIVE CORPORATION
    Inventors: Kenneth Duong, Jung Ko, Steven L. Teig
  • Patent number: 11475283
    Abstract: Embodiments of the present disclosure relate to a neural engine of a neural processor circuit having multiple multiply-add circuits and an accumulator circuit coupled to the multiply-add circuits. The multiply-add circuits perform multiply-add operations of a three dimensional convolution on a work unit of input data using a kernel to generate at least a portion of output data in a processing cycle. The accumulator circuit includes multiple batches of accumulators. Each batch of accumulators receives and stores, after the processing cycle, the portion of the output data for each output depth plane of multiple output depth planes. A corresponding batch of accumulators stores, after the processing cycle, the portion of the output data for a subset of the output channels and for each output depth plane.
    Type: Grant
    Filed: October 24, 2019
    Date of Patent: October 18, 2022
    Assignee: Apple Inc.
    Inventors: Christopher L. Mills, Sung Hee Park
  • Patent number: 11467831
    Abstract: Systems and/or methods can include a ring based inverter chain that constructs multi-bit flip-flops that store time. The time flip-flops serve as storage units and enable pipeline operations. Single cells used in time series analysis, such as dynamic time warping are rendered by the time-domain circuits. The circuits include time flip-flops, Min, and ABS circuits. A and the matrix can be constructed through the single cells.
    Type: Grant
    Filed: December 18, 2019
    Date of Patent: October 11, 2022
    Assignee: Northwestern University
    Inventors: Jie Gu, Zhengyu Chen
  • Patent number: 11468304
    Abstract: In one example, a hardware accelerator comprises an event register that stores an event; a hardware execution engine; and a controller configured to: extract, from an instruction, parameters of an operation to be performed by the hardware execution engine, and a synchronization primitive of a plurality of synchronization primitives for the event; and based on the synchronization primitive, perform at least one of: controlling a start time of the operation at the hardware execution engine, or determining whether to access the event register. The synchronization primitives include a set operation to set the event and/or a wait operation to suspend the operation at the hardware execution engine until the event is set. The plurality of synchronization primitive defines different conditions to be satisfied in order to perform the set operation.
    Type: Grant
    Filed: November 26, 2019
    Date of Patent: October 11, 2022
    Assignee: Amazon Technologies, Inc.
    Inventor: Ron Diamant
  • Patent number: 11468306
    Abstract: A storage system includes a host device and a storage device. The host device provides first input data for data storage function and second input data for artificial intelligence (AI) function. The storage device stores the first input data from the host device, and performs AI calculation based on the second input data to generate calculation result data. The storage device includes a first processor, a first nonvolatile memory, a second processor and a second nonvolatile memory. The first processor controls an operation of the storage device. The first nonvolatile memory stores the first input data. The second processor performs the AI calculation, and is distinguished from the first processor. The second nonvolatile memory stores weight data associated with the AI calculation, and is distinguished from the first nonvolatile memory.
    Type: Grant
    Filed: June 19, 2020
    Date of Patent: October 11, 2022
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Jaehun Jang, Hongrak Son, Changkyu Seol, Hyejeong So, Hwaseok Oh, Pilsang Yoon, Jinsoo Lim
  • Patent number: 11468002
    Abstract: A computing device includes an array of processing elements mutually connected to perform single instruction multiple data (SIMD) operations, memory cells connected to each processing element to store data related to the SIMD operations, and a cache connected to each processing element to cache data related to the SIMD operations. Caches of adjacent processing elements are connected. The same or another computing device includes rows of mutually connected processing elements to share data. The computing device further includes a row arithmetic logic unit (ALU) at each row of processing elements. The row ALU of a respective row is configured to perform an operation with processing elements of the respective row.
    Type: Grant
    Filed: February 26, 2021
    Date of Patent: October 11, 2022
    Assignee: UNTETHER AI CORPORATION
    Inventors: William Martin Snelgrove, Jonathan Scobbie
  • Patent number: 11461097
    Abstract: A content-addressable processing engine, also referred to herein as CAPE, is provided. Processing-in-memory (PIM) architectures attempt to overcome the von Neumann bottleneck by combining computation and storage logic into a single component. CAPE provides a general-purpose PIM microarchitecture that provides acceleration of vector operations while being programmable with standard reduced instruction set computing (RISC) instructions, such as RISC-V instructions with standard vector extensions. CAPE can be implemented as a standalone core that specializes in associative computing, and that can be integrated in a tiled multicore chip alongside other types of compute engines. Certain embodiments of CAPE achieve average speedups of 14× (up to 254×) over an area-equivalent out-of-order processor core tile with three levels of caches across a diverse set of representative applications.
    Type: Grant
    Filed: January 15, 2021
    Date of Patent: October 4, 2022
    Assignee: CORNELL UNIVERSITY
    Inventors: José F. Martínez, Helena Caminal, Kailin Yang, Khalid Al-Hawaj, Christopher Batten