Patents Examined by Eric Coleman

Neural processor

Patent number: 11544213

Abstract: A neural processor is provided. The neural processor includes a matrix device which is configured to generate an output feature map by processing a standard convolution operation and which has a systolic array architecture, and accelerators with an adder-tree structure which are configured to process depth-wise convolution operations for each of elements of the output feature map corresponding to lanes of the matrix device.

Type: Grant

Filed: July 7, 2021

Date of Patent: January 3, 2023

Assignees: Samsung Electronics Co., Ltd., Seoul National University R&DB Foundation

Inventors: Dongyoung Kim, Jung Ho Ahn, Sunjung Lee, Jaewan Choi
Flexible precision neural inference processing unit

Patent number: 11537859

Abstract: Neural inference chips are provided. A neural core of the neural inference chip comprises a vector-matrix multiplier; a vector processor; and an activation unit operatively coupled to the vector processor. The vector-matrix multiplier, vector processor, and/or activation unit is adapted to operate at variable precision.

Type: Grant

Filed: December 6, 2019

Date of Patent: December 27, 2022

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Andrew S. Cassidy, Rathinakumar Appuswamy, John V. Arthur, Pallab Datta, Steve Esser, Myron D. Flickner, Jeffrey McKinstry, Dharmendra S. Modha, Jun Sawada, Brian Taba
Instruction and logic for processing text strings

Patent number: 11537398

Abstract: Method, apparatus, and program means for performing a string comparison operation. In one embodiment, an apparatus includes execution resources to execute a first instruction. In response to the first instruction, said execution resources store a result of a comparison between each data element of a first and second operand corresponding to a first and second text string, respectively.

Type: Grant

Filed: June 7, 2021

Date of Patent: December 27, 2022

Assignee: Intel Corporation

Inventors: Michael A. Julier, Jeffrey D. Gray, Srinivas Chennupaty, Sean P. Mirkes, Mark P. Seconi
Smallest or largest value element determination

Patent number: 11526355

Abstract: Examples of the present disclosure provide apparatuses and methods for smallest value element or largest value element determination in memory. An example method comprises: storing an elements vector comprising a plurality of elements in a group of memory cells coupled to an access line of an array; performing, using sensing circuitry coupled to the array, a logical operation using a first vector and a second vector as inputs, with a result of the logical operation being stored in the array as a result vector; updating the result vector responsive to performing a plurality of subsequent logical operations using the sensing circuitry; and providing an indication of which of the plurality of elements have one of a smallest value and a largest value.

Type: Grant

Filed: June 4, 2021

Date of Patent: December 13, 2022

Assignee: Micron Technology, Inc.

Inventor: Sanjay Tiwari
Instruction decoding using hash tables

Patent number: 11520587

Abstract: Systems and methods for instruction decoding using hash tables. An example method of constructing a decoding tree comprises: generating an aggregated vector of differentiating bit scores representing at least a subset of a set of processor instructions; identifying, based on the aggregated vector of differentiating bit scores, one or more opcode bit positions; and constructing a hash table implementing a current level of a decoding tree representing the subset of the set of processor instructions, wherein the hash table is indexed by one or more opcode bits identified by the one or more opcode bit positions.

Type: Grant

Filed: May 17, 2021

Date of Patent: December 6, 2022

Assignee: Parallels International GmbH

Inventors: Alexey Koryakin, Nikolay Dobrovolskiy
Host connected computer network

Patent number: 11520726

Abstract: A processor comprises a plurality of processing units on an integrated circuit interconnected by an exchange. The exchange has a group of exchange paths extending between first and second portions of the integrated circuit. Each group has a first exchange block in the first portion and a second exchange block in the second portion. The processor has a first external interface in the first portion a second external interface in the second portion and a routing bus which routes packets between the external interfaces and the exchange blocks. The first external interface exchanges packets between the integrated circuit and a host. The second interface exchanges packets between the integrated circuit and another integrated circuit. Errors may be trapped when packets are wrongly addressed. A network of such processors is also provided.

Type: Grant

Filed: July 14, 2021

Date of Patent: December 6, 2022

Assignee: GRAPHCORE LIMITED

Inventor: Hachem Yassine
Neural network based on total hamming distance

Patent number: 11507814

Abstract: Disclosed herein includes a system, a method, and a device for improving power efficiency of a neural network implemented in an AI chip. In a neural network, large amounts of computations for multiply and accumulate can result in frequent toggles or transitions in states of logic circuits in the AI chip. Such frequent toggles or transitions of states of logic circuits can cause a large overall power consumption. In one aspect, to minimize the number of toggles, a sequence or order of computations can be rearranged. In one approach, total hamming distances for weights or input strings in different arrangements or sequences can be identified, and an arrangement or a sequence of weights or input strings with a reduced or minimum total hamming distance can be identified. An arrangement or a sequence of weights that render a reduced total hamming distance can be identified.

Type: Grant

Filed: June 12, 2020

Date of Patent: November 22, 2022

Assignee: Meta Platforms Technologies, LLC

Inventors: Meng Li, Yilei Li
Hierarchical general register file (GRF) for execution block

Patent number: 11507375

Abstract: In an example, an apparatus comprises a plurality of execution units, and a first general register file (GRF) communicatively couple to the plurality of execution units, wherein the first GRF is shared by the plurality of execution units. Other embodiments are also disclosed and claimed.

Type: Grant

Filed: May 12, 2021

Date of Patent: November 22, 2022

Assignee: INTEL CORPORATION

Inventors: Abhishek R. Appu, Altug Koker, Joydeep Ray, Kamal Sinha, Kiran C. Veernapu, Subramaniam Maiyuran, Prasoonkumar Surti, Guei-Yuan Lueh, David Puffer, Supratim Pal, Eric J. Hoekstra, Travis T. Schluessler, Linda L. Hurd
Arithmetic processing circuit and arithmetic processing method

Patent number: 11507377

Abstract: An arithmetic processing circuit includes an fetch unit configured to generate fetch addresses, an address table configured to store a branch address and a first tag for each of a plurality of indexes, the indexes being a first bit string extracted from a fetch address by including at least one bit among instruction address bits whose values vary within one fetch line, the first tag being a second bit string situated at higher bit positions than the first bit string, an upper tag storage unit configured to store a second tag situated at higher bit positions than the first tag, and a branch determination unit configured to supply to the fetch unit the branch address retrieved from the address table, upon determining that the first tag retrieved from the address table and the second tag in the upper tag storage unit match respective portions of the fetch address.

Type: Grant

Filed: October 5, 2021

Date of Patent: November 22, 2022

Assignee: FUJITSU LIMITED

Inventor: Ryohei Okazaki
Memory operation for systolic array

Patent number: 11501145

Abstract: In one example, a neural network accelerator executes instructions to: load a first weight data element of an array of weight data elements from a memory into a systolic array; extract, from the instructions, information indicating a first number of input data elements to be obtained from a first address of the memory and a second number of input data elements to be skipped between adjacent input data elements to be obtained, the first address being based on first coordinates of the first weight data element, and the first and second numbers being based on a stride of a convolution operation; based on the information, obtain first input data elements from the first address of the memory; and control the systolic array to perform first computations based on the first weight data element and the first input data elements to generate first output data elements of an output data array.

Type: Grant

Filed: September 17, 2019

Date of Patent: November 15, 2022

Assignee: Amazon Technologies, Inc.

Inventors: Jeffrey T. Huynh, Ron Diamant
Processor having multiple cores, shared core extension logic, and shared core extension utilization instructions

Patent number: 11494194

Abstract: An apparatus of an aspect includes a plurality of cores and shared core extension logic coupled with each of the plurality of cores. The shared core extension logic has shared data processing logic that is shared by each of the plurality of cores. Instruction execution logic, for each of the cores, in response to a shared core extension call instruction, is to call the shared core extension logic. The call is to have data processing performed by the shared data processing logic on behalf of a corresponding core. Other apparatus, methods, and systems are also disclosed.

Type: Grant

Filed: March 29, 2021

Date of Patent: November 8, 2022

Assignee: Intel Corporation

Inventors: Eran Shifer, Mostafa Hagog, Eliyahu Turiel
Accelerating neuron computations in artificial neural networks with dual sparsity

Patent number: 11494624

Abstract: Systems and methods for accelerating computation of an artificial neural network (ANN) are provided. An example method comprises receiving, by processing units coupled with arithmetic units and accumulation units, a first plurality of first values and a second plurality of second values associated with one or more neurons of the ANN, generating, by the processing units, a plurality of pairs, wherein each pair of the plurality of pairs has a first value of the first plurality and a second value of the second plurality and the first value and the second value satisfy criteria, performing, by the arithmetic units, mathematical operations on pairs of the plurality of pairs to obtain results; accumulating, by the accumulation units, the results to obtain accumulated results, and determining, by the processing units and based on the accumulated results, an output of the neurons.

Type: Grant

Filed: May 20, 2019

Date of Patent: November 8, 2022

Assignee: MIPSOLOGY SAS

Inventors: Ludovic Larzul, Sebastien Delerse
Streaming engine with flexible streaming engine template supporting differing number of nested loops with corresponding loop counts and loop offsets

Patent number: 11481327

Abstract: A streaming engine employed in a digital data processor specifies a fixed read only data stream defined by plural nested loops. An address generator produces address of data elements for the nested loops. A steam head register stores data elements next to be supplied to functional units for use as operands. A stream template specifies loop count and loop dimension for each nested loop. A format definition field in the stream template specifies the number of loops and the stream template bits devoted to the loop counts and loop dimensions. This permits the same bits of the stream template to be interpreted differently enabling trade off between the number of loops supported and the size of the loop counts and loop dimensions.

Type: Grant

Filed: January 12, 2021

Date of Patent: October 25, 2022

Assignee: Texas Instruments Incorporated

Inventor: Joseph Zbiciak
Storage of input values across multiple cores of neural network inference circuit

Patent number: 11481612

Abstract: Some embodiments provide a neural network inference circuit (NNIC) for executing a neural network that includes multiple computation nodes at multiple layers. Each of a set of the computation nodes includes a dot product of input values and weight values. The NNIC includes dot product cores, each of which includes (i) partial dot product computation circuits to compute dot products between input values and weight values and (ii) memories to store the weight values and input values for a layer of the NN. The input values for a particular layer of the NN are stored in the memories of multiple cores. A starting memory location in a first core for the input values of the layer stored in the first core is the same as a starting memory location for the input values in each of the other cores that store the input values for the layer.

Type: Grant

Filed: March 15, 2019

Date of Patent: October 25, 2022

Assignee: PERCEIVE CORPORATION

Inventors: Kenneth Duong, Jung Ko, Steven L. Teig
Multi dimensional convolution in neural network processor

Patent number: 11475283

Abstract: Embodiments of the present disclosure relate to a neural engine of a neural processor circuit having multiple multiply-add circuits and an accumulator circuit coupled to the multiply-add circuits. The multiply-add circuits perform multiply-add operations of a three dimensional convolution on a work unit of input data using a kernel to generate at least a portion of output data in a processing cycle. The accumulator circuit includes multiple batches of accumulators. Each batch of accumulators receives and stores, after the processing cycle, the portion of the output data for each output depth plane of multiple output depth planes. A corresponding batch of accumulators stores, after the processing cycle, the portion of the output data for a subset of the output channels and for each output depth plane.

Type: Grant

Filed: October 24, 2019

Date of Patent: October 18, 2022

Assignee: Apple Inc.

Inventors: Christopher L. Mills, Sung Hee Park
System and method for pipelined time-domain computing using time-domain flip-flops and its application in time-series analysis

Patent number: 11467831

Abstract: Systems and/or methods can include a ring based inverter chain that constructs multi-bit flip-flops that store time. The time flip-flops serve as storage units and enable pipeline operations. Single cells used in time series analysis, such as dynamic time warping are rendered by the time-domain circuits. The circuits include time flip-flops, Min, and ABS circuits. A and the matrix can be constructed through the single cells.

Type: Grant

Filed: December 18, 2019

Date of Patent: October 11, 2022

Assignee: Northwestern University

Inventors: Jie Gu, Zhengyu Chen
Synchronizing operations in hardware accelerator

Patent number: 11468304

Abstract: In one example, a hardware accelerator comprises an event register that stores an event; a hardware execution engine; and a controller configured to: extract, from an instruction, parameters of an operation to be performed by the hardware execution engine, and a synchronization primitive of a plurality of synchronization primitives for the event; and based on the synchronization primitive, perform at least one of: controlling a start time of the operation at the hardware execution engine, or determining whether to access the event register. The synchronization primitives include a set operation to set the event and/or a wait operation to suspend the operation at the hardware execution engine until the event is set. The plurality of synchronization primitive defines different conditions to be satisfied in order to perform the set operation.

Type: Grant

Filed: November 26, 2019

Date of Patent: October 11, 2022

Assignee: Amazon Technologies, Inc.

Inventor: Ron Diamant
Storage device with artificial intelligence and storage system including the same

Patent number: 11468306

Abstract: A storage system includes a host device and a storage device. The host device provides first input data for data storage function and second input data for artificial intelligence (AI) function. The storage device stores the first input data from the host device, and performs AI calculation based on the second input data to generate calculation result data. The storage device includes a first processor, a first nonvolatile memory, a second processor and a second nonvolatile memory. The first processor controls an operation of the storage device. The first nonvolatile memory stores the first input data. The second processor performs the AI calculation, and is distinguished from the first processor. The second nonvolatile memory stores weight data associated with the AI calculation, and is distinguished from the first nonvolatile memory.

Type: Grant

Filed: June 19, 2020

Date of Patent: October 11, 2022

Assignee: Samsung Electronics Co., Ltd.

Inventors: Jaehun Jang, Hongrak Son, Changkyu Seol, Hyejeong So, Hwaseok Oh, Pilsang Yoon, Jinsoo Lim
Computational memory with cooperation among rows of processing elements and memory thereof

Patent number: 11468002

Abstract: A computing device includes an array of processing elements mutually connected to perform single instruction multiple data (SIMD) operations, memory cells connected to each processing element to store data related to the SIMD operations, and a cache connected to each processing element to cache data related to the SIMD operations. Caches of adjacent processing elements are connected. The same or another computing device includes rows of mutually connected processing elements to share data. The computing device further includes a row arithmetic logic unit (ALU) at each row of processing elements. The row ALU of a respective row is configured to perform an operation with processing elements of the respective row.

Type: Grant

Filed: February 26, 2021

Date of Patent: October 11, 2022

Assignee: UNTETHER AI CORPORATION

Inventors: William Martin Snelgrove, Jonathan Scobbie
Content-addressable processing engine

Patent number: 11461097

Abstract: A content-addressable processing engine, also referred to herein as CAPE, is provided. Processing-in-memory (PIM) architectures attempt to overcome the von Neumann bottleneck by combining computation and storage logic into a single component. CAPE provides a general-purpose PIM microarchitecture that provides acceleration of vector operations while being programmable with standard reduced instruction set computing (RISC) instructions, such as RISC-V instructions with standard vector extensions. CAPE can be implemented as a standalone core that specializes in associative computing, and that can be integrated in a tiled multicore chip alongside other types of compute engines. Certain embodiments of CAPE achieve average speedups of 14× (up to 254×) over an area-equivalent out-of-order processor core tile with three levels of caches across a diverse set of representative applications.

Type: Grant

Filed: January 15, 2021

Date of Patent: October 4, 2022

Assignee: CORNELL UNIVERSITY

Inventors: José F. Martínez, Helena Caminal, Kailin Yang, Khalid Al-Hawaj, Christopher Batten

prev … 2 3 4 5 6 7 8 9 10 … next